Ever since I started doing my “A look under ARC’s hood” series of blog posts I have been intrigued by objc_retainAutoreleasedReturnValue
. It’s been covered by Mike Ash on his blog from a conceptual point of view but I haven’t found a decent explanation into exactly how it works. So I took a look and here’s what I found.
What is it meant to do?
The concept behind objc_retainAutoreleasedReturnValue
is that if a value is to be returned from a function autoreleased, but the very next thing that is to be done is a retain on that object then it’s absolutely pointless doing the autorelease and retain – we’re just wasting cycles. So if we can somehow determine that we’re about to retain then we can save a few CPU cycles. Over the course of a running application this could add up to quite a lot of time and effort saved.
In Apple’s code they say this:
objc_autoreleaseReturnValue() examines the caller’s instructions following the return. If the caller’s instructions immediately call objc_autoreleaseReturnValue, then the callee omits the -autorelease and saves the result in thread-local storage. If the caller does not look like it cooperates, then the callee calls -autorelease as usual.
objc_autoreleaseReturnValue checks if the returned value is the same as the one in thread-local storage. If it is, the value is used directly. If not, the value is assumed to be truly autoreleased and is retained again. In either case, the caller now has a retained reference to the value.
I think there is a typo there in that it should read:
“If the caller’s instructions immediately call objc_retainAutoreleasedReturnValue”
So basically what it means is that if you consider this bit of code:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Just ignore the fact that you wouldn’t really do that, but if we rewrite that to include the retains, releases & autoreleases that will be going on behind the scenes then it looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
Now then if we inline the createMeAnObject
code into init
:
1 2 3 4 5 6 7 8 9 10 11 |
|
Here we notice that there is a [temp autorelease]
followed immediately by a [temp retain]
. It is this optimisation that the new Objective-C runtime can help us with.
How does it work – objc_autoreleaseReturnValue?
The code is out there for the x86 version of this, but there’s no ARM code so I had to go digging into the disassembly for it.
Here’s the disassembly for objc_autoreleaseReturnValue
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
Let’s break that down then…
00006ec4 b580 push {r7, lr} 00006ec6 466f mov r7, sp
This is a standard prologue for a method in ARM.
00006ec8 f01e0f01 tst.w lr, #1 @ 0x1 00006ecc d004 beq.n 0x6ed8
Here we are doing the first bit of our sniffing of the following instructions. lr
is the “link register” and contains the address of the method that we’re returning to. Since this method is always called as a tail call, this will contain the address of the caller of our method that’s returning a value autoreleased.
The tst
instruction is doing a bitwise AND of the value in lr
and the integer value 1. Then the beq
will branch if the zero flag is set, i.e. if lr & 1 == 0
. So this means that we are testing if the lowest bit is not set. You can either read up about ARM processors or take it from me that if the low bit is set on the link register then it means the caller is in thumb mode. So this means that if we’re going back to ARM code then we branch over a few instructions to 0x6ed8
whereas if we’re going back to Thumb code then we don’t branch.
00006ece f83e1c01 ldrh.w r1, [lr, #-1] 00006ed2 f244623f movw r2, 0x463f 00006ed6 e005 b.n 0x6ee4
This is the case that gets run if our condition before was not true. We are loading a half word (16-bits) from lr - 1
into r1
(we need the -1 because of the reason from before that the low bit is set if we’re in Thumb mode so actually the next instruction after return will be at lr - 1
). We then put 0x463f
into r2
. Then we jump to 0x6ee4
.
00006ed8 f8de1000 ldr.w r1, [lr] 00006edc f2470207 movw r2, 0x7007 00006ee0 f2ce12a0 movt r2, 0xe1a0
This is the case that gets run if our condition before was true. We are here loading a whole 32-bits from lr
into r1
and loading 0xe1a07007
into r2
.
00006ee4 4291 cmp r1, r2 00006ee6 d106 bne.n 0x6ef6
The next section compares the two registers that we’ve just been setting in one of two ways. If they are not equal then we branch over to 0x6ef6
So we’re matching against either 0x463f
(if it’s Thumb mode) or 0xe1a07007
(if it’s ARM mode). Why do we care that the instructions that we’re about to run when we return have those particular binary values? Well if we compile a method that does the objc_autoreleaseReturnValue
and objc_retainAutoreleasedReturnValue
dance then we see that the compiler adds in an instruction which acts as a marker. Let’s see what it looks like:
1 2 3 4 5 6 7 8 9 |
|
Well take a look at that. It’s added in a mov r7, r7
in each case which is a noop (i.e. does nothing as it moves r7
back into itself). If you examine the binary values for these instructions then you’ll see they match the values that we were told to compare against. The compiler has added this as a marker to tell the objc_autoreleaseReturnValue
that the caller is about to call objc_retainAutoreleasedReturnValue
.
00006ee8 ee1d1f70 mrc 15, 0, r1, cr13, cr0, {3} 00006eec f0210103 bic.w r1, r1, #3 @ 0x3 00006ef0 f8c100c4 str.w r0, [r1, #196] 00006ef4 bd80 pop {r7, pc}
This is the code that gets run if the instructions matched. It appears to be getting a value from a coprocessor (the mrc
instruction) then acting on it and storing r0
(which will be the value that’s to be returned) into the memory location computed. Then it returns. I’m not entirely sure what this coprocessor magic is doing but it will probably become apparent when we look at the code for objc_retainAutoreleasedReturnValue
. But essentially it’s setting some flag that we’ll read later.
00006ef6 f00bfb93 bl _objc_autorelease 00006efa bd80 pop {r7, pc}
Finally, this is where we get to if the instructions did not match. This performs a normal call to objc_autorelease
incase the caller is not about to retain the object.
How does it work – objc_retainAutoreleasedReturnValue?
Let’s now take a look at objc_retainAutoreleasedReturnValue
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
And again, breaking that down we get:
00012bbc b580 push {r7, lr}
Standard prologue for a method.
00012bbe ee1d1f70 mrc 15, 0, r1, cr13, cr0, {3} 00012bc2 466f mov r7, sp 00012bc4 f0210103 bic.w r1, r1, #3 @ 0x3 00012bc8 f8d110c4 ldr.w r1, [r1, #196]
Here we get some more context on what that mrc
was all about before. We can see here that we’re running the same instruction as we did before and doing the same bic
instruction and then loading value stored at the computed address into r1
.
00012bcc 4281 cmp r1, r0 00012bce d107 bne.n 0x12be0
Now this is the interesting bit. We’re checking that the value we obtained from doing the dance with the coprocessor (r1
) is the same as the object passed into this method (r0
). If these two match then we know that the object we are trying to retain has just been returned from a method that had called objc_autoreleaseReturnValue
. So we don’t need to do anything. It’s not been autoreleased so we’re not going to retain it.
00012bd0 ee1d1f70 mrc 15, 0, r1, cr13, cr0, {3} 00012bd4 2200 movs r2, #0 00012bd6 f0210103 bic.w r1, r1, #3 @ 0x3 00012bda f8c120c4 str.w r2, [r1, #196] 00012bde bd80 pop {r7, pc}
This is the code that then gets run if the comparison was true – i.e. this object had just gone through a objc_autoreleaseReturnValue
. We clear out the value in the coprocessor magic dance and return.
00012be0 f7f3f976 bl _objc_retain 00012be4 bd80 pop {r7, pc}
If it didn’t match, then we know that this object has not gone through a objc_autoreleaseReturnValue
, which is likely because the method we called was not compiled with ARC enabled. So we do a retain.
Phew, so explain that again please?
It’s probably easiest to consider the following pseudo code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
That is basically what it all boils down to and with some tail call optimisations this can all be incredibly optimised compared to all the redundant autorelease followed by retain pairs that we must have had in code before ARC was invented.
Conclusions
This is yet again some awesome stuff from the Apple engineers here. Sniffing the next instructions to be executed is some very clever stuff to ensure that it’s always going to work. They must have to ensure that the marker (mov r7, r7
) isn’t moved by the optimiser for instance and I’m sure lots of edge cases but it seems to work a treat!