How does objc_retainAutoreleasedReturnValue work?

Ever since I started doing my “A look under ARC’s hood” series of blog posts I have been intrigued by objc_retainAutoreleasedReturnValue. It’s been covered by Mike Ash on his blog from a conceptual point of view but I haven’t found a decent explanation into exactly how it works. So I took a look and here’s what I found.

What is it meant to do?

The concept behind objc_retainAutoreleasedReturnValue is that if a value is to be returned from a function autoreleased, but the very next thing that is to be done is a retain on that object then it’s absolutely pointless doing the autorelease and retain – we’re just wasting cycles. So if we can somehow determine that we’re about to retain then we can save a few CPU cycles. Over the course of a running application this could add up to quite a lot of time and effort saved.

In Apple’s code they say this:

objc_autoreleaseReturnValue() examines the caller’s instructions following the return. If the caller’s instructions immediately call objc_autoreleaseReturnValue, then the callee omits the -autorelease and saves the result in thread-local storage. If the caller does not look like it cooperates, then the callee calls -autorelease as usual.

objc_autoreleaseReturnValue checks if the returned value is the same as the one in thread-local storage. If it is, the value is used directly. If not, the value is assumed to be truly autoreleased and is retained again. In either case, the caller now has a retained reference to the value.

I think there is a typo there in that it should read:

“If the caller’s instructions immediately call objc_retainAutoreleasedReturnValue”

So basically what it means is that if you consider this bit of code:

- (SomeClass*)createMeAnObject {
   SomeClass *obj = [[SomeClass alloc] init];
   obj.string = @"Badger";
   obj.number = 10;
   return obj;
}

- (id)init {
    if ((self = [super init])) {
        self.myObject = [self createMeAnObject];
    }
    return self;
}

Just ignore the fact that you wouldn’t really do that, but if we rewrite that to include the retains, releases & autoreleases that will be going on behind the scenes then it looks like this:

- (SomeClass*)createMeAnObject {
   SomeClass *obj = [[SomeClass alloc] init];
   obj.string = @"Badger";
   obj.number = 10;
   return [obj autorelease];
}

- (id)init {
    if ((self = [super init])) {
        [myObject release];
        SomeClass *temp = [self createMeAnObject];
        myObject = [temp retain];
    }
    return self;
}

Now then if we inline the createMeAnObject code into init:

- (id)init {
    if ((self = [super init])) {
        [myObject release];
        SomeClass *temp = [[SomeClass alloc] init];
        obj.string = @"Badger";
        obj.number = 10;
        [temp autorelease];
        myObject = [temp retain];
    }
    return self;
}

Here we notice that there is a [temp autorelease] followed immediately by a [temp retain]. It is this optimisation that the new Objective-C runtime can help us with.

How does it work – objc_autoreleaseReturnValue?

The code is out there for the x86 version of this, but there’s no ARM code so I had to go digging into the disassembly for it.

Here’s the disassembly for objc_autoreleaseReturnValue:

_objc_autoreleaseReturnValue:
00006ec4            b580        push    {r7, lr}
00006ec6            466f        mov     r7, sp
00006ec8        f01e0f01        tst.w   lr, #1  @ 0x1
00006ecc            d004        beq.n   0x6ed8
00006ece        f83e1c01        ldrh.w  r1, [lr, #-1]
00006ed2        f244623f        movw    r2, 0x463f
00006ed6            e005        b.n     0x6ee4
00006ed8        f8de1000        ldr.w   r1, [lr]
00006edc        f2470207        movw    r2, 0x7007
00006ee0        f2ce12a0        movt    r2, 0xe1a0
00006ee4            4291        cmp     r1, r2
00006ee6            d106        bne.n   0x6ef6
00006ee8        ee1d1f70        mrc     15, 0, r1, cr13, cr0, {3}
00006eec        f0210103        bic.w   r1, r1, #3      @ 0x3
00006ef0        f8c100c4        str.w   r0, [r1, #196]
00006ef4            bd80        pop     {r7, pc}
00006ef6        f00bfb93        bl      _objc_autorelease
00006efa            bd80        pop     {r7, pc}

Let’s break that down then…

00006ec4            b580        push    {r7, lr}
00006ec6            466f        mov     r7, sp

This is a standard prologue for a method in ARM.

00006ec8        f01e0f01        tst.w   lr, #1  @ 0x1
00006ecc            d004        beq.n   0x6ed8

Here we are doing the first bit of our sniffing of the following instructions. lr is the “link register” and contains the address of the method that we’re returning to. Since this method is always called as a tail call, this will contain the address of the caller of our method that’s returning a value autoreleased.

The tst instruction is doing a bitwise AND of the value in lr and the integer value 1. Then the beq will branch if the zero flag is set, i.e. if lr & 1 == 0. So this means that we are testing if the lowest bit is not set. You can either read up about ARM processors or take it from me that if the low bit is set on the link register then it means the caller is in thumb mode. So this means that if we’re going back to ARM code then we branch over a few instructions to 0x6ed8 whereas if we’re going back to Thumb code then we don’t branch.

00006ece        f83e1c01        ldrh.w  r1, [lr, #-1]
00006ed2        f244623f        movw    r2, 0x463f
00006ed6            e005        b.n     0x6ee4

This is the case that gets run if our condition before was not true. We are loading a half word (16-bits) from lr - 1 into r1 (we need the -1 because of the reason from before that the low bit is set if we’re in Thumb mode so actually the next instruction after return will be at lr - 1). We then put 0x463f into r2. Then we jump to 0x6ee4.

00006ed8        f8de1000        ldr.w   r1, [lr]
00006edc        f2470207        movw    r2, 0x7007
00006ee0        f2ce12a0        movt    r2, 0xe1a0

This is the case that gets run if our condition before was true. We are here loading a whole 32-bits from lr into r1 and loading 0xe1a07007 into r2.

00006ee4            4291        cmp     r1, r2
00006ee6            d106        bne.n   0x6ef6

The next section compares the two registers that we’ve just been setting in one of two ways. If they are not equal then we branch over to 0x6ef6

So we’re matching against either 0x463f (if it’s Thumb mode) or 0xe1a07007 (if it’s ARM mode). Why do we care that the instructions that we’re about to run when we return have those particular binary values? Well if we compile a method that does the objc_autoreleaseReturnValue and objc_retainAutoreleasedReturnValue dance then we see that the compiler adds in an instruction which acts as a marker. Let’s see what it looks like:

Thumb mode:
f7ffef56   blx  _objc_msgSend
    463f  mov  r7, r7
f7ffef54   blx  _objc_retainAutoreleasedReturnValue

ARM mode:
ebffffa0   bl   _objc_msgSend
e1a07007   mov  r7, r7
ebffff9e   bl   _objc_retainAutoreleasedReturnValue

Well take a look at that. It’s added in a mov r7, r7 in each case which is a noop (i.e. does nothing as it moves r7 back into itself). If you examine the binary values for these instructions then you’ll see they match the values that we were told to compare against. The compiler has added this as a marker to tell the objc_autoreleaseReturnValue that the caller is about to call objc_retainAutoreleasedReturnValue.

00006ee8        ee1d1f70        mrc     15, 0, r1, cr13, cr0, {3}
00006eec        f0210103        bic.w   r1, r1, #3      @ 0x3
00006ef0        f8c100c4        str.w   r0, [r1, #196]
00006ef4            bd80        pop     {r7, pc}

This is the code that gets run if the instructions matched. It appears to be getting a value from a coprocessor (the mrc instruction) then acting on it and storing r0 (which will be the value that’s to be returned) into the memory location computed. Then it returns. I’m not entirely sure what this coprocessor magic is doing but it will probably become apparent when we look at the code for objc_retainAutoreleasedReturnValue. But essentially it’s setting some flag that we’ll read later.

00006ef6        f00bfb93        bl      _objc_autorelease
00006efa            bd80        pop     {r7, pc}

Finally, this is where we get to if the instructions did not match. This performs a normal call to objc_autorelease incase the caller is not about to retain the object.

How does it work – objc_retainAutoreleasedReturnValue?

Let’s now take a look at objc_retainAutoreleasedReturnValue:

_objc_retainAutoreleasedReturnValue:
00012bbc            b580        push    {r7, lr}
00012bbe        ee1d1f70        mrc     15, 0, r1, cr13, cr0, {3}
00012bc2            466f        mov     r7, sp
00012bc4        f0210103        bic.w   r1, r1, #3      @ 0x3
00012bc8        f8d110c4        ldr.w   r1, [r1, #196]
00012bcc            4281        cmp     r1, r0
00012bce            d107        bne.n   0x12be0
00012bd0        ee1d1f70        mrc     15, 0, r1, cr13, cr0, {3}
00012bd4            2200        movs    r2, #0
00012bd6        f0210103        bic.w   r1, r1, #3      @ 0x3
00012bda        f8c120c4        str.w   r2, [r1, #196]
00012bde            bd80        pop     {r7, pc}
00012be0        f7f3f976        bl      _objc_retain
00012be4            bd80        pop     {r7, pc}

And again, breaking that down we get:

00012bbc            b580        push    {r7, lr}

Standard prologue for a method.

00012bbe        ee1d1f70        mrc     15, 0, r1, cr13, cr0, {3}
00012bc2            466f        mov     r7, sp
00012bc4        f0210103        bic.w   r1, r1, #3      @ 0x3
00012bc8        f8d110c4        ldr.w   r1, [r1, #196]

Here we get some more context on what that mrc was all about before. We can see here that we’re running the same instruction as we did before and doing the same bic instruction and then loading value stored at the computed address into r1.

00012bcc            4281        cmp     r1, r0
00012bce            d107        bne.n   0x12be0

Now this is the interesting bit. We’re checking that the value we obtained from doing the dance with the coprocessor (r1) is the same as the object passed into this method (r0). If these two match then we know that the object we are trying to retain has just been returned from a method that had called objc_autoreleaseReturnValue. So we don’t need to do anything. It’s not been autoreleased so we’re not going to retain it.

00012bd0        ee1d1f70        mrc     15, 0, r1, cr13, cr0, {3}
00012bd4            2200        movs    r2, #0
00012bd6        f0210103        bic.w   r1, r1, #3      @ 0x3
00012bda        f8c120c4        str.w   r2, [r1, #196]
00012bde            bd80        pop     {r7, pc}

This is the code that then gets run if the comparison was true – i.e. this object had just gone through a objc_autoreleaseReturnValue. We clear out the value in the coprocessor magic dance and return.

00012be0        f7f3f976        bl      _objc_retain
00012be4            bd80        pop     {r7, pc}

If it didn’t match, then we know that this object has not gone through a objc_autoreleaseReturnValue, which is likely because the method we called was not compiled with ARC enabled. So we do a retain.

Phew, so explain that again please?

It’s probably easiest to consider the following pseudo code:

id objc_autoreleaseReturnValue(id object) {
    if (thumb_mode && next_instruction_after_return == 0x463f ||
        arm_mode   && next_instruction_after_return == 0xe1a07007)
    {
        set_flag(object);
    } else {
        return objc_autorelease(object);
    }
}

id objc_retainAutoreleasedReturnValue(id object) {
    if (get_flag(object)) {
        clear_flag();
    } else {
        return objc_retain(object);
    }
}

That is basically what it all boils down to and with some tail call optimisations this can all be incredibly optimised compared to all the redundant autorelease followed by retain pairs that we must have had in code before ARC was invented.

Conclusions

This is yet again some awesome stuff from the Apple engineers here. Sniffing the next instructions to be executed is some very clever stuff to ensure that it’s always going to work. They must have to ensure that the marker (mov r7, r7) isn’t moved by the optimiser for instance and I’m sure lots of edge cases but it seems to work a treat!

Matt Galloway

My home on the 'net.