This is a follow on post to A look inside blocks: Episode 1 in which I looked into the innards of blocks and how the compiler sees them. In this article I take a look at blocks that are not constant and how they are formed on the stack.
Block types
In the first article we saw the block have a class of _NSConcreteGlobalBlock
. The block structure and descriptor were both fully initialised at compile time since all variables were known. There are a few different types of block, each with their own associated class. However for simplicities sake, we just need to consider 3 of them:
_NSConcreteGlobalBlock
is a block defined globally where it is fully complete at compile time. These blocks are those that don’t capture any scope such as an empty block._NSConcreteStackBlock
is a block located on the stack. This is where all blocks start out before they are eventually copied onto the heap._NSConcreteMallocBlock
is a block located on the heap. After copying a block, this is where they end up. Once here they are reference counted and freed when the reference count drops to zero.
A block that captures scope
This time we’re going to look at the following bit of code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
The function called foo
is just there so that the block captures something, by having a function to call with a captured variable. Once again, we look at the armv7 assembly produced, relevant bits only:
1 2 3 4 5 6 7 |
|
First of all the runBlockA
function is the same as before. It’s calling the invoke
function of the block. Then onto doBlockA
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
|
Well this is very different to before. Instead of seeing a block get loaded from a global symbol, it looks like a lot more work is being done. It might look daunting, but it’s pretty easy to see what’s going on. It’s probably best to consider the function rearranged, but believe me that this doesn’t alter anything functionally. The reason the compiler has emitted the instructions in the order it has is for optimisation to reduce pipeline bubbles, etc. So, rearranged the function looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
|
This is what that is doing:
Function prologue.
r7
is pushed onto the stack because it’s going to get overwritten and is a register which must be preserved across function calls.lr
is the link register and contains the address of the next instruction to execute when this function returns. See the function epilogue for more on that. Also, the stack pointer is saved intor7
.Subtract 24 from the stack pointer. This makes room for 24 bytes of data to be stored in stack space.
This little block of code is doing a lookup of the
L__NSConcreteStackBlock$non_lazy_ptr
symbol, relative to the program counter such that it works wherever the code may end up in the binary when finally linked. The value is then stored to the address of the stack pointer.The value
1073741824
is stored to the stack pointer + 4.The value
0
is stored to the stack pointer + 8. By now it may be becoming clear what’s going on. ABlock_layout
structure is being created on the stack! Up until now there’s theisa
pointer, theflags
and thereserved
values being set.The address of
___doBlockA_block_invoke_0
is stored at the stack pointer + 12. This is theinvoke
parameter of the block structure.The address of
___block_descriptor_tmp
is stored at the stack pointer + 16. This is thedescriptor
parameter of the block structure.The value
128
is stored at the stack pointer + 20. Ah. If you look back at theBlock_layout
struct you’ll see that there’s only 5 values in it. So what is this being stored after the end of the struct then? Well, you’ll notice that the value is128
which is the value of the variable captured in the block. So this must be where blocks store values that they use – after the end of theBlock_layout
struct.The stack pointer, which now points to a fully initialised block structure is put into
r0
andrunBlockA
is called. (Remember thatr0
contains the first argument to a function in the ARM EABI).Finally the stack pointer has 24 added back to it to balance out the subtraction at the start of the function. Then 2 values are popped off the stack into
r7
andpc
respectively. Ther7
balances the push from the prologue and thepc
will now get the value that was inlr
when the function began. This effectively performs the return of the function as it sets the CPU to continue executing (thepc
, program counter) from where the function was told to return to,lr
the link register.
Wow! You still with me? Brilliant!
The final bit of this little section is to check what the invoke function and the descriptor look like. We would expect them to be not much different to the global block from episode 1. Here they are:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
|
And yep, there’s not much difference really. The only difference is the size
parameter of the block descriptor. It’s now 24
rather than 20
. This is because there’s an integer value captured by the block and so the block structure is 24 bytes rather than the standard 20. We saw the extra bytes being added to the end of the structure when it was created.
Also in the actual block function, i.e. __doBlockA_block_invoke_0
, you can see the value being read out of the end of the block structure, i.e. r0 + 20
. This is the variable captured in the block.
What about capturing object types?
The next thing to consider is what if instead of capturing an integer, it was an object type such as an NSString
. To see what happens there, consider the following code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
I won’t go into the details of doBlockA
because that doesn’t change much. What is interesting is the block descriptor structure that’s created:
1 2 3 4 5 6 7 8 9 |
|
Notice there are pointers to functions called ___copy_helper_block_
and ___destroy_helper_block_
. Here are the definitions of those functions:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
I assume these functions are what gets run when blocks are copied and destroyed. They must be retaining and releasing the object that was captured by the block. It looks like the copy function takes 2 parameters as both r0
and r1
are addressed as if they contain valid data. The destroy function looks like it just takes 1. All of the hard work looks like it’s done by _Block_object_assign
and _Block_object_dispose
. The code for that is within the block runtime code, part of the compiler-rt
project within LLVM.
If you want to go away and have a read of the code for the blocks runtime then take a look at the source which can be downloaded from http://compiler-rt.llvm.org. In particular, runtime.c
is the file to look at.
What next?
In the next episode I shall take a look into the blocks runtime by investigating the code for Block_copy
and see just how that does its business. This will give an insight into the copy and destroy helper functions we’ve just seen get created for blocks that capture objects.