Let's have a look at how recursion works at low-level by having a look at some Dart AOT code!
the best way to demonstrate what a recursive function is, is through an example. given the following Dart code:
import 'dart:io' show exit;
int incrementUntilValueIs100OrMore(int value) {
if (value >= 100) {
return value;
} else {
return incrementUntilValueIs100OrMore(value + 1);
}
}
void main(List<String> args) {
final value = incrementUntilValueIs100OrMore(0);
print(value);
exit(0);
}
we'll get the following AOT for the incrementUntilValueIs100OrMore
function:
Precompiled____incrementUntilValueIs100OrMore_1436:
000000000009a738 push rbp ; CODE XREF=Precompiled____main_1435+17, Precompiled____incrementUntilValueIs100OrMore_1436+38
000000000009a739 mov rbp, rsp
000000000009a73c cmp rsp, qword [r14+0x40]
000000000009a740 jbe loc_9a769
loc_9a746:
000000000009a746 mov rax, qword [rbp+arg_0] ; CODE XREF=Precompiled____incrementUntilValueIs100OrMore_1436+56
000000000009a74a cmp rax, 0x64
000000000009a74e jl loc_9a759
000000000009a754 mov rsp, rbp
000000000009a757 pop rbp
000000000009a758 ret
; endp
loc_9a759:
000000000009a759 add rax, 0x1 ; CODE XREF=Precompiled____incrementUntilValueIs100OrMore_1436+22
000000000009a75d push rax
000000000009a75e call Precompiled____incrementUntilValueIs100OrMore_1436 ; Precompiled____incrementUntilValueIs100OrMore_1436
000000000009a763 pop rcx
000000000009a764 mov rsp, rbp
000000000009a767 pop rbp
000000000009a768 ret
; endp
loc_9a769:
000000000009a769 call qword [r14+0x240] ; CODE XREF=Precompiled____incrementUntilValueIs100OrMore_1436+8
000000000009a770 jmp loc_9a746
let's break it down
these two bits of code are related to each other and as pointed by Vyacheslav Egorov, they are there as overflow checks:
Precompiled____incrementUntilValueIs100OrMore_1436:
000000000009a738 push rbp ; CODE XREF=Precompiled____main_1435+17, Precompiled____incrementUntilValueIs100OrMore_1436+38
000000000009a739 mov rbp, rsp
000000000009a73c cmp rsp, qword [r14+0x40]
000000000009a740 jbe loc_9a769
...
loc_9a769:
000000000009a769 call qword [r14+0x240] ; CODE XREF=Precompiled____incrementUntilValueIs100OrMore_1436+8
000000000009a770 jmp loc_9a746
so I won't go through them too much now since we've looked at them in issue-4 just briefly. the important part is the jmp
that is a short jump to the loc_9a746
tag where the actual procedure code is laid out. then we get to loc_9a746
which starts like this:
loc_9a746:
000000000009a746 mov rax, qword [rbp+arg_0] ; CODE XREF=Precompiled____incrementUntilValueIs100OrMore_1436+56
000000000009a74a cmp rax, 0x64
000000000009a74e jl loc_9a759
the mov
instruction there is setting the 64-bit value of rax
to the value of the value
argument, that we passed to this function to start with. then you can see cmp rax, 0x64
where 0x64
is the base-16 value of 100, basically our if statement. the cmp
is there to do the if
basically and compare rax
with 100. this is then followed by jl
which is "Jump near if less (SF≠ OF).". this will jump to loc_9a759
if rax
is less than 100. in our code we said if value >= 100
but Dart is translating this to if value < 100
and then it jumps to loc_9a759
. if that's not the case, in other words, if rax
is greater than or equal to 100, then this happens:
000000000009a754 mov rsp, rbp
000000000009a757 pop rbp
000000000009a758 ret
; endp
I asked Vyacheslav about the calling convention of Dart and he said "it's a custom one" and I can see a little indication that the callee stores the return value in rax
in this case we are returning an int
in our function so it seems like Dart is reserving rax
for that return purpose. So keep that in mind!
however if rax
is less than 100, then we jump short to loc_9a759
which is this:
loc_9a759:
000000000009a759 add rax, 0x1 ; CODE XREF=Precompiled____incrementUntilValueIs100OrMore_1436+22
000000000009a75d push rax
000000000009a75e call Precompiled____incrementUntilValueIs100OrMore_1436 ; Precompiled____incrementUntilValueIs100OrMore_1436
000000000009a763 pop rcx
000000000009a764 mov rsp, rbp
000000000009a767 pop rbp
000000000009a768 ret
; endp
as you can see, rax
is getting incremented by 1 and then pushed into the stack, and Dart is calling the Precompiled____incrementUntilValueIs100OrMore_1436
procedure again, while we are already in that procedure. in this case Precompiled____incrementUntilValueIs100OrMore_1436
becomes both the caller and the callee.
back in the main
function where we called our procedure from, you can see this:
Precompiled____main_1435:
000000000009a700 push rbp ; CODE XREF=Precompiled____main_main_1437+17
000000000009a701 mov rbp, rsp
000000000009a704 xor eax, eax
000000000009a706 cmp rsp, qword [r14+0x40]
000000000009a70a jbe loc_9a72f
loc_9a710:
000000000009a710 push rax ; CODE XREF=Precompiled____main_1435+54
000000000009a711 call Precompiled____incrementUntilValueIs100OrMore_1436 ; Precompiled____incrementUntilValueIs100OrMore_1436
000000000009a716 pop rcx
000000000009a717 push rax
000000000009a718 call Precompiled____print_911 ; Precompiled____print_911
000000000009a71d pop rcx
000000000009a71e call Precompiled____exit_1024 ; Precompiled____exit_1024
000000000009a723 mov rax, qword [r14+0xc8]
000000000009a72a mov rsp, rbp
000000000009a72d pop rbp
000000000009a72e ret
; endp
loc_9a72f:
000000000009a72f call qword [r14+0x240] ; CODE XREF=Precompiled____main_1435+10
000000000009a736 jmp loc_9a710
with this part being the most interesting part to me:
000000000009a710 push rax ; CODE XREF=Precompiled____main_1435+54
000000000009a711 call Precompiled____incrementUntilValueIs100OrMore_1436 ; Precompiled____incrementUntilValueIs100OrMore_1436
000000000009a716 pop rcx
000000000009a717 push rax
000000000009a718 call Precompiled____print_911 ; Precompiled____print_911
000000000009a71d pop rcx
seems like the Precompiled____incrementUntilValueIs100OrMore_1436
procedure is keeping its return value in rax
after it's returning. so seems like the callee returns its value in rax
in Dart. and the push
and pop
is just to balance the stack since value
went into the stack with push rax
so the caller is responsible for balancing the calls on the stack and that's what the pop rcx
seems to be doing there. so we can conclude that the Precompiled____incrementUntilValueIs100OrMore_1436
procedure here is keeping its return value in rax
all the while it is calling itself on and on until it gets to the JL
instruction where it pops out of the whole routine!
to better understand how recursive functions actually work in Dart, you'd need to know how call
, ret
, stack pointers, base pointer, etc work at assembly level so let's dig into those things now.
let's first talk about stacks. to know about stack, you'd need to know about segments. a segment is usually a defined piece of a software that has a limit, usually maximum of 4 gigs of memory on modern hardware. then you'd have a pointer, let's say the stack pointer, that starts at the top of the segment. so let's say we have a stack that is 1 megabytes, 1024 bytes in other words. in normal conditions, the program would set the stack up for you, in this case Dart, and then you'd have a stack pointer, or SP (stack pointer), that is stored in esp
under x86_32 and rsp
in x86_64. the stack pointer's value in this case would be 1024, so it would point to the top of the stack.
when you pass a function to a procedure, the compiler would set up the stack using a calling convention. a calling convention is an agreed-upon way of calling other procedures. for instance, a calling convention might be that the first parameter to a function is passed into rax
, the second into rcx
and the rest are pushed into the stack as 64-bit pointers. something like that. i'm making this up but the idea stays corrected. Dart, as I've understood from Vyacheslav Egorov, has a custom calling convention meaning that it's not really documented in a central place and this is nothing strange. the calling convention has to make sense to those who write a specific compiler, otherwise you and I who use the compiler won't even notice how it is working under the hood but a good understanding of how the stack works will help you in understanding recursive functions.
to continue you'd also need to know about the base pointer, or ebp
in 32-bit mode or rbp
in 64-bit protected mode. usually what happens inside the creation of a stack frame for a procedure, depending on who or what has compiled the code, would look like this:
push rbp
mov rbp, rsp
sub rsp, NNN
... now we have NNN bytes of space on the stack
leave
ret
when we enter this procedure, usually what happens is that the stack pointer points to the top of the stack for the current procedure, so any values above that pointer are usually the values that have been pushed to the stack. again depending on the calling convention, maybe arguments will get passed to gprs (general purpose registers) and from what Vyacheslav said, Dart has a custom calling convention so I cannot really document it here but we will assume that in our fictitious calling convention in this article, all arguments are passed in the stack! in that case when we enter our procedure shown before, the stack pointer would point to the last argument pushed to the stack. then you would use the base pointer to point to the memory the procedure needs to allocate for its local variables as well as the variables that the caller pushed to the stack and passed to us. if you see rbp + NNN
it may mean that the procedure is reading arguments passed to it while if you see rsp + NNN
after a sub rsp, NNN
it may mean that the procedure is working with its local variables.
so to summarize, calling conventions define how a caller and callee communicate with each other while the base pointer and the stack pointer are used to coordinate the access to both arguments passed to a procedure and the locally stored stack space for the procedure.
knowing that, and understanding that upon a Dart function getting called, it will set up its stack using the stack pointer and the base pointer, and knowing that every value pushed to the stack decreases the stack pointer by the number of bytes needed for that variable to be stored in the stack, you'll understand that the stack can actually run out of space if you recursively call a function with no proper exit.
the thing to take away from all of this rant is that every time we enter a new procedure through CALL
where that procedure is setting up its stack, the stack pointer is decremented since we start at the top of the stack segment, so if you continue calling procedures like in our case, recursively, without actually leaving the nested procedure calls, the stack pointer will get so low that the runtime will and should eventually throw a stack overflow, which I'm sure Dart does.
here is also a good little bit of information about the stack pointer from Intel:
Items are placed on the stack using the PUSH instruction and removed from the stack using the POP instruction. When an item is pushed onto the stack, the processor decrements the ESP register, then writes the item at the new top of stack. When an item is popped off the stack, the processor reads the item from the top of stack, then incre- ments the ESP register. In this manner, the stack grows down in memory (towards lesser addresses) when items are pushed on the stack and shrinks up (towards greater addresses) when the items are popped from the stack.
i can't talk about recursive functions without paying tribute to the classical factorial function implementation that uses recursion so let's have a look at that. factorial of N is the product of all numbers from and including 1 up to and including N, so the factorial of 6 is 1*2*3*4*5*6 = 720
. a non-recursive way of calculating factorial of N would be like this:
import 'dart:io' show exit;
int factorial(int value) {
var result = 1;
for (var count = 1; count <= value; count++) {
result *= count;
}
return result;
}
void main(List<String> args) {
print(factorial(6));
exit(0);
}
and this is quite straightforward but not what I want to demonstrate in this issue. let's have a look at how this would look like if we used recursion:
import 'dart:io' show exit;
int factorial(int value) => value == 1
? value
: value * factorial(value - 1);
void main(List<String> args) {
print(factorial(4));
exit(0);
}
let's check this function's compiled AOT and see how that looks like:
; ================ B E G I N N I N G O F P R O C E D U R E ================
; Variables:
; arg_0: int, 16
Precompiled____factorial_1436:
000000000009a73c push rbp ; CODE XREF=Precompiled____main_1435+20, Precompiled____factorial_1436+65
000000000009a73d mov rbp, rsp
000000000009a740 cmp rsp, qword [r14+0x40]
000000000009a744 jbe loc_9a793
loc_9a74a:
000000000009a74a mov rcx, qword [rbp+arg_0] ; CODE XREF=Precompiled____factorial_1436+94
000000000009a74e mov rax, rcx
000000000009a751 add rax, rax
000000000009a754 jno loc_9a763
000000000009a75a call Precompiled_Stub__iso_stub_AllocateMintSharedWithoutFPURegsStub ; Precompiled_Stub__iso_stub_AllocateMintSharedWithoutFPURegsStub
000000000009a75f mov qword [rax+7], rcx
loc_9a763:
000000000009a763 cmp rax, 0x2 ; CODE XREF=Precompiled____factorial_1436+24
000000000009a767 jne loc_9a775
000000000009a76d mov rax, rcx
000000000009a770 jmp loc_9a78e
loc_9a775:
000000000009a775 mov rax, rcx ; CODE XREF=Precompiled____factorial_1436+43
000000000009a778 sub rax, 0x1
000000000009a77c push rax
000000000009a77d call Precompiled____factorial_1436 ; Precompiled____factorial_1436
000000000009a782 pop rcx
000000000009a783 mov rcx, qword [rbp+arg_0]
000000000009a787 imul rcx, rax
000000000009a78b mov rax, rcx
loc_9a78e:
000000000009a78e mov rsp, rbp ; CODE XREF=Precompiled____factorial_1436+52
000000000009a791 pop rbp
000000000009a792 ret
; endp
loc_9a793:
000000000009a793 call qword [r14+0x240] ; CODE XREF=Precompiled____factorial_1436+8
the first part of the procedure is just setting up the stack:
Precompiled____factorial_1436:
000000000009a73c push rbp ; CODE XREF=Precompiled____main_1435+20, Precompiled____factorial_1436+65
000000000009a73d mov rbp, rsp
000000000009a740 cmp rsp, qword [r14+0x40]
000000000009a744 jbe loc_9a793
so i won't go through it again! after that we are going to loc_9a74a
which looks like this:
loc_9a74a:
000000000009a74a mov rcx, qword [rbp+arg_0] ; CODE XREF=Precompiled____factorial_1436+94
000000000009a74e mov rax, rcx
000000000009a751 add rax, rax
000000000009a754 jno loc_9a763
000000000009a75a call Precompiled_Stub__iso_stub_AllocateMintSharedWithoutFPURegsStub ; Precompiled_Stub__iso_stub_AllocateMintSharedWithoutFPURegsStub
000000000009a75f mov qword [rax+7], rcx
loc_9a763:
000000000009a763 cmp rax, 0x2 ; CODE XREF=Precompiled____factorial_1436+24
000000000009a767 jne loc_9a775
000000000009a76d mov rax, rcx
000000000009a770 jmp loc_9a78e
i truly got puzzled by all of this so I asked Vyacheslav on Twitter what all of this means and he answerd with this:
it boxes an unboxed int64 value (either into a smi or a mint box if it does not fit into a smi) and then compares result with with a smi representation of 1 it's a bug - boxing should not be needed, will be fixed by https://t.co/PIKVq6kYFD
Since it seems to be a known bug that the compiler is doing additional work here than it should, and the work should essentially be replaced by doing calculations in a gpr, I will skip explaining this. but if you're interested in what mint representations and smi are in Dart, I suggest that you have a look at this resource.
we then get to the juciest part of the procedure, loc_9a775
, which in my opinion is this:
loc_9a775:
000000000009a775 mov rax, rcx ; CODE XREF=Precompiled____factorial_1436+43
000000000009a778 sub rax, 0x1
000000000009a77c push rax
000000000009a77d call Precompiled____factorial_1436 ; Precompiled____factorial_1436
000000000009a782 pop rcx
000000000009a783 mov rcx, qword [rbp+arg_0]
000000000009a787 imul rcx, rax
000000000009a78b mov rax, rcx
so it looks like the current accumulated value is being placed inside rbp+arg_0
and the current value
is being processed by the first three instructions in loc_9a775
. if you look at the original Dart code you may get confused by the ternary statement comparing value
with 1 and then passing the value - 1
(which by the way is the sub rax, 0x01
instruction above) into itself, and you'd wonder how value
can both be 1
and also can be the result of the calculation of the factorial
function but one thing we need to keep in mind is that the calculation done inside the factorial
function is being saved in the stack, and the result is being stored in the stack too, and when value
then gets subtracted, it gets pushed as a new value and passed to the stack and then read from the stack by the new factorial
procedure call. so we essentially have two representations of the value
! one is being essentially used as a counter, and the other as an accumulator, perfect for rcx
and rax
(or vice versa) if you ask me!
- recursive functions do indeed call themselves, and this comes with the overhead of settign up a new stack and unwinding the stack in every pass through the function. you may not incur the same execution cost if you write your functions using iterators!
- if the exit scenario for a recursive function is not set properly, you will get stack overflow since you will run out of stack space after N execution depending on what Dart is allocated the stack size for. Usually the stack can be up to 4GB long, but I'm unsure as how large the stack is allowed to grow in Dart. The only part-official information about this is provided in GitHub on the Dart SDK repo from Ivan Posva who wrote "Stack space for the main thread is not set by the VM, but by the OS on launch."