Thursday, December 13, 2007

Calling Conventions --- __cdecl V/s __stdcall

In an earlier post I mentioned that the __stdcall calling convention reduces the code size by putting the stack clear statement just before the return statement. If this is better than __cdecl then why do we need this "__cdecl" calling convention at all??

As we know, in "__cdecl" the caller will clear the stack. This would be the ideal behavior if the callee does not know how much of stack to clear. This is absolutely the situation in case of functions with variable number of arguments. The callee will not known how many arguments it is going to receive when compiling the code and hence will not be able to put a proper stack move statement. On the other hand this info will be available with the calling function and an appropriate stack move statement can be put right after the function call instruction. This is where the "__cdecl" convention comes to rescue.

Functions with variable arguments will be qualified with "__cdecl" calling convention.

2 comments:

  1. The bigger problem with implementing a variable number of arguments with __stdcall is that arguments are pushed on the stack from left to right. Since the stack on the x86 and most other processors grows "down", the last argument pushed onto the stack will be the one closest to the stack pointer. For example, if I have function foo(int, int, int), and it is called as foo(2, 3, 4) and we use the __stdcall convention the stack will look like this:

    esp-4---> | local variables |
    esp-----> |frame pointer [ebp]|
    esp+4---> | 4 |
    esp+8---> | 3 |
    esp+12--> | 2 |

    If we'd used the C calling convention instead. We would have a stack that looked like this:

    esp-4---> | local variables |
    esp-----> |frame pointer [ebp]|
    esp+4---> | 2 |
    esp+8---> | 3 |
    esp+12--> | 4 |

    If I had to implement a function like printf where the first argument - the format string - tells the function how many more arguments there are, the __stdcall convention becomes problematic because the first argument is at some unknown offset from the stack pointer that depends on the number of arguments passed! On the other hand, for __cdecl the first argument, because it is pushed last, is always at esp+4.

    My point is, the argument that in __stdcall you don't know how many bytes to clean up is not strictly correct - if you can access each parameter, surely you can clean up the stack too. The bigger problem is the ordering of stack pushes with __stdcall which makes finding the number of actual parameters difficult.

    ReplyDelete