I Think Tech: C

Showing posts with label C. Show all posts

Saturday, March 10, 2012

Analysis of the Duqu Trojan worm by Kaspersky Labs

This summary is not available. Please click here to view the post.

Monday, October 3, 2011

Watch points for variables in Ruby - Object#freeze

Almost every programmer knows about watch points. Especially the ones doing native development with C/C++. Watch points were really helpful to me when I was working with C/C++. They were, sort of, my go to weapons whenever I wanted to understand how some third party code worked. It was something that I dearly missed when I started with Ruby. I am fairly new to Ruby and I have never used the ruby-debug (or ruby-debug19) gem, because until today simple print statements were sufficient most of the times.

Today I was at a loss as I was unable to figure out where a particular hash variable was getting two new key-value pairs in it. It was an instance variable with just an attr_reader defined. So obviously a reference to the instance variable was being passed around to the place where it was being modified. So my initial idea of writing a custom write accessor method was probably not going to work (did not try it). That is when I came across this http://ruby-doc.org/docs/ProgrammingRuby/html/trouble.html#S3. The last bullet point in that section has the answer.

You just freeze the object/variable that you want to watch by calling the "freeze" instance method on that object and anyone modifying that object after it's frozen will cause an exception to be raised giving you the precise location of where that modification is happening. This isn't probably as elegant as running a debugger and setting a watch point but it gets the work done nevertheless. RTFM after all..!! This tool is definitely going into my belt. :)

Sunday, August 17, 2008

How libffi actually works?

I came across this libffi when I thought of working on js-ctypes. Though my contribution remained very close to nil, I got to know about this fantastic new thing libffi. But I did not understand how it actually worked when I first read. I just got a high overview and assumed that the library abstracts out several things that I need not worry about when making calls across different programming languages. I just followed the usage instructions and started looking at the js-ctypes code. That was sufficient to understand the js-ctypes code. But today when I was once again going through the README, I actually came across one line that made things a little more clearer. The first paragraph in "What is libffi?" tells it all. Its the calling conventions that this library exploits. I had written a post about calling conventions some time back. As mentioned in the readme file, its the calling convention whcih is the guiding light for the compiler. So when generating the binary code, thecompiler will assume will that the arguments that are being passed to that function will be available in some place and also it knows about a place where the return value will be kept, so that the code from the calling function will know where to look for it.

Now we know that the ultimate binary code that we get after compilation is the machine code. So it is basically a set of machine supported instructions. These instructions are certainly not as sophisticated as those available in the C - like the functions. So what do these functions transalte to? Nothing but a jump of IP(instruction pointer) to an address where the binary code of that function is. Inside this code, the arguments passed are accessed by referencing some memory location. During compilation the compiler will not know the precise memory locations (obviously). But the compiler has to put in some address there when generating the code. How does it decides on the memory location? This is where the calling conventions come into picture. The compiler follows these standard steps. Since its the compiler who is generating the code for both calling a function, where arguments are sent, and the code of the called function, where those args are received, the compiler will be knowing where it has put the arguments and hence generate code to use the data in those locations. From this point of view, the main(or is it the only one) condition for the binary code of a function to work properly is to have its arguments in the memory locations it thinks they are in and and the end place back the return value in the right location. And from what I have understood till now, it is this point that libffi exploits.

When we are forwarding calls from an interpreted language, like JS, to some binary code generated from a compiled language, like C, there are two things to be done:

0. As discussed earlier, whatever arguments are passed from JS are to be placed in the right locations and later take the return value and give it back to JS.
1. Type conversions -- The types used by JS are not understood by C. So someone should rise up to the occasion and map these JS types to their C counterparts and vice-versa.

The first of these is taken care by libffi. But the second one is very context specific and totally depends on the interpreted language. It does not make sense to just have a catalog of type convertors for every interpreted language on this earth. So these two steps were separated and spread out as two interacting layers between the binary code and the interpreted language code. libffi now takes care of the calling convention part and making sure the binary code runs and gives back the results. And the job of type conversion is the job of another layer which is specific to the interpreted language which wants to call into the binary code. And hence we have these different type converting layers for different interpreted languages: ctypes for python, js-ctypes for JS (and probably some more exist). Well with this renewed and clearer understanding I hope to actually contribute to js-ctypes.

Happy calling (with conventions) ;-)

Thursday, April 24, 2008

JS-CTYPES Status

Mark Finkles Weblog » JS-CTYPES Status

This was what mfinkle recently talked about js-ctypes. As he said because of the hectic FF3 schedule the devs at mozilla were pretty busy and hence nothing could be done. I being a lazy bot, also did not do anything. In the blog post above, Mark talks about the struct support (which I told that I would be doing). Now I exhibited my initial enthusiasm by writing up a doc, which was not really useful (its here ). After that nothing much happened except for a small bug fix.

Now that mfinkle has rekindled the interest I will also try and contribute. The struct support is still pending and I will start getting XPCOM code exchanging struct information with libffi. After that we can look at the other interface of JS exchanging structs with XPCOM to finally complete the loop. In the mean time I am also looking at embedding mozilla. Will put up a post about that very soon.

Again, Get ready to use deadly binaries in your JS code. JS-ctypes is soon going to be here.. ;-)

Wednesday, April 9, 2008

#pragma comment(lib, "libfilename") -- A cool way to indicate dependency.

.NET programmers are well aware of the References section of a .NET project using which we specify the dependencies. Things are pretty straight forward in the managed world and with an IDE like Visual Studio it can't be easier. You are using a type from a particular assembly, then just add that assembly to the references of the project. The rest is taken care of by the IDE and framework. When we move to the native/unmanaged world, we get slightly more powers with of course increased responsibility.

Any external symbol that we are using must be approved by two tools/stages: The compiler and the linker. The compiler can be tackled by having the appropriate header file (.h file) which just has the declaration and no definition/implementation. Using this is simple. #include the header file in whichever source file you are using the symbol. But just make sure that the compiler knows where to find that header file, that is maintain an includes directory and tell the compiler about that directory.

The linker is a less tamed beast or rather a more difficult a beast to tame. One simple reason being that since it handles the object code, its error messages cannot be traced back to a line in the source code. Another reason being that we have two versions of library: Static and Dynamic.

Now to use this we will have to provide each of these libs, containing the symbols that we use, as input to the linker. On the VS IDE, we have to list all the required libs in the project settings and also specify the the library containing the libs. I say this is not as simple as the references thing .NET because, to change anything you will have to visit the Project Settings page. And if you are developing without an IDE, your compiler invocation command line will be really really long. (though you can shorten it with some gmake variables and system variables). But apart from these geeky build methods there is yet another simple way to tell the linker which all .lib files it has to look into for symbols when linking the current code and thats where this #pragma directive comes into picture.

#pragma comment(lib, "requiredLibrary.lib") --- This line of code will make the linker look for the library file requiredLibrary.lib. Isn't that cool? Don't you get a feeling that you, kind of, tamed the linker using your C/C++ code?

All you need to do is to put this line of code in one of your code files and the compiler will know what to do. When this code is encountered, the compiler adds a "search directive" to the object file (.obj file) that it generates. When the linker reads that search record it will know what lib file it has to look for to find any external symbol that is present in the current object file that it is linking.

So, suppose you are providing a library and you want to enforce this above method on anyone who consumes your lib, you can provide a header file that they must include to use your symbols. And in that header file you can specify this above pre-processor directive. This way all your anyone who is using your lib will implicitly tell the linker which lib file to look for.

This #pragma comment on a broader sense is used to inject extra information to the object code generated by the compiler. This information can be used by the linker. Here is the msdn article on what all this #pragma comment can do: This page.

Thursday, December 13, 2007

References issue with __fastcall calling conventions

I talked about calling conventions in an earlier post. There I mentioned that if the function is qualified with the __fastcall calling convention the system will try to push the parameters to the registers instead of the stack and there by tries to increase the speed of the function call. But this would just be an attempt. We are not assured that our parameters will always go to the registers. This is because of the availability of the registers.

Apart from this there is another issue associated with __fastcall. Since the parameters are put in the registers, those variables will not have any address location. So if we try to fetch/use the address of the parameters the system will be in a fix as it will not have any address to return. In such a case the params in the registers are copied to some temporary memory location and the address of that location is returned. This temporary location can be anything. It can be the regular stack itself.

Calling Conventions --- cdecl V/s stdcall

In an earlier post I mentioned that the __stdcall calling convention reduces the code size by putting the stack clear statement just before the return statement. If this is better than __cdecl then why do we need this "__cdecl" calling convention at all??

As we know, in "__cdecl" the caller will clear the stack. This would be the ideal behavior if the callee does not know how much of stack to clear. This is absolutely the situation in case of functions with variable number of arguments. The callee will not known how many arguments it is going to receive when compiling the code and hence will not be able to put a proper stack move statement. On the other hand this info will be available with the calling function and an appropriate stack move statement can be put right after the function call instruction. This is where the "__cdecl" convention comes to rescue.

Functions with variable arguments will be qualified with "__cdecl" calling convention.

Tuesday, December 11, 2007

Calling Conventions

This is the outcome of today's presentation by Anand - our tech lead.
There are four calling conventions, and the Microsoft terminologies for them are:

__stdcall
__cdecl
__thiscall
__fastcall

As its known, the calling conventions will tell who, amongst the caller and callee, will clear the parameters from the top of the stack when the function call completes.

__cdecl is the old 'C' way where the stack is cleared by the calling function. Clearing the stack is nothing but having a statement to move the stack pointer up by some number (which is generally the number of arguments). So when a call is placed, immediately after instruction to call, there will be an instruction to move the stack pointer. This way we have an extra instruction associated with every function call instruction. This leads to the increased size of the executable, because of one extra instruction for every function call.

__stdcall is the new way where every function is responsible for clearing the stack allocations made for its parameters. As in, the callee is the one who will rewind the stack and move the stack pointer. In this approach before the return statement of the function a statement to move the stack pointer is introduced. This way the number of instructions will not increase with the number of the function calls.

__thiscall is the object-oriented way of calling functions. Here the first parameter passed is always the "this" pointer. So any instance function member of a class will follow this calling convention. Unlike other parameters, which go to the stack top, the "this" parameter which is also passed is stored in "ECX" register. Now this is how the "this" pointer is implicitly available to all the instance member functions. No function can be explicitly qualified with this calling convention. Any instance member function will get this implicitly, but it can be over-ridden with any other calling convention.

__fastcall is an approach where in the system will try to push the parameters on to Registers instead of the stack just to get a performance boost. But as with the variable qualifier "register" , if the registers are not available then again stack is used for the parameters.

There are a couple of other posts regarding the specifics of the calling conventions and comparison. If this post made any sense then check out those also.

Sunday, May 6, 2007

Thinking C: Behaviour - - Undefined or Unspecified?

C has been my cherished programming language right from the time I was hooked up with computer engineering and I still love.

Now here is discussion and a concept that I recently got to know through a discussion in IRC. ( server: irc.freenode.net channel: ##C).

I was aware of the concept of sequnece points and undefined behaviour in C. But it was during this discussion that I got to know about the Unspecified Behaviour.

Generally both Unspecified and Undefined behaviour occur because of the side effects not being cleard off until a sequence point is reached.

In case of Undefined behaviour nothing is mentioned about that kind of situation in the C standards. The implementors are free to do anything for those situations. They can simply leave it unimplementd or they may just flash up an error or they may follow there own conventions regarding that. We ll, in no position, be able to predict the behaviour and hence the name.

Eg: int a[10],i;
/* The array a is filled with some values */
a[i]=i++;

In case of Unspecified Behaviour the C standards give certain options to handle the situation. The implementors are expected to follow any of those options. Unlike undefined behaviour they donot have complete freedom. They have freedom of choice. Hence the behaviour of a code being classified as Unspecified can be predicted on a particular compiler on a particular platform.

Eg: int a=5;
a=scanf("%d",&a) + a;

Here we are not sure whether the scanf is first executed or the second operand a is first evaluated. Hence the Unspecified behaviour.

OnSwipe redirect code