I Think Tech: code performace

Showing posts with label code performace. Show all posts

Sunday, January 1, 2012

MongoDB concurrency - Global write lock and yielding locks

There has been lot of hue and cry about MongoDB's global write lock. Quite a few people have said (in blog posts, mailing lists etc) that this design ties down MongoDB to a great extent in terms of performance. I too was surprised (actually shocked) when I first read that the whole DB is locked whenever a write happens - i.e a create or update. I can't even read a different document during this time. It did not make any sense to me initially. Previous to this revelation I was very pleased to see MongoDB not having transactions and always thought about that feature as a tool which avoided locking the DB when running expensive transactions. However this global lock sent me wondering whether MongoDB is worth using at all.. !! I was under the assumption that the art of "record level locking" had been mastered by the database developers. This made MongoDB look like a tool of stone age.

Well I was wrong. Turns out that "Record level locking" is not that easy (and the reasons for that warrant a different post altogether) and from what I understand MongoDB has no plans of implementing such a thing in the near future. However this doesn't mean the DB will be tied up for long durations (long on some scale) for every write operation. The reason is that MongoDB is designed and implemented in ways different than other databases and there are mechanisms in place to avoid delays to a large extent. Here are a couple of things to keep in mind :

MongoDB uses memory mapped files to access it's DB files. So a considerable chunk of your data resides in the RAM and hence results in fast access - fast read all the time and very fast write without journaling and pretty fast write with journaling. This means that for several regular operations MongoDB will not hit the disk before sending the response at all - including write operations. So the global lock that is applied exists only for the duration of time needed to update the record in the RAM. This is orders of magnitude faster than writing to the disk. So the DB is locked for a very tiny amount of time. This global lock is after all not as bad as it sounds at first.

But then the entire database cannot be in RAM. Only a part of it (often referred to as working set) is in RAM. When a record not present in RAM is requested/updated MongoDB hits the disk. Oh no, wait.. so does that mean the DB is locked while Mongo tries read/write that (slow) disk? Definitely not. This is where the "yield" feature comes in. Since 2.0 MongoDB will yield the lock if it is hitting the disk. This means that once Mongo realizes it is going for the disk, it sort of temporarily releases the lock until the data from disk is loaded and available in RAM.

Although I still prefer record level locking in MongoDB, these two above mentioned features are sufficient to reinstate my respect and love for MongoDB. :)

Tuesday, March 30, 2010

Mozilla @ SJCE : Static Analysis projects

It has been a very long time since I posted anything about the Mozilla related activities going on at my college SJCE, Mysore. That in no way means absence of any activity. In a previous post I mentioned that along with the attempt to introduce Mozilla as a standard course I was working to get the current final year students (who are enrolled under the larger VTU University) to start working on Mozilla, using their final semester project as a means. Well I am happy to say that this has materialized. 8 final semester students from CS expressed interest in working with the Mozilla community as part of their final year project and it's been a month nearly since they started their work. Here is a brief write up about that.

As is with most of the Computer Science students in India approaching the Mozilla community, these 8 students also wanted to do something related to compilers. The JS engine and static analysis are two projects in Mozilla which would come under the compiler banner. These 8 students wanted to work on something substantial which can be presented by two teams of 4 students each as their final semester project. So the bugs that they would be working on had to be related. This was possible only with static analysis as there are a lot of related tasks available. Also static analysis would be something new to the students and it would give them an opportunity to understand the internals of the compiler (GCC here) like the AST (Abstract Syntax Tree) and other representations of the code. Moreover the static analysis APIs are exposed in JS and hence the analysis scripts would be written in JS. That way students would learn JS also. Above all these students would be doing something genuinely new.

The students could not be asked to start working on the bugs directly. They were new to open source development, the tools used there like the bugzilla, using email as a formal medium of communication, the source control system (to add to the complexity Mozilla now uses a distributed RCS - Mercurial [hg]), using IRC, the linux development environment etc. It has been these things that the students have been learning for this part month or so. This learning has been in the form of accomplishing the tasks which form the prerequisites for the actual static analysis work. These are things like downloading gcc and mozilla sources from the ftp hosts and from the mercurial repository respectively, applying the mozilla specific patches to gcc for plugin support etc, etc... These are all listed here. Note that some things like installing all the dependency packages for building these applications from sources, learning to use the linux command line itself and others are not on that page but were new to these students nonetheless.

All the students have been putting in substantial effort and have picked up the traits of an open source hacker pretty soon. We have had a few IRC meetings and a lot of formal communications over emails. In parallel we were also working towards shortlisting 8 static analysis bugs. Based on the feasibility of the bug being completed by an amateur developer within a span of 2.5 months and based on the students' interest we finally decided on these 8 bugs :

Bug 525063 - Analysis to produce an error on uninitialized class members
Bug 500874 - Static analysis to find heap allocations that could be stack allocations
Bug 500866 - Warn about base classes with non-virtual destructors
Bug 500864 - Warn on passing large objects by value
Bug 528206 - Warn on unnecessary float->double conversion
Bug 526309 - Basic check for memory leaks
Bug 542364 - Create a static analysis script for detecting reentrancy on a function
Bug 500875 - Find literal strings/arrays/data that should be const static

These tasks are good, challenging and provide an opportunity to understand compilers very closely.

Currently the students have downloaded gcc, applied the patches, built it along with the dehydra plugin support and are ready to run static analysis on the mozilla code. They are now trying to run simple analysis scripts like listing all classes in mozilla code and all classes and their corresponding member functions. It is still quite a long way to go, but it has been a real good start. Let's wait and watch what great feats are in the pipeline.

I hope to keep this blog updated at the same pace at which the students are working.

Good luck to the students. :-)

Friday, April 17, 2009

Profiling (timing) the firefox build process

Its been nearly 2 years since I am building Mozilla Firefox myself on my machines - various machines of varying capacity.

Initially it was a desktop having Intel Pentium 4, Single Core (obviously), 2.6Ghz, 256MB RAM, running Slackware 11. It probably used to take about 1.5 to 2 hours (I do not remember it now). I never profiled that at that time. Getting a mozilla build itself was a big achievement for me.

After that it was another desktop having Intel Core 2 Duo, Dual Core (obviously, again), 2.4 GHz, 2GB RAM, running Windows XP. This generally took about 45 minutes. AFAI Remember, I had several other programs running when firefox was building.

And this discussion of amount of time taken to build firefox came up a few times in IRC and I myself had this wish to time the build process. Off late, that is from about last week this wish became very strong and today finally I did time it, that too on two machines, my laptop and my desktop. This blog post is the result of the these two profiling tasks. Here are the results.

Note: By profiling I did not do any complicated or intricate. I just used the "time" utility which tells how much time the command takes.

1) On my laptop. --- Build was the only application running apart from the services.
Specs:
IBM Thinkpad T60p. (The one which heats up a lot)
Intel Centrino Duo, T2600 @ 2.16GHz, (Dual Core)
2GB RAM
Windows XP Pro SP2.

The results shown by the "time" utility are:

real 36m33.476s
user 4m17.776s
sys 4m36.271s

The build was done in the MingW shell that the mozilla build system provides. I am not sure to what extent these are reliable, but the real time is pretty much acceptable.

Edit:

Today I ran the build command from the history and hence the 'time" prefix got in automatically and the build was timed again. Surprisingly today's times are way away from the last one. Here are the times:

real 74m26.484s
user 9m53.015s
sys 8m18.365s

Well this time a lot of other apps were running. Firefox (2 instances), Chatzilla, Outlook, Several command windows, Komodo Edit (which again is like another firefox), notepad and a couple of explorer windows. So the time being doubled is not a surprise. Guess this just gives a perspective. :)

2) On my desktop --- Apart from the build process, FF with Chatzilla was running and several instances of bash (Terminals) were running.
Specs:
DELL Optiplex 755 (Sleak, powerful, sometimes fragile)
Intel Core 2 Quad, Q6600 @ 2.4Ghz
4GB RAM
Ubuntu, Gutsy Gibbon

And the results are:

real 20m40.497s
user 19m4.320s
sys 1m17.885s

The two sets of times are a little confusing. But real is all that matters as that is how long I have to wait for the build to be ready.

If you are planning a build then this info might help you plan things accordingly.

Happy building.

Friday, April 3, 2009

Making firefox use a little lesser memory

A lot people tell me that firefox uses a lot of memory and it grows like anything when used for a long duration. From the discussions developers and some blog posts/forums it appears the prior to FF3 it was memory leaks which formed a major part of this. Apart from that one of the important features "bfcache - Back/Forward Cache" which, arguably, makes back and forward navigation very fast, uses a lot of memory. In on the pages I read apparently it can be up to 4MB per page. This cache essentially keeps the parsed HTML of the mark-up in the memory along with the Javascript state. Sometimes, or more like most of the times, this can very taxing. You might be the kind of person who does not really uses the Back-Forward navigation and in case you do it you are OK with a slight delay (which may not percievable with a good conenction). If indeed you are that kind then disabling this "bfcache" will probably make your firefox eat lesser memory.

The setting/preference that controls this behavior is : browser.sessionhistory.max_total_viewers

By default it is -1, which is no limit. Setting this to 0 (zero) will disable this cache completely. Essentially you are telling Firefox to not to store the state of any document in the memory. If you want to have this feature with a saner limit you can set it to an integer representing the number of pages you want to be saved in memory.

Like any other pref not exposed through the Options dialog this has to be edited by visiting the page about:config. Open this in a tab and key in (or copy and paste) the preference string. Double click it to edit.

May be you can now lose lesser hair as your other applications will run smoothly. :P

Hari Om.

Thursday, July 3, 2008

Optimized resource utilization by Firefox

I have now started working very closely to firefox and in the course I have discovered a few things. Firefox does some real good job in optimizing using resources, mainly the network resources. I specifically mean the connections established to remote servers and the DNS resolution. Of course the second one is done by caching where as the first one is done by connection sharing (at the network layer).

For example, There are two tabs opened in mozilla. In the first one some page is loading, say http://sribrahmana.blogspot.com and when it is loading the DNS resolution is done and a TCP/IP connection is already established and over that application layer connections are made. Now if you try load to load the same url in the second tab also then a lot of redundant work is avoided. The DNS resolution result which is already available is used instead of resolving the name again. This is possible because of the DNS cache. (There are, of course, ways to disable this cache, in which case redundancy will be there). And also since there is already a network layer connection established for the first tab, and there is every possibility that it is still alive, the same connection is used for the second tab also. This results in lesser network resource utilization and also in one less memory consumption as the memory taken by the new sockets is avoided.

This is some cool thing, but not what I precisely want. I infact do not want it to share connections. I do not yet know how to tell FF to do that. I will put it here once I figure out how.

And Just as an FYI, here is how you can override cache: MDC doc

Tuesday, December 11, 2007

Calling Conventions

This is the outcome of today's presentation by Anand - our tech lead.
There are four calling conventions, and the Microsoft terminologies for them are:

__stdcall
__cdecl
__thiscall
__fastcall

As its known, the calling conventions will tell who, amongst the caller and callee, will clear the parameters from the top of the stack when the function call completes.

__cdecl is the old 'C' way where the stack is cleared by the calling function. Clearing the stack is nothing but having a statement to move the stack pointer up by some number (which is generally the number of arguments). So when a call is placed, immediately after instruction to call, there will be an instruction to move the stack pointer. This way we have an extra instruction associated with every function call instruction. This leads to the increased size of the executable, because of one extra instruction for every function call.

__stdcall is the new way where every function is responsible for clearing the stack allocations made for its parameters. As in, the callee is the one who will rewind the stack and move the stack pointer. In this approach before the return statement of the function a statement to move the stack pointer is introduced. This way the number of instructions will not increase with the number of the function calls.

__thiscall is the object-oriented way of calling functions. Here the first parameter passed is always the "this" pointer. So any instance function member of a class will follow this calling convention. Unlike other parameters, which go to the stack top, the "this" parameter which is also passed is stored in "ECX" register. Now this is how the "this" pointer is implicitly available to all the instance member functions. No function can be explicitly qualified with this calling convention. Any instance member function will get this implicitly, but it can be over-ridden with any other calling convention.

__fastcall is an approach where in the system will try to push the parameters on to Registers instead of the stack just to get a performance boost. But as with the variable qualifier "register" , if the registers are not available then again stack is used for the parameters.

There are a couple of other posts regarding the specifics of the calling conventions and comparison. If this post made any sense then check out those also.

Tuesday, October 16, 2007

Mozilla - Bookmarks' keywords & Quick Searches

This world is really full of amazing things. How much ever we discover and find out, there will be something interesting just right next to us which we would not have figured out. This post is about one such thing, which was right here from such a long time and I had not figured it out.

The first hints for this came to me when I saw one of my colleagues, Pramod S, at NI using shortforms for Google(g) and Wikipedia(wp) when searching. I thought he had come across some hack and tweaked his browser. But just today I realized that it is infact a feature in Firefox.

For any URL or webpage that we bookmark we can assign a keyword, a shortform, which can be entered in the URL bar to visit that bookmarked page. This is really a very useful things if we visit certain pages over and over again. In my case http://mxr.mozilla.org/seamonkey/ is one site that I cannot live without. So I have keywords for the 3 different searches available at this page, viz:

mxrf - for file search(find)
mxri - for identifier search
mxrt - for free form text search

This feature is really handy and increases the brwosing speed to a great extent.

How to get this working? or how to assign the keywords?. This again is really simple. For existing bookmarks just go to the Bookmark properties by Right-Clicking on that and you get a small dialog with a text field for Keyword. Key in whatever you want and click on OK. Thats it you are ready to fire the internet with your keywords. For new bookmarks it is still easier. When you are creating a bookmard with the helper dialog you get the keyword field which you can key in or leave it blank (either to be filled later or not to be filled at all).

This is not the end. There is yet another enhancement to this. Certain bookmarked urls have query strings, as in case of search urls. The google url: http://www.google.com/ is followed by a query string of the form "search?hl=en&q=search string" which carries the search string entered. Such strings work with the keywords also. Meaning you can search for something on google using the keyword, 'g' like this:

"g any-search-string" . This will be converted to appropriate URL with a proper query string as specified in the bookmark properties. As it appears this of course requires a little bit of extra work. When saving an URL as bookmark, for instance when saving http://www.google.com/ you can save it along with the query string, but the search string being substituted by "%s" and whatever string you specify with the keyword will replace every occurance of %s in the URL. So this is how the bookmark URL will look like:

http://www.google.com/search?hl=en&q=%s ---- with g as the keyword. Now when using this bookmark, if you enter something like this in the URL bar:

"g bookmarks " -- this becomes-- http://www.google.com/search?hl=en&q=bookmarks -- and finally giving you search results for bookmarks.

Now this is how I have arranged my mxr bookmarks:

http://mxr.mozilla.org/seamonkey/find?string=%s -- with keyword: mxrf
http://mxr.mozilla.org/seamonkey/ident?string=%s -- with keyword: mxri
http://mxr.mozilla.org/seamonkey/text?string=%s -- with keyword: mxrt

Now does this method appear too complex and cumbersome? Well mozillians come to your rescue yet again. You can save such enhanced search bookmarks called "Quick Search" bookmarks very easily like this:

Go to the search page and Right-Click on the search bar, and click on: "Add a keyword for this search" . That will give you the small bookmarking dialog, with a field for keyword. All that you have to do is to just enter the keyword that you want and click on OK. All the URL, query string and %s headache is taken care by Mozilla.

Well mozilla is here just to make your browsing experience easier, safer and worthwhile., isn't it?

And I am proud to be associated with this awesome community. ;-)

Happy and Fast Browsing.

Subtle things about s/w code Performance

Any software product is first benchmarked for its performance before it is shipped out. No company/developer wants complains about the performance of the product from its users. This one single thing can make the users to simply move away to another better performing product. And moreover such a complaint would mean inefficient coding and to some extent incompetence of the developers, which we software developers can't really take.

Performance is generally measured with 2 parameters:
1) Time taken by the application
2) Memory consumed by the application

Typically, on current day Desktop systems the memory criteria is almost ruled out, unless the application is really memory consuming like 3D games and other high end stuff.

But time has always been a concern. So many algorithms are developed to solve the same problem, with each algorithm being a good choice under some set of conditions. Those are all the high-funda things involving Math, mainly things like: O(n), Exponential complexity etc, etc..

Apart from these there are a lot of very small things, which appear as very trivial and insignificant but eventually make a lot of difference. I learnt about 2 such things at my work place NI, yesterday when I was working on my first task here. I of course cannot speak of the nature of my work and what exactly I was doing. Here are the generic things that I learnt:

1) Avoid calling a function in a loop, instead pass the data you are iterating in the
loop to the function being called.

Suppose you have a large amounts of data to be processed and you have helper processing function and an actual processing function. The most common way of accomplishing such a task would be to iterate through the data in the helper function and for every single instance of the data call the actual data processing function. This can be optimized to a great extent in a lot of cases. By iterating through the data in the helper function and calling the actual function we introduce a lot of function-call overhead, which obviously eats up a major chunk of time. If possible, which it indeed is in many cases, we should instead pass the large data chunk to the actual data processing function and iterate over there and make the necessary processing then and there itself as and when we fetch an instance of the data.

PS: By data instance here I mean one data unit. Something like an element if the data chunk is an array. If it is an array of structures, then one data instance would be a structure.

This way we reduce the overhead caused due to extensive function calls and there by improving the performance.

2) If we have an if/else block which is hit a lot of times continuously, then we should have the code for the most probably case in the if block.

This is particularly interesting. Yesterday when my tech lead told me about this, I was really surprised that optimization can be done at this level. I liked this very much.

Consider the following code fragment:

for ( execute some times) {
if (aCondition) {
.....
.....
} else {
......
......
}
}

Something like above when compiled into machine language would look like this:

loopLabel: instruction for condition check;
JMP if false to elseLabel
code for true case (if block)
some more code for true case
loop condition check
JMP to loopLabel if true
elseLabel: code for false case (else block)
some more code for false case
loop condition check
JMP to loopLabel if true

This is how the assembly code would look like.
Here the if/else block is repeatedly hit as it is in a loop and is the main body of the loop. In the assembly code when the if condition turns out to be true the code for true case is execute and then a JMP is made to loop beginning. BUT, but, in case of else there will be "2" JMP instructions - one to go to the else block and another to go back to loop beginning. This double jump obviously is an overhead and takes more time for execution. Hence we should try and design the if/else in a such a way the code for the most common or most probable case to be put in the if block, so that a maximum part of the execution happens only with a single jump.

This might seem so stupid or so subtle that may lead to the thinking that the performance gain with this trick is almost negligible. Its not so. Believe me, the gain is really good in most cases where the if/else block is repeatedly hit.

So stop jumping around and start improving your code. ;-)

OnSwipe redirect code