I Think Tech: OSS

Showing posts with label OSS. Show all posts

Monday, July 17, 2017

$PATH for the pathless

With the rise in popularity of scripting languages and their adoption in the industry a lot of first time professional programmers (a.k.a fresh graduates) are being exposed to the Linux command line environment and some times Linux as a system itself might be new to them. So they tend to have difficulty in understanding how a program is actually being executed and/or what all is needed to make a program run. This coupled with the magic spewed around by certain tools like rvm / chruby for Ruby and nvm for Node.js (and their ilk) are just adding to the confusion of how things are executed in the first place. I am not at all saying that such tools are bad. I myself use them and they are very helpful. But if a person is using them without getting the basics right, he/she is going to have a mighty problem wrapping their head around how things are running..!

In this context, I have been explaining what the $PATH means and what is its purpose a few times of late. So instead of repeating myself every time I thought I would just document what I know about $PATH to help the pathless get on the right PATH (or the PATH OF RIGHTEOUSNESS if you will).

GEEK NOTE : This post is aimed at absolute newbies, folks who just got started with something like Ubuntu and are fiddling around with some of the cool new technologies like Node.js / Ruby / . So this post will not go into the depths nor will it try to be GEEK ACCURATE, i.e. it will not deal with things like POSIX compliance or the behaviour on non-Linux systems, etc. For that matter this will to a large extent be specific to Ubuntu and most likely Bash. We will also not try to distinguish between login and non-login interactive shells. As the screenshots shown below indicate, the assumed environment is running the "Terminal" application in Ubuntu.

So, lets start with the basics and a few assumptions.

Assumptions :

1. We are primarily interested in understanding how programs are run in the Linux command line environment.
2. We will restrict ourselves to the "Terminal" program of the Ubuntu system as the environment. However all / most of the following might be applicable to all such similar programs on different flavours of Linux.

Basics :

Whenever you launch the "Terminal" program what you are doing is actually launching a "Shell". Like with any type of software in the Linux world, choices are abundant and there are a lot of shell programs. However the "Terminal" program in Ubuntu (as of this writing) uses the "Bash" shell and that is what we will stick to. So on launching "Terminal" you are effectively launching the bash shell.

The "Terminal" application in Ubuntu 16.04

What is a Shell?

In the simplest terms, a Shell is a program which helps you run several other programs. It is like this eternal waiter who just waits for you to tell him what program to run and when you give him a program to run, it just runs it, tells you what was the result of running that program and goes back to waiting. Broadly speaking this is all that the shell does : Wait - Run a program - Show results - Wait. Wash-Rinse-Repeat.

If you have used the Linux command line (a.k.a the shell), you might be thinking that the shell certainly does a lot more than this. You might say "I run so many commands on the shell and it allows me to do a variety of things like search of files, list the directories, check the date etc etc. What about all those functionalities??!". Well, almost all of those actions are nothing but the shell running some program or the other. Almost every "command" that you have run are independent programs provided with the Linux system by default. Whenever you "run a command", you are just telling the shell to search for that program on the system and run it. The standard commands that most Linux users are familiar with are stored in certain predefined standard directories / locations. On Ubuntu those locations are "/bin" or "/usr/bin". While "/bin" has the core system commands, "/usr/bin" has the commands added by other softwares that are installed on the system.

A listing of the /bin directory showing system program executable files. Notice the programs corresponding to the most commonly used commands like "cp", "ls", "mkdir" etc

While these are what could be called as "standard system programs", the shell is not limited to these. The shell can run just about any program (technically, any executable file) on the system. All you have to do specify the full path to the location of that file on the command line and the shell will execute it, reporting the result of that execution. So if you have a file in a deep directory structure as shown below :

The full path to "my_program" will be /home/srirang/work/path_learning/custom_programs/my_program

When you have such a setup, to run the file "my_program" you just have to type the full path to it and hit enter. Shell will take the file at that location and run it.

/home/srirang/work/path_learning/custom_programs/my_program ↵

There might be two questions now in your head :

1. For the standard commands I never specified the full path. I just specified the program name and shell ran it without any issues. How did that happen?
2. If I have to specify the full path for each and every time, it seems a mighty tedious way, especially for programs that I run very often. Is there no easier way?

The answer for both of the these is the "PATH" environment variable.

You need not specify the full path to the executable program all the time. When you specify only the program name, shell has a mechanism to search for that program on the system and run it. Searching for a program in the entire system every time is massively inefficient. So the shell has a mechanism for us to tell the specific locations where the shell should search for a program whenever we do not specify the full path. The way to specify this is by putting all these locations in an environment variable called the "PATH".

Whenever a program name is specified with the full path, shell will take the directory locations specified in the "PATH" environment variable one by one and search for the program in each of those locations sequentially. So if you have one or more custom programs that you would be running frequently, then it is best to add the directory locations of those programs to the PATH variable.

Environment variables : Brief introduction :

As mentioned earlier PATH is an environment variable. This post will not go into the details of what environment variables are. For the sake of this discussion here are a few points to remember :

Environment variables are like the settings for the shell. The values of the environment variables are used by the shell to decide how it should act.
By convention all environment variables are specified using upper case alphabets, numbers and underscore.
You can have as many variables as you want, although the shell will use only the ones it has specified.
You can set the value of a environment variable like this :

VARIABLE_NAME=value

Examples:
MY_VAR=1
RUBY_VERSION=2.0.0
LANG=en_IN
Very often to make sure that the value of the environment variable is persisted across multiple programs (and not limit it to a single program or command), the "export" keyword is used while setting the value of the variable. Examples :

export MY_VAR=1
export RUBY_VERSION=2.0.0
export LANG=en_IN
You can read the value of a variable by prepending the $ symbol before the variable name like this :

$VARIABLE_NAME
The value of a variable can be used to set the value of another variable by using the same $ symbol. Example :

RUBY_DIRECTORY=/usr/lib/$RUBY_VERSION/
If you want to see the value of an environment variable use the "echo" command. See the image below :

Managing the PATH variable :

Here are a few characteristics of the PATH variable :

The value of the PATH variable is a string.
There can (and will) be multiple directory locations mentioned in the PATH variable.
The individual directory locations are separated by the : (colon) character.
Most common default PATH value is typically this :

/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
To see the current value of PATH use echo $PATH. See the image below.
The most common way of adding a new directory location to the PATH is this :

export PATH=$PATH:/the/new/directory/location

Once you add a directory location to the PATH like this any executable program in that directory can be run on the command line directly by specifying just the program name.

In my system, I have installed Ruby to a particular directory and I want the ruby program to be available on the command line. So this is how my PATH variable looks :


Contents of the PATH variable on my Ubuntu system.

Few Gotchas :

Environment variables are local to each shell. So if you are running two terminal windows or tabs, the shells running in them will have their own set of environment variables. The value that you set in one shell is not available in the other shells.
The value stored in the environment variables is available until the shell process quits. The values are not persisted beyond the lifetime of the shell. Nothing like saving the environment variables to the hard disk.

This would imply that any PATH settings that you would need will have to be done every time you open up a new Terminal window or tab (a.k.a every time you launch a shell). That sounds awfully painful and tedious. Sure it is. But Linux has a solution for you.

Like many Linux programs, the Bash shell (and almost every other shell too) support a startup configuration file. This file is in your home directory with the name .bashrc (yes, the name starts with a dot). This file can contain any number of commands that you would run on the command line and all of those will be "executed" every time a Bash shell is launched (a.k.a every time you open a new Terminal window or tab).

The common practice is to put all the commands setting the values of the necessary variables into this .bashrc file. That way you are guaranteed that every time you open a new Terminal window or tab, all of these environment variables will be set to the appropriate values. This is why magic tools like "rvm" or "chruby" or "nvm" ask you to add some stuff to your .bashrc file.

This should give you an initial understanding of what the PATH variable is for and what someone means when they say "PATH is not setup properly" or "PATH needs to be setup". From this point on, you should be able to google around for additional information and make sense out of it. If something mentioned here is not clear, drop in your questions in the comments and I will try my best to respond with answers.

Sunday, October 27, 2013

Invalid credentails : OmniAuth + oAuth2 + Rails 4 encrypted cookie store + simultaneous requests

OmniAuth is a very well know gem in the Ruby/Rails world. Almost every Rails application out there is probably using it to authenticate with one of the various mechanisms it supports. OmniAuth is just awesome.!

I have been using OmniAuth for about 3 years now in various Rails projects and it has worked very well, although I have had to monkey patch it once or twice to allow me to exploit some of the Facebook features (like authenticated referrals). But all in all, it just works as advertised and you will have an authentication system in pretty much no time at all.

Yesterday, however, I was having quite a bit of hard time getting OmniAuth to do a simple "Login with Facebook" oAuth2 authorization. It was something that had worked seamlessly on innumerable previous occasions. But yesterday it just kept failing repeatedly, succeeding only once in a while. And it always failed with the same obscure error "Invalid Credentials" during the callback phase. (OmniAuth operates in three phases : Setup, Request and Callback. Apart from OmniAuth wiki on github, this is a good place to read about it : http://www.slideshare.net/mbleigh/omniauth-from-the-ground-up). The fact that the error message was not so helpful made this whole process a lot more frustrating. After some hunting on the inter-webs I found that the culprit could be a bad "state" parameter.

Wait, what is this "state" parameter?

Background : oAuth-2 CSRF protection

oAuth2 specifies the uses of a non-guessable secure random string to be used as a "state" parameter to prevent CSRF attacks on oAuth. More details here : http://tools.ietf.org/html/rfc6749#section-10.12. This came out almost a year back and many oAuth providers implement it already, including Facebook. OmniAuth also implemented it last year. Although not written with the best grammar, this article will tell you why this "state" parameter is needed and what happens without it.

To sum it up, oAuth client (our web application) creates a random string, stores it in an accessible place and also sends it to the oAuth provider (Ex : Facebook) as "state" during the request phase. Facebook will keep it, authenticate the end user, ask for permissions and when granted sends a callback to our web application by redirecting the user back to our website "with the state" as a query parameter. The client (our web application) compares the "state" sent by the provider and the one it had stored previously and proceeds only if they match. If they don't then there is no proof that the callback that our web application received is actually from the provider. It could be from some other attacker trying to trick our web application into thinking (s)he is someone else.

In case of OmniAuth oAuth2, this state parameter is stored as a property in the session with the key 'omniauth.state' during the request phase. The result of the request phase is a redirect to the provider's URL. The new session with the "state" stored in it will be set on the client's browser when it receives this redirect (302) response for the request /auth/:provider (This is the default OmniAuth route to initiate the request phase). After the provider (Facebook) authenticates the user and user authorizes our application, the provider makes a callback to our application by redirecting the user back to our web application at the callback URL /auth/:provider/callback along with the "state" as a query parameter. When this callback URL is requested by the browser, the previously stored session cookie containing the 'omniauth.state' property is also sent to our web application.

OmniAuth checks both of these and proceeds only if they match. If they don't match it raises the above mentioned "Invalid Credentials" error. (Yeah, I know, not really a helpful error message..!).

Ok, that is good to hear, but why will there be a mismatch?

A mismatch is possible only if the session cookie stored on the user's browser is changed such that the 'omniauth.state' property is removed from it or altered after the request phase has set it. This can happen if a second request to our web application was initiated while the request phase of oAuth was running and it completed after the request phase completed but before the callback phase started. Sounds complex? The diagram below illustrates it.

The diagram makes it clear as to when and how the 'omniauth.state' gets removed from the session leading to the error. However, apart from the timeline requirements (i.e. when requests start and end), there is another essential criteria for this error to occur :

The response of the "other simultaneous request" must set a new session cookie, overriding the existing one. If it does not explicitly specify a session cookie in the response headers, the client's browser will retain the existing cookie and 'omniauth.state' will be preserved in the session.

Now, from what I have observed, Rails (or one of Rack middlewares) has this nifty feature of not serializing the session and setting the session cookie in the response headers, if the session has not changed in the course of processing a request. So, in our case, if the intermediate simultaneous request does not make any changes to the session, Rails will not explicitly set the session cookie, there by preventing the loss of 'omniauth.state' property in the session.

Ok, then why will the session cookie change and lose the 'omniauth.state' property?

One obvious thing is that the "other simultaneous request" might change the session - either add or remove or edit any of the properties. There however is another player involved.

This is where the "Encrypted Cookie Store" of Rails 4 comes into picture. Prior to Rails 4, Rails did not encrypt its session cookie. It merely signed it and verified the signature when it had to de-serialize the session from the request cookie. Read how Rails 3 handles cookies for a detailed breakdown. Rails 4 goes one step ahead and encrypts the session data with AES-256 (along with the old signing mechanism. More details on that coming up in a new post). The implementation used is AES-256-CBC from OpenSSL. I am not a Cryptography expert, but the way I understand it, the property of AES is that it results in a different cipher text every time you run the encryption for the same message plain text. Or it could also be because the Rails encryption scheme initializes the encryptor with a random initialization vector every time it encrypts a session (Implementation here). Either ways, the session cookie contents are always new for every request even when the actual session object or session contents remain unchanged. As a result Rails will always set the session cookie in the response header for every request and result in the browser updating that cookie in its cookie store.

In our case, this results in the session being clobbered at the end of the "other simultaneous request" and we end up losing the 'omniauth.state' property and oAuth fails.

Umm.. ok, but when and how does this happen in real world, if at all it can?!

All these requirements/constraints described above, especially the timing constraints makes one wonder if this can really happen in the real world. Well, for starters it happened to me (hence this blog post..!!). I also tried to think of scenarios other than mine where this would happen. Here are a couple that I could think of :

Scenario - 1) FB Login is in a popup window and the "Simultaneous request" is a XHR - Ex : an analytics or tracking request.

Here is the flow :

User clicks on a "Login with FB" button on your website.
You popup the FB Login page in a new popup window. Request phase is initiated. But there is a small window of time before the redirect response for '/auth/facebook' is received and 'omniauth.state' is set.
During that small window of time, in the main window, you send an XHR to your web app to, may be, track the click on the "Login with FB" button. You might do this to just track usage or for some A/B testing or to build a funnel, etc. This request sends the session without the 'omniauth.state'.
While the XHR is in progress, the redirect from the request phase is complete and the session with 'omniauth.state' is set. The user now sees FB Login page loading and proceeds to login once it is loaded.
While the user is logging in to FB and approving our app, the XHR has completed and has come back with a session without 'omniauth.state'. This is stored by the browser now.
Once user logs in and approves your app, the callback state starts. But the session sent to your web app is now missing the 'omniauth.state'.
oAuth fails.

How big a deal is this scenario?

If you are indeed making a XHR in the background, then this scenario needs to be taken care of. Since the "other simultaneous" request is automatically triggered every time, it is very likely that session will get clobbered.

How to solve this?

You can either first send the XHR and then in the response handler of that XHR, you can open the FB Login page in the popup. Also have a timeout just to make sure you don't wait for too long (or forever) before you receive a response for that XHR.

Alternatively, if you can push your tracking events in a queue stored in a cookie, you can do that and then open the FB Login page. Once the FB Login completes, you can pull that event out of the queue and send it. As a backup have a code that runs on every new page load to look for pending events from the queue in the cookie and send those events.

With HTML5 in place, its probably better to use the localstorage for the queue than the cookie. But again that needs user's permission. Your call.

Scenario - 2) FB Login is in the same window/tab but User has the website opened in two tabs.

Here is the flow :

User has your website opened in a browser tab - Tab-1
User opens a link on your website in a second tab - Tab-2 (Ctrl + Click or 'Open in a new tab' menu item). This request sends the session without 'omniauth.state'.
While that Tab-2 is loading, user clicks on "Login with FB" in Tab-1 initiating the request phase.
If the request loading in Tab-2 is a little time consuming the redirect of the request phase of oAuth in Tab-1 completes before request in Tab-2, setting the session with 'omniauth.state'. After that FB Login page is shown and user proceeds to login and authorize.
While the user is logging in, the request in Tab-2 completes, but with a session that is missing 'omniauth.state'.
After logging in to FB, the callback phase is initiated with a redirect to your web app, but with a session that doesn't have 'omniauth.state'.
oAuth fails.

How big a deal is this scenario?

Not a big deal actually. In your web app, in the oAuth failure handle, you can just redirect the user back to /auth/facebook, redoing the whole process again and guess what - this time it will succeed and that too without the user having to do anything because the user is already logged in to FB and has also authorized your app. But just to be on the safer side, you would want to be careful about this loop going infinite (i.e. You start FB auth, it fails and the failure handler restarts the FB auth). Setting a cookie (different from the session cookie) with the attempt count should be good. If the attempt count crosses a certain limit, send the user back to homepage or show up an error page or show a lolcats video, c'mon be creative.

Ok, those are two scenarios that I could think of. I am not sure if there are more.

Can OmniAuth change something to solve this?

I believe so. If OmniAuth uses a different signed and/or encrypted cookie to store the state value instead of the session cookie none of this session clobbering would result in loss of the state value. OmniAuth is a Rack based app and relies on the Session middleware. I am not entirely sure, but it can probably use the Cookie middleware instead. Just set its own '_oa_state' cookie and use that during callback for verification.

Will you send a pull request making this change?

I am not sure. I first will hit the OmniAuth mailing list and find out what the wise folks there have to say about this. If it makes sense and nobody in the awesome Ruby community provides an instant patch, I will try and send a patch myself.

THE END

Ok, so that was the awesome ride through oAuth workings inside the OmniAuth gem. In the course I got to know quite a bit of Rails and also Ruby internals. Looking forward to writing posts about those too. Okay, okay.. fine. I will try and keep those posts short and not make them this long..!!

Till then, happy oAuthing. :-/ !

P.S : Security experts, excuse me if I have used "authentication" and "authorization" in the wrong places. I guess I have used it interchangeably as web applications typically do both with oAuth2.

Wednesday, October 16, 2013

Creating Wildcard self-signed certificates with openssl with subjectAltName (SAN - Subject Alternate Name)

For the past few hours I have been trying to create a self-signed certificate for all the sub-domains for my staging setup using wildcard subdomain.

There are a lot of guides and tutorials on the internet out there which explain the process of creating a self-signed certificate using openssl with a good amount details. Further there are also certain guides to create a self-signed cert for wildcard domain. It's fairly easy. You just specify that your Common Name (CN) a.k.a FQDN is *.yourdomain.com while creating the certificate signing request (CSR).

This will take care of all of your sub-domains under yourdomain.com (like www.yourdomain.com or mail.yourdomain.com), however your top level bare domain (yourdomain.com) itself is not covered under this certificate. When you use a certificate generated by specifying *.yourdomain.com the browsers will throw up an error when you hit your server with the top level domain name https://yourdomain.com/.

To address this X.509 certificate standard allows for a type of extension named subjectAltName (http://en.wikipedia.org/wiki/SubjectAltName). Using this you can specify that there are a few other domain names for which this certificate is valid.

This requires specifying the use of this extension while generating the certificate request AND while signing the certificate. To do this you will have to add a few things to your openssl configuration file (typically /etc/ssl/openssl.cnf on a Ubuntu like machine). Alternatively you can copy the config file to another location, add these extension stuff there and then specify the new config file for all your openssl commands. The commands shown below assume the default config file at /etc/ssl/openssl.cnf was updated with extension details.

Here is one blog post which details the updates needed for the openssl config file. http://grevi.ch/blog/ssl-certificate-request-with-subject-alternative-names-san. It also has commands for generating the private key, converting the key to a format which does not ask for password (a.k.a unencrypted key), generating the certificate request (CSR) and finally signing the certificate. The steps mentioned there until the generation of certificate request are correct. It is the last step of signing the certificate that is missing one small piece of information because of which the extension mentioned above (subjectAltName) doesn't get added to the final certificate, despite they being present in the certificate request.

After a lot of searching on the internet, copy pasting the commands exactly, trying my luck at IRC (irc.freenode.net#openssl) the answer finally appeared to me in the man pages (duh..! RTFM dude..!!). The man page for x509 (man x509) command of openssl has this little entry :

-extfile filename
file containing certificate extensions to use. If not specified then no extensions are added to the certificate.

So turns out that just specifying the extensions in the openssl config file is not sufficient, but you must also specify the same file as the file containing the extensions to be included in the command line using the above -extfile option.

With that added, you will get a self-signed certificate for your wildcard subdomain which is also valid for your top level bare domain.

Changes made to /etc/ssl/openssl.cnf

Uncomment the req_extensions = v3_req

req_extensions = v3_req # The extensions to add to a certificate request

Add subjectAltName to v3_req section

[ v3_req ]

# Extensions to add to a certificate request

basicConstraints = CA:FALSE
keyUsage = nonRepudiation, digitalSignature, keyEncipherment
subjectAltName = @alt_names

Finally add the alternate names for which you want this certificate to be valid. It would be your toplevel bare domain

[alt_names]
DNS.1 = yourdomain.com

This last section [alt_names] will not be present in the file. You can add it right after the [ v3_req ] section.

Once config update is done, create your certificate.
Here are the commands that I used to create such a certificate :

Create a private key :

openssl genrsa -des3 -out ssl/staging/yourdomain.com.key 2048

This will ask you for a password. Key in a simple one and remember it.

Convert the private key to an unencrypted format

openssl rsa -in ssl/staging/yourdomain.com.key -out ssl/staging/yourdomain.com.key.rsa

This will ask you for a password. Key in the same thing that you used in the previous step.

Create the certificate signing request

openssl req -new -key ssl/staging/yourdomain.com.key.rsa -out ssl/staging/yourdomain.com.csr

This will ask you for a bunch of fields. Enter *.yourdomain.com when it asks for Common Name (or FQDN). The fields after that can be left blank by just hitting return (Enter) key.

Sign the certificate with extensions

openssl x509 -req -extensions v3_req -days 365 -in ssl/staging/yourdomain.com.csr -signkey ssl/staging/yourdomain.com.key.rsa -out ssl/staging/yourdomain.com.crt -extfile /etc/ssl/openssl.cnf

Note that here we specify the openssl config file as the file file containing extensions as that is where we have defined it. Probably we can put the extensions in a separate file too, but I haven't tried that. (Homework?! :P)

Thats it. Now you have a self-signed wildcard subdomain certificate which is valid for your top level domain too.

Despite this the first time you hit your SSL enabled website with the above generated certificate your browser will show the standard "Invalid Certificate - Untrusted" page. It is because your certificate is self-signed. You can view the errors in the "Technical Details" section.

Sunday, January 1, 2012

MongoDB concurrency - Global write lock and yielding locks

There has been lot of hue and cry about MongoDB's global write lock. Quite a few people have said (in blog posts, mailing lists etc) that this design ties down MongoDB to a great extent in terms of performance. I too was surprised (actually shocked) when I first read that the whole DB is locked whenever a write happens - i.e a create or update. I can't even read a different document during this time. It did not make any sense to me initially. Previous to this revelation I was very pleased to see MongoDB not having transactions and always thought about that feature as a tool which avoided locking the DB when running expensive transactions. However this global lock sent me wondering whether MongoDB is worth using at all.. !! I was under the assumption that the art of "record level locking" had been mastered by the database developers. This made MongoDB look like a tool of stone age.

Well I was wrong. Turns out that "Record level locking" is not that easy (and the reasons for that warrant a different post altogether) and from what I understand MongoDB has no plans of implementing such a thing in the near future. However this doesn't mean the DB will be tied up for long durations (long on some scale) for every write operation. The reason is that MongoDB is designed and implemented in ways different than other databases and there are mechanisms in place to avoid delays to a large extent. Here are a couple of things to keep in mind :

MongoDB uses memory mapped files to access it's DB files. So a considerable chunk of your data resides in the RAM and hence results in fast access - fast read all the time and very fast write without journaling and pretty fast write with journaling. This means that for several regular operations MongoDB will not hit the disk before sending the response at all - including write operations. So the global lock that is applied exists only for the duration of time needed to update the record in the RAM. This is orders of magnitude faster than writing to the disk. So the DB is locked for a very tiny amount of time. This global lock is after all not as bad as it sounds at first.

But then the entire database cannot be in RAM. Only a part of it (often referred to as working set) is in RAM. When a record not present in RAM is requested/updated MongoDB hits the disk. Oh no, wait.. so does that mean the DB is locked while Mongo tries read/write that (slow) disk? Definitely not. This is where the "yield" feature comes in. Since 2.0 MongoDB will yield the lock if it is hitting the disk. This means that once Mongo realizes it is going for the disk, it sort of temporarily releases the lock until the data from disk is loaded and available in RAM.

Although I still prefer record level locking in MongoDB, these two above mentioned features are sufficient to reinstate my respect and love for MongoDB. :)

Saturday, October 8, 2011

"List all Tabs" in Firefox 7 has different backgrounds for onscreen and off-screen tabs

A few days ago, my Firefox automatically updated to version 7.0.1 and ever since it's one new feature (which hasn't been advertised as a feature at all) has been bugging me; because I was not able to understand what it does.!. The feature in question is the different background color for some tabs in the "List All Tabs" menu - The menu that you get when you click the small (almost inconspicuous) inverted triangle between the "New Tab" (+) and the "Window minimize" (-) buttons.

The first time I saw the differential background I thought it was like the read and unread tabs, like one of the tab management add-ons does. But no. No matter how many times I read the page in a tab, it's background color did not change.

The next thought was that it corresponded to background tabs which are lazily loaded, i.e. the page is not actually loaded from the internet until you bring that tab to focus. But no, it was not even this. Even after several visits some tabs stayed with dark background.

No amount of searching on the internet helped at that time and I had to calm myself down and let go off the quest for this eternal answer. Finally today I gave it another shot with some variations in search strings and I landed on this page which lists all the changes in version 7.0. Here I started searching for any bug related to tabs and finally got the right bug. This bug states the purpose of differential background color in the tabs list.

As the bug states, the dark colored (highlighted) tabs are the ones which are currently displayed on the screen in the tab bar and the light colored ones are scrolled off the screen (horizontally). The rationale is that, this will make it very easy to figure out where a particular tab that you are looking for is, in case you are confused about it's position.
I am yet to see the feature being actually useful. Nevertheless, for the moment I am happy that I know what the feature does. :)

Monday, October 3, 2011

Watch points for variables in Ruby - Object#freeze

Almost every programmer knows about watch points. Especially the ones doing native development with C/C++. Watch points were really helpful to me when I was working with C/C++. They were, sort of, my go to weapons whenever I wanted to understand how some third party code worked. It was something that I dearly missed when I started with Ruby. I am fairly new to Ruby and I have never used the ruby-debug (or ruby-debug19) gem, because until today simple print statements were sufficient most of the times.

Today I was at a loss as I was unable to figure out where a particular hash variable was getting two new key-value pairs in it. It was an instance variable with just an attr_reader defined. So obviously a reference to the instance variable was being passed around to the place where it was being modified. So my initial idea of writing a custom write accessor method was probably not going to work (did not try it). That is when I came across this http://ruby-doc.org/docs/ProgrammingRuby/html/trouble.html#S3. The last bullet point in that section has the answer.

You just freeze the object/variable that you want to watch by calling the "freeze" instance method on that object and anyone modifying that object after it's frozen will cause an exception to be raised giving you the precise location of where that modification is happening. This isn't probably as elegant as running a debugger and setting a watch point but it gets the work done nevertheless. RTFM after all..!! This tool is definitely going into my belt. :)

Tuesday, October 12, 2010

Building Ruby 1.9.2 and installing rails 3.0 on it -- On Ubuntu 10.04 - Lucid Lynx

Issues that I faced while building Ruby 1.9.2 and then installing Rails 3.0 and finally making the example in "Getting started with Rails guide".

Make sure the following development libraries are installed before you start building ruby:
(The ruby configure, make and make install (i.e. building and installing) will not tell you anything about these missing libraries)

1) zlib-dev (I think the package name is zlib1g-dev) -- Needed when you try to install the rails gem. If this is not available you will get the following error when you try to install rails with the command :

gem install rails

ERROR: Loading command: install (LoadError) no such file to load -- zlib

2) libssl-dev -- Needed when you try to run the inbuilt rails WEBrick server and load the first example app in the getting started guide. You will get an error of the type:

"LoadError: no such file to load -- openssl"

In my case I did not have this library the first time I built ruby. So I followed the instructions given here to build the openssl-ruby module/binding.
After this I ran `make` and `make install` from the top ruby source directory. May be that was not necessary, but I did it anyways.

Also, I am guessing that if this package was available when I first built ruby then the openssl-ruby module would be built by default. If not there should be a configure option to enable this `feature`. The configure's help output does not provide any info on this (not even the --help=recursive option).

==== Upgrading from older ruby versions ====

Older ruby versions used the folder /usr/local/lib/ruby/site_ruby//rubygems . Now apparently this directory is replaced by /usr/local/lib/ruby//rubygems .

So you will have to get rid of the site_ruby folder (i.e. delete it) so that the gems are not searched for and used from a stale folder.

Not doing this might result in you not being able to run gem at all.

Saturday, June 26, 2010

What is Cloud? -- Simple terms please

Cloud has been making a lot of noise and almost every tech (or tech related) person knows about it or at least heard of it. Now for those who have just heard about it but do not know what it means here is a quick definition from Dave Neilsen, the founder of Cloud-Camp. He says, "For something to be called cloud, it should have these properties :

Hosted by someone else
On-demand. Do not have to wait or call somebody to get it.
Metered somehow. So you know exactly how much you are using and how much you are paying.
Scalable, both ways - up and down as and when you require."

He goes on to say that Cloud could mean different things for different people. Here area few examples stating what cloud is for a particular person :

For an IT guy -- Infrastructure as Service
For a Web Developer -- Platform. Just dump your code and don't worry what runs it.
For a Business guy -- SaaS (Software as a Service)

That was pretty neat. Helps me answer the standard question "What the hell is this cloud thing?" in a sane manner. Earlier I could never figure out what a proper answer should be for this question, because there was so much to tell.

Here is my attempt to elaborate on above mentioned examples.

So cloud is basically having the infrastructure to do what you do hosted by someone else and having it totally scalable. For example, in the above list, for a web developer cloud is a platform where he can dump his code and expect it to run as he has designed it. He does not worry about the machines, the network connectivity, the bandwidth. He just pays for those in a metered manner. He scales his platform whenever he wants. He can increase his bandwidth quota, move to a better machine, increase the number of machines and all of this without calling the customer care or the sales guy. He will do it by logging into the cloud services website or he would have a script do this for him automatically, i.e if he is geek enough.

Similarly for a business man, it is software as a service. E-mail service would probably be a good example. The business man does not know what software runs the email system, he does not worry about what version of email server is running, what os it is running on, what DB it is using to store the emails, what protocols it is making use of. If the email contents are not that sensitive he would not even worry about the physical location of the servers storing these emails. He just buys the email software as a service and uses it. All that he probably worries about is how many email accounts are available to him/his company and how reliable/usable they are. At any point he can increase or decrease the number of accounts, once again without making a call.

That's cloud .

Note : I got this definition from one of the IBM developerWorks podcasts which is available here.

Oh, and remember, all this time every reference to Cloud meant "Cloud Computing", not just plain "cloud"

Monday, June 7, 2010

Very high startup time for vim under screen (GNU-Screen) -- SESSION_MANAGER

I have been using GNU-Screen for a while now and it has been very useful. Today morning when I started working, I noticed that vim is taking unusually long time to start up. It was very irritating. I had faced this same issue some time back but I could not recall the solution. I just remembered that it had something to do with GNOME and the display settings. On searching I found a couple of posts which said that this is because of vim trying to connect to an X which either is on a distant machine (distant in terms of network delay) or it is trying to connect to an non-existent X server. Another post on Ubuntu forums suggested that this could be because of multiple entries for 127.0.0.1 in the /etc/hosts file. Various combination of commenting/un-commenting entries did not help. I checked the DISPLAY env variable. It looked good too.

Finally I resorted to the last option of using strace. strace did reveal interesting stuff. I saw the that the wait/delay was because of a connect() call. Here are a few lines from strace output :

:~$ cat strace.vim.out | grep connect
connect(3, {sa_family=AF_FILE, path="/tmp/.ICE-unix/6386"}, 21) = -1 ENOENT (No such file or directory)
connect(3, {sa_family=AF_FILE, path="/tmp/.ICE-unix/6386"}, 21) = -1 ENOENT (No such file or directory)
connect(3, {sa_family=AF_FILE, path="/tmp/.ICE-unix/6386"}, 21) = -1 ENOENT (No such file or directory)
connect(3, {sa_family=AF_FILE, path="/tmp/.ICE-unix/6386"}, 21) = -1 ENOENT (No such file or directory)
connect(3, {sa_family=AF_FILE, path="/tmp/.ICE-unix/6386"}, 21) = -1 ENOENT (No such file or directory)
connect(3, {sa_family=AF_FILE, path="/tmp/.ICE-unix/6386"}, 21) = -1 ENOENT (No such file or directory)
connect(3, {sa_family=AF_FILE, path="/tmp/.X11-unix/X0"}, 110) = 0
connect(3, {sa_family=AF_FILE, path="/tmp/.X11-unix/X0"}, 110) = 0
connect(3, {sa_family=AF_FILE, path="/tmp/.X11-unix/X0"}, 110) = 0
connect(4, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
connect(4, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
connect(4, {sa_family=AF_FILE, path="/var/centrifydc/daemon2"}, 25) = 0

This confirmed the fact that it was a network related thing, i.e vim is trying to connect to something that did not exist. I thought my X (or GDM to be more precise) was screwed up and thought of logging out and logging back in. I thought of doing some more experiments with this setup to find out what caused the problem.

All of this was running under my gnu-screen session. I opened another gnome terminal to read the redirected output of strace. Accidentally I used vim itself to open the file. Before I realized my mistake and I could start cursing myself, vim popped up..! It was there up and running as fast as it could be... !! Then it hit me that it could be my screen session which is causing this. I did not know how to find the differences in the two environments - in and out of screen. To solve this particular problem I ran strace on vim in the new terminal so that I could compare the two and find out what is lacking. Here is what strace told me in the terminal outside screen :

:~$ cat outside.strace.vim.out | grep connect
connect(3, {sa_family=AF_FILE, path="/tmp/.ICE-unix/28919"}, 22) = 0
connect(4, {sa_family=AF_FILE, path="/tmp/.X11-unix/X0"}, 110) = 0
connect(5, {sa_family=AF_FILE, path="/tmp/.X11-unix/X0"}, 110) = 0
connect(6, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
connect(6, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
connect(6, {sa_family=AF_FILE, path="/var/centrifydc/daemon2"}, 25) = 0

The difference in the path was obvious. On searching through the list of environment variables SESSION_MANAGER came up. It is the variable which will tell all gnome based apps how to contact the X (or the GNOME session). I do not know what caused this disparity, but most likely setting the appropriate value inside the screen session would have worked. Well it would have worked in one of the screen windows in which I change the value. I have several such windows, so I just chose to start a new screen session.

Tuesday, March 30, 2010

Mozilla @ SJCE : Static Analysis projects

It has been a very long time since I posted anything about the Mozilla related activities going on at my college SJCE, Mysore. That in no way means absence of any activity. In a previous post I mentioned that along with the attempt to introduce Mozilla as a standard course I was working to get the current final year students (who are enrolled under the larger VTU University) to start working on Mozilla, using their final semester project as a means. Well I am happy to say that this has materialized. 8 final semester students from CS expressed interest in working with the Mozilla community as part of their final year project and it's been a month nearly since they started their work. Here is a brief write up about that.

As is with most of the Computer Science students in India approaching the Mozilla community, these 8 students also wanted to do something related to compilers. The JS engine and static analysis are two projects in Mozilla which would come under the compiler banner. These 8 students wanted to work on something substantial which can be presented by two teams of 4 students each as their final semester project. So the bugs that they would be working on had to be related. This was possible only with static analysis as there are a lot of related tasks available. Also static analysis would be something new to the students and it would give them an opportunity to understand the internals of the compiler (GCC here) like the AST (Abstract Syntax Tree) and other representations of the code. Moreover the static analysis APIs are exposed in JS and hence the analysis scripts would be written in JS. That way students would learn JS also. Above all these students would be doing something genuinely new.

The students could not be asked to start working on the bugs directly. They were new to open source development, the tools used there like the bugzilla, using email as a formal medium of communication, the source control system (to add to the complexity Mozilla now uses a distributed RCS - Mercurial [hg]), using IRC, the linux development environment etc. It has been these things that the students have been learning for this part month or so. This learning has been in the form of accomplishing the tasks which form the prerequisites for the actual static analysis work. These are things like downloading gcc and mozilla sources from the ftp hosts and from the mercurial repository respectively, applying the mozilla specific patches to gcc for plugin support etc, etc... These are all listed here. Note that some things like installing all the dependency packages for building these applications from sources, learning to use the linux command line itself and others are not on that page but were new to these students nonetheless.

All the students have been putting in substantial effort and have picked up the traits of an open source hacker pretty soon. We have had a few IRC meetings and a lot of formal communications over emails. In parallel we were also working towards shortlisting 8 static analysis bugs. Based on the feasibility of the bug being completed by an amateur developer within a span of 2.5 months and based on the students' interest we finally decided on these 8 bugs :

Bug 525063 - Analysis to produce an error on uninitialized class members
Bug 500874 - Static analysis to find heap allocations that could be stack allocations
Bug 500866 - Warn about base classes with non-virtual destructors
Bug 500864 - Warn on passing large objects by value
Bug 528206 - Warn on unnecessary float->double conversion
Bug 526309 - Basic check for memory leaks
Bug 542364 - Create a static analysis script for detecting reentrancy on a function
Bug 500875 - Find literal strings/arrays/data that should be const static

These tasks are good, challenging and provide an opportunity to understand compilers very closely.

Currently the students have downloaded gcc, applied the patches, built it along with the dehydra plugin support and are ready to run static analysis on the mozilla code. They are now trying to run simple analysis scripts like listing all classes in mozilla code and all classes and their corresponding member functions. It is still quite a long way to go, but it has been a real good start. Let's wait and watch what great feats are in the pipeline.

I hope to keep this blog updated at the same pace at which the students are working.

Good luck to the students. :-)

Thursday, November 19, 2009

Mozilla @ SJCE -- Contributing to Mozilla informally - Final semester student projects

Along with the attempt to introduce Mozilla as a formal elective at SJCE I have been working on getting some development work started informally too. The current final year students are enrolled under the central university - VTU and hence cannot be offered any new subject. However they are expected to do a project as part of the course completion. I thought of using this to get students to do their final semester project with Mozilla. Also Mozilla labs organizes this program named Design Challenge where enthusiasts -- students, teachers, academicians, software developers, etc, are invited to submit innovative ideas to make the Firefox web browser a better software and the Internet in general a better place. This has received tremendous participation from the student community across the globe. The best part about this program is that this is not just a competition. The selected students are trained on various Mozilla related technologies by the very Mozilla developers who are developing those technologies. After such mentoring the students can start contributing code to Mozilla and products based on it. And not to mention the wealth of knowledge they stand to gain and how much of positive influence it will have with prospective employers or while applying for higher studies. This suited very well for the final year and also the pre-final year students.

So when I visited my college last Saturday (14-Nov-2009) to meet my HoD and the coordinating lecturer, related to the Mozilla elective to be offered, I decided to talk to final and pre-final year students and motivate a few of them to participate in the upcoming design challenge (Nov - 09 to Mar - 10) and also take up Mozilla work for final year projects. I talked to the coordinating lecturer (Shri P M Shivamurthy) and asked to him make an announcement regarding this and have the students assembled in one of the classrooms or the seminar hall. After going to the college I got to know that the pre-final year (5th sem) students would not be available as they have their internal assessment tests starting from Monday. HoD suggested that I should stay back till Monday evening and address the 5th sem students on Monday after that day's tests. That was not possible for me and I decided to visit the college again on the next Saturday for that and I would talk to the final year (7th sem) students for now. As a result I decided to stick to the final year project only. Things were set up in the Network Lab and there were about 30 students.

Standing in front of them I blabbered a bunch of things about Mozilla, Open Source software, how engineers graduating are not really industry ready and the fact that they do not have any experience on working with real world applications with huge code base and contributions from a large number of developers and finally how participating in Mozilla would help them fill that gap. I also told them the vast amount of options that Mozilla provides in terms of technologies and that they could find some work or the other which lies in their area of interest. At the end I asked if anyone had any questions and as expected nobody did. Then on asking how many would be willing to try something like this I saw something like 3 to 4 half-hands rising up in the air. This was certainly not a good sign. So I started with the "motivational" speech. "This will really help you guys to be ahead of students from other colleges. You will be industry ready where as other will require a lot more training and mentoring. This is all HoD pre-approved... and on and on and on" for a few more minutes. That really did the trick. After this I had about 10 - 12 hands, full ones. Quite satisfied I told them to get my contact details from PMS sir and contact me for any queries. Till now I have received emails from 6 students (one of them representing a project group of 4 students. so 9 students actually). I have sent them a couple of links to start reading. None of them have responded after that. But I am still hopeful.

A little later I was talking to some of the students offline and I got to know some facts which would have been very useful to me in positioning this Mozilla project idea in a much stronger way.

1) Campus recruitment is pretty bad this time. Only 6 students in Computer Science have got job offers, compared to a daily average of 20 - 25 students a couple of years back. --- I could have talked about how open source development experience will help them with jobs. It did help me.
2) Project teams (generally of 3 to 4 students) have already been formed and a guide (a member of the faculty) has been assigned. This has two effects:
    a) Some teams have already been given the project work, which is a small part of the guide's doctorate thesis/research. The guide will now not happily approve of students under him/her pursuing a difference project. -- We could talk to HoD and reason out with the guide. I could have told the students that such a thing is very much possible.

    b) In a project team of 4, generally one or two students are the smart ones and others will be banking on them for the project to be completed. I had told them that in Mozilla it is generally individual contribution or a team of 2 at the max. The teams, like those mentioned earlier, cannot be divided as the dependent folks will get into a problematic position. -- I could have told them that Mozilla does not bother if the work done by one student is present by 4 as a team work. So let the team enroll for a Mozilla project. Either all or a few in the team will work. If its all of them each one will have a bug assigned or the bug will be assigned to one guy with all of them working on it. If it is just one or two of them then there are no issues.

    c) On a related not to the above two points, some students told me that they would like to do a Mozilla project in addition to an already assigned final semester project. This really delighted me. But it also was, sort of, a matter of concern, as it appeared to me that people were desperate to do something like this with the hope that it will add a line to their resume and help them get a job. I might be wrong and I wish and hope I am. Students doing open source development just out of pure interest and not part of any course requirements is the best thing. But let me see what it turns out to be.

3) I did not make an announcement about the design challenge because the mentoring classes for that goes on from Dec-09 to Feb - 10 and these guys have their exams in the second half of December. But I later got to know that no mentoring classes will be held from approximately 21st Dec to 4th Jan because of the holiday season in the US. So I talked to a smaller number guys, those who stayed back to talk to me, about the design challenge and am hoping to have 1 to 3 ideas being submitted.

I am going to use these points during my next visit, this coming Saturday.

Tuesday, November 17, 2009

Mozilla @ SJCE -- Modern teaching methods are still a stigma and considered unreliable.

In my effort to get open source development into main stream academics at my engineering college SJCE, I have been working with the Mozilla Education team for some time now. With our college getting the autonomous status and also with great help from the MozEdu folks like Prof David Humphrey and Frank Hecker and after considerable persuasion (more about that in a different post) I could get an elective named "Learning Open Source Software with Mozilla" added to the curriculum of the 6th semester students. We (me and the members of the college faculty) decided to roll out this elective in the year of 2010. The next step for me, I thought, was to get at least one lecturer trained with the curriculum and in general get that lecturer involved with Mozilla development and practices. Also with Prof David making the videos of his lecture sessions available on the internet, freely for everyone's use, my plan was to give the lecturer a head start (w.r.t students) so that with the beginning of the next semester he could start teaching the students those parts which he has already learnt. In the mean time he himself can continue his learning by going through David's lecture sessions and other resources that would be available on the internet. I would be visiting the college on alternate weekends. I thought, may be I could take a couple of hours of classes on Saturday for both students and also the lecturer(s). I could have used my presence to answer the queries that the students and the lecturers had, or at least I could point them in the right direction. Apart from these fortnightly visits I planned to be in touch with the college folks continuously on the internet -- email, irc, skype, etc. I would be actively involved the first time this subject is taught. After that the lecturer would be considerably capable and also the subsequent batch of students would have their seniors to help them out. At that point the program will not greatly depend on me and will be sort of self-sufficient with people directly talking to the Mozilla developers and the community in general. As a bonus the students who studied this subject would get to carry out their final year project work with Mozilla, either in terms of some feature implementation or certain bug fixes or any such task. It appeared like a sound plan and I had even decided that we would try to get about 15 students for the first time and gradually increase the number.

Last Saturday (14-Nov-2009) I went to meet the HoD and the lecturer who was coordinating this from the college side to get things started. The meeting was a big disappointment. Our HoD made it absolutely clear that the this elective will be offered only if 50% of the students (which translates to about 70 - 80 students) take up the elective. So the idea of first starting with a small number so that coordinating things on the internet will be easier and all that was just blown away. The reason for this is apparently because there are not enough class rooms to teach more than 2 groups of students from the same semester.!! It has to be a 50-50 division between two electives. So though there are 5 or 6 electives available to the students, they actually have to choose from just 2 of them, based on the majority and not interest.

Well my plan was not killed completely, yet, as the ideas in it were sort of the perfect solution for "the lack of classrooms" problem. I put forth the rest of my idea saying the remote teaching and a lot of learning on an individual basis (by reading up the resources on the internet and interacting with the Mozilla devs) would virtually the necessity of a full blown class room teaching always. But the HoD flatly rejected this idea and said that he understood what I was suggesting but there are rules saying classes must be conducted for a fixed number of hours for any subject offered and it has to be the traditional way. Also the idea of training the lecturers in a, sort of, asynchronous manner was also not acceptable. He would want a training session to be conducted - typically a week to a month long session, may be with a certification at the end of the session. Moreover currently I have one lecturer ready to take this up but department mandates at least 2 or 3. Now I have an additional task of motivating at leat two more faculty members. For this I have to prepare a write up explaining what the lectures stand to learn/gain by taking up this new thing. After that if any of them express their interest in taking this up, I will have to train them and probably it has to be in the traditional way - not sure yet.

Another problem is the pace. The next semester will be starting some time in Feb or Mar 2010 and my HoD keeps saying "Lets go slowly at first and see. If not in 2010 we will offer this in 2011" !!.. :-( . I hope we can get this thing started in 2010 itself. Another year of idle waiting might just terminate the interest that I currently have.

All in all, the wall between open source and my college is appearing to be more and more thick. I intend to meet the HoD the coming Saturday again and try to convince him to give his approval for the "internet based learning" approach. Lets see how it works out.

Apart from this I talked to a bunch of final year students about carrying out their final year project with Mozilla and also about participation the upcoming design challenge. More about that in another post.

Wednesday, November 4, 2009

Mozilla Developer Network (MDN) survey -- My inputs

I just finished the MDN survey and here is what I said in that last box which was put there for the people like us to pen down our rants. ;-)

Project documentation needs improvement. It has improved and is improving, but a lot still needs to be done specifically about the oldest lines of code.
I hear from some of the core developers that there are lots of hacks which make the code not entirely predictable. These need to be removed and replaced by proper, reliable code. Again the cleaning is going on, am just saying that it is really important so that there is some sort of SLA based on which people can develop applications.
Consolidation of the content on MDC and MozEdu so that we can have a "The Mozilla Book", which any beginner can go through and dive into Mozilla related development -- either the platform or the browser or the add-ons or anything.
Finally, making various Mozilla components available in the form of easily pluggable library modules and step by step guides telling us how to use them.

I do not know if any of this is useful to anyone else in the community, but for me, these appeared to be very important based on my association with Moziila for about 2.5 years now.

Wednesday, October 28, 2009

Incrementally building Mozilla/Firefox

Mozilla code base is really huge and has variety of files which are built in a variety of ways. It was sort of always confusing for me to figure out where all I should run make after I change any of the files. I generally asked on the IRC and someone just told me where to run make.

Today it was the same thing. But I also thought I would as well learn the logic to decide for myself the next time. Here is the chat transcript of NeilAway answering these questions.

The MDC page (https://developer.mozilla.org/en/Incremental_Build) has almost the same content for the native code. Neil here explains it for all the types of files involved.

Also to add to the following things running : "make check" from the objdir will run the automated tests.

For xul/js/css/xbl it usually suffices to find the jar.mn (it may be in an ancestor folder) and make realchrome in the corresponding objdir
For idl you're often looking at a full rebuild, depending on how widely it's used
For .cpp and .h you obviously have to make in the folder itself, and then look for a corresponding build folder
Except for uriloader where you use docshell/build and content, dom, editor and view use layout/build
If you're building libxul or static then this is all wrong
You don't look for a build folder, I think for libxul you build in toolkit/library and for static you build in browser/app

Tuesday, October 6, 2009

OpenSSL base64 filter BIO needs an EOL and memory BIO needs to know about EOF

I recently started working with the OpenSSL library to do some https stuff (sort of obviously). OpenSSL apart from having an implementation for the SSL encryption part, it also nifty algorithms for certificate handling and more importantly an abstract I/O layer implementation called BIO which probably stands for Basic I/O or Buffered I/O or something else. I do not know, I could not find it. Nevertheless, the items of interest here are the BIO_f_base64() -- The base64 encode/decode filter BIO and the BIO_s_mem() -- The memory BIO, which can hold data in a memory buffer.

The BIO man page (or its online version present here: http://www.openssl.org/docs/crypto/bio.html) give a nice introduction. For now just consider BIOs as black boxes from which you can read or write data. If the BIO is a filter BIO then the data will be processed whenever you read or write to it.

The name, BIO_f_base64, says it all about the functionality of this BIO. If you read from this BIO, then whatever data is being read is first base64 decoded and given to you. OTOH, if you write something to this BIO it will be base64 encoded and then written to the destination. These BIOs can be arranged in the form of chains to do a series of processing on the data that you are reading or writing, all by just a single call to read() or write(). Its all abstracted. Saves a lot of time.

I was trying to decode some base64 encoded data which I had in a buffer, a char [] to be precise. So if you read up about the BIOs it becomes obvious that you first have to create a memory BIO, which will hold the actual encoded data. Write the encoded data to the memory BIO. Then you chain that memory BIO with a base64 BIO and read from that chain. Any data that you read from the chain will actually come from the memory BIO, but before it reaches you it passes through the base64 BIO. So essentially you are reading from the base64 BIO. As mentioned in the earlier paragraph, when you read from a base64 BIO it decodes the data and gives it you. So the base64 encoded data present in the memory BIO is decoded and presented to you. That's it. base64 decoding is done in one simple read() call !

But there is small catch here. For some reason, which I have yet partially understood, base64 requires that the data it is handling be terminated with a new-line character always. If the data does have any newline character, meaning all your data is present in a single line then you have to explicitly tell that to the BIO by setting the appropriate flag. Here is what the man page says:

The flag BIO_FLAGS_BASE64_NO_NL can be set with BIO_set_flags() to encode the data all on one line or expect the data to be all on one line.

That's about the base64's EOL. Now the other BIO involved here,the memory BIO, is also an interesting guy. When the data it has gets over, it doesn't say "Hey, its over, stop it!". Instead it says "Dude, you got to wait for some more data to arrive. Hang on and keep trying". !!! This is very much suitable, probably when you using the BIO like a PIPE, where you keep pumping data from one end by acquiring it from somewhere and some other guy consumes that data. But in a situation like mine where the data is all fixed I simply want it to tell that the data is all over and I need to stop it. To do this again I will have to explicitly set an appropriate flag and here is what the man page says:

BIO_set_mem_eof_return() sets the behaviour of memory BIO b when it is empty. If the v is zero then an empty memory BIO will
return EOF (that is it will return zero and BIO_should_retry(b) will be false. If v is non zero then it will return v when it
is empty and it will set the read retry flag (that is BIO_read_retry(b) is true). To avoid ambiguity with a normal positive
return value v should be set to a negative value, typically -1.

And this same thing is explained very well here: http://www.openssl.org/support/faq.html#PROG15.

I thank Dr. Stephen N Herson of the OpenSSL project for helping me out in understanding this. Here is the mailing list posting that taught me this thing : http://groups.google.com/group/mailing.openssl.users/browse_thread/thread/f0fc310c1bc6ec65#

Happy BIOing. :-)

Wednesday, July 22, 2009

Getting the size of an already loaded page (from cache) in a Firefox extension.

Today this question came up in the IRC (moznet, #extdev). One of the add-on developers wanted to get the size of the page, either bytes or number of characters. The most obvious thing that came to my mind was progress listeners for definitive answers or the content length from the channel for not so critical scenario. But then he said he wants it for an already loaded page. And he further said that the information is already there somewhere as it is shown by the Page Info dialog (Right Click on a web page and select View Page Info). He was indeed right. Somebody in the code is already going through the trouble of calculating the data size and we can just re-use that. And I immediately started the quest to find that out.

As usual to figure out any browser component I opened up DOM Inspector. That tool is improving, which was against my earlier perception (Sorry Shawn Wilsher), though the highlighting part is still screwed up. Nevertheless, locating that particular label "Size" and the textbox in front of it containing the value was not difficult at all. I got the "id" of the textbox containing the size value. (Its "sizetext" :) ).

Next it was MXR (http://mxr.moziila.org/) in action. I did a text search for the id and got a bunch of results, one of which was pageInfo.js with this entry : line 489 -- setItemValue("sizetext", sizeText); . It is here. The very line made it apparent that it is the place where the value is being set and hence it is the place from where I can get to know how the value is being calculated.

Once I saw the code it was very clear and straight forward and pretty simple also. We have the URL. From the URL we get the cache entry for that URL. (Every cache entry has a key and that key is the URL - so neat). We try to get the cache entry from the HTTP Session first and if that fails we try FTP Session. The cache entry has the size as an attribute on itself, so its just getting that attribute value. DONE.

I am not sure how this will behave if we have disabled every type of cache. AFAIK, there will still be some in-memory cache as long as the page is still loaded. Probably good enough.

That was the end of a small but interesting quest. :-)

Saturday, June 27, 2009

Getting Open Source/Mozilla in my college - SJCE - Part I

Earlier I had written about my first Mozilla Education Status Call in which I mentioned my interest to bring Open source software in general and Mozilla in specific to my college, SJCE. Well, good news, it did not stop at that blog post. Actually speaking it had not started with that blog post either. It has been a long standing wish of mine, even from my college days when I participated in the Google Summer of Code 2007. (More about it here). Back then there were a lot of short-comings, both from my side and the institution's side to actually make this idea into reality. Nevertheless, past is past and no point in brooding about it now. The good thing is that now both I and my institution have overcome our short-comings and we have started working towards making that idea into a reality.

Now for some background aka story telling (which I like the most :) )

As stated earlier nothing happened about this idea when I was in my college. Then after passing out of the college and having worked in the industry for about 12-18 months it hit me, very hard, that I did not learn a lot of things during my life as an engineering student which otherwise would have helped me a lot in my professional life. Also I learnt that I was not competent enough as an engineering graduate as compared to some of my foreign counterparts and also those from some of the "famed premier" engineering institutions of the country. It was not just one thing or two, but I saw differences in many aspects, both theoretical and practical. Gradually it occurred to me that these two aspects are inter-related. Since we did not have proper practical experience and exposure to real world software development we never really appreciated the basic theoretical concepts of computer science, which formed our regular syllabus. Note that here I am saying "we" and not "I". This part of the story talks about the state of most of my classmates and that is the worst part. Nevertheless, the moral of the story is the good old philosophy of teaching that the theory and practice should go hand in hand.

Now there is another part to this story. Before I start with it let me tell you that whatever I am putting here is based on what I have perceived. I may be wrong, but I personally don't think so. And this is absolutely not about boasting about myself. So the other part of the story is that, my association with Open Source development communities, Mozilla to be specific, has greatly helped me in my professional life. I am not going to give examples, but it has really really helped me a lot. Also it has sort of put me ahead of several other capable classmates and most juniors of mine (with the difference being considerably more in the case of juniors). The only differentiating factor between me and them was my exposure to developing a real world application, Mozilla Firefox, and the various lessons that I have learnt by being a part of the global developer community. I am also certain that I could have been a much better computer engineer if I had started working with Mozilla at a much earlier stage, say 2nd year or early 3rd year of engineering and had dedicated more time to it. I still continue to learn a lot of general computer science and software development concepts (concepts not specific to Mozilla development) even now whenever I try to fix a Mozilla bug or even when I try to answer any query on the IRC, many a times even when I just observe few people conversing on the IRC.

Ok, enough of story telling. Now is the part for moral of the story. Here is what I inferred from these experiences:

1) Engineering students, specifically Computer Science engineering students, must get exposure to real world engineering (aka application development) to understand and appreciate the theoretical concepts they learn.
2) Open source software development communities provide a suitable environment for students to work with real world applications. Suitable in terms of - opportunities, cost, mentoring and certainly a few more good things also.

With these two points, it was clear to me that we badly needed open source education/exposure for students in my college. I knew that once this happened the possibilities were endless. Every time I heard/read about some of the classic Free Software implementations done at Universities abroad, I thought that our college can at least have several continuous contributors to currently existing open source projects, if not have creators of some totally new world class software projects. We could be having several different groups of students working of different types of software which operate at various levels (which translates to contributing to different open source software). Then they all could be interacting to help each other in troubleshooting problems. I thought of scenarios/discussions like this happening in the hostel corridors:

Student_1 and Student_2 are working on the Mozilla Download Manager (and here goes the conversation)

Student_1 : Hey, I want to test my new implementation for Mozilla Download Manager for HTTPS downloads. I am unable to configure my test server for HTTPS. You got any idea?
Student_2 : No dude, never done any server side stuff. Lets ask Student_3 from the Apache team.

Then Student_3 comes and sets up Apache for HTTPS within minutes (because that's day-to-day kind of stuff for him) and Student_1 continues testing his new implementation.

After some time:

Student_1 : Oh man, SSL handshake is taking too much time. I need to talk to Student_4, he knows the SSL library code base.
Student_2 : Yeah, I talked about that to Student_4. He is coming up with a patch to reduce the handshake time. It will probably be ready by tomorrow, I guess. Apparently it was a race condition causing the delays.

And so on.

Something like this is really possible. In fact many things much bigger than this are possible. But only if our students start working with and for open source communities.

So this set of thoughts made me work towards getting Open Source into my college. Now that's the background and the story. In the next part I will write about the first set of steps taken towards this, how many of them worked and how many were dead even before they started. And just FYI, the next post too will have some story telling (Obviously since this is just a record of my experiences and my (our) actions).

Thursday, June 25, 2009

Vim -- Restoring cursor from previous session

I am sure every Vim user needs this. Its just so frustrating to open a source file to see the #includes (the first few lines) in the file when we actually will be editing many hundred lines later. Today specially I was juggling with several source files, adding something to the .h file and then come back, do something in the .cpp file and then again go to some .c file and so on. Every time I opened a file the cursor was at the first line and every time invariably I had to search for the function I was editing and cycle through the matches to reach it. So I set on a "Search Mission" - A mission to search for the appropriate .vimrc settings to make Vim remember the cursor position from the previous session.

I got a lot of links. In fact there is a separate Vim Tip - Vim Tip #80 for this. But it has so many code lines and I was a little wary to put all that in my .vimrc file. Continued search revealed me a simpler way. Just two lines solution and it is here : Some Mr.Gopinath's .vimrc file. Its very big, but the lines concerning me are:

" VimTip 80: Restore cursor to file position in previous editing session
" for unix/linux/solaris
set viminfo='10,\"100,:20,%,n~/.viminfo

" only for windows [give some path to store the line number info]
"set viminfo='10,\"100,:20,%,nc:\\Winnt\\_viminfo
au BufReadPost * if line("'\"") > 1 && line("'\"") <= line("$") | exe "normal! g`\"" | endif

Looks like he also picked this up from the same Vim Tip #80 but was smart enough to take out only the necessary part. Nevertheless, this works for me. Thank you Mr. Gopinath.

Happy Vimming :-)

Edit:

I read the Vim Tip #80 again and it made more sense this time. I picked up the last line for a user comment. Now the cursor is put back on the same column too.

Friday, April 17, 2009

Profiling (timing) the firefox build process

Its been nearly 2 years since I am building Mozilla Firefox myself on my machines - various machines of varying capacity.

Initially it was a desktop having Intel Pentium 4, Single Core (obviously), 2.6Ghz, 256MB RAM, running Slackware 11. It probably used to take about 1.5 to 2 hours (I do not remember it now). I never profiled that at that time. Getting a mozilla build itself was a big achievement for me.

After that it was another desktop having Intel Core 2 Duo, Dual Core (obviously, again), 2.4 GHz, 2GB RAM, running Windows XP. This generally took about 45 minutes. AFAI Remember, I had several other programs running when firefox was building.

And this discussion of amount of time taken to build firefox came up a few times in IRC and I myself had this wish to time the build process. Off late, that is from about last week this wish became very strong and today finally I did time it, that too on two machines, my laptop and my desktop. This blog post is the result of the these two profiling tasks. Here are the results.

Note: By profiling I did not do any complicated or intricate. I just used the "time" utility which tells how much time the command takes.

1) On my laptop. --- Build was the only application running apart from the services.
Specs:
IBM Thinkpad T60p. (The one which heats up a lot)
Intel Centrino Duo, T2600 @ 2.16GHz, (Dual Core)
2GB RAM
Windows XP Pro SP2.

The results shown by the "time" utility are:

real 36m33.476s
user 4m17.776s
sys 4m36.271s

The build was done in the MingW shell that the mozilla build system provides. I am not sure to what extent these are reliable, but the real time is pretty much acceptable.

Edit:

Today I ran the build command from the history and hence the 'time" prefix got in automatically and the build was timed again. Surprisingly today's times are way away from the last one. Here are the times:

real 74m26.484s
user 9m53.015s
sys 8m18.365s

Well this time a lot of other apps were running. Firefox (2 instances), Chatzilla, Outlook, Several command windows, Komodo Edit (which again is like another firefox), notepad and a couple of explorer windows. So the time being doubled is not a surprise. Guess this just gives a perspective. :)

2) On my desktop --- Apart from the build process, FF with Chatzilla was running and several instances of bash (Terminals) were running.
Specs:
DELL Optiplex 755 (Sleak, powerful, sometimes fragile)
Intel Core 2 Quad, Q6600 @ 2.4Ghz
4GB RAM
Ubuntu, Gutsy Gibbon

And the results are:

real 20m40.497s
user 19m4.320s
sys 1m17.885s

The two sets of times are a little confusing. But real is all that matters as that is how long I have to wait for the build to be ready.

If you are planning a build then this info might help you plan things accordingly.

Happy building.

OnSwipe redirect code