OnSwipe redirect code

Sunday, March 18, 2012

Rails cookie handling -- serialization and format

A typical Rails cookie has this format : cookie-value--signature (the two dashes are literal). The "cookie-value" part is a url encoded, base64 encoded string of the binary dump (via Marshal.dump) of whatever was set in the session. The signature part is a HMAC-SHA1 digest, created using the cookie-value as the data and a secret key. This secret key is typically defined in [app-root]/config/initializers/secret_token.rb.

Let us try and reverse engineer a session cookie for a local app that I am running. I am using Devise for authentication, which in turn uses Warden. I use the Firecookie extension to Firebug to keep track of cookies. It is pretty handy.

Here is the session cookie set by Rails:

# Cookie as seen in Firebug

As mentioned at the beginning it has two parts separated by two dashes (--).

The cookie value in this case is :

# The cookie-value part

The signature is :

Whenever Rails gets a cookie it verifies that the cookie is not tampered with, by verifying that the HMAC-SHA1 signature of the cookie-value sent matches the signature sent. We can also do the verification ourselves here. Fire up irb and try the following :
$ irb

irb(main):003:0> cookie_str = "BAh7B0kiGXdhcmRlbi51c2VyLnVzZXIua2V5BjoGRVRbCEkiCVVzZXIGOwBGWwZvOhNCU09OOjpPYmplY3RJZAY6CkBkYXRhWxFpVGkvaQGsaQGwaRBpAdFpCGk9aQHtaQBpAGkGSSIiJDJhJDEwJEZseHh3c293Q29LcHhneWMxODR2b08GOwBUSSIPc2Vzc2lvbl9pZAY7AEYiJTUwNDdkOTMwNDNkNGEzOTA4YTkwN2U2MDY5OGRmOTdm"

# This cookie_secret comes from [app-root]/config/initializers/secret_token.rb. Obviously you need to keep this secret for your production apps.
irb(main):005:0> cookie_secret = '392cacbaac74af104375eb91324e254ba232424130e69022690aa98c1d0dfade159260588677e2859204298181385a83b923e58c4ef24bb3a40bdad9a41431b4'
=> "392cacbaac74af104375eb91324e254ba232424130e69022690aa98c1d0dfade159260588677e2859204298181385a83b923e58c4ef24bb3a40bdad9a41431b4"

irb(main):006:0> OpenSSL::HMAC.hexdigest(OpenSSL::Digest::SHA1.new, cookie_secret, cookie_str)
=> "51f90f7176326f61636b89ee9a1fce2a4972d24f"

As can be seen the HMAC-SHA1 hexdigest generated with the cookie-value matches the signature part of the cookie. Hence the cookie is not tampered with.

Now that the cookie authenticity is validated, let us see what information it holds.

Let us retrace the steps taken by Rails to generate this cookie value to get the value stored in the cookie. The steps taken by Rails are :
  1. session_dump = Marshal.dump(session)
  2. b64_encoded_session = Base64.encode64(session_dump)
  3. final_cookie_value = url_encode(b64_encoded_session)

The reverse process would be :
  1. url_decoded_cookie = CGI::unescape(cookie_value)
  2. b64_decoded_session = Base64.decode64(url_decoded_cookie)
  3. session = Marshal.load(b64_decoded_session)

And with a beautiful language like Ruby all these 3 steps can be done in one single line of code. Here it is :
(Btw, I need to require 'mongo' because one of the values contained here is of type BSON::ObjectId which is defined in the mongo gem. Without this Marshal.load will error out)

irb(main):001:0> require 'mongo'
=> true
irb(main):002:0> require 'cgi'
=> true
irb(main):003:0> cookie_str = "BAh7B0kiGXdhcmRlbi51c2VyLnVzZXIua2V5BjoGRVRbCEkiCVVzZXIGOwBGWwZvOhNCU09OOjpPYmplY3RJZAY6CkBkYXRhWxFpVGkvaQGsaQGwaRBpAdFpCGk9aQHtaQBpAGkGSSIiJDJhJDEwJEZseHh3c293Q29LcHhneWMxODR2b08GOwBUSSIPc2Vzc2lvbl9pZAY7AEYiJTUwNDdkOTMwNDNkNGEzOTA4YTkwN2U2MDY5OGRmOTdm"

# Reverse engineering the cookie to get the session object
irb(main):004:0> session = Marshal.load(Base64.decode64(CGI.unescape(cookie_str)))
=> {"warden.user.user.key"=>["User", [BSON::ObjectId('4f2aacb00bd10338ed000001')], "$2a$10$FlxxwsowCoKpxgyc184voO"], "session_id"=>"5047d93043d4a3908a907e60698df97f"}

This is the session data that the session cookie was holding. This data is subsequently used by Warden and Devise to fetch the user from the DB and do the authentication.

And that is how Rails handles cookies (at least how Rails 3.0.11 does. I am not sure if things have changed in later versions)

Thursday, March 15, 2012

NAS and SAN explained -- with technical differences.

Acronyms and fancy buzz words (specifically computer science related ones) have always troubled me, at times making me very angry at the person using them and in many cases leaving me in a confused state eventually. So whenever I come across such acronyms/buzz words I try to dissect them and prepare a mental visual map that I will use every time the acronym is used in the future. The acronyms for this write up are NAS (Network Attached Storage) and SAN (Storage Area Network).

These might be very simple and obvious things for many people but I am sure I have lost quite a bit of my hair whenever someone mentioned these acronyms to me. So here is my attempt to decipher them.

First the basics. Both of these consist of two building blocks storage and network, or to put in a less naive manner both SAN and NAS allow applications on one machine to access data present on another machine. Okay, so why two names, why two acronyms? To answer that let me just take up these two building blocks separately.

In the simplest sense "Storage" means dealing with files stored on the hard disk attached to the system. We do that with the the APIs (or "methods" if you want to avoid the acronym) made available by the filesystem and libraries built using those methods. As an application programmer we almost never worry about how the files are actually stored on the disk. It is the responsibility of the filesystem, the kernel and the disk driver. The application always views the data stored on the disk in terms of files (used in a generic sense to refer to both files and directories) - more so as a stream of bytes. If we dig a little deeper we find that these disks are actually made available to the filesystem by the disk drivers as block devices - i.e. whenever they accept or return data they do it in quantum of blocks. A disk doesn't return you a single byte of data when you read from it. It always returns one or more blocks. From what I understand the size of a block these days is typically 4KB. Amount of data transferred to or from the disk is a multiple of this block size. Allocation of space for files is also made in terms of blocks, which some times leads to a file utilizing the last block partially (and that is why we see the difference in the actual file size and file size of disk entries).

That's about storage. To summarize; data is made available as files by filesystem software, but the device actually makes it available as blocks.

Network in the simplest sense is communication between two processes - running either on the same machine or on different machines. To simplify it further let's just limit to the case of communication between two processes on two different machines. Typically one of these two processes will be a server processes and the other a client process. The server process would be listening on a specified port to which the client can connect. The client can then send requests over the connection which the server will "serve", by sending back a suitable response. The format of the request and the response are specified before hand and the client and the server agree to conform to that specification. This conformance is what is called the "protocol" which the two processes (or in this case the two machines) are using for their communication. The client typically asks for some data and the server fetches it from some place and sends the requested data as response. The client doesn't know where the server is fetching the data from and the server doesn't know what the client is doing with the data. The protocol is all that matters to them.

That's network. No summary here.

Okay, so how do storage and network come together now?

In the storage example the data on the hard disk (referred to as "our hard disk" henceforth) was being accessed by the applications running on the same machine (referred to as the "host machine" henceforth). Now what if applications running on a different machine (referred to as the "new machine" henceforth) want to access the data on our hard disk? Let us call this requirement as "remote data access".

The traditional filesystem software is designed to interact with a disk that was made available to it on the local system by the disk driver and the driver is designed to handle a disk that is attached to this local system. For our "remote data access" either the filesystem software has to get smarter and start talking to the device available on our host machine or the disk driver has to become smarter and make the disk on our host machine available as a local device on the new machine. It is these two options that the two acronyms stand for. One acronym means a smarter filesystem software with the same old driver and another means a smarter driver with the same old filesystem. That's the difference between the two and the reason there are two names and two acronyms.. !

NAS - Network Attached Storage -- This one has a smarter filesystem and the same old driver. In our setup, the filesystem on the "new machine" knows that the disk is on the "host machine" and every time an application requests a file (either for reading or writing) it has to contact the "host machine" over network and retrieve the file. In other words the filesystem on the "new machine" makes a request to the "host machine" - making it a client process. To accept and respond to that request there must be a server process running on the "host machine". This server process fetches the requested file from the disk (using the old driver) and sends it back to the client. The client process, which is the filesystem software, in turn makes that file available to the application that requested it. We can see that the data on the server is made available to the client as a file. This is what defines NAS.

So for the filesystem software to get smart, it now needs two components - a client part used by the applications and the server part which handles the disk. There are quite a few such "smart filesystem software" out there. The most common in the UNIX/LINUX world is NFS - Network File System. The server part of NFS is named "nfsd". On the client side, the standard "mount" command is smart enough to mount drives with "nfs" filesystem type.

Note that here the filesystem software is aware that the disk (and hence the data) is on a remote machine. This is another defining trait of NAS.

More details are available here : http://nfs.sourceforge.net/ and here : https://help.ubuntu.com/8.04/serverguide/C/network-file-system.html

SAN - Storage Area Network -- This one has a smarter disk driver and the same old filesystem. The disk driver on the "new machine" lies to the OS and the filesystem software that there is a disk attached to the system locally. The OS and the filesystem software believe the driver and continue to use the fake disk that the driver provided. Whenever the disk driver is asked to fetch a block (not a file, a block), it in turn sends a request to the "host machine" and retrieves that block of data - thereby becoming the client process in the setup. Accordingly there will be a server process running on the "host machine" which accepts this request, fetches the corresponding block from the actual disk and sends it back to the client. The client, which is the smart disk driver in this case, in turn passes that data to the filesystem software and eventually to the application that requested the file data. It is evident here that the data on the server was made available to the client as "blocks" and not as files. This is what defines SAN.

Note that here the filesystem (and every other component apart from disk driver) is not aware that the disk (and the data) is on a remote machine. This is another defining trait of SAN.

A very common and popular appearance of SAN these days is in the various cloud offerings. For instance the Amazon cloud offering has a service named EBS - Elastic Block Storage, which makes network storage available as locally attached disk. We can have all the regular filesystems like ext4 or xfs on top of this EBS drive.

That's it. The two acronyms have been conquered... !

Saturday, March 10, 2012

Analysis of the Duqu Trojan worm by Kaspersky Labs

I happen to come across the discovery and research of the Duqu Trojan worm, which apparently is the successor of the notorious Stuxnet worm. There are a lot of articles to read and I am feeling a little sleepy now and I may not finish all of them and be awake to write a summary of my understanding. So instead of bookmarking all those tabs I am documenting them here with little metadata to identify what each link talks about.

(Note: Yesterday night I did doze off in the course of writing this post. :P)

  1. The FAQ link - http://www.securelist.com/en/blog/208193178/Duqu_FAQ

    A standard FAQ page, good starting point if you are totally new to Duqu or Stuxnet. Also answers some noob questions. Btw, it mentions that one of the Command & Control center servers was hosted in India.. !!

  2. The mystery of Duqu - Part one - http://www.securelist.com/en/blog/208193182/The_Mystery_of_Duqu_Part_One

    This one provides the bird's eye view of the worm - it's components, the files involved and how they play together, comparison with Stuxnet (with a missile analogy). It also gives a chronological view of the discovery and detection of this worm. More importantly it talks about the various device drivers - signed and unsigned which were used as a disguise.

  3. The Mystery of Duqu: Part Two - http://www.securelist.com/en/blog/208193197/The_Mystery_of_Duqu_Part_Two

    This one talks about the first detected real world infections that these guys detected using their cloud based Kaspersky Security Network. These were in Sudan and Iran, but no direct link to Iran's nuclear program yet. But one thing comes out - the worm was totally different on each infection. Different driver name and different checksum. In one case different size too. So the mystery actually continues.

  4. The Mystery of Duqu: Part Three - http://www.securelist.com/en/blog/208193206/The_Mystery_of_Duqu_Part_Three

    A short entry which corrects a mistake made in previous post about a network attack. What's more interesting is that, this reveals the starting point of this infection - a.k.a the dropper. Turns out that it was a 0-day exploit in Microsoft Word, related to the file win32k.sys (CVE-2011-3402). So the infected word file was sent to specific people via email. Also each infected file is different, which means the file was crafted individually for each target.

  5. The Duqu Saga Continues: Enter Mr. B. Jason and TV’s Dexter - http://www.securelist.com/en/blog/208193243/The_Duqu_Saga_Continues_Enter_Mr_B_Jason_and_TVs_Dexter

    This one gets a little technical and walks us through the modus operandi taking one of the infections mentioned in previous post. It reveals a bunch of things and confirms most of the assumptions made previously - viz : very targeted attack, dynamic modules with little to no trace on target machine, different C&C servers for different targets, etc. It also tells us how the worm authors got creative and created a font named Regular Dexter and named the creator of font as Showtime Inc.

    What is more interesting is the way the comments get even more creative. One comment talks about a new interpretation of a HEX string found in the trojan code - 0xAE790409. Earlier it was thought to be related to the death of Habib Elghanian (http://en.wikipedia.org/wiki/Habib_Elghanian) like in the Stuxnet case. But the new interpretation is that : AE means "Atomic Energy" and (19)79-05-09 is the date on which USA and USSR signed the Salt 2 treaty to limiting nuclear weapons. This is wrong because SALT II was signed on June 18th 1979 - http://en.wikipedia.org/wiki/Strategic_Arms_Limitation_Talks#SALT_II

    Another comment interprets the sender email bjasonxxxx@xxx.com as "Bourne Jason", the ultimate spy/operative from the famous Bourne novel/movie series.

  6. The Mystery of Duqu: Part Five - http://www.securelist.com/en/blog/606/The_Mystery_of_Duqu_Part_Five

    This one dives deep into the structure and layout of the DLL and PNF files of the trojan, the registry entry, the config files, the process it affects, etc.. It gets very technical, and requires knowledge of binary file formats and dll loading mechanism to understand it fully. The loader part is fully disected here, however the payload is still not known. They say it is some C++ code with heavy use of STL and probably a custom framework.

  7. The Mystery of Duqu: Part Six (The Command and Control servers) - http://www.securelist.com/en/blog/625/The_Mystery_of_Duqu_Part_Six_The_Command_and_Control_servers

    This one analyzes the command and control servers used by the Duqu trojan. This is the first post where the details of the India C&C server was mentioned. It belonged to a web hosting company named Webwerks - http://www.web-werks.com/ and http://www.webwerks.in/. The Kaspersky guys say this was the most interesting of all the C&C servers - probably because it was the first one and also the longest serving. Unfortunately they were not able to analyze this as it was cleared off just hours before the hosting company agreed to make an image of this server. Nevertheless they analyzed two servers - one in Vietnam and another in Germany and dug a boat load of information. Final stand is that either OpenSSH 4.3 has a 0-day vulnerability or the server guys had very bad password and hackers cracked it with brute force.

  8. The Mystery of the Duqu Framework - http://www.securelist.com/en/blog/667/The_Mystery_of_the_Duqu_Framework

    This post details the code structure of payload and tries to decipher the programming language and the framework used. Although many parts appear as standard C++ with heavy use of STL the significant portion of the main payload code appears to not have any link to the standard C runtime and does not appear to be compiled with the Microsoft Visual C++ compiler. The code uses the Win32 native API directly bypassing the runtime. This means the trojan authors used a very obscure programming language and compiler or came up with their own. The comments talk about various possibilities but few actually make sense. One commentor is very sure it is one of the big US software companies and pin points IBM as the prime suspect along with his own myriad set of proofs.

The bottom line is that the sponsors of the Duqu worm have deep pockets, are very organized and have very specific targets. Also different parts were probably developed by different teams, with no team knowing the full picture. This very likely means it is state sponsored. My guess is : that information will never come out.