I Think Tech: web-applications

Showing posts with label web-applications. Show all posts

Saturday, February 7, 2015

AJAX for beginners - what it is and how it evolved

Today's websites or web pages are very complex structures involving several components. It can be daunting for a beginner to understand it all at once. To ease the process let's break things down and see how the various pieces fit together.

At the core of web pages is the HTML. It is the language that browsers understand and it is what web developers use to showcase rich content. Without HTML all our web pages would have been simple text files. Now I assume that you are aware of HTML and how it is used in web pages. Another critical piece that works closely with HTML is CSS, but we can skip that one for now.

Plain HTML web pages are pretty dull. They are not really interactive. What they mainly have apart from text are :

Some formatting information which tells the browser how to show the contents.
Some media content like images or videos.
A bunch of hyperlinks which lead to other similar web pages.
In slightly evolved pages there would be forms which allowed a user to send data back to the servers.

It is the third one which helped form this "web" of pages which we call the Internet. How do these links work? Pretty simple actually. Each link typically points to a web page on a server. Whenever a user clicked on a link, the browser would fetch that page, remove the existing page and show the newly downloaded one. Some times the link could point back to the same page in which case clicking on it would just "refresh" the existing page. We all know about the refresh/reload button in the browsers don't we?

There are a few things to note here.

These basic web pages were dull in the sense that they were not interactive. They did not respond to user actions except for clicks on links, in which case the existing web page goes away and a fresh page is loaded.
When a link was clicked the browser would decide to send a request to the server. The web page itself did not have much control on when the request should be made.
When the browser was making a request in response to a link click, there was pretty much nothing you could do with the web page until the request completed. Every request to the server blocked the entire page and if there were multiple requests that had to be made it had to be sequential, one after the other. A fancy technical word to describe this would be "synchronous".
An obvious observation is that the data that came from server in response to request from a browser was HTML, as in the data and information on how to display that data were sent back from the server all the time.

While the fourth point appears as not such a big problem, the first three are definitely limitations.

To work around the first problem browser developers came up with JavaScript. Browser folks told web developers that they can make their web pages interactive and responsive to user actions by writing code to handle those interactions in this new language called JavaScript and including it in the web pages. Web developers could now write different pieces of code to handle different user actions. The browser provided a few functions (aka APIs) in JavaScript which the web developers could use to make changes to the web page. This API is called DOM (Document Object Model). Thus web pages became interactive and responsive.

The second and third limitations still existed however. The only way to request data from server was through links or in case of evolved pages through forms. Also the web page was one single unit. Whenever something had to change the whole page had to be reloaded. Imagine something like that with your gmail page. How would it be if the page refreshed every time you received a chat message or worse every time one of your contacts logged in or out of chat? That would be disastrous. The fate would be similar of something like a sports website in which the small corner showing the score needs to be updated a lot more frequently than the rest.

To solve this some people thought of iframes as a solution. Iframe is a HTML tag inside which you can put another HTML page. Now with this arrangement the inner page could be reloaded without affecting the outer page. So the sports website could now put the score part in an iframe and refresh it without affecting the rest of the page. Similarly gmail could break up its page into multiple iframes (It actually does this in reality, but it also does a lot more than this).

With iframes the "synchronous" problem was somewhat solved. Multiple requests could now be made in parallel from a single web page. However the method of making the request largely remained the same, using links or forms. It was either not seamless or almost impossible to make requests in any other manner, like say whenever user typed something (like Google search). Now let's see what all capabilities we have on top of the basic web pages.

Web pages can respond to user actions. Some code can be executed on various user actions made on different parts of the page.
This code can modify the web page using DOM.
Browsers are now capable of making multiple requests in parallel without blocking the web page.

The browser folks now combined these abilities thereby allowing web developers to make requests to servers in response to user actions using JavaScript. These requests could be made in parallel without blocking the page i.e. Asynchronously. Now when the response came back the same JavaScript code could add the HTML received in the response to the existing page using DOM.

The developers went a step ahead and said "why receive HTML every time? In most cases the formatting or presentation information is already there in the web page. What is needed is only new data or content (For example, in the sports website example, all that is needed is the updated score. Information on how to show that score is already present in the web page). So instead of fetching HTML every time let's just fetch data to make things much faster and efficient". And back then the most popular way of representing data in a structured manner was XML. So developers started receiving data in XML format from the server and using that to update the web page. XML was so popular that browsers added support for automatically parsing XML data received in response to these requests.

This approach of making requests and updating parts of web pages became very popular and was much widely used. Since this was predominantly used to make requests over HTTP (The protocol of the web) to fetch XML data this technology was called "XMLHttpRequest" (Internet explorer called it something different, it probably still does).

What all does this XHR offer?

Making requests "Asynchronously"
Making requests from "JavaScript"
Making requests for "XML"

Put together it forms "AJAX - Asynchronous JavaScript and XML".

Notes :

Several things have been simplified here as this is aimed at people new to web development and Ajax in particular. This should nevertheless give a good starting point.
References to XML should be read as " something other than HTML". Nowadays XML has been largely replaced by JSON because of several advantages.
XMLHttpRequest does not necessarily mean making asynchronous request. It supports making synchronous requests also. But I don't know if there is a good reason to do it. Making a sync xhr will block the entire page until that request is complete. Really don't do that.

Sunday, October 27, 2013

Invalid credentails : OmniAuth + oAuth2 + Rails 4 encrypted cookie store + simultaneous requests

OmniAuth is a very well know gem in the Ruby/Rails world. Almost every Rails application out there is probably using it to authenticate with one of the various mechanisms it supports. OmniAuth is just awesome.!

I have been using OmniAuth for about 3 years now in various Rails projects and it has worked very well, although I have had to monkey patch it once or twice to allow me to exploit some of the Facebook features (like authenticated referrals). But all in all, it just works as advertised and you will have an authentication system in pretty much no time at all.

Yesterday, however, I was having quite a bit of hard time getting OmniAuth to do a simple "Login with Facebook" oAuth2 authorization. It was something that had worked seamlessly on innumerable previous occasions. But yesterday it just kept failing repeatedly, succeeding only once in a while. And it always failed with the same obscure error "Invalid Credentials" during the callback phase. (OmniAuth operates in three phases : Setup, Request and Callback. Apart from OmniAuth wiki on github, this is a good place to read about it : http://www.slideshare.net/mbleigh/omniauth-from-the-ground-up). The fact that the error message was not so helpful made this whole process a lot more frustrating. After some hunting on the inter-webs I found that the culprit could be a bad "state" parameter.

Wait, what is this "state" parameter?

Background : oAuth-2 CSRF protection

oAuth2 specifies the uses of a non-guessable secure random string to be used as a "state" parameter to prevent CSRF attacks on oAuth. More details here : http://tools.ietf.org/html/rfc6749#section-10.12. This came out almost a year back and many oAuth providers implement it already, including Facebook. OmniAuth also implemented it last year. Although not written with the best grammar, this article will tell you why this "state" parameter is needed and what happens without it.

To sum it up, oAuth client (our web application) creates a random string, stores it in an accessible place and also sends it to the oAuth provider (Ex : Facebook) as "state" during the request phase. Facebook will keep it, authenticate the end user, ask for permissions and when granted sends a callback to our web application by redirecting the user back to our website "with the state" as a query parameter. The client (our web application) compares the "state" sent by the provider and the one it had stored previously and proceeds only if they match. If they don't then there is no proof that the callback that our web application received is actually from the provider. It could be from some other attacker trying to trick our web application into thinking (s)he is someone else.

In case of OmniAuth oAuth2, this state parameter is stored as a property in the session with the key 'omniauth.state' during the request phase. The result of the request phase is a redirect to the provider's URL. The new session with the "state" stored in it will be set on the client's browser when it receives this redirect (302) response for the request /auth/:provider (This is the default OmniAuth route to initiate the request phase). After the provider (Facebook) authenticates the user and user authorizes our application, the provider makes a callback to our application by redirecting the user back to our web application at the callback URL /auth/:provider/callback along with the "state" as a query parameter. When this callback URL is requested by the browser, the previously stored session cookie containing the 'omniauth.state' property is also sent to our web application.

OmniAuth checks both of these and proceeds only if they match. If they don't match it raises the above mentioned "Invalid Credentials" error. (Yeah, I know, not really a helpful error message..!).

Ok, that is good to hear, but why will there be a mismatch?

A mismatch is possible only if the session cookie stored on the user's browser is changed such that the 'omniauth.state' property is removed from it or altered after the request phase has set it. This can happen if a second request to our web application was initiated while the request phase of oAuth was running and it completed after the request phase completed but before the callback phase started. Sounds complex? The diagram below illustrates it.

The diagram makes it clear as to when and how the 'omniauth.state' gets removed from the session leading to the error. However, apart from the timeline requirements (i.e. when requests start and end), there is another essential criteria for this error to occur :

The response of the "other simultaneous request" must set a new session cookie, overriding the existing one. If it does not explicitly specify a session cookie in the response headers, the client's browser will retain the existing cookie and 'omniauth.state' will be preserved in the session.

Now, from what I have observed, Rails (or one of Rack middlewares) has this nifty feature of not serializing the session and setting the session cookie in the response headers, if the session has not changed in the course of processing a request. So, in our case, if the intermediate simultaneous request does not make any changes to the session, Rails will not explicitly set the session cookie, there by preventing the loss of 'omniauth.state' property in the session.

Ok, then why will the session cookie change and lose the 'omniauth.state' property?

One obvious thing is that the "other simultaneous request" might change the session - either add or remove or edit any of the properties. There however is another player involved.

This is where the "Encrypted Cookie Store" of Rails 4 comes into picture. Prior to Rails 4, Rails did not encrypt its session cookie. It merely signed it and verified the signature when it had to de-serialize the session from the request cookie. Read how Rails 3 handles cookies for a detailed breakdown. Rails 4 goes one step ahead and encrypts the session data with AES-256 (along with the old signing mechanism. More details on that coming up in a new post). The implementation used is AES-256-CBC from OpenSSL. I am not a Cryptography expert, but the way I understand it, the property of AES is that it results in a different cipher text every time you run the encryption for the same message plain text. Or it could also be because the Rails encryption scheme initializes the encryptor with a random initialization vector every time it encrypts a session (Implementation here). Either ways, the session cookie contents are always new for every request even when the actual session object or session contents remain unchanged. As a result Rails will always set the session cookie in the response header for every request and result in the browser updating that cookie in its cookie store.

In our case, this results in the session being clobbered at the end of the "other simultaneous request" and we end up losing the 'omniauth.state' property and oAuth fails.

Umm.. ok, but when and how does this happen in real world, if at all it can?!

All these requirements/constraints described above, especially the timing constraints makes one wonder if this can really happen in the real world. Well, for starters it happened to me (hence this blog post..!!). I also tried to think of scenarios other than mine where this would happen. Here are a couple that I could think of :

Scenario - 1) FB Login is in a popup window and the "Simultaneous request" is a XHR - Ex : an analytics or tracking request.

Here is the flow :

User clicks on a "Login with FB" button on your website.
You popup the FB Login page in a new popup window. Request phase is initiated. But there is a small window of time before the redirect response for '/auth/facebook' is received and 'omniauth.state' is set.
During that small window of time, in the main window, you send an XHR to your web app to, may be, track the click on the "Login with FB" button. You might do this to just track usage or for some A/B testing or to build a funnel, etc. This request sends the session without the 'omniauth.state'.
While the XHR is in progress, the redirect from the request phase is complete and the session with 'omniauth.state' is set. The user now sees FB Login page loading and proceeds to login once it is loaded.
While the user is logging in to FB and approving our app, the XHR has completed and has come back with a session without 'omniauth.state'. This is stored by the browser now.
Once user logs in and approves your app, the callback state starts. But the session sent to your web app is now missing the 'omniauth.state'.
oAuth fails.

How big a deal is this scenario?

If you are indeed making a XHR in the background, then this scenario needs to be taken care of. Since the "other simultaneous" request is automatically triggered every time, it is very likely that session will get clobbered.

How to solve this?

You can either first send the XHR and then in the response handler of that XHR, you can open the FB Login page in the popup. Also have a timeout just to make sure you don't wait for too long (or forever) before you receive a response for that XHR.

Alternatively, if you can push your tracking events in a queue stored in a cookie, you can do that and then open the FB Login page. Once the FB Login completes, you can pull that event out of the queue and send it. As a backup have a code that runs on every new page load to look for pending events from the queue in the cookie and send those events.

With HTML5 in place, its probably better to use the localstorage for the queue than the cookie. But again that needs user's permission. Your call.

Scenario - 2) FB Login is in the same window/tab but User has the website opened in two tabs.

Here is the flow :

User has your website opened in a browser tab - Tab-1
User opens a link on your website in a second tab - Tab-2 (Ctrl + Click or 'Open in a new tab' menu item). This request sends the session without 'omniauth.state'.
While that Tab-2 is loading, user clicks on "Login with FB" in Tab-1 initiating the request phase.
If the request loading in Tab-2 is a little time consuming the redirect of the request phase of oAuth in Tab-1 completes before request in Tab-2, setting the session with 'omniauth.state'. After that FB Login page is shown and user proceeds to login and authorize.
While the user is logging in, the request in Tab-2 completes, but with a session that is missing 'omniauth.state'.
After logging in to FB, the callback phase is initiated with a redirect to your web app, but with a session that doesn't have 'omniauth.state'.
oAuth fails.

How big a deal is this scenario?

Not a big deal actually. In your web app, in the oAuth failure handle, you can just redirect the user back to /auth/facebook, redoing the whole process again and guess what - this time it will succeed and that too without the user having to do anything because the user is already logged in to FB and has also authorized your app. But just to be on the safer side, you would want to be careful about this loop going infinite (i.e. You start FB auth, it fails and the failure handler restarts the FB auth). Setting a cookie (different from the session cookie) with the attempt count should be good. If the attempt count crosses a certain limit, send the user back to homepage or show up an error page or show a lolcats video, c'mon be creative.

Ok, those are two scenarios that I could think of. I am not sure if there are more.

Can OmniAuth change something to solve this?

I believe so. If OmniAuth uses a different signed and/or encrypted cookie to store the state value instead of the session cookie none of this session clobbering would result in loss of the state value. OmniAuth is a Rack based app and relies on the Session middleware. I am not entirely sure, but it can probably use the Cookie middleware instead. Just set its own '_oa_state' cookie and use that during callback for verification.

Will you send a pull request making this change?

I am not sure. I first will hit the OmniAuth mailing list and find out what the wise folks there have to say about this. If it makes sense and nobody in the awesome Ruby community provides an instant patch, I will try and send a patch myself.

THE END

Ok, so that was the awesome ride through oAuth workings inside the OmniAuth gem. In the course I got to know quite a bit of Rails and also Ruby internals. Looking forward to writing posts about those too. Okay, okay.. fine. I will try and keep those posts short and not make them this long..!!

Till then, happy oAuthing. :-/ !

P.S : Security experts, excuse me if I have used "authentication" and "authorization" in the wrong places. I guess I have used it interchangeably as web applications typically do both with oAuth2.

Sunday, January 1, 2012

MongoDB concurrency - Global write lock and yielding locks

There has been lot of hue and cry about MongoDB's global write lock. Quite a few people have said (in blog posts, mailing lists etc) that this design ties down MongoDB to a great extent in terms of performance. I too was surprised (actually shocked) when I first read that the whole DB is locked whenever a write happens - i.e a create or update. I can't even read a different document during this time. It did not make any sense to me initially. Previous to this revelation I was very pleased to see MongoDB not having transactions and always thought about that feature as a tool which avoided locking the DB when running expensive transactions. However this global lock sent me wondering whether MongoDB is worth using at all.. !! I was under the assumption that the art of "record level locking" had been mastered by the database developers. This made MongoDB look like a tool of stone age.

Well I was wrong. Turns out that "Record level locking" is not that easy (and the reasons for that warrant a different post altogether) and from what I understand MongoDB has no plans of implementing such a thing in the near future. However this doesn't mean the DB will be tied up for long durations (long on some scale) for every write operation. The reason is that MongoDB is designed and implemented in ways different than other databases and there are mechanisms in place to avoid delays to a large extent. Here are a couple of things to keep in mind :

MongoDB uses memory mapped files to access it's DB files. So a considerable chunk of your data resides in the RAM and hence results in fast access - fast read all the time and very fast write without journaling and pretty fast write with journaling. This means that for several regular operations MongoDB will not hit the disk before sending the response at all - including write operations. So the global lock that is applied exists only for the duration of time needed to update the record in the RAM. This is orders of magnitude faster than writing to the disk. So the DB is locked for a very tiny amount of time. This global lock is after all not as bad as it sounds at first.

But then the entire database cannot be in RAM. Only a part of it (often referred to as working set) is in RAM. When a record not present in RAM is requested/updated MongoDB hits the disk. Oh no, wait.. so does that mean the DB is locked while Mongo tries read/write that (slow) disk? Definitely not. This is where the "yield" feature comes in. Since 2.0 MongoDB will yield the lock if it is hitting the disk. This means that once Mongo realizes it is going for the disk, it sort of temporarily releases the lock until the data from disk is loaded and available in RAM.

Although I still prefer record level locking in MongoDB, these two above mentioned features are sufficient to reinstate my respect and love for MongoDB. :)

OnSwipe redirect code