OnSwipe redirect code

Friday, November 25, 2011

Transactions - both single node and distributed - are hardwired in Windows - since Win 95

Transactions or "Atomic Transactions" to be precise, are very well known to anyone who has worked with databases. With the recent advent of NoSQL databases and the CAP theorem being used/abused by anyone and everyone, words like "consistency" and "transactional model" have become run-of-the-mill jargon. But what is actually interesting is that the concept of transactions or transactional model goes beyond our typical RDBMS. Things get even more challenging when we try to achieve transactions in a distributed system. Because transactions inherently lock the resource(s)/data they are operating on until the transaction completes, those resources can become inaccessible altogether very easily in a distributed setup if one of the node fails or if there is some problem with the network or any such thing, there by increasing the complexity of implementing distributed transactions by many folds compared to transactions on a single node.

Today I was trying to figure out if there is a way to "simulate" (albeit it will be very crude) some sort of transactions in my application which uses MongoDB (which doesn't support transactions by design - to avoid the locking mentioned above, although ironically there is a global write lock..!!). Searching on the internet lead me to this blog of a Raven DB developer. The author there mentions that RavenDB supports both sharding and transactions, which means it has implemented distributed transaction support. At first read I was pretty impressed (this was the first time I had heard about RavenDB). Before I could ask the author about the implementation details I saw a comment in which the author had mentioned that they use DTC (which again was a new thing). Turns out DTC, Distributed Transaction Controller, is a service that is baked right in the Windows OS itself, that too dating back to the Windows 95 days (wow.. now I am impressed with Windows..!). Here is the MSDN article describing the service.

The MSDN article clearly explains the basics of distributed transactions and how it is modeled. What is worth noting is that, by abstracting out the code for carrying out distributed transactions as a service, multiple resource managers (like different databases, queue servers, file servers/managers, etc..) can all interact together in a single transaction. For example, lets say that you have web application where in a client request results in a job being picked up from a queue for processing and simultaneously you update the status of the job in a DB and also create a new file associated with the start of the job. Very evidently all the three resource managers and the web application itself can be (very likely will be) on different nodes. With something like DTC you can easily create a new transaction, send across a commit message and you will get a success notification only if all three actions were successful or else none of the actions go through. Of course, this is possible only if all the three resource managers involved here adhere to the Microsoft's DTC specification and provide the necessary interface to work with it.

The previous example might make DTC appear like this Jason Bourne kind of super dude who can take care of all the heavy lifting and also do it very efficiently. But remember even Bourne gets shot at and also loses his girl. So DTC is not fully immune to problems either. Here is one blog post titled "My beef with MSDTC and two phase commits". It is definitely worth reading. Note that my impression about DTC is purely based on reading the documentation. I have not written a single line of code using DTC.

No comments:

Post a Comment