OnSwipe redirect code

Showing posts with label Networks. Show all posts
Showing posts with label Networks. Show all posts

Thursday, March 15, 2012

NAS and SAN explained -- with technical differences.

Acronyms and fancy buzz words (specifically computer science related ones) have always troubled me, at times making me very angry at the person using them and in many cases leaving me in a confused state eventually. So whenever I come across such acronyms/buzz words I try to dissect them and prepare a mental visual map that I will use every time the acronym is used in the future. The acronyms for this write up are NAS (Network Attached Storage) and SAN (Storage Area Network).

These might be very simple and obvious things for many people but I am sure I have lost quite a bit of my hair whenever someone mentioned these acronyms to me. So here is my attempt to decipher them.

First the basics. Both of these consist of two building blocks storage and network, or to put in a less naive manner both SAN and NAS allow applications on one machine to access data present on another machine. Okay, so why two names, why two acronyms? To answer that let me just take up these two building blocks separately.

In the simplest sense "Storage" means dealing with files stored on the hard disk attached to the system. We do that with the the APIs (or "methods" if you want to avoid the acronym) made available by the filesystem and libraries built using those methods. As an application programmer we almost never worry about how the files are actually stored on the disk. It is the responsibility of the filesystem, the kernel and the disk driver. The application always views the data stored on the disk in terms of files (used in a generic sense to refer to both files and directories) - more so as a stream of bytes. If we dig a little deeper we find that these disks are actually made available to the filesystem by the disk drivers as block devices - i.e. whenever they accept or return data they do it in quantum of blocks. A disk doesn't return you a single byte of data when you read from it. It always returns one or more blocks. From what I understand the size of a block these days is typically 4KB. Amount of data transferred to or from the disk is a multiple of this block size. Allocation of space for files is also made in terms of blocks, which some times leads to a file utilizing the last block partially (and that is why we see the difference in the actual file size and file size of disk entries).

That's about storage. To summarize; data is made available as files by filesystem software, but the device actually makes it available as blocks.

Network in the simplest sense is communication between two processes - running either on the same machine or on different machines. To simplify it further let's just limit to the case of communication between two processes on two different machines. Typically one of these two processes will be a server processes and the other a client process. The server process would be listening on a specified port to which the client can connect. The client can then send requests over the connection which the server will "serve", by sending back a suitable response. The format of the request and the response are specified before hand and the client and the server agree to conform to that specification. This conformance is what is called the "protocol" which the two processes (or in this case the two machines) are using for their communication. The client typically asks for some data and the server fetches it from some place and sends the requested data as response. The client doesn't know where the server is fetching the data from and the server doesn't know what the client is doing with the data. The protocol is all that matters to them.

That's network. No summary here.

Okay, so how do storage and network come together now?

In the storage example the data on the hard disk (referred to as "our hard disk" henceforth) was being accessed by the applications running on the same machine (referred to as the "host machine" henceforth). Now what if applications running on a different machine (referred to as the "new machine" henceforth) want to access the data on our hard disk? Let us call this requirement as "remote data access".

The traditional filesystem software is designed to interact with a disk that was made available to it on the local system by the disk driver and the driver is designed to handle a disk that is attached to this local system. For our "remote data access" either the filesystem software has to get smarter and start talking to the device available on our host machine or the disk driver has to become smarter and make the disk on our host machine available as a local device on the new machine. It is these two options that the two acronyms stand for. One acronym means a smarter filesystem software with the same old driver and another means a smarter driver with the same old filesystem. That's the difference between the two and the reason there are two names and two acronyms.. !

NAS - Network Attached Storage -- This one has a smarter filesystem and the same old driver. In our setup, the filesystem on the "new machine" knows that the disk is on the "host machine" and every time an application requests a file (either for reading or writing) it has to contact the "host machine" over network and retrieve the file. In other words the filesystem on the "new machine" makes a request to the "host machine" - making it a client process. To accept and respond to that request there must be a server process running on the "host machine". This server process fetches the requested file from the disk (using the old driver) and sends it back to the client. The client process, which is the filesystem software, in turn makes that file available to the application that requested it. We can see that the data on the server is made available to the client as a file. This is what defines NAS.

So for the filesystem software to get smart, it now needs two components - a client part used by the applications and the server part which handles the disk. There are quite a few such "smart filesystem software" out there. The most common in the UNIX/LINUX world is NFS - Network File System. The server part of NFS is named "nfsd". On the client side, the standard "mount" command is smart enough to mount drives with "nfs" filesystem type.

Note that here the filesystem software is aware that the disk (and hence the data) is on a remote machine. This is another defining trait of NAS.

More details are available here : http://nfs.sourceforge.net/ and here : https://help.ubuntu.com/8.04/serverguide/C/network-file-system.html

SAN - Storage Area Network -- This one has a smarter disk driver and the same old filesystem. The disk driver on the "new machine" lies to the OS and the filesystem software that there is a disk attached to the system locally. The OS and the filesystem software believe the driver and continue to use the fake disk that the driver provided. Whenever the disk driver is asked to fetch a block (not a file, a block), it in turn sends a request to the "host machine" and retrieves that block of data - thereby becoming the client process in the setup. Accordingly there will be a server process running on the "host machine" which accepts this request, fetches the corresponding block from the actual disk and sends it back to the client. The client, which is the smart disk driver in this case, in turn passes that data to the filesystem software and eventually to the application that requested the file data. It is evident here that the data on the server was made available to the client as "blocks" and not as files. This is what defines SAN.

Note that here the filesystem (and every other component apart from disk driver) is not aware that the disk (and the data) is on a remote machine. This is another defining trait of SAN.

A very common and popular appearance of SAN these days is in the various cloud offerings. For instance the Amazon cloud offering has a service named EBS - Elastic Block Storage, which makes network storage available as locally attached disk. We can have all the regular filesystems like ext4 or xfs on top of this EBS drive.

That's it. The two acronyms have been conquered... !

Thursday, February 2, 2012

Network does not work in Ubuntu after Hibernate and Resume

I run Ubuntu 10.04 (Lucid Lynx) in VMWare player and Ubuntu has this habit of silently hibernating when it gets a report that battery level is low. Now VMWare player doesn't do a good job in reporting the right battery status and that leads to the virtual machine just hibernating without asking me anything when I will be in the middle of something. When I restart the virtual machine and resume the system there would be no network..!! I would then close all my open applications - editors, DB, rails etc etc.. and reboot the VM. This was pain.

Today I finally found a solution for this. Turns out that the problem lies with the networking module being used. In my VM I use VMWare player's vmxnet module. I just removed the module and re-added it and that worked. Just two simple commands

sudo modprobe -r vmxnet
sudo modprobe vmxnet

If you are not running Ubuntu as a VM in VMWare player your network module name will be different. lsmod might help you find out which you are using.

Friday, November 25, 2011

Transactions - both single node and distributed - are hardwired in Windows - since Win 95

Transactions or "Atomic Transactions" to be precise, are very well known to anyone who has worked with databases. With the recent advent of NoSQL databases and the CAP theorem being used/abused by anyone and everyone, words like "consistency" and "transactional model" have become run-of-the-mill jargon. But what is actually interesting is that the concept of transactions or transactional model goes beyond our typical RDBMS. Things get even more challenging when we try to achieve transactions in a distributed system. Because transactions inherently lock the resource(s)/data they are operating on until the transaction completes, those resources can become inaccessible altogether very easily in a distributed setup if one of the node fails or if there is some problem with the network or any such thing, there by increasing the complexity of implementing distributed transactions by many folds compared to transactions on a single node.

Today I was trying to figure out if there is a way to "simulate" (albeit it will be very crude) some sort of transactions in my application which uses MongoDB (which doesn't support transactions by design - to avoid the locking mentioned above, although ironically there is a global write lock..!!). Searching on the internet lead me to this blog of a Raven DB developer. The author there mentions that RavenDB supports both sharding and transactions, which means it has implemented distributed transaction support. At first read I was pretty impressed (this was the first time I had heard about RavenDB). Before I could ask the author about the implementation details I saw a comment in which the author had mentioned that they use DTC (which again was a new thing). Turns out DTC, Distributed Transaction Controller, is a service that is baked right in the Windows OS itself, that too dating back to the Windows 95 days (wow.. now I am impressed with Windows..!). Here is the MSDN article describing the service.

The MSDN article clearly explains the basics of distributed transactions and how it is modeled. What is worth noting is that, by abstracting out the code for carrying out distributed transactions as a service, multiple resource managers (like different databases, queue servers, file servers/managers, etc..) can all interact together in a single transaction. For example, lets say that you have web application where in a client request results in a job being picked up from a queue for processing and simultaneously you update the status of the job in a DB and also create a new file associated with the start of the job. Very evidently all the three resource managers and the web application itself can be (very likely will be) on different nodes. With something like DTC you can easily create a new transaction, send across a commit message and you will get a success notification only if all three actions were successful or else none of the actions go through. Of course, this is possible only if all the three resource managers involved here adhere to the Microsoft's DTC specification and provide the necessary interface to work with it.

The previous example might make DTC appear like this Jason Bourne kind of super dude who can take care of all the heavy lifting and also do it very efficiently. But remember even Bourne gets shot at and also loses his girl. So DTC is not fully immune to problems either. Here is one blog post titled "My beef with MSDTC and two phase commits". It is definitely worth reading. Note that my impression about DTC is purely based on reading the documentation. I have not written a single line of code using DTC.

Wednesday, November 16, 2011

Microsoft's Virtual Wifi adapter ( or virtual wifi card) -- cool technology

I wasn't aware of the very interesting research on Virtual Wifi Adapters that Microsoft guys have been carrying out. Apparently they have been doing it for quite some time now. What this research group is trying to do is basically allow us to have an unlimited number of "virtual" wireless cards on our computers, each connecting to a different wireless connection, and all of it using just a single physical card. That is some awesome stuff.. !

A couple of days ago I opened up the Dell Support Center tool on my laptop and it popped up a message saying a device on my system is in the disabled state. I was pretty startled to see that, as pretty much every device on my laptop is used by me daily. On clicking the message it told me that the disabled device is "Microsoft Viritual Wifi Miniport". That did not make any sense to me. I had absolutely no clue about this device.

Searching the internet led me to this Microsoft page (along with several others, of course) which gave me a fair idea of what this device might be, but nothing concrete. It was this fine article on istartedsomething.com that clearly explained what this is all about. In the same article the author tells us that Microsoft has been carrying out research in this regard for a few years now. But nothing was given to end users until Windows 7 baked in this wifi card virtualization natively. And not just that, all WiFi card providers are expected to add support for this virtualization in their drivers if they want their drivers to be properly signed digitally and recognized by Windows during installation. I say that is "Wicked cool".. :)

About the technology itself, it can be described as a way to make "software copies" of your Wireless card and use those copies to "connect to multiple networks simultaneously".  Although research prototypes can apparently create any number of virtual devices over the single actual hardware device, Windows 7 limits it to just one copy/virtual device.

This whole research is doubly fascinating.

First because the applications of this research work are very interesting. One such application is explained in the article mentioned above. It talks about being able to connect to an existing wireless access point with your laptop and at the same time making your laptop a wireless access point in itself. It means, if someone is far from the actual access point and your laptop happens to be closer, he/she can connect to your laptop instead and your laptop will forward their connections to the actual wireless access point. Of course, this can only happen when the two laptops involved are in the same security/trust group. I wouldn't go on and connect via some random stranger's laptop. It is like letting that person look at all the data coming in and going out of your computer over the internet (or network in general). Despite such caveats, this is very much a practical use case. May be you wouldn't use it to be a hop in the network (or more like a virtual signal booster), but you may use it to make P2P/direct connections with another laptop close by for sharing files instead of doing over the wireless LAN. Or, if access to wireless network is possible only after you authenticate via certificate (like in a corporate setup) and the certificates can be put only one of your laptops (the official company laptop), the connection sharing will indeed come in handy.

Secondly, and more importantly, the complexities associated with this are lot and that makes it all the more exciting. If we delve a little deeper into what this virtual adapter is and how it works, we will see that it is actually a piece of software sitting between the actual device driver and the rest of the network stack (i.e. all above the MAC layer in the OSI model). This little piece of software is supposed to appear as one or more "devices" to the OS and hence it invariably has to have its own device driver. That is the "Virtual WiFi Filter" driver or VWifi driver. This VWifi driver tells the OS that there are multiple wireless cards and the OS then allows the user to connect to different available wireless connections via these virtual cards. But note that all this time, there is only one physical card and hence at any given point in time that one physical card can be connected to (or can communicate with) only one wireless network. It is the job of the virtual adapter software to cycle over all the virtual wireless cards and service the network requests made through them using the one physical card, in a time shared manner all the while keeping it transparent to the user. Although it sounds very similar to the kernel's process scheduling which makes use of the single processor in a time shared manner, this is actually somewhat different because of the way wireless networks work.

Note that different wireless networks behave differently. They might be operating at different frequencies, they might be having different authentication/encryption schemes, the bandwidth might be different and probably many other factors that I can't think of right now. So every time the actual wireless card switches to a different connection, there can be a shift in all or some of the above mentioned attributes. The card might have to step up or down its operating frequency, change to a different encryption scheme and all of this on the fly. Now that is a lot of work to do and in fact all of this switching can just drag the network performance to the ground. This makes the design and implementation of the virtualization software pretty challenging. This and many other challenges/caveats are discussed in this Microsoft Research's FAQ page.

I have been very excited about this research work ever since I read it and have been meaning to try out writing some network code using the virtual adapter. Sadly I have zero experience in network programming in Windows and currently don't have enough time to read up all of that. I hope such a thing will come up in Linux some time soon, if it isn't already there.

Saturday, June 26, 2010

What is Cloud? -- Simple terms please

Cloud has been making a lot of noise and almost every tech (or tech related) person knows about it or at least heard of it. Now for those who have just heard about it but do not know what it means here is a quick definition from Dave Neilsen, the founder of Cloud-Camp. He says, "For something to be called cloud, it should have these properties :

  • Hosted by someone else
  • On-demand. Do not have to wait or call somebody to get it.
  • Metered somehow. So you know exactly how much you are using and how much you are paying.
  • Scalable, both ways - up and down as and when you require."
He goes on to say that Cloud could mean different things for different people. Here area few examples stating what cloud is for a particular person :

For an IT guy -- Infrastructure as Service
For a Web Developer -- Platform. Just dump your code and don't worry what runs it.
For a Business guy -- SaaS (Software as a Service)

That was pretty neat. Helps me answer the standard question "What the hell is this cloud thing?" in a sane manner. Earlier I could never figure out what a proper answer should be for this question, because there was so much to tell.

Here is my attempt to elaborate on above mentioned examples.

So cloud is basically having the infrastructure to do what you do hosted by someone else and having it totally scalable. For example, in the above list, for a web developer cloud is a platform where he can dump his code and expect it to run as he has designed it. He does not worry about the machines, the network connectivity, the bandwidth. He just pays for those in a metered manner. He scales his platform whenever he wants. He can increase his bandwidth quota, move to a better machine, increase the number of machines and all of this without calling the customer care or the sales guy. He will do it by logging into the cloud services website or he would have a script do this for him automatically, i.e if he is geek enough.

Similarly for a business man, it is software as a service. E-mail service would probably be a good example. The business man does not know what software runs the email system, he does not worry about what version of email server is running, what os it is running on, what DB it is using to store the emails, what protocols it is making use of. If the email contents are not that sensitive he would not even worry about the physical location of the servers storing these emails. He just buys the email software as a service and uses it. All that he probably worries about is how many email accounts are available to him/his company and how reliable/usable they are. At any point he can increase or decrease the number of accounts, once again without making a call.

That's cloud .

Note : I got this definition from one of the IBM developerWorks podcasts which is available here.

Oh, and remember, all this time every reference to Cloud meant "Cloud Computing", not just plain "cloud"

Tuesday, October 6, 2009

OpenSSL base64 filter BIO needs an EOL and memory BIO needs to know about EOF

I recently started working with the OpenSSL library to do some https stuff (sort of obviously). OpenSSL apart from having an implementation for the SSL encryption part, it also nifty algorithms for certificate handling and more importantly an abstract I/O layer implementation called BIO which probably stands for Basic I/O or Buffered I/O or something else. I do not know, I could not find it. Nevertheless, the items of interest here are the BIO_f_base64() -- The base64 encode/decode filter BIO and the BIO_s_mem() -- The memory BIO, which can hold data in a memory buffer.

The BIO man page (or its online version present here: http://www.openssl.org/docs/crypto/bio.html) give a nice introduction. For now just consider BIOs as black boxes from which you can read or write data. If the BIO is a filter BIO then the data will be processed whenever you read or write to it.

The name, BIO_f_base64, says it all about the functionality of this BIO. If you read from this BIO, then whatever data is being read is first base64 decoded and given to you. OTOH, if you write something to this BIO it will be base64 encoded and then written to the destination. These BIOs can be arranged in the form of chains to do a series of processing on the data that you are reading or writing, all by just a single call to read() or write(). Its all abstracted. Saves a lot of time.

I was trying to decode some base64 encoded data which I had in a buffer, a char [] to be precise. So if you read up about the BIOs it becomes obvious that you first have to create a memory BIO, which will hold the actual encoded data. Write the encoded data to the memory BIO. Then you chain that memory BIO with a base64 BIO and read from that chain. Any data that you read from the chain will actually come from the memory BIO, but before it reaches you it passes through the base64 BIO. So essentially you are reading from the base64 BIO. As mentioned in the earlier paragraph, when you read from a base64 BIO it decodes the data and gives it you. So the base64 encoded data present in the memory BIO is decoded and presented to you. That's it. base64 decoding is done in one simple read() call !

But there is small catch here. For some reason, which I have yet partially understood, base64 requires that the data it is handling be terminated with a new-line character always. If the data does have any newline character, meaning all your data is present in a single line then you have to explicitly tell that to the BIO by setting the appropriate flag. Here is what the man page says:

The flag BIO_FLAGS_BASE64_NO_NL can be set with BIO_set_flags() to encode the data all on one line or expect the data to be all on one line.

That's about the base64's EOL. Now the other BIO involved here,the memory BIO, is also an interesting guy. When the data it has gets over, it doesn't say "Hey, its over, stop it!". Instead it says "Dude, you got to wait for some more data to arrive. Hang on and keep trying". !!! This is very much suitable, probably when you using the BIO like a PIPE, where you keep pumping data from one end by acquiring it from somewhere and some other guy consumes that data. But in a situation like mine where the data is all fixed I simply want it to tell that the data is all over and I need to stop it. To do this again I will have to explicitly set an appropriate flag and here is what the man page says:

BIO_set_mem_eof_return() sets the behaviour of memory BIO b when it is empty. If the v is zero then an empty memory BIO will
return EOF (that is it will return zero and BIO_should_retry(b) will be false. If v is non zero then it will return v when it
is empty and it will set the read retry flag (that is BIO_read_retry(b) is true). To avoid ambiguity with a normal positive
return value v should be set to a negative value, typically -1.

And this same thing is explained very well here: http://www.openssl.org/support/faq.html#PROG15.

I thank Dr. Stephen N Herson of the OpenSSL project for helping me out in understanding this. Here is the mailing list posting that taught me this thing : http://groups.google.com/group/mailing.openssl.users/browse_thread/thread/f0fc310c1bc6ec65#

Happy BIOing. :-)





Wednesday, July 22, 2009

Getting the size of an already loaded page (from cache) in a Firefox extension.

Today this question came up in the IRC (moznet, #extdev). One of the add-on developers wanted to get the size of the page, either bytes or number of characters. The most obvious thing that came to my mind was progress listeners for definitive answers or the content length from the channel for not so critical scenario. But then he said he wants it for an already loaded page. And he further said that the information is already there somewhere as it is shown by the Page Info dialog (Right Click on a web page and select View Page Info). He was indeed right. Somebody in the code is already going through the trouble of calculating the data size and we can just re-use that. And I immediately started the quest to find that out.

As usual to figure out any browser component I opened up DOM Inspector. That tool is improving, which was against my earlier perception (Sorry Shawn Wilsher), though the highlighting part is still screwed up. Nevertheless, locating that particular label "Size" and the textbox in front of it containing the value was not difficult at all. I got the "id" of the textbox containing the size value. (Its "sizetext" :) ).

Next it was MXR (http://mxr.moziila.org/) in action. I did a text search for the id and got a bunch of results, one of which was pageInfo.js with this entry : line 489 -- setItemValue("sizetext", sizeText); . It is here. The very line made it apparent that it is the place where the value is being set and hence it is the place from where I can get to know how the value is being calculated.

Once I saw the code it was very clear and straight forward and pretty simple also. We have the URL. From the URL we get the cache entry for that URL. (Every cache entry has a key and that key is the URL - so neat). We try to get the cache entry from the HTTP Session first and if that fails we try FTP Session. The cache entry has the size as an attribute on itself, so its just getting that attribute value. DONE.

I am not sure how this will behave if we have disabled every type of cache. AFAIK, there will still be some in-memory cache as long as the page is still loaded. Probably good enough.

That was the end of a small but interesting quest. :-)

Tuesday, March 3, 2009

Browsers are undergoing continuous innovation.

It was sometime since I blogged about anything. I have had several things on mind and many of them are presents as drafts. But this one really caught my attention and I felt I should put in my thoughts about this.

Until a couple of years, people rarely looked beyond Internet Explorer for their browsing experience, though they kept cursing it a lot. After that came the Mozilla Firefox web browser with its pack of addons allowing users to actually customize for their needs. They could actually make the browser do what they wanted it to do and not just set some options. The browser wars had started again. At least I started reading about browsers and started following up on things happening with these browsers. Opera and Safari and a few other, actually used, browsers were not really "news" as such.
This was dying off. Firefox had created a sizable chunk of user base and was pretty stable with it. Except for few traditional enhancements like memory optimizations, bug fixes, etc.. nothing big was happening. It was a lot silent.

Then came the next wave with Google announcing the release of its browser with a nice, easy to understand "comic" book. It came with a whole new paradigm for building browsers with the "process-per-tab" concept. It was really innovative. Though discussions about this happened in other browser communities also, Google Chrome was the first one to implement it. Also it boasted of its super-fast V8 JavaScript Engine and also the browser UI, which gave more screen real estate to content than chrome. This was a real big thing and made the browser wave go much higher than what it had ever been. With the Google branding a vast majority of internet users rushed to have a sneak peak at Google Chrome or may be try it or even keep it as their regular browser. There was a lot of noise about this and people indeed listened. Mozilla and IE people were not silent and did responded very well. Mozilla came up with its ultra-fast "Tracemonkey" JS engine, implementing the trace trees and there by making it much faster than V8. IE8 also has the "process-per-tab" and "private browsing" features first presented by Chrome. But like any other sound this too dampened a little after it was created and browsers were back in the silent phase doing traditional improvements and bug fixes. At least that's how I percieved the situation.

[Edit 06-Mar-09 : Shawn Wilsher suggest that Tracemonkey had appeared before Chrome made the pubic appearance. He certainly knows these things better than me and I believe he is right. But still I wanted to keep my original post as it is and instead I put this separate edit note. :) ]

But I was clearly, totally wrong. People have realized that internet is the place to be in the future and thats where a large part of our life will be. With browser being the main and central interface for people to use that internet, it makes a real good sense to make this browser as robust and reliable as possible. New things keep coming on the internet, both and bad, the latter being more often, and the browser has to keep up with all of it. What we thought yesterday as being a good design apporach might just look senseless tomorrow. There just seems no end to it and researchers appear all ready for it. I am coming across so many innovations happening in the browser domain. This article : Researchers Say Gazelle Browser Offers Better Security -- Campus Technology -- gives us an idea about how much effort the scientists are putting in making our internet lives better. The article is about Gazelle, but it also mentions about another experimental browser OP.

This browser Gazelle, uses the Trident rendering engine (used by IE) but builds upon a OS based process architecture where websites form the processes and they communicate by passing messages like IPC (Inter-process communication).

I am not yet sure how this will all work out. Probably the websites need to built with some intelligence so that it can do the so called IPC when it actually has to. But the evoltion in browsers is for sure. They will not be same as they are now. Several things are going on along the UI front. When these things mature and come togther you can have the scenes of your favourite sci-fi movie scenes right in front of you everyday.

Lets just hope these come soon enough for people like me to enjoy, not when I am all grey hair and toothless.

Hari Om.

Friday, November 21, 2008

Editing remote files smoothly in vim on windows

I have a laptop and a desktop. Desktop runs Ubuntu, by choice, and laptop has to run Windows, by force. But most of my work happens on the Linux desktop itself. So when I am away from the desktop I would login to my desktop via SSH using PuTTy. It works fine when I am still on the corporate LAN, but the problem starts when I go home and get on to VPN. The speed and the responsiveness simply demotivates me and I tend to waste a lot of time specially when I am editing files with vim, because every keystroke has to travel across the network to my desktop and the response is to be sent back to my laptop. Coding really becomes hell with this.

Recently I got to know that VIM has identified this problem quite some time back and has a solution in store. You can open a remote file over SCP, where in VIM would bring down that file to the local system and store in some temp location. You edit that temp file, with VIM running on your own machine. So do not have to wait for keystrokes to be processed by the remote machine. When you write the file, VIM updates the remote file using SCP.

[Note]: If the file is read only, w! will not work. The user account used for SCP must have write permisson for the file you are editing. Otherwise, obviously, remote write fails and VIM will promptly report it.

Look at this page for the syntax and more details.

This would be straight forward on a linux box as both vim and scp come packed with the OS and they are in the shell execution path and everything is set up by default. Things need some extra work on Windows.

First obvious thing is to install VIM. Then you would need a SCP program. And once again PuTTy comes to rescure. They have a PSCP.exe, which makes you feel at home even on Windows. Get it here.

To improve this a little more you can rename the PSCP.exe to scp.exe and place it in "C:\Windows\System32\" so that it will be picked up from everywhere at the command prompt. Also note that you can use your PuTTy saved sessions directly with PSCP.

Happy remote VIMming. :-)

Hari Om

Tuesday, October 21, 2008

The ISP Cat and Mouse game and CDNs finally benefiting out of it.

Politics and policies are everywhere. They say policies are made to govern us and I say many of those are there for inertia - resistance to change. The big and powerful want to be so, always and do not want others to get there. This is well known and very much a cliche. But what has this got to do with ISPs in specific? Here we go:

ISPs are the people who sort of own the "Internet network" physically. It is they who actually connect the various computers by physical cables. And yes, that is why we pay them. For getting us connected to the rest of the world. Now there is no single ISP who has his cables connected to all the computers in the world. In fact there is no one who can even boast about a majority stake in the market. So obviously when the data travels through the internet, it goes through the infrastructure laid and maintained by different ISPs. A simple example will illustrate this :

Let's say a user is connected to the internet via Bharti Airtel connection and he is trying to access, say Indian Railways website, which is hosted on a machine connected to internet, for the sake of illustration, via BSNL connection. So the path of the request from client to the server would involve, both Airtel network path and BSNL network path. The client sends the request to the Airtel server. The Airtel network will route the data in its own network to the extent possible. At one point it needs to request BSNL network to take up the data and then give it to the destination server machine. Now at this cross-over point, Airtel is requesting some service from BSNL. Essentially Airtel is making use of the BSNL's network infrastructure to carry its data. Now there is no point in BSNL giving this for free. So obviously it charges Airtel some amout of money. Airtel does not mind paying it as it mostly gets translated to user charges. This is not really the issue. Problem would arise when BSNL will refuse to take the request and Airtel will have find some other alternate path, which generally ends up to be very very long. Consider this:

Clinet -> Last Airtel machine (router) -- m number of hops
Last Airtel machine -> Destination machine in BSNL network (Direct path) -- 4 hops.
Last Airtel machine -> Destination machine in BSNL network (Indirect path via some other ISP or via some other cross-over point) -- 20 hops

So totally the data has to do m+4 hops if BSNL takes up the request from last Airtel machine. At a time when BSNL is experiencing some heavy traffic in the region where the Airtel-BSNL crossover is happening, it would not be willing to accept more data, that too from a different ISP. So they follow two techniques here:

1. Simply drop the data packets, which will result in bad experience for the end user.
2. As routing happens based on least number of hops, the first BSNL server at the crossover point, will tell the last Airtel machine that the number of hops to the destination machine is actually 25 hops even though it is totally wrong. As a result the last Airtel machine will instead choose the indirect path with 20 hops. This will obviously slows down the internet and again result in bad experience for the end user.

Now you see how policies and profits affect technology. This is, as stated by an electronics professor at my college SJCE, TECHNO-POLITICS.

The solution for this would be to make the data available in every ISPs own network. And that is precisely what the CDN - Content Delivery Network - companies do. These companies have a huge number or servers placed in various parts of the worlds. In most cases they are placed in the data centers of these ISPs. It is symbiosis. With CDNs placing their servers in ISP's data centers, the ISP has a lot of data in its own network, even though the original website (or content owner) might be using a different ISP. This avoids a lot of requests to different ISPs and there by reduce costs significantly. In return the CDN companies get a very sweet deal for the rack space for their machines

Saturday, August 30, 2008

offline cache discussion with campd

[ 2:20 am] <brahmana> hi all,
[ 2:20 am] <brahmana> Looks like my earlier question about offline cache got lost here...
[ 2:21 am] <brahmana> I read the HTML 5 spec and understood that the cache will be versioned and hence multiple versions of the same cached element will be present on the client's disk. Is that so?
[ 2:22 am] <campd> yeah
[ 2:22 am] <campd> as of 3.1
[ 2:22 am] <campd> in 3.0, there's only one version
[ 2:23 am] <brahmana> ok..
[ 2:23 am] <campd> brahmana: though they won't stick around long
[ 2:23 am] <brahmana> ok..
[ 2:23 am] <campd> brahmana: when you visit a page, it'll download a new version. Once any page using he old version is navigated away from, it is cleaned up
[ 2:25 am] <brahmana> campd, So everytime the user goes online and visits a website which has offline cache, the cache is refreshed provided no page is using the old cache.
[ 2:26 am] <campd> brahmana: sorta
[ 2:26 am] <campd> brahmana: every time they visit an offline cached website
[ 2:26 am] <campd> brahmana: it will check for a new version of the cache manifest
[ 2:26 am] <brahmana> ok..
[ 2:27 am] <campd> brahmana: if there's a new manifest, a new version of the cache will be created and fetched
[ 2:27 am] <brahmana> campd, ok.. answers my question fully..
[ 2:27 am] <campd> cool
[ 2:27 am] <brahmana> campd, However I have another question..
[ 2:27 am] <campd> ok
[ 2:28 am] <brahmana> Now is this cache application specific? As in if a image with the same src is referenced by two websites, the image will be cached separately for each webapp?
[ 2:28 am] <campd> yes.
[ 2:29 am] <brahmana> ok..
[ 2:30 am] <brahmana> campd, Will this offline cache in anyway affect the regular browsing when the user is online?
[ 2:30 am] <campd> if they're online, browsing an offline app, it will be loaded from the cache first
[ 2:30 am] <campd> it won't affect online browsing of non-offline pages
[ 2:31 am] <brahmana> ok..
[ 2:31 am] <campd> so if http://www.foo.com/offline.html is an offline app that references http://www.bar.com/another.html
[ 2:31 am] <campd> going to http://www.bar.com/another.html will NOT load it from the offline cache
[ 2:31 am] <campd> but going to http://www.foo.com/offline.html WILL be loaded from the offline cache
[ 2:32 am] <brahmana> okay..
[ 2:33 am] <brahmana> campd, Regarding the local storage, Can it be looked at as an extended form of what currently is cookie?
[ 2:34 am] <campd> kinda, yeah
[ 2:34 am] <brahmana> Is there any limit on the amount of data that each web-app gets on this local storage?
[ 2:35 am] <campd> yep
[ 2:35 am] <brahmana> Because the spec says that the web-app can use this to store user created _documents_
[ 2:35 am] <campd> 5 megs for the etld+1
[ 2:35 am] <campd> if the domain has the offline-app permission it gets more, but I forget the exact number
[ 2:35 am] <mconnor> campd: is that right? I thought I remembered some wacky combination thing
[ 2:36 am] <brahmana> oh.. ok.. thats pretty big space..
[ 2:36 am] <campd> (which I assume is the wacky combination mconnor's referring to ;))
[ 2:36 am] <mconnor> no
[ 2:36 am] <mconnor> it was something like "foo.bar.com can have 3 MB, and bar.com can have 2 MB" or something
[ 2:36 am] <mconnor> in whatever combination
[ 2:37 am] <mconnor> maybe that was the spec that got deprecated?
[ 2:37 am] <campd> I think right now it's just "5 for the whole etld"
[ 2:37 am] <campd> err, etld+1