OnSwipe redirect code

Thursday, March 15, 2012

NAS and SAN explained -- with technical differences.

Acronyms and fancy buzz words (specifically computer science related ones) have always troubled me, at times making me very angry at the person using them and in many cases leaving me in a confused state eventually. So whenever I come across such acronyms/buzz words I try to dissect them and prepare a mental visual map that I will use every time the acronym is used in the future. The acronyms for this write up are NAS (Network Attached Storage) and SAN (Storage Area Network).

These might be very simple and obvious things for many people but I am sure I have lost quite a bit of my hair whenever someone mentioned these acronyms to me. So here is my attempt to decipher them.

First the basics. Both of these consist of two building blocks storage and network, or to put in a less naive manner both SAN and NAS allow applications on one machine to access data present on another machine. Okay, so why two names, why two acronyms? To answer that let me just take up these two building blocks separately.

In the simplest sense "Storage" means dealing with files stored on the hard disk attached to the system. We do that with the the APIs (or "methods" if you want to avoid the acronym) made available by the filesystem and libraries built using those methods. As an application programmer we almost never worry about how the files are actually stored on the disk. It is the responsibility of the filesystem, the kernel and the disk driver. The application always views the data stored on the disk in terms of files (used in a generic sense to refer to both files and directories) - more so as a stream of bytes. If we dig a little deeper we find that these disks are actually made available to the filesystem by the disk drivers as block devices - i.e. whenever they accept or return data they do it in quantum of blocks. A disk doesn't return you a single byte of data when you read from it. It always returns one or more blocks. From what I understand the size of a block these days is typically 4KB. Amount of data transferred to or from the disk is a multiple of this block size. Allocation of space for files is also made in terms of blocks, which some times leads to a file utilizing the last block partially (and that is why we see the difference in the actual file size and file size of disk entries).

That's about storage. To summarize; data is made available as files by filesystem software, but the device actually makes it available as blocks.

Network in the simplest sense is communication between two processes - running either on the same machine or on different machines. To simplify it further let's just limit to the case of communication between two processes on two different machines. Typically one of these two processes will be a server processes and the other a client process. The server process would be listening on a specified port to which the client can connect. The client can then send requests over the connection which the server will "serve", by sending back a suitable response. The format of the request and the response are specified before hand and the client and the server agree to conform to that specification. This conformance is what is called the "protocol" which the two processes (or in this case the two machines) are using for their communication. The client typically asks for some data and the server fetches it from some place and sends the requested data as response. The client doesn't know where the server is fetching the data from and the server doesn't know what the client is doing with the data. The protocol is all that matters to them.

That's network. No summary here.

Okay, so how do storage and network come together now?

In the storage example the data on the hard disk (referred to as "our hard disk" henceforth) was being accessed by the applications running on the same machine (referred to as the "host machine" henceforth). Now what if applications running on a different machine (referred to as the "new machine" henceforth) want to access the data on our hard disk? Let us call this requirement as "remote data access".

The traditional filesystem software is designed to interact with a disk that was made available to it on the local system by the disk driver and the driver is designed to handle a disk that is attached to this local system. For our "remote data access" either the filesystem software has to get smarter and start talking to the device available on our host machine or the disk driver has to become smarter and make the disk on our host machine available as a local device on the new machine. It is these two options that the two acronyms stand for. One acronym means a smarter filesystem software with the same old driver and another means a smarter driver with the same old filesystem. That's the difference between the two and the reason there are two names and two acronyms.. !

NAS - Network Attached Storage -- This one has a smarter filesystem and the same old driver. In our setup, the filesystem on the "new machine" knows that the disk is on the "host machine" and every time an application requests a file (either for reading or writing) it has to contact the "host machine" over network and retrieve the file. In other words the filesystem on the "new machine" makes a request to the "host machine" - making it a client process. To accept and respond to that request there must be a server process running on the "host machine". This server process fetches the requested file from the disk (using the old driver) and sends it back to the client. The client process, which is the filesystem software, in turn makes that file available to the application that requested it. We can see that the data on the server is made available to the client as a file. This is what defines NAS.

So for the filesystem software to get smart, it now needs two components - a client part used by the applications and the server part which handles the disk. There are quite a few such "smart filesystem software" out there. The most common in the UNIX/LINUX world is NFS - Network File System. The server part of NFS is named "nfsd". On the client side, the standard "mount" command is smart enough to mount drives with "nfs" filesystem type.

Note that here the filesystem software is aware that the disk (and hence the data) is on a remote machine. This is another defining trait of NAS.

More details are available here : http://nfs.sourceforge.net/ and here : https://help.ubuntu.com/8.04/serverguide/C/network-file-system.html

SAN - Storage Area Network -- This one has a smarter disk driver and the same old filesystem. The disk driver on the "new machine" lies to the OS and the filesystem software that there is a disk attached to the system locally. The OS and the filesystem software believe the driver and continue to use the fake disk that the driver provided. Whenever the disk driver is asked to fetch a block (not a file, a block), it in turn sends a request to the "host machine" and retrieves that block of data - thereby becoming the client process in the setup. Accordingly there will be a server process running on the "host machine" which accepts this request, fetches the corresponding block from the actual disk and sends it back to the client. The client, which is the smart disk driver in this case, in turn passes that data to the filesystem software and eventually to the application that requested the file data. It is evident here that the data on the server was made available to the client as "blocks" and not as files. This is what defines SAN.

Note that here the filesystem (and every other component apart from disk driver) is not aware that the disk (and the data) is on a remote machine. This is another defining trait of SAN.

A very common and popular appearance of SAN these days is in the various cloud offerings. For instance the Amazon cloud offering has a service named EBS - Elastic Block Storage, which makes network storage available as locally attached disk. We can have all the regular filesystems like ext4 or xfs on top of this EBS drive.

That's it. The two acronyms have been conquered... !

3 comments:

  1. AWESOME !!!!
    SAN and NAS explained beautifully.
    Thanx a lot :D

    -Om

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. Thank you for publishing this. It's a great explanation that lays out the meaningful difference clearly. There are so many glib explanations of NAS vs SAN on the web that gloss over the fundamental difference between the two, just leading to more confusion.

    ReplyDelete