Monday, July 17, 2017

$PATH for the pathless

With the rise in popularity of scripting languages and their adoption in the industry a lot of first time professional programmers (a.k.a fresh graduates) are being exposed to the Linux command line environment and some times Linux as a system itself might be new to them. So they tend to have difficulty in understanding how a program is actually being executed and/or what all is needed to make a program run. This coupled with the magic spewed around by certain tools like rvm / chruby for Ruby and nvm for Node.js (and their ilk) are just adding to the confusion of how things are executed in the first place. I am not at all saying that such tools are bad. I myself use them and they are very helpful. But if a person is using them without getting the basics right, he/she is going to have a mighty problem wrapping their head around how things are running..!

In this context, I have been explaining what the $PATH means and what is its purpose a few times of late. So instead of repeating myself every time I thought I would just document what I know about $PATH to help the pathless get on the right PATH (or the PATH OF RIGHTEOUSNESS if you will).

GEEK NOTE : This post is aimed at absolute newbies, folks who just got started with something like Ubuntu and are fiddling around with some of the cool new technologies like Node.js / Ruby / . So this post will  not go into the depths nor will it try to be GEEK ACCURATE, i.e. it will not deal with things like POSIX compliance or the behaviour on non-Linux systems, etc. For that matter this will to a large extent be specific to Ubuntu and most likely Bash. We will also not try to distinguish between login and non-login interactive shells. As the screenshots shown below indicate, the assumed environment is running the "Terminal" application in Ubuntu.

So, lets start with the basics and a few assumptions.

Assumptions :

1. We are primarily interested in understanding how programs are run in the Linux command line environment.
2. We will restrict ourselves to the "Terminal" program of the Ubuntu system as the environment. However all / most of the following might be applicable to all such similar programs on different flavours of Linux.

Basics :

Whenever you launch the "Terminal" program what you are doing is actually launching a "Shell". Like with any type of software in the Linux world, choices are abundant and there are a lot of shell programs. However the "Terminal" program in Ubuntu (as of this writing) uses the "Bash" shell and that is what we will stick to. So on launching "Terminal" you are effectively launching the bash shell.

The "Terminal" application in Ubuntu 16.04


What is a Shell?

In the simplest terms, a Shell is a program which helps you run several other programs. It is like this eternal waiter who just waits for you to tell him what program to run and when you give him a program to run, it just runs it, tells you what was the result of running that program and goes back to waiting. Broadly speaking this is all that the shell does : Wait - Run a program - Show results - Wait. Wash-Rinse-Repeat.

If you have used the Linux command line (a.k.a the shell), you might be thinking that the shell certainly does a lot more than this. You might say "I run so many commands on the shell and it allows me to do a variety of things like search of files, list the directories, check the date etc etc. What about all those functionalities??!". Well, almost all of those actions are nothing but the shell running some program or the other. Almost every "command" that you have run are independent programs provided with the Linux system by default. Whenever you "run a command", you are just telling the shell to search for that program on the system and run it. The standard commands that most Linux users are familiar with are stored in certain predefined standard directories / locations. On Ubuntu those locations are "/bin" or "/usr/bin". While "/bin" has the core system commands, "/usr/bin" has the commands added by other softwares that are installed on the system.

A listing of the /bin directory showing system program executable files. Notice the programs corresponding to the most commonly used commands like "cp", "ls", "mkdir" etc
While these are what could be called as "standard system programs", the shell is not limited to these. The shell can run just about any program (technically, any executable file) on the system. All you have to do specify the full path to the location of that file on the command line and the shell will execute it, reporting the result of that execution. So if you have a file in a deep directory structure as shown below :

The full path to "my_program" will be /home/srirang/work/path_learning/custom_programs/my_program
When you have such a setup, to run the file "my_program" you just have to type the full path to it and hit enter. Shell will take the file at that location and run it.

/home/srirang/work/path_learning/custom_programs/my_program
There might be two questions now in your head :

1. For the standard commands I never specified the full path. I just specified the program name and shell ran it without any issues. How did that happen?
2. If I have to specify the full path for each and every time, it seems a mighty tedious way, especially for programs that I run very often. Is there no easier way?

The answer for both of the these  is the "PATH" environment variable.

You need not specify the full path to the executable program all the time. When you specify only the program name, shell has a mechanism to search for that program on the system and run it. Searching for a program in the entire system every time is massively inefficient. So the shell has a mechanism for us to tell the specific locations where the shell should search for a program whenever we do not specify the full path. The way to specify this is by putting all these locations in an environment variable called the "PATH".

Whenever a program name is specified with the full path, shell will take the directory locations specified in the "PATH" environment variable one by one and search for the program in each of those locations sequentially. So if you have one or more custom programs that you would be running frequently, then it is best to add the directory locations of those programs to the PATH variable.

Environment variables : Brief introduction :

As mentioned earlier PATH is an environment variable. This post will not go into the details of what environment variables are. For the sake of this discussion here are a few points to remember :
  1. Environment variables are like the settings for the shell. The values of the environment variables are used by the shell to decide how it should act.
  2. By convention all environment variables are specified using upper case alphabets, numbers and underscore.
  3. You can have as many variables as you want, although the shell will use only the ones it has specified.
  4. You can set the value of a environment variable like this :

    VARIABLE_NAME=value

    Examples:
    MY_VAR=1
    RUBY_VERSION=2.0.0
    LANG=en_IN
  5. Very often to make sure that the value of the environment variable is persisted across multiple programs (and not limit it to a single program or command), the "export" keyword is used while setting the value of the variable. Examples :

    export MY_VAR=1
    export RUBY_VERSION=2.0.0
    export LANG=en_IN
  6. You can read the value of a variable by prepending the $ symbol before the variable name like this :

    $VARIABLE_NAME
  7. The value of a variable can be used to set the value of another variable by using the same $ symbol. Example :

    RUBY_DIRECTORY=/usr/lib/$RUBY_VERSION/
  8. If you want to see the value of an environment variable use the "echo" command. See the image below :

Managing the PATH variable :

Here are a few characteristics of the PATH variable :
  1. The value of the PATH variable is a string.
  2. There can (and will) be multiple directory locations mentioned in the PATH variable.
  3. The individual directory locations are separated by the : (colon) character.
  4. Most common default PATH value is typically this :

    /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
  5. To see the current value of PATH use echo $PATH. See the image below.
  6. The most common way of adding a new directory location to the PATH is this :

    export PATH=$PATH:/the/new/directory/location
Once you  add a directory location to the PATH like this any executable program in that directory can be run on the command line directly by specifying just the program name.

In my system, I have installed Ruby to a particular directory and I want the ruby program to be available on the command line. So this is how my PATH variable looks :

Contents of the PATH variable on my Ubuntu system.
Few Gotchas :

  1. Environment variables are local to each shell. So if you are running two terminal windows or tabs, the shells running in them will have their own set of environment variables. The value that you set in one shell is not available in the other shells.
  2. The value stored in the environment variables is available until the shell process quits. The values are not persisted beyond the lifetime of the shell. Nothing like saving the environment variables to the hard disk.
This would imply that any PATH settings that you would need will have to be done every time you open up a new Terminal window or tab (a.k.a every time you launch a shell). That sounds awfully painful and tedious. Sure it is. But Linux has a solution for you.

Like many Linux programs, the Bash shell (and almost every other shell too) support a startup configuration file. This file is in your home directory with the name .bashrc (yes, the name starts with a dot). This file can contain any number of commands that you would run on the command line and all of those will be "executed" every time a Bash shell is launched (a.k.a every time you open a new Terminal window or tab).

The common practice is to put all the commands setting the values of the necessary variables into this .bashrc file. That way you are guaranteed that every time you open a new Terminal window or tab, all of these environment variables will be set to the appropriate values. This is why magic tools like "rvm" or "chruby" or "nvm" ask you to add some stuff to your .bashrc file.

This should give you an initial understanding of what the PATH variable is for and what someone means when they say "PATH is not setup properly" or "PATH needs to be setup". From this point on, you should be able to google around for additional information and make sense out of it. If something mentioned here is not clear, drop in your questions in the comments and I will try my best to respond with answers.