This is the first post in a series which will go over some of the computing tools and practises that make my life as a scientific researcher easier. Today I will tell you about tmux and some of its use cases. Tmux is a modern terminal multiplexer and has become an extremely useful component of my remote data analysis and software development workflow.
Both in industry and in many scientific disciplines, some datasets are too big to be conveniently handled by a personal computer. For example, I could not even think about processing a several TB dataset from the CMS experiment solely using my Mid-2014 Macbook Pro notebook (i.e. 4-cores, 8 GB RAM, 128 GB SSD).
Luckily enough, research institutions and companies nowadays provide an infrastructure of remote computing resources (e.g. computing clusters, storage solutions, distributed file systems and/or virtual machine instances on demand).
While in theory the use of remote computing resources is a huge advantage over local systems in terms of management, reliability and powerfulness, people used to work locally sometimes have a rough time adapting to a remote computing paradigm. An important part of this blog series will be dedicated to solve issues which might arise in this transition.
Today we will deal with remote session persistence and session/window management, but in the near future I will also tell you how to access remote data as if it was in your computer and interactively carry out data analyses/visualize the results remotely using only your web browser.
Imagine you just got access to a remote machine provided by your institution (e.g. the lxplus linux service at CERN). You connect to it through SSH from your local computer, set up a development environment and start working right way.
However, your network connection happens to be a bit shaky and you suddenly get disconnected, so your SSH session breaks and whatever you are running is killed. In addition, every time you reconnect you have to set up again your development environment.
In a way, when you use standalone SSH access, the processes you are running are coupled to the terminal you are using to access, so if the connection breaks they are killed. By using a terminal multiplexer, like tmux (or its ancient ascendant GNU Screen), you add an intermediate layer between your processes and terminal sessions and your session, so if you detach or the connection breaks, everything keeps running in the background and you can attach to it again when you reconnect.
Tmux is becoming an increasingly popular tool, so it is probable that it is already installed in your remote computing system. For example, at the time of writing this post, the default SLC 6.8 nodes of lxplus @ CERN have tmux 1.6 installed, which is enough for basic usage but misses some useful new features.
If you have administrator privileges in the remote system you can install it from the official distribution repositories (e.g. sudo apt-get install tmux for ubuntu-based distributions), or run this script (tested in lxplus) to build a static executable for the latest tmux currently available.
So now you have tmux available in your remote system. To start it for the first time you only have to run:
A new tmux session will be started, opening a new virtual terminal and a status bar will appear at the bottom of the terminal. Now we can work as we would usually do. After a while, you need to attend a meeting or get some food, so you just close the terminal (a better practise is to detach first but more on that later). Some minutes/hours/days later you are full of new ideas (or just food) and are ready to continue what you started, so you access the same machine again and reattach to the tmux session:
After that command, everything will be exactly as you leave it, because it had been kept running in the background. The point of accessing the same machine is really important, especially for load balanced clusters as lxplus @ CERN. Basically, when you access lxplus, by default you are assigned to a certain machine with low workload, so resources are better distributed among the users. However, you cannot access a background process in a different machine, so in this case we want to make sure that we access the same machine every time. An easy way to overcome the load balancer is to get the hostname of the computer we are going to run tmux on and then just SSH directly to this machine:
ssh -Y firstname.lastname@example.org # the last part is the hostname
So far we have gone through the simplest use case of tmux, which is keeping processes alive when we close the SSH access terminal. However, the real powerfulness of tmux is that it is like a “window” manager, but for your terminal sessions. After you attach to your tmux process for the first time, you only get a single terminal and the status bar. However, the actual magic is that you can split this terminal in panes or create new windows, all within the same tmux session and they will all be there when you reconnect and reattach.
Imagine that you are working on a script and you want to test it and see the result and the code simultaneously in your terminal. You can split the window in two panes with the default shortcut (Ctrl-b + %) and then comfortably work in split terminal mode. In this way, tmux also takes over part of the use cases of fancy terminal clients like iTerm2 (Mac OS X) or terminator (Linux), which I know some of the members of this network use.
That would have to suffice as a basic presentation of tmux and its use cases, for more advanced uses you can check out the O(1o00) tutorials and posts available on the internet regarding this tool (e.g. this one and this one are quite extensive).
Beware that tmux is a fully configurable and extendable tool, both aesthetically and in usability, so you can spend hours setting up your development environment to your liking instead of doing actual work. I might talk about software configuration files (a.k.a. dotfiles) and how to keep them synced between machines in a future post of this series, but the other two topics I mentioned at the beginning, e.g. how to access remote data as if it was in your computer and how to interactively carry out data analyses/visualize the results remotely using only your web browser, are first in the queue.
If you have any doubts about terminal multiplexing, want to give me some feedback about this post in particular or the perks of working on remote computers, please do so in the comment section. See you around!