Cluster SSH access

From mathpub
Jump to navigation Jump to search

Cluster SSH Access

Most cluster applications will be controlled using the SSH protocol.

In order for this to work, you will have to set up SSH keys in your math account so that you can securely access the cluster without entering your password for each operation.

The first step is creating a public and private SSH key for your account. This part is pretty simple. At the command prompt, type

ssh-keygen

Hit enter for all the prompts. It will prompt you to enter a 'passphrase', just hit enter for that so that there is no passphrase.

Once this has run, in your account there will be a hidden directory (filenames starting with a dot are hidden) called .ssh

Inside this folder, there are two files, id_rsa and id_rsa.pub id_rsa is your secret key. This key should not be shared with anyone, it is the key that gives you access. id_rsa.pub is the public key associated with your private key. It can be shared and emailed because it can't be used to give access by itself, only to give access to someone who has the related private key.

Now that you have the keys created, you must give access to your account to the public key. There will be a file in your .ssh directory called authorized_keys . It may or may not already exist. To append your public key to your autorized_keys file, type the following command:

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Now you have ssh access to your own account on other machines without a password, but we're not ready for cluster access just yet, we still have some more steps. However, you can test your access now by doing

ssh ramsey

The system may prompt you to accept the host key for ramsey if it's not already in your known_hosts file. Say yes to the prompt.

You should now be logged in to ramsey, without having to have typed your password.

Exit out of ramsey by typing 'exit'. You should now be back at the machine where you started to go on to the next step.

Setting up your known_hosts file.

When you connect to a new host using ssh, the system will check if the hosts public hostkey is in your .ssh/known_hosts file. If it is not, the system prompts you to accept the key. If a key is already there you will be logged in. However, if a key is there for that host and it does not match the one that the remote host has sent, you will receive a warning and the command will not continue.

For accessing a cluster, it would be really tedious to have to connect to each host and then say 'yes' to the prompt in order to get the keys into your known_hosts file. So, to avoid this, we have a script that you can run which will remove any outdated keys from your known_hosts file and set up the file so you have automatic access to the cluster and other calculation machines.

At the command prompt, type

setup_known_hosts

You should see the progress of your known_hosts file being set up. Now your keys are ready for cluster access.

Testing Cluster Access

To test your access to the cluster, you'll use the pdsh command. pdsh uses ssh to send commands in parallel to a group of machines. You have to remember to only send commands that are not going to run for a long time or prompt you for any input, because that will cause your pdsh command to hang as the commands on the other end wait for input that never comes. So, you can send commands like 'uptime' or 'date' that will return output and then exit.

'date' is a good example, it will display the system time on the remote machine and then exit. It's a good idea to run it to make sure all of the remote machines have synchronized clocks. Type the following command: (you may want to stretch the terminal window vertically to see all 64 lines of output.)

pdsh -R ssh -w pnode[01-64] 'date'

This will display the date from each of the remote machines. This should run without errors and come back to a command prompt. If it has errors or does not finish, there may be a problem with either your keys or connectivity to the cluster.

Don't worry if they differ by one second, because the time may have changed during the command. If they differ by more than a second, there may be a problem, but it should not effect most commands.

To see how busy the remote machines are, you can do:

pdsh -R ssh -w pnode[01-64] 'uptime'

which will tell you how long each machine has been up and its system load.

You can use this same command to run things on a subset of nodes, like this command to check the load on the last 8 nodes:

pdsh -R ssh -w pnode[57-64] 'uptime'

Or other computation hosts where you have keys set up, such as fibonacci and ramsey:

pdsh -R ssh -w fibonacci,ramsey 'uptime'

Once you have these commands working, you're ready to configure your distributed application.