168
edits
(7 intermediate revisions by the same user not shown) | |||
On your Linux desktop, start up Mathematica.
We'll start by launching kernels on
For this example, we're going to start 4 kernels each on ramsey, fibonacci, and
In the Mathematica window, type the following:
<nowiki>LaunchKernels[{"ssh://ramsey/?4","ssh://fibonacci/?4","ssh://boole/?4" }]</nowiki>
Then, right-click that cell and choose 'Evaluate Cell'. The system will launch the remote kernels, and show its progress while it does that. Once all the kernels are launched, you can run calculations on them. To close those kernels, do
For the cluster nodes, each one has 8 cpu threads, so ideally if no one else is using the cluster, your job should load up each node to a load of 8 so that you're making full use of each node's CPU. If it's less than 8, you're not using the whole CPU, and if it's more than 8, some of your kernels are waiting for resources.
You can see the status of your the cluster nodes here:
[http://graph.math.cornell.edu:3000/d/BPzrVrznz/cluster-pnodes Cluster Pnodes] (You must be on the Cornell network or VPN to use this link)
Note that these graphs will usually show you the last 24 hours, so you can click on the little clock on the upper right and choose a different time interval, like the last hour, so you have a better picture of what is going on. The graphs update only once a minute, so be patient.
If you're running remote kernels on the Ryzen crunchers, they have 32 CPU threads each, so you would want to load them up to 32 to make full use of the CPU, if it was zero before you started. Note that other people are using these machines, so be a good neighbor and don't hog the entire machine.
The status of the crunchers is here:
[http://graph.math.cornell.edu:3000/d/f8zUy_L7k/crunchers Crunchers] (You must be on the Cornell network or VPN to use this link)
==Power Consumption==
If you really want to geek out, you can look at the server room power consumption when you're running your job.
[http://graph.math.cornell.edu:3000/d/_wfSvpjmz/server-room-power Server Room Power]▼
▲[http://graph.math.cornell.edu:3000/d/_wfSvpjmz/server-room-power Server Room Power] (You must be on the Cornell network or VPN to use this link)
The orange and purple circuits are the cluster. The power usage is around 620W when nothing is going on, with an additional 100W being used by the two UPS units. It will go up when the machines get busy. If the orange circuit goes above 2200 watts or the purple circuit goes above 2800 watts, the circuit will shut down. At this time, the combined capacity of the two circuits should be more than the entire cluster can use. See if you can do enough math on the cluster to cause a shutdown.
Note the room temperature! The room has a large cooler that uses Cornell Lake-source cooling to cool the room, but your job will still probably warm up the room by a few degrees.
The blue circuit is the Ryzen GPU cruncher machines. That circuit also has a limit of 2700 watts total. There should be enough headroom there to be able to load up all of those machines including their GPUs without overloading the circuit. Again, you might be able to make it shut down. That's ok, as long as you tell the system administrator. It's better to cause this problem now so we can rearrange things to avoid it in the future. But, so far, there seems to be enough capacity on this circuit to handle all of the cruncher machines.
==Network Bottlenecks==
|