In this section we’ll monitor the status of the GPU’s using the excellent tool nvtop.
squeue, for example if the job is running on p5-dy-cr-48xlarge-[1-10] we’ll use p5-dy-cr-48xlarge-1.ubuntu@ip-10-0-21-245:~$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
6 p5 megatron ubuntu R 0:22 1 p5-dy-cr-48xlarge-1
nvtop:ssh p5-dy-cr-48xlarge-1
sudo apt-get -y install nvtop
nvtop: