Affected services:

  • Login nodes
  • Interactive nodes
  • Interactive GPU nodes
  • Web portal
  • Storage
  • Scheduler
  • Compute nodes

Peregrine maintenance

Scheduled for Monday 10th August 2020 at 08:00 (Amsterdam)

Schedule/description of work

Regular Peregrine maintenance during which we will update the firmware/drivers, operating system, SLURM, and file system.

Scheduled start time
August 10, 2020 08:00
Duration
3 days 10 hours
Status
Finished

Updates

For updates about the vulture and web portal issues, please go to: https://status.hpc.rug.nl/issue/0211dd2d-5aef-464a-9921-adc31d563c34

Posted by Bob

If you do still encounter any issues, please let us know (hpc@rug.nl).

Posted by Bob

The Peregrine cluster is back online! Almost everything should work again, except for the vulture partition: we still need to fix the nodes in this partition, which we will do today. Also the interactive apps of the HPC Web portal (portal.hpc.rug.nl) may not work yet; we're still looking into this. For both issues, we will post updates here when they have been fixed.

Posted by Bob

We solved the issue with the /scratch file system, and we're currently doing some final tests and fixes. We expect to bring the cluster back online shortly.

Posted by Bob

After the updates we currently experience timeouts in file operations on /scratch. This means that the system is not ready for production yet. Unfortunately the maintenance window has to be extended again.

Posted by Fokke

The maintenance has not yet finished as some further work is needed on the storage systems.

Posted by Fokke