HPC Status Page
Affected services:
Scheduled for Friday 16th October 2020 at 16:00 (Amsterdam)
Schedule/description of work
The data center at DUO where Peregrine is running will perform maintenance on their entire power infrastructure from the 17th till the 18th of October. This will lead to a power loss for the cluster on the 17th and the 18th of October. The cluster will be brought down the day before, on the 16th, at 16:00 and if all goes well will be back up on the 19th from 09:00 in the morning.
The portal (portal.hpc.rug.nl) has been restarted as well. Everything, except for a few compute nodes, should be back online. If you encounter any issues, please let us know,
The issues with the module environment have been solved. We removed the reservation in the scheduler, and jobs have started running again.
Because of these issues with the module environment, the scheduler will not start any new jobs (as most of them will crash right away) until the issues have been resolved.
The majority of the nodes is back online, including the login nodes. The compute nodes just started running jobs again, but we found an issue with the software modules not being available. Unfortunately, jobs depending on modules may have failed because of this. We're still looking into this, and hope to fix this as soon as possible.