Affected services:

  • Login nodes
  • Storage
  • Compute nodes
  • Interactive nodes
  • Interactive GPU nodes
  • Web portal

High load on storage causing login issues and unresponsiveness of cluster

Opened on Thursday 16th September 2021, last updated

Resolved — The issue has been resolved.

Posted by Fokke

Investigating — The storage is in an unhealthy state, making it impossible to access the shared file systems. This also means that logging in is currently not possible.

Posted by Cristian

Monitoring — We have run several checks on the shared file systems and their metadata, and they are back online. We are still monitoring the stability of the storage, but you should be able to use it again.

Posted by Bob

Investigating — The storage is in an unhealthy state, making it impossible to access the shared file systems. This also means that logging in is currently not possible.

Posted by Bob

Investigating — Due to a high load on the storage, it's currently hard to log in to the login node. If this does not work for you, you can try the interactive node (pg-interactive.hpc.rug.nl) instead.

Posted by Bob