Affected services:

  • Compute nodes
  • Login nodes
  • Interactive nodes
  • Web portal
  • Network
  • Scheduler

Hábrók network outage

Opened on Tuesday 11th July 2023, last updated

Resolved — Cluster is operational again.

Posted by Fokke

Monitoring — Except for a couple of nodes that are still having issues, the cluster nodes are back online and running jobs again. If you do still encounter any issues, please let us know.

Posted by Bob

Identified — We are currently reinstalling the compute nodes of the cluster to get them into the correct state again. This will at least take the rest of the day.

Posted by Fokke

Identified — Unfortunately we haven't been able to restore the network yet. Again our sincere apologies for the inconvenience this is causing.

Posted by Bob

Identified — Due to a misconfiguration around 16:00 today, the network settings of Habrok were removed. This resulted in the cluster's baremetal nodes losing their outside and inside network. Unfortunately, all running jobs will be lost. Currently we are working on restoring the network, after which we expect Habrok to be working again. We apologize for the inconvenience this will have caused and are taking steps to make sure this does not happen again.

Posted by Bob

Investigating — All Hábrók compute nodes (except for the V100 GPU nodes) are currently unreachable due to a network issue. We're investigating the issue.

Posted by Bob