Affected services:

  • Compute nodes

Several V100 GPU nodes down

Opened on Wednesday 17th November 2021, last updated

Resolved — All V100 nodes are back online. Crashed jobs will have to be resubmitted, unfortunately.

Posted by Bob

Identified — Due to a power outage in of the data centers, several V100 GPU nodes went down, and running jobs on these nodes have crashed. We hope to bring these nodes back online today.

Posted by Bob