Affected services:

  • Storage
  • Compute nodes

Jobs and nodes stuck in "completing" due to file system issues

Opened on Friday 1st July 2022, last updated

Resolved — Since we haven't seen this issue the last months it appears to have been resolved.

Posted by Fokke

Investigating — Some nodes are having issues with the access to the shared /scratch file system, which is causing jobs to get stuck when they access files on this file system. Also the cleanup stage of jobs may not work, and these jobs will get stuck in a "CG" (completing) state.

Posted by Bob