The Coeus HPC cluster leaf switch has been replaced and affected compute nodes brought back on line. This system is now running normally.
Posted Aug 06, 2018 - 15:27 PDT
Update
The replacement OPA leaf switch for the Coeus HPC cluster should arrive soon. The cluster should be fully operational later today.
Posted Aug 06, 2018 - 09:59 PDT
Update
The replacement OPA leaf switch for the Coeus HPC cluster should arrive soon. The cluster should be fully operational later today.
Posted Aug 06, 2018 - 09:14 PDT
Update
The Coeus HPC cluster will continue to run down 30 compute nodes (long and interactive partitions) while we wait for a replacement switch. Users are still able to run jobs on the medium, phi and himem partitions.
Posted Aug 03, 2018 - 10:35 PDT
Monitoring
The OPA leaf switch failure will continue to make file services compute nodes 97-128 unavailable. We are working with the system vendor and Intel to get a replacement switch.