Quest Research Computing Cluster June Maintenance
Scheduled for Jun 8, 07:00 CDT  -  Jun 14, 17:00 CDT
Scheduled
Quest, including the Quest Analytics Nodes, the Genomics Compute Cluster (GCC), the Kellogg Linux Cluster (KLC), and Quest OnDemand, will be unavailable for scheduled maintenance starting at 7:00 CDT on Saturday, June 8, and ending at 17:00 CDT on Friday, June 14. Globus will also be unavailable for file transfers to and from Quest, and between Quest and the Research Data Storage Services (RDSS)/FSMRESFILES. This maintenance is necessary to refresh hardware and apply critical system upgrades. Quest, including access to data on Quest, will be unavailable throughout the entire maintenance period.

During this downtime, the following maintenance will be performed:

- Slurm, Quest's job scheduler, will be upgraded to version 23.11
- New Slurm management nodes will be deployed to replace older hardware and enhance scheduler stability
- Storage system firmware and InfiniBand interconnect drivers will be upgraded to integrate the next generation of computing and storage hardware
- A new InfiniBand interconnect configuration will be implemented to improve connectivity between nodes and Quest storage, enhancing compute performance
- Security patches will be applied to all Quest nodes
- Power and liquid-cooling maintenance will be performed for the healthy operation of computing infrastructure

Impact on Quest and Globus Users:

- Users cannot log in to Quest, submit new jobs, run jobs, access files stored on Quest, or use the Quest Analytics Nodes, GCC, KLC, and Quest OnDemand during the maintenance window
- Jobs submitted to Quest with a wall time that extends beyond the start of the downtime will not run and must be resubmitted after the maintenance. These jobs will receive a "ReqNodeNotAvail, Reserved_for_maintenance" message as the queue reason
- User jobs and processes running on the Quest login nodes, the Quest Analytics Nodes, and KLC will be canceled at the beginning of the downtime.
- The Globus data transfer tool will be unavailable to transfer files from and to Quest and between Quest and Research Data Transfer Services (RDSS)/FSMRESFILES.
- Requests submitted for Quest and Globus shortly before or during the downtime will be addressed following the maintenance period.
- The expiration dates of all user files in Quest global scratch space will be extended for another 30 days before downtime. Data in scratch space will have updated timestamps to allow data to remain in scratch. Scratch users can run "checkscratch utility" to monitor the expiration dates in scratch space: https://services.northwestern.edu/TDClient/30/Portal/KB/ArticleDet?ID=1546#checkscratch

For any questions about this maintenance, please contact quest-help@northwestern.edu
Posted Apr 17, 2024 - 07:30 CDT
This scheduled maintenance affects: Research Technologies and Support (Kellogg Linux Cluster (KLC) (Server Management), Quest Analytics Nodes, Quest High-Performance Computing Cluster (HPCC) (Server Management)).