Quest Research Computing Cluster December Maintenance
Scheduled for Dec 14, 08:00 CST  -  Dec 18, 17:00 CST
Scheduled
Quest, including access to data on Quest, the Quest Analytics Nodes, the Genomics Compute Cluster (GCC), the Kellogg Linux Cluster (KLC), and Quest OnDemand, will be unavailable starting at 8 a.m. on Saturday, December 14, and ending at 5 p.m. on Wednesday, December 18. Globus will also be unavailable for file transfers to and from Quest and between Quest and the Research Data Storage Service (RDSS)/FSMRESFILES.

Maintenance Details

This maintenance is necessary to improve networking stability, deploy a new storage system, and upgrade the scheduler. The new storage system will provide 12 PB of disk (HDD) storage and 500 TB of flash (SSD) storage. Flash tier will be used for Quest scratch directories and all users are eligible to apply for a scratch directory. Quest users are encouraged to utilize scratch to improve their compute job speeds and performance.

Additional maintenance details include:

- Quest's job scheduler, Slurm, will be upgraded to version 24.05 and compiled with the Nvidia Management Library (NVML) for better GPU statistics.
- The final data sync will be performed to complete the migration of all data to the new 12 PB storage system.
- The ethernet network will be reconfigured for better performance.

Impact on Quest and Globus Users:

- Users cannot log in to Quest, submit new jobs, run jobs, access files stored on Quest, or use the Quest Analytics Nodes, GCC, KLC, and Quest OnDemand during the maintenance window.
- Processing of new Quest allocations will be paused on Monday, December 9 through Thursday, December 19. Support requests submitted for Quest and Globus shortly before or during the downtime will be addressed following the maintenance period.
- Jobs submitted to Quest with a wall time that extends beyond the start of the downtime will be held and resumed after maintenance.
- User sessions and processes running on the Quest login nodes, the Quest Analytics Nodes, and KLC will be canceled at the beginning of the downtime.
- The expiration dates of all user files in Quest global scratch space will be extended for another 30 days before downtime.
- The Globus data transfer tool will be unavailable to transfer files from and to Quest and between Quest and Research Data Transfer Services (RDSS)/FSMRESFILES.

For any questions about this maintenance, please contact quest-help@northwestern.edu.
Posted Oct 11, 2024 - 13:16 CDT
This scheduled maintenance affects: Research Technologies and Support (Kellogg Linux Cluster (KLC) (Server Management), Quest Analytics Nodes, Quest High-Performance Computing Cluster (HPCC) (Server Management), Globus File Transfer).