Quest Research Computing Cluster RHEL8 Update and New Storage System

Scheduled Maintenance Report for Northwestern Information Technology

Completed

Scheduled maintenance has been completed.
Posted Mar 31, 2025 - 15:41 CDT

In progress

Scheduled maintenance is currently in progress. We will provide updates as necessary.
Posted Mar 22, 2025 - 08:00 CDT

Update

We will be undergoing scheduled maintenance during this time.
Posted Feb 24, 2025 - 10:50 CST

Update

We will be undergoing scheduled maintenance during this time.
Posted Feb 24, 2025 - 10:46 CST

Scheduled

Quest, including access to data on Quest, the Quest Analytics Nodes, the Genomics Compute Cluster (GCC), the Kellogg Linux Cluster (KLC), and Quest OnDemand, will be unavailable starting at 8 a.m. on Saturday, March 22, and ending at 5 p.m. on Monday, March 31. Globus will also be unavailable for file transfers to and from Quest and between Quest and the Research Data Storage Service (RDSS)/FSMRESFILES.

Maintenance Details
This maintenance is necessary to deploy a new storage system and update the operating system to RHEL8. The new storage system will provide 12 PB of disk (HDD) storage and 500 TB of flash (SSD) storage. Flash tier will be used for Quest scratch directories and all users are eligible to apply for a scratch directory. Quest users are encouraged to utilize scratch to improve their compute job speeds and performance. Upgrading the Operating System will allow us to deploy and use the latest hardware on Quest.

Impact on Quest and Globus Users:

- Users cannot log in to Quest, submit new jobs, run jobs, access files stored on Quest, or use
the Quest Analytics Nodes, GCC, KLC, and Quest OnDemand during the maintenance window. User sessions and processes running on the Quest login nodes, the Quest Analytics Nodes, and KLC will be canceled at the beginning of the downtime.

-Processing of new Quest allocations and accounts will be paused on Friday, March 14 through Monday, April 7.

- Jobs submitted to Quest through Slurm with a wall time that extends beyond the start of the downtime will not run and must be resubmitted after the maintenance. These jobs will receive a "ReqNodeNotAvail, Reserved_for_maintenance" message as the queue reason.

- NVIDIA GPU Drivers on all GPUs, including Buy-in GPUs, will be updated to version 570.86.15, which supports CUDA up through 12.8.

- JupyterHub on Quest Analytics Nodes will be upgraded to 5.2.1.

- A new URL, `login.quest.northwestern.edu`, will be available to connect to Quest login nodes in addition to `quest.northwestern.edu` after downtime. This is due to the implementation of a new load balancer which will improve user experience on the login nodes.

- New URLs, `rstudio.quest.northwestern.edu`, `jupyterhub.quest.northwestern.edu`, and `sasstudio.quest.northwestern.edu` will be available to connect to Quest Analytics Node services for RStudio Server, JupyterHub, and SAS Studio, respectively, in addition to `rstudio.questanalytics.northwestern.edu`, `jupyter.questanalytics.northwestern.edu`, and `sas.questanalytics.northwestern.edu` after downtime. This is due to the implementation of a new load balancer which will improve user experience on the Analytics Nodes.

- The expiration dates of all user files in Quest global scratch space will be extended for another 30 days before downtime.

- The Globus data transfer tool will be unavailable to transfer files from and to Quest and between Quest and Research Data Transfer Services (RDSS)/FSMRESFILES. In addition, all Globus jobs associated with either of these endpoints will be cancelled and will need to be re-submitted after downtime.

- Support requests submitted for Quest and Globus shortly before or during the downtime will be addressed following the maintenance period.

For any questions about this maintenance, please contact quest-help@northwestern.edu.
Posted Jan 22, 2025 - 16:34 CST
This scheduled maintenance affected: Research Technologies and Support (Quest Analytics Nodes, Quest High-Performance Computing Cluster (HPCC) (Server Management), Research Data Storage Service (RDSS)).