Quest Research Computing Cluster June Maintenance
Scheduled Maintenance Report for Northwestern Information Technology
Completed
Quest maintenance is complete, and the service is now available for use, including the Quest Analytics Nodes, the Genomics Compute Cluster (GCC), the Kellogg Linux Cluster (KLC), and Quest OnDemand.

During this downtime, Northwestern IT:

• Upgraded Quest's job scheduler, Slurm, to version 23.11.
• Deployed new Slurm management nodes to replace older hardware and enhance scheduler stability.
• Upgraded the storage system firmware and InfiniBand interconnect drivers to integrate the next
generation of computing and storage hardware.
• Implemented a new InfiniBand interconnect configuration to improve connectivity between nodes and
Quest storage, enhancing compute performance.
• Performed power and liquid-cooling maintenance to ensure the healthy operation of computing
infrastructure.

Globus Users

Globus users will need to repeat the following steps to re-mount their shares to the data transfer node for Globus to transfer data between RDSS/FSMResFiles and Quest:

1. First, log into GlobalProtect VPN.
2. Next, log into the Globus node by opening a terminal and typing "ssh
netid@qglobus12.ci.northwestern.edu" where "netid" is replaced by your NetID.
3. Type "exit" to log out of qglobus12.

Logging into qglobus12 will automount RDSS share or FSMResFiles folder to the Quest RDSS Globus endpoint. Information on using Globus with RDSS/FSMResFiles can be found in this Knowledge Base article.

Contact quest-help@northwestern.edu for any support related to using Quest. For assistance with Globus, contact globus-help@northwestern.edu.
Posted Jun 14, 2024 - 10:55 CDT
In progress
Scheduled maintenance is currently in progress. We will provide updates as necessary.
Posted Jun 08, 2024 - 07:00 CDT
Scheduled
Quest, including the Quest Analytics Nodes, the Genomics Compute Cluster (GCC), the Kellogg Linux Cluster (KLC), and Quest OnDemand, will be unavailable for scheduled maintenance starting at 7:00 CDT on Saturday, June 8, and ending at 17:00 CDT on Friday, June 14. Globus will also be unavailable for file transfers to and from Quest, and between Quest and the Research Data Storage Services (RDSS)/FSMRESFILES. This maintenance is necessary to refresh hardware and apply critical system upgrades. Quest, including access to data on Quest, will be unavailable throughout the entire maintenance period.

During this downtime, the following maintenance will be performed:

- Slurm, Quest's job scheduler, will be upgraded to version 23.11
- New Slurm management nodes will be deployed to replace older hardware and enhance scheduler stability
- Storage system firmware and InfiniBand interconnect drivers will be upgraded to integrate the next generation of computing and storage hardware
- A new InfiniBand interconnect configuration will be implemented to improve connectivity between nodes and Quest storage, enhancing compute performance
- Security patches will be applied to all Quest nodes
- Power and liquid-cooling maintenance will be performed for the healthy operation of computing infrastructure

Impact on Quest and Globus Users:

- Users cannot log in to Quest, submit new jobs, run jobs, access files stored on Quest, or use the Quest Analytics Nodes, GCC, KLC, and Quest OnDemand during the maintenance window
- Jobs submitted to Quest with a wall time that extends beyond the start of the downtime will not run and must be resubmitted after the maintenance. These jobs will receive a "ReqNodeNotAvail, Reserved_for_maintenance" message as the queue reason
- User jobs and processes running on the Quest login nodes, the Quest Analytics Nodes, and KLC will be canceled at the beginning of the downtime.
- The Globus data transfer tool will be unavailable to transfer files from and to Quest and between Quest and Research Data Transfer Services (RDSS)/FSMRESFILES.
- Requests submitted for Quest and Globus shortly before or during the downtime will be addressed following the maintenance period.
- The expiration dates of all user files in Quest global scratch space will be extended for another 30 days before downtime. Data in scratch space will have updated timestamps to allow data to remain in scratch. Scratch users can run "checkscratch utility" to monitor the expiration dates in scratch space: https://services.northwestern.edu/TDClient/30/Portal/KB/ArticleDet?ID=1546#checkscratch

For any questions about this maintenance, please contact quest-help@northwestern.edu
Posted Apr 17, 2024 - 07:30 CDT
This scheduled maintenance affected: Research Technologies and Support (Kellogg Linux Cluster (KLC) (Server Management), Quest Analytics Nodes, Quest High-Performance Computing Cluster (HPCC) (Server Management)).