FYI, there is still some file system maintenance that has to be done.
So, even though the is no expected data loss, the recommendation is to
avoid creating new directories nor rename existing ones.
-- Greg
Just a heads up that plato will be down starting October 30th and might
not be back up until November 30th. We expect to be back up earlier,
but can't guarantee it. This is because plato depends on Wynton's
BeeGFS network filesystem, and that is going to be upgraded then. For
reference, the Wynton announcement is included below.
Since plato has already migrated to Rocky 8 from CentOS 7, we don't
anticipate any plato specific issues. Scooter and I are directly
involved in the Wynton upgrade, so we'll know when is safe to come back up.
So during the downtime:
* email will be unavailable
* web sites hosted by plato will be unavailable
o CGL/RVBI, ChimeraX toolshed, SPOKE, SFLD
o web services (/e.g/., for ChimeraX) will be unavailable
You should consider taking a few vacation days. 🙂
-- Greg
-------- Forwarded Message --------
Subject: [Wynton-Announce] Upcoming Wynton Downtime Oct 30
Date: Fri, 13 Oct 2023 18:08:37 +0000
From: Ellestad, Erik <0000090458849058-dmarc-request(a)LISTSRV.UCSF.EDU>
Reply-To: support(a)WYNTON.UCSF.EDU
To: WYNTON(a)LISTSRV.UCSF.EDU
TLDR:
What: Full Wynton downtime, including no access to login, dev, data
transfer, or app nodes
When: 9am Monday October 30 through End of Business November 3rd
Why: To update the Wynton HPC OS to Rocky 8 Linux, Update BeeGFS,
Replace aging hardware, and to accommodate work by UCSF Facilities.
How: The downtime has been added to SGE's calendar. If your job's
runtime limit (h_rt) extends into the maintenance window, it won't
start before the maintenance.
The longer story:
Rocky Linux 8 Migration. Wynton HPC is currently based on CentOS 7
Linux. The CentOS 7 operating system will reach its end of life in early
2024. To allow for security patches, newer versions of libraries and
applications, and continued support we need to upgrade to a newer Linux
Operating System. We have identified Rocky Linux 8 as being the most
compatible with our current needs which has an end of life in 2029.
NOTE: All local partitions WILL BE ERASED during the OS Upgrade to Rocky
8 Linux, (including /scratch,) on app, dev, and compute nodes (unless
the node is already running Rocky 8 Linux before the downtime).
BeeGFS Upgrade. We will update the version of the underlying shared file
system, aka BeeGFS, to enable newly implemented features, increase
reliability, and implement optimizations.
NOTE: NO DATA ON WYNTON IS BACKED UP. While we have tested the BeeGFS
upgrade and expect no problems or data loss due to the upgrade of the
shared file system version, before the downtime, be sure you have
migrated or backed up any working data from /wynton to its canonical
storage location.
Hardware Replacement. As part of the downtime we will be replacing
several older components in our infrastructure.
We have done our best to plan ahead and test for this downtime, but due
to the number of systems which need to be updated and have their
configurations migrated, we expect Wynton HPC to be unavailable until
Friday Novemeber 3rd.
More information about the Rocky 8 Linux migration project is available
on our website:
https://wynton.ucsf.edu/hpc/software/rocky-8-linux.html
--
Erik Ellestad
Wynton Cluster SysAdmin
UCSF
------------------------------------------------------------------------
This list is used to keep users of the Wynton cluster updated on
outages, system upgrades etc.
List membership is automatically generated based on registered cluster
users. To unsubscribe,
please email the cluster admins at <support(a)wynton.ucsf.edu> to close
your account.
Hi all,
Recently we sent e-mail informing users of the upcoming plato downtime
for the Wynton Rocky 8 upgrade. He said that plato will be down
starting October 30th and might not be back up until November 30th.
/This is incorrect/. Plato will be down beginning on October 30th and
might not be back up until *November 3rd*, although we hope to have it
back earlier.
Sorry for the noise...
-- scooter
Hello everyone,
TL;DR -- change your incoming and outgoing email servers to mail.cgl.ucsf.edu
I goofed and accidentally cancelled the update to our email certificate. So we have an expired certificate with email right now. And that might make your email client might unhappy when sending email or viewing email.
So while we recommended using plato.cgl.ucsf.edu for both your incoming and outgoing email servers (incoming uses the IMAPS or POP3S protocol, and outgoing uses the SMTP protocol), you can workaround the problem by setting it to mail.cgl.ucsf.edu (because I borrowed the webmail server certificate). mail.cgl.ucsf.edu limits you to a single host in the plato cluster, but it will work without complaint.
We should get a replace certificate within a few days. mail.cgl.ucsf.edu will continue to work when that happens. Our email certificate also includes email servers that don't run webmail. And if you're not asleep yet, having an out of date certificate for email servers doesn't stop email from being delivered. There are just warnings in the log files. I suspect that will change as the Internet gets more paranoid.
My apologies,
Greg
Hi all,
Due to power work in BH-101, where the plato nodes are located, we
will be taking the plato cluster down tomorrow everning. We're taking
the opportunity to complete our operating system upgrade to avoid any
additional downtime.
-- scooter