class-community - Re: Kubernetes and networking event.
Subject: Entire CLASS Cohort Community.
List archive
- From: Timothy Middelkoop <>
- To: Tony Cricelli <>, Rob Fatland <>
- Cc: Eric Jackson <>, "Keist, CJ" <>, "" <>
- Subject: Re: Kubernetes and networking event.
- Date: Mon, 28 Feb 2022 17:07:54 +0000
- Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=internet2.edu; dmarc=pass action=none header.from=internet2.edu; dkim=pass header.d=internet2.edu; arc=none
- Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=cv/4WLs29sNLHIb18RwxTVHJVQK83RKQAFzJ3teMK2Q=; b=fi1xVnRIAgvVc0kgrmHKsCBQzlX8OETiCWjl19j1vrUPrG62J/ZfWZ6negVFxpWyftA8IytmR8bo4ToqMGd8SLazHB/2f+k9pWrDFXQ64Q/jB0q2iXO+1FgQeWpoGlgf1XaxaLdT62U2fc1zAK2Hvn9OquY2orn5X+BYqXzzogBSC9JQRdnFGEsKnJAhRklBtATedlrbaf0+/Sw/hTjMPUD8g8XAPQwio5YPHV2poinCCyWOHz0NuhuQ41PXERS0QnMSBnrrQn/cR7NiKjYR8zxZPYkzj/Q5oaz6bw51DeaNnnbX9Lo4mu61WiXJC8y37c8cCrQcC3dNdBsnmdzXHA==
- Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=EZvluo/BIhGIXK/PJ8qJ2hHDym7G5LmpUzwC/imPHNt3w3aD4Zdl+d/6Z+cSJPDIORwLRSiv8eLdDjXC08FOYNTMGoxZVhjVpqjyueO7I9BhVbpEWSJc3YFAxIpbb/R5ASCSt+OF4lR2UTFCoEXr1D7BYJFcZJe11aweKS1HMjOf7BnolcDhvYAp2zuVM6lJ3+LU6WxsS+yXJQ6mDKM0bUWTIIf5j8h3R3QpknsK4981FHT/mXc5GEhYU9+7V23hYQi5EoCtp1GCWFImzmwwHJXBNylPo0nJipKOUVGqESPLcGROMVTHg+YMqjNv7jQyEkkp+ctvmn/whGEI/R6X7w==
Thanks for the update and sharing!!!! I wondered what would happen when it turned real... I assume it “scaled” and shutdown nodes when not used.... Did you look at purchasing reserved instances, spot instances etc.? I’m guessing it still was “way more” than going back on-prem. Tim.
From:
Tony Cricelli <> Hi Rob,
Your question is very complicated to answer directly. I can share our experience deploying a SLURM cluster on GCP. First, Tim/Internet2 training was instrumental in getting it deployed (Thank you Tim!) We used terraform to deploy our SLURM cluster on GCP.
The cluster worked as expected, as jobs were submitted, VMs were booted and jobs ran. At this point, we were happy, no hardware to take care of, "infinitely" expandable, etc... So, we let it rip and started installing researchers. Everything was great until the administration saw that our first bill was at $17k/mo with only about 10% of the users installed.
I was told to redeploy on campus as soon as possible and turn off the cloud cluster. We had 10 servers, 64core/512G each and a 300TB CORAID storage sitting around. So we installed them in our campus data center. I think campus charges $1k/mo per rack.
We now fixed our monthly "costs" and students and researchers can go crazy submitting jobs. If our cluster is not big enough, our campus offers a cluster with many thousands of cores (at a cost).
It was a bit of labor to install the servers. Energy costs are absorbed by campus, so other than the initial server costs and the campus colo fee, it seems on prem works better for us. Researchers are not charged and know that if it breaks, they may go down until we can get access to the data center.
When it comes to teaching, we cannot have downtime, so we deployed kubernetes (GKE) on GCP. We mostly run Jupyterhub and costs are stable since no students want to do extra homework :) The cloud in this case is the best option for us and is worth the cost.
I guess it comes down to finances, risk tolerance and type of computing being done.
Tony
On Sat, Feb 26, 2022 at 2:17 PM Rob Fatland <> wrote:
|
- Kubernetes and networking event., Timothy Middelkoop, 02/24/2022
- Re: Kubernetes and networking event., Keist, CJ, 02/24/2022
- Re: Kubernetes and networking event., Eric Jackson, 02/26/2022
- Re: Kubernetes and networking event., Rob Fatland, 02/26/2022
- Re: Kubernetes and networking event., Timothy Middelkoop, 02/28/2022
- Message not available
- Re: Kubernetes and networking event., Timothy Middelkoop, 02/28/2022
- Message not available
- Re: Kubernetes and networking event., Timothy Middelkoop, 02/28/2022
- Message not available
- Re: Kubernetes and networking event., Timothy Middelkoop, 02/28/2022
- Re: Kubernetes and networking event., Rob Fatland, 02/26/2022
- Re: Kubernetes and networking event., Eric Jackson, 02/26/2022
- Re: Kubernetes and networking event., Keist, CJ, 02/24/2022
Archive powered by MHonArc 2.6.24.