Skip to Content.
Sympa Menu

class-community - Re: Kubernetes and networking event.

Subject: Entire CLASS Cohort Community.

List archive

Re: Kubernetes and networking event.


Chronological Thread 
  • From: Tony Cricelli <>
  • To: Timothy Middelkoop <>
  • Cc: Rob Fatland <>, Eric Jackson <>, "Keist, CJ" <>, "" <>
  • Subject: Re: Kubernetes and networking event.
  • Date: Mon, 28 Feb 2022 09:23:10 -0800

Hi Tim,

Yes, it scaled perfectly (once the VM was ready) and the nodes did shut down during our tests.  Once the researchers launched their regressions the nodes crunched away, they never had a chance to shutdown :) 

I could have done a better job of explaining to the administration that the 10% installed were our most urgent users and that maybe the next 90% would not use as much computing resources. 

Tony

On Mon, Feb 28, 2022 at 9:07 AM Timothy Middelkoop <> wrote:

Thanks for the update and sharing!!!!  I wondered what would happen when it turned real...  I assume it “scaled” and shutdown nodes when not used....  Did you look at purchasing reserved instances, spot instances etc.?  I’m guessing it still was “way more” than going back on-prem.  Tim.

 

From: Tony Cricelli <>
Date: Monday, February 28, 2022 at 10:35 AM
To: Rob Fatland <>
Cc: Eric Jackson <>, Keist, CJ <>, Timothy Middelkoop <>, <>
Subject: Re: Kubernetes and networking event.

Hi Rob,

 

Your question is very complicated to answer directly.  I can share our experience deploying a SLURM cluster on GCP.   First, Tim/Internet2 training was instrumental in getting it deployed (Thank you Tim!)  We used terraform to deploy our SLURM cluster on GCP.  

 

The cluster worked as expected, as jobs were submitted,  VMs were booted and jobs ran.  At this point, we were happy, no hardware to take care of, "infinitely" expandable, etc...  So, we let it rip and started installing researchers.  Everything was great until the administration saw that our first bill was at $17k/mo with only about 10% of the users installed.

 

I was told to redeploy on campus as soon as possible and turn off the cloud cluster.  We had 10 servers, 64core/512G each and a 300TB CORAID storage sitting around. So we installed them in our campus data center. I think campus charges $1k/mo per rack. 

 

We now fixed our monthly "costs" and students and researchers can go crazy submitting jobs. If our cluster is not big enough, our campus offers a cluster with many thousands of cores (at a cost).  

 

It was a bit of labor to install the servers. Energy costs are absorbed by campus, so other than the initial server costs and the campus colo fee,  it seems on prem works better for us.  Researchers are not charged and know that if it breaks, they may go down until we can get access to the data center.

 

When it comes to teaching, we cannot have downtime, so we deployed kubernetes (GKE) on GCP. We mostly run Jupyterhub and costs are stable since no students want to do extra homework :) The cloud in this case is the best option for us and is worth the cost.

 

I guess it comes down to finances, risk tolerance and type of computing being done. 

 

Tony

 

On Sat, Feb 26, 2022 at 2:17 PM Rob Fatland <> wrote:

Where I'm at is: I know enough about container orchestration to describe it at a high level. While this works I can't really claim I'm "happy" with my degree of comprehension; and in addition I have a sense of where the rubber meets the road for cloud research computing advocacy. And by this I mean outside the curriculum scope that CJ describes (also super important). So: Can the cloud hold a candle to on prem distributed hpc. My feeling is that the poster child implementation is computational fluid dynamics. Supposing I were to dive into k8: I'd first spend some time looking into Singularity containerization; and the main goal would be an HPC implementation of a CFD sort of computation with benchmarking to get to unequivocal remarks comparing to an on prem cluster / slurm / MPI / infiniband run at the same problem. I'm sure this has been done a number of (zillion?) times and I'd be super pleased to get a pointer to a good example... in my case it's always a matter of finding uninterrupted blocks of time. Anyways that's my take on this important topic.  

best -r

 

 

On Sat, Feb 26, 2022 at 12:59 PM Eric Jackson <> wrote:

Tim,

I'm also interested in this event. I think the learning format works! A month's pace is fine to me, since I am very new to Kubernetes. Currently, Kunernetes isn't on my radar, but I know that it will be of some use to me in the future.

 

Thanks,

Eric

 

On Thu, Feb 24, 2022 at 9:31 AM Keist, CJ <> wrote:

Tim,

   This event sound very interesting and glad that you forwarded it on to this list.  We are using Kubernetes on AWS to manage JupyterHub instances for class courses here at OSU.  Any extra training on Kubernetes is welcomed, especially when it’s free 😉.

 

 

-- 

CJ Keist

Manager for Digital Research and Infrastructure

Subdivision of Technical & Solutions Architecture

Oregon State University

 

 

From: <> on behalf of Timothy Middelkoop <>
Date: Wednesday, February 23, 2022 at 4:05 PM
To: <>
Subject: Kubernetes and networking event.

[This email originated from outside of OSU. Use caution with links and attachments.]

Question for the community,

 

Are events like these of use? 

https://www.nginx.com/c/microservices-march-2022-kubernetes-networking/

 

(there is a short explainer video about what it is on the page – basically a 16 hour kubernetes tutorial)

 

Does the learning format work or is it when you need learn something, you need it yesterday and don’t have time to consume them at a month’s pace?

 

Where is Kubernetes on your radar?

 

Is this reference of value to the community?

 

Regards,

 

 

Tim.

 

--

Timothy Middelkoop (he/him/his)

Senior Research Engagement Engineer

Internet2




Archive powered by MHonArc 2.6.24.

Top of Page