perfsonar-user - RE: [perfsonar-user] Cassandra runaway CPU
Subject: perfSONAR User Q&A and Other Discussion
List archive
- From: "Garnizov, Ivan (RRZE)" <>
- To: Casey Russell <>
- Cc: "" <>
- Subject: RE: [perfsonar-user] Cassandra runaway CPU
- Date: Wed, 6 Apr 2016 08:09:55 +0000
- Accept-language: en-GB, de-DE, en-US
Hi Casey, I have no idea, what the problem could be, but I guess you are still away from the real problem. My suggestion is to activate debug level logging on the system, do a service restart and provide these details. If you do the same thing on a system with no
such symptoms, you/we will be able to compare. There it should become apparent with what parameters the Java application starts. Best regards, Ivan From: [mailto:]
On Behalf Of Casey Russell Group, Here's what I've done since last week. I've taken the box offline for several maintenance windows and booted it from liveCDs to run memory diagnostics, HD diagnostics, CPU and chipset diagnostics (cpuburn
to heat up the box and look for fan problems etc). I upgraded the BIOS thinking maybe I'd get better (or different) S.M.A.R.T. info, and did a full badblocks block level check of the drive. Then I did a RAID controller consistency check on the mirrored
pair, because. meh. why not? :-) I even followed the directions in the FAQ to nuke the Esmond database and re-initialize it.
However after all that, I still have a system that consumes an entire CPU core as soon as cassandra starts up. I notice that nuking the Esmond database per the instructions on the FAQ had no impact on
the size of the Cassandra data files. My hosts are in a mesh and use a Central MA. Is there any harm in just nuking the Cassandra database/datafiles on this host and starting fresh. I'm looking back through my mesh config file and the only thing that doesn't
use the Central MA as the read/write host is PingER. It wouldn't break my heart to lose PingER data for this one host if that's all that's stored locally. If it's relatively safe, does anyone have a process or set of instructions for doing so? Is there anything else I should be trying, or should I just consider re-installing this host since it's a mesh node anyway and very little data will be lost?
Casey Russell Network Engineer Kansas Research and Education Network 2029 Becker Drive, Suite
282 Lawrence, KS 66047 (785)856-9820 ext 9809 On Thu, Mar 31, 2016 at 9:17 AM, Casey Russell <> wrote: Andy, Thank you for giving me an avenue to chase down.
Casey Russell Network Engineer Kansas Research and Education Network 2029 Becker Drive, Suite
282 Lawrence, KS 66047 On Thu, Mar 31, 2016 at 8:15 AM, Andrew Lake <> wrote: Hi, Have you checked for a failing disk or bad memory on the host question? It could be something else, but I’ve seen similar before on our ESnet hosts when we have had hardware
failures. Thanks, Andy On March 30, 2016 at 6:07:34 PM, Casey Russell () wrote:
|
- Re: [perfsonar-user] Cassandra runaway CPU, Casey Russell, 04/05/2016
- RE: [perfsonar-user] Cassandra runaway CPU, Garnizov, Ivan (RRZE), 04/06/2016
- Re: [perfsonar-user] Cassandra runaway CPU, Casey Russell, 04/06/2016
- Re: [perfsonar-user] Cassandra runaway CPU, Andrew Lake, 04/06/2016
- Re: [perfsonar-user] Cassandra runaway CPU, Casey Russell, 04/06/2016
- RE: [perfsonar-user] Cassandra runaway CPU, Garnizov, Ivan (RRZE), 04/06/2016
Archive powered by MHonArc 2.6.16.