perfsonar-user - [perfsonar-user] Central MA database size snuck up on me
Subject: perfSONAR User Q&A and Other Discussion
List archive
- From: Casey Russell <>
- To: "" <>
- Subject: [perfsonar-user] Central MA database size snuck up on me
- Date: Wed, 26 Jul 2017 10:15:07 -0500
- Ironport-phdr: 9a23:hwt+GRCZN3U1hvKckrvoUyQJP3N1i/DPJgcQr6AfoPdwSPTzpMbcNUDSrc9gkEXOFd2CrakV26yO6+jJYi8p2d65qncMcZhBBVcuqP49uEgeOvODElDxN/XwbiY3T4xoXV5h+GynYwAOQJ6tL1LdrWev4jEMBx7xKRR6JvjvGo7Vks+7y/2+94fdbghMhzexe69+IAmrpgjNq8cahpdvJLwswRXTuHtIfOpWxWJsJV2Nmhv3+9m98p1+/SlOovwt78FPX7n0cKQ+VrxYES8pM3sp683xtBnMVhWA630BWWgLiBVIAgzF7BbnXpfttybxq+Rw1DWGMcDwULs5Xymp4aV2Rx/ykCoJNzA3/mLKhMJukK1WuwiuqwBlzoPOfI2ZKPhzc6XAdt0aX2pBWcNRWjRfD4ymdIsAEeoANvtEoYngvFsOtgWxBQ2oBOjyzTJHmmX23bAh0+Q6Dw7G2AggEskNsHvOqtX1LrkdUeavwKnO0zrDc+pb1DHg44bGdRAhpOuDXbN2ccfJzUkvFgXFjlaOpoP4PjOV0P4BvHSc7+plTe6vl2AmqwBtojiz2MgskJPFiZ4SylDB7Sl5w5w6JduiSEFlZ96oCp1QuD+GN4ZwX8gsQHlotT4kxrAHpZK2fi0HyJokyhHEd/CKdoeF7g7/WOmNJDp3mG5pdbG6ihu370Ss1PPzW8+p21hQtCVFiMPDtnUV2hzT9MeHTvx981+k2TmV1gDT7vhIIV0umqbHMpIgzaA8moQNvkjZES/2n0L2jKCSdko64OSn9+PnYrD+qp+dMY97lB3+P7wwlsG+Heg1MA0DX2aY9OunyLHu+EL0TKlWgvIql6TWrIzWKMceq6O8HQNZzJgs5w66Dzi80dQYmXcHLEhCeBKCl4XmJ0vOIO3jDfeknVuslDNryuvFPrL7BJXNNGbMkLH7cbZ79UFc1BI/zcpD6JJMFrEBPPXzV1fptNPGFB85PRe0w+HhCNpnzIMSQH+PArSHP6PIqlKI4uMvI/KQZI8OpjrxMfkl5/jyjXAng18de7em3YcJZHyiAPtpPliZMjLQhYJLCWoQsBE5SuXwzUCZXCR7ZnCuUrg66y1hTo+qEM2LEpigmrKa2yGyBNhLfW1cIlGKDXrycYiYAbEBZD/EceF7lTlRfrG6Rp5p7wy1rwL+z/IzJfDJ4TYVsZbL19Fz/eDV0xc/6WonXIymz2iRQjQszSszTDgs0fUnrA==
Group,
I dug myself a hole and I only see a couple of ways out now. I wasn't watching the database size on my central MA and my disk utilization is now over 90%. I've tried using the ps_remove_data.py script several times with several different variations on the config script, but it will invariably end some minutes or hours later with a timeout like this:
Sending request to delete 24 rows for metadata_key=1bdb8f32fe9d4194828d134f37fb37b0, event_type=histogram-rtt, summary_type=aggregation, summary_window=3600
Deleted 24 rows for metadata_key=1bdb8f32fe9d4194828d134f37fb37b0, event_type=histogram-rtt, summary_type=aggregation, summary_window=3600
Query error for metadata_key=1bdb8f32fe9d4194828d134f37fb37b0, event_type=histogram-rtt, summary_type=base, summary_window=0, begin_time=0, expire_time=1485529380
Query error for metadata_key=1bdb8f32fe9d4194828d134f37fb37b0, event_type=histogram-rtt, summary_type=statistics, summary_window=0, begin_time=0, expire_time=1485529380
Sending request to delete 24 rows for metadata_key=1bdb8f32fe9d4194828d134f37fb37b0, event_type=histogram-rtt, summary_type=statistics, summary_window=3600
Deleted 24 rows for metadata_key=1bdb8f32fe9d4194828d134f37fb37b0, event_type=histogram-rtt, summary_type=statistics, summary_window=3600
Query error for metadata_key=1bdb8f32fe9d4194828d134f37fb37b0, event_type=histogram-ttl-reverse, summary_type=base, summary_window=0, begin_time=0, expire_time=1469545381
Query error for metadata_key=1bdb8f32fe9d4194828d134f37fb37b0, event_type=packet-count-lost-bidir, summary_type=base, summary_window=0, begin_time=0, expire_time=1469545381
Query error for metadata_key=1bdb8f32fe9d4194828d134f37fb37b0, event_type=packet-duplicates-bidir, summary_type=base, summary_window=0, begin_time=0, expire_time=1469545503
Sending request to delete 24 rows for metadata_key=1bdb8f32fe9d4194828d134f37fb37b0, event_type=packet-loss-rate-bidir, summary_type=aggregation, summary_window=3600
Deleted 24 rows for metadata_key=1bdb8f32fe9d4194828d134f37fb37b0, event_type=packet-loss-rate-bidir, summary_type=aggregation, summary_window=3600
Query error for metadata_key=1bdb8f32fe9d4194828d134f37fb37b0, event_type=packet-loss-rate-bidir, summary_type=base, summary_window=0, begin_time=0, expire_time=1469545505
Query error for metadata_key=1bdb8f32fe9d4194828d134f37fb37b0, event_type=packet-reorders-bidir, summary_type=base, summary_window=0, begin_time=0, expire_time=1469545509
Sending request to delete 6 rows for metadata_key=1be4b626486c46be88776b3530819ce8, event_type=packet-count-lost, summary_type=base, summary_window=0
Deleted 6 rows for metadata_key=1be4b626486c46be88776b3530819ce8, event_type=packet-count-lost, summary_type=base, summary_window=0
Sending request to delete 6 rows for metadata_key=1be4b626486c46be88776b3530819ce8, event_type=packet-count-lost, summary_type=base, summary_window=0
Error: Retried 1 times. Last failure was timeout: timed out
[root@ps-dashboard esmond]# du -h /var/lib/cassandra/data/esmond/
47G /var/lib/cassandra/data/esmond/raw_data
4.0K /var/lib/cassandra/data/esmond/stat_aggregations
9.9G /var/lib/cassandra/data/esmond/rate_aggregations
13G /var/lib/cassandra/data/esmond/base_rates
69G /var/lib/cassandra/data/esmond/
[root@ps-dashboard esmond]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup-lv_root
95G 82G 8.1G 91% /
tmpfs 3.9G 4.0K 3.9G 1% /dev/shm
/dev/sda1 477M 99M 353M 22% /boot
At the time of the "timeout" as I watch, the disk reaches 100% utilization. It appears to me that during the deletion of rows, Cassandra/Esmond uses chunks of disk space to store temporary data, and flushes that data. During the process the disk utilization varies up and down from 91% to 100% until it finally reaches full and the timeout error occurs.
At the end of the failed attempt, even if I restart cassandra, the disk space utilization is approximately what it was before the failed run.
So, without enough disk space to finish the ps_remove_data.py script, it would appear to me, I have two options. Delete all my data and start over with a clean database, or shut the machine down and allocate more space to it (it's a VM, but I can't add the space "hot").
Before I take one of those approaches, does anyone else have other ideas or thoughts?
- [perfsonar-user] Central MA database size snuck up on me, Casey Russell, 07/26/2017
- Re: [perfsonar-user] Central MA database size snuck up on me, Andrew Lake, 07/26/2017
- Re: [perfsonar-user] Central MA database size snuck up on me, Casey Russell, 07/26/2017
- Re: [perfsonar-user] Central MA database size snuck up on me, Andrew Lake, 07/26/2017
- Re: [perfsonar-user] Central MA database size snuck up on me, Casey Russell, 07/26/2017
- Re: [perfsonar-user] Central MA database size snuck up on me, Andrew Lake, 07/26/2017
Archive powered by MHonArc 2.6.19.