Skip to Content.
Sympa Menu

perfsonar-user - [perfsonar-user] Central MA database size snuck up on me

Subject: perfSONAR User Q&A and Other Discussion

List archive

[perfsonar-user] Central MA database size snuck up on me


Chronological Thread 
  • From: Casey Russell <>
  • To: "" <>
  • Subject: [perfsonar-user] Central MA database size snuck up on me
  • Date: Wed, 26 Jul 2017 10:15:07 -0500
  • Ironport-phdr: 9a23:hwt+GRCZN3U1hvKckrvoUyQJP3N1i/DPJgcQr6AfoPdwSPTzpMbcNUDSrc9gkEXOFd2CrakV26yO6+jJYi8p2d65qncMcZhBBVcuqP49uEgeOvODElDxN/XwbiY3T4xoXV5h+GynYwAOQJ6tL1LdrWev4jEMBx7xKRR6JvjvGo7Vks+7y/2+94fdbghMhzexe69+IAmrpgjNq8cahpdvJLwswRXTuHtIfOpWxWJsJV2Nmhv3+9m98p1+/SlOovwt78FPX7n0cKQ+VrxYES8pM3sp683xtBnMVhWA630BWWgLiBVIAgzF7BbnXpfttybxq+Rw1DWGMcDwULs5Xymp4aV2Rx/ykCoJNzA3/mLKhMJukK1WuwiuqwBlzoPOfI2ZKPhzc6XAdt0aX2pBWcNRWjRfD4ymdIsAEeoANvtEoYngvFsOtgWxBQ2oBOjyzTJHmmX23bAh0+Q6Dw7G2AggEskNsHvOqtX1LrkdUeavwKnO0zrDc+pb1DHg44bGdRAhpOuDXbN2ccfJzUkvFgXFjlaOpoP4PjOV0P4BvHSc7+plTe6vl2AmqwBtojiz2MgskJPFiZ4SylDB7Sl5w5w6JduiSEFlZ96oCp1QuD+GN4ZwX8gsQHlotT4kxrAHpZK2fi0HyJokyhHEd/CKdoeF7g7/WOmNJDp3mG5pdbG6ihu370Ss1PPzW8+p21hQtCVFiMPDtnUV2hzT9MeHTvx981+k2TmV1gDT7vhIIV0umqbHMpIgzaA8moQNvkjZES/2n0L2jKCSdko64OSn9+PnYrD+qp+dMY97lB3+P7wwlsG+Heg1MA0DX2aY9OunyLHu+EL0TKlWgvIql6TWrIzWKMceq6O8HQNZzJgs5w66Dzi80dQYmXcHLEhCeBKCl4XmJ0vOIO3jDfeknVuslDNryuvFPrL7BJXNNGbMkLH7cbZ79UFc1BI/zcpD6JJMFrEBPPXzV1fptNPGFB85PRe0w+HhCNpnzIMSQH+PArSHP6PIqlKI4uMvI/KQZI8OpjrxMfkl5/jyjXAng18de7em3YcJZHyiAPtpPliZMjLQhYJLCWoQsBE5SuXwzUCZXCR7ZnCuUrg66y1hTo+qEM2LEpigmrKa2yGyBNhLfW1cIlGKDXrycYiYAbEBZD/EceF7lTlRfrG6Rp5p7wy1rwL+z/IzJfDJ4TYVsZbL19Fz/eDV0xc/6WonXIymz2iRQjQszSszTDgs0fUnrA==

Group,

     I dug myself a hole and I only see a couple of ways out now.  I wasn't watching the database size on my central MA and my disk utilization is now over 90%.  I've tried using the ps_remove_data.py script several times with several different variations on the config script, but it will invariably end some minutes or hours later with a timeout like this:

Sending request to delete 24 rows for metadata_key=1bdb8f32fe9d4194828d134f37fb37b0, event_type=histogram-rtt, summary_type=aggregation, summary_window=3600
Deleted 24 rows for metadata_key=1bdb8f32fe9d4194828d134f37fb37b0, event_type=histogram-rtt, summary_type=aggregation, summary_window=3600
Query error for metadata_key=1bdb8f32fe9d4194828d134f37fb37b0, event_type=histogram-rtt, summary_type=base, summary_window=0, begin_time=0, expire_time=1485529380
Query error for metadata_key=1bdb8f32fe9d4194828d134f37fb37b0, event_type=histogram-rtt, summary_type=statistics, summary_window=0, begin_time=0, expire_time=1485529380
Sending request to delete 24 rows for metadata_key=1bdb8f32fe9d4194828d134f37fb37b0, event_type=histogram-rtt, summary_type=statistics, summary_window=3600
Deleted 24 rows for metadata_key=1bdb8f32fe9d4194828d134f37fb37b0, event_type=histogram-rtt, summary_type=statistics, summary_window=3600
Query error for metadata_key=1bdb8f32fe9d4194828d134f37fb37b0, event_type=histogram-ttl-reverse, summary_type=base, summary_window=0, begin_time=0, expire_time=1469545381
Query error for metadata_key=1bdb8f32fe9d4194828d134f37fb37b0, event_type=packet-count-lost-bidir, summary_type=base, summary_window=0, begin_time=0, expire_time=1469545381
Query error for metadata_key=1bdb8f32fe9d4194828d134f37fb37b0, event_type=packet-duplicates-bidir, summary_type=base, summary_window=0, begin_time=0, expire_time=1469545503
Sending request to delete 24 rows for metadata_key=1bdb8f32fe9d4194828d134f37fb37b0, event_type=packet-loss-rate-bidir, summary_type=aggregation, summary_window=3600
Deleted 24 rows for metadata_key=1bdb8f32fe9d4194828d134f37fb37b0, event_type=packet-loss-rate-bidir, summary_type=aggregation, summary_window=3600
Query error for metadata_key=1bdb8f32fe9d4194828d134f37fb37b0, event_type=packet-loss-rate-bidir, summary_type=base, summary_window=0, begin_time=0, expire_time=1469545505
Query error for metadata_key=1bdb8f32fe9d4194828d134f37fb37b0, event_type=packet-reorders-bidir, summary_type=base, summary_window=0, begin_time=0, expire_time=1469545509
Sending request to delete 6 rows for metadata_key=1be4b626486c46be88776b3530819ce8, event_type=packet-count-lost, summary_type=base, summary_window=0
Deleted 6 rows for metadata_key=1be4b626486c46be88776b3530819ce8, event_type=packet-count-lost, summary_type=base, summary_window=0
Sending request to delete 6 rows for metadata_key=1be4b626486c46be88776b3530819ce8, event_type=packet-count-lost, summary_type=base, summary_window=0
Error: Retried 1 times. Last failure was timeout: timed out

[root@ps-dashboard esmond]# du -h /var/lib/cassandra/data/esmond/
47G     /var/lib/cassandra/data/esmond/raw_data
4.0K    /var/lib/cassandra/data/esmond/stat_aggregations
9.9G    /var/lib/cassandra/data/esmond/rate_aggregations
13G     /var/lib/cassandra/data/esmond/base_rates
69G     /var/lib/cassandra/data/esmond/

[root@ps-dashboard esmond]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup-lv_root
                       95G   82G  8.1G  91% /
tmpfs                 3.9G  4.0K  3.9G   1% /dev/shm
/dev/sda1             477M   99M  353M  22% /boot

At the time of the "timeout" as I watch, the disk reaches 100% utilization.  It appears to me that during the deletion of rows, Cassandra/Esmond uses chunks of disk space to store temporary data, and flushes that data.  During the process the disk utilization varies up and down from 91% to 100% until it finally reaches full and the timeout error occurs.

At the end of the failed attempt, even if I restart cassandra, the disk space utilization is approximately what it was before the failed run.  

So, without enough disk space to finish the ps_remove_data.py script, it would appear to me, I have two options.  Delete all my data and start over with a clean database, or shut the machine down and allocate more space to it (it's a VM, but I can't add the space "hot").

Before I take one of those approaches, does anyone else have other ideas or thoughts?

Sincerely,
Casey Russell
Network Engineer
KanREN
phone785-856-9809
2029 Becker Drive, Suite 282
Lawrence, Kansas 66047
linkedin twitter twitter



Archive powered by MHonArc 2.6.19.

Top of Page