Skip to Content.
Sympa Menu

perfsonar-user - [perfsonar-user] Opensearch running out of shards (1000/1000)

Subject: perfSONAR User Q&A and Other Discussion

List archive

[perfsonar-user] Opensearch running out of shards (1000/1000)


Chronological Thread 
  • From: Tim Chown <>
  • To: "" <>
  • Subject: [perfsonar-user] Opensearch running out of shards (1000/1000)
  • Date: Wed, 13 Sep 2023 14:09:47 +0000
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=jisc.ac.uk; dmarc=pass action=none header.from=jisc.ac.uk; dkim=pass header.d=jisc.ac.uk; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=T9Pjd9xDjLadHtLKy9jzqtJRG2xzDT4lxQ0tQMLxNOg=; b=QuLftDPDBTqhwQ2sVjDnB3q78wF9VmOdcO19HKGaS3xKaP6rMxMvWDX6XmkDt0EL+7rYxr6oYX2ml91dS+Az/5JNprq37UvmBxh1bNuy4t5+I5T11HeCX1XPNtNWrO6QFs0gjY9JNXS1vLFdye4o0VBoIfneg7W45Nlde+GgVj4o+3GB2b3rmitoILTpjNzOf5htB39HikIa2WnDirkhdcBETVACjNu3BRIZtb87+NBWIItPwJ3nXWCq9Z86gKQA8X3vEbnOx4//gj9gmRC8wN31zYwnFJj5PtpQVTG5/hK3DQ/0v2ccw2jNAhxxZ7RhNROUjccx+nLnpsNOsustuA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=V+GDQF+JUHqx2z6Vc4X/ONnABmr6KYcbp4bXEYqUX3rO2LGzN2ZPJ5NGu5vKWJfPK3HgN1am4dqFi5I4JxuxYcidA5z0oPx/lhsmSgtA9h+O7BgYlK4yWTEQWpiXlboXrD8VnXyHCZ+1DMF7uSNPzHASeQ+YU8UH1WXKrbrtasD4HTjsoTUsPrh5Rjq0D+tCTIzZEHyZ3Gyl+mBxoZit4UDrZ60KFR2R7Gwx9wAc68y5iHdx+khs0uxN213COMmEEMaBcyl5ZfDXCQXROQhFgitq9h/EMMBTU27Gi5dRhqF2NCpAc3ZqnleedoOwAyfoTTp8u8TiYSXtAtLuDMO4GQ==

Hi,

We’re seeing servers running out of shards and thus perfSONAR falling over,
and thus some rather barren maddish views.

An example error:

" "ip"=>"2001:630:1:112:0:0:0:3"}},
"reference"=>{"psconfig"=>{"created-by"=>{"uuid"=>"61E213D2-F410-11ED-B798-E28714F07E7B",
"user-agent"=>"psconfig-pscheduler-agent"}}},
"id"=>"dc9c56e1-eaeb-4751-998f-7ce6d3c8c623"}],
:response=>{"index"=>{"_index"=>"pscheduler_latencybg-2023.09.13",
"_id"=>nil, "status"=>400, "error"=>{"type"=>"validation_exception",
"reason"=>"Validation Failed: 1: this action would add [2] total shards, but
this cluster currently has [1000]/[1000] maximum shards open;"}}}}”

It looks like a latency test is trying to save a result, but wants 2 more
shards and opensearch is capped at 1000/1000. That seems a lot of shards - is
it creating a lot of new indices which in turn need more shards?

We’ve seen this on multiple systems. The error above is from
http://ps-london-bw.perf.ja.net/toolkit/ which only has 67 tests running,
with 100G interfaces.

Is this a known problem? I can’t immediately find an open issue mentioning
shards.

Thanks,
Tim


Archive powered by MHonArc 2.6.24.

Top of Page