Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] New Central MA

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] New Central MA


Chronological Thread 
  • From: Casey Russell <>
  • To: Andrew Lake <>
  • Cc: "" <>, "Garnizov, Ivan (RRZE)" <>
  • Subject: Re: [perfsonar-user] New Central MA
  • Date: Wed, 4 Oct 2017 08:39:20 -0500
  • Ironport-phdr: 9a23:qQyDWRS840YWbQAL2FrqXKQ299psv+yvbD5Q0YIujvd0So/mwa67ZBeFt8tkgFKBZ4jH8fUM07OQ6PGwHzRYqb+681k6OKRWUBEEjchE1ycBO+WiTXPBEfjxciYhF95DXlI2t1uyMExSBdqsLwaK+i764jEdAAjwOhRoLerpBIHSk9631+ev8JHPfglEnjSwbLdxIRmssQndqtQdjJd/JKo21hbHuGZDdf5MxWNvK1KTnhL86dm18ZV+7SleuO8v+tBZX6nicKs2UbJXDDI9M2Ao/8LrrgXMTRGO5nQHTGoblAdDDhXf4xH7WpfxtTb6tvZ41SKHM8D6Uaw4VDK/5KpwVhTmlDkIOCI48GHPi8x/kqRboA66pxdix4LYeZyZOOZicq/Ye94RWGhPUdtLVyFZAIy8YYsBAeQCM+hFsYfyu0ADogGiCQS2Hu7j1iNEi33w0KYn0+ohCwbG3Ak4Et8QqnvUt8v6NacPWu6p0anI1i7DYO1S2Tf59YPFdQwuoeuRXbJrasre100vFwfeg1WTs4PlOSmV2foLs2SB6epvS/6vhnchpgpsrDavwcIshZPIhoIT0l3E8SR5wIA2JdKmVUF7YNikHIFMuCGdMot6Wt8iQm9puCYm1r0Jp4S7cDIWx5Qgwh7Tc/OHc4+P4hLsUOaePy10i25ieLK6nxq+7Emtx+LmWsWpzVpHoDBJnsXWunwRzxDe68yKRuF/80qjwzqDyxrf5v1cLUA7lKrbN54hwqMrmZYJrUvDGSr2lF3rgKKXeUgo4PWk5/npb7n8qZKRNJV4hhz/P6g2mcywG+I4MhIQUGid4+i80qPs/VHhTLlXivA7kbPVvI7VKMkYvKK1HxVZ3pol5h2iDDmmyMwVkWcGIV5Zeh+KiobpNlLVL/zkCPqyjEignCtlyv3DIrLtHpTAI33Gnbv/Y7p95UhRwxcpw99F/ZJbELQBLerzWkDvsNzYCQc0MwmuzObmDNVxz4QeWWOTDqOAP6Ler0GE5uw1L+mDY48Vvzn9K/w76PL0kXA5nlodcbGo3ZsRdn+4AuxrL1uYbHbwgNoMFGkKswklQ+D2jVCPXiJfa2q8Uq85+j43FIOrAILGS4yznrCB0j+3HphMaWBHDlCMH23od4KBW/oUdC2SONJhkicfVbe/T48h0QqjtAzgxLphNOrb5CsYuYjl1Ndr++3fjQsy+iBsD8SBz2GNSHl5nnsWSD8s0qB/ukt9ylGZ3qhimvBYCMdT6O1TUgohMZ7czvd6C8zpWg7fZNuJSVCmQsm4DjErSNI+3cMOb1hnF9q8kx/DwnniP7hAvrqHHpEruobV3HW5c897x2fu2bJniVQ6FJhhL2qj04V27AXCT7TUiF6Uk6LiIa8GwTXW+WOH5WmHultVVkh2XLmTDiNXXVffsdmsvhCKdLSpE7lyd1IZkcM=

Thanks Andrew,

     I thought I had covered myself.  I changed the TTL on that DNS record to 1hr, several days before making these changes. Then on the day in question (yesterday) I made the DNS change to put the new VM in place of the old.  Then I flushed the cache on all our KanREN DNS resolvers after the change, and finally, I did a "nscd --invalidate=hosts" on each of our PS nodes after that to solve local cacheing problems.

     But you appear to be correct, even with all that happening, it appears this morning that most of the problems have aged out and about 90% of the tests are now running clean.  I only have tests from two latency IPs that are still failing dependably.  I suspect, given enough time, they'll correct themselves as well.  It looks like I just didn't understand the inner workings well enough to have the right expectations about how these tests would adjust.  Thank you for your help.


Sincerely,
Casey Russell
Network Engineer
KanREN
phone785-856-9809
2029 Becker Drive, Suite 282
Lawrence, Kansas 66047
linkedin twitter twitter

On Wed, Oct 4, 2017 at 8:02 AM, Andrew Lake <> wrote:
Hi Casey,

I think maybe you just needed to wait? As of this morning I see 384 test results registered in the last two minutes including a bunch from ps-wsu-lt.perfsonar.kanren.net. I think maybe you fixed it but it took some time to clear out the old powstream tasks (this is something we addressed in 4.0.2). I picked through a bunch of the newer tasks on ps-wsu-bw/lt and they seemed ok as well as poked around the archive (e.g. http://ps-dashboard.perfsonar.kanren.net/esmond/perfsonar/archive/?time-range=120&limit=50). Let us know if anything is still amiss. 

Thanks,
Andy





On October 4, 2017 at 1:27:54 AM, Garnizov, Ivan (RRZE) () wrote:

Hello Casey,

 

I can hardly think of a reason for these failures.

Please make sure there are no password statements in your /etc/perfsonar/meshconfig-agent-tasks.conf file for the pS MA in question (in case there are other Esmond references)

 

You should also note, that with the DNS change, there will be additional problems. These will stem from the fact, that pScheduler has a schedule for 24h in advance. So everything that had already been scheduled will be send to the wrong address.

The local DNS caching will further contribute to the problem.

 

Here https://github.com/perfsonar/pscheduler/wiki/Archivers you will also find interesting examples on how to instantiate customized pscheduler tasks with specific measurement archive to get around the problem above in your diagnostics.

 

Regards,

Ivan Garnizov

 

From: [mailto:] On Behalf Of Casey Russell
Sent: Dienstag, 3. Oktober 2017 23:23
To:
Subject: Re: [perfsonar-user] New Central MA

 

Group,

 

     Ok, initially, I thought this was the Esmond authentication tokens (username or ip authentication).  Now I'm not sure.  Today I rebuilt the central MA as Ivan suggested with 3 individual entries for IP address authentication

 

python esmond/manage.py add_user_ip_address kanren_1 164.113.0.0/16

python esmond/manage.py add_user_ip_address kanren_2 198.248.0.0/16

python esmond/manage.py add_user_ip_address kanren_3 69.77.0.0/17

 

and a single user named "kanren7"

 

I went to several of my testing hosts and modified the /etc/perfsonar/meshconfig-agent-tasks.conf file and modified the measurement archive stanzas.  A few hosts have no username and password (to force IP authentication) a few of them have the correct username and password for the "kanren7" user.

 

Virtually all of my tests are still failing to archive to this new MA.  As a reminder, the MA was a clean install of CentOS7 with the centralmanagement bundle. 

 

From a testing host (/var/log/perfsonar/pscheduler.log)

Oct  3 16:14:48 ps-wsu-bw archiver WARNING  13603154: Failed to archive https://localhost/pscheduler/tasks/743139d6-b1e9-4261-8c30-4bc2433a0d20/runs/27a37e6d-23b6-42ab-b9ae-2376694403da to esmond: 401: Invalid token.

Oct  3 16:14:48 ps-wsu-bw archiver WARNING  13603158: Failed to archive https://localhost/pscheduler/tasks/8c30e451-c257-4b70-8bc3-454950b5bab8/runs/250b4304-bf86-4459-801d-642168c147e2 to esmond: 401: Invalid token.

Oct  3 16:14:48 ps-wsu-bw archiver WARNING  13603156: Failed to archive https://localhost/pscheduler/tasks/4ce46425-1294-4f34-8370-ce0ce71a7f0b/runs/ed47fe16-cf32-422d-be30-0d009ffe77f8 to esmond: 401: Invalid token.

Oct  3 16:14:49 ps-wsu-bw archiver WARNING  13603098: Failed to archive https://localhost/pscheduler/tasks/c4456cf7-a91e-4907-b562-7906d6636dff/runs/250f98c1-4cde-41dd-8350-8b82a78c2b31 to esmond: Archiver permanently abandoned registering test after 2 attempt(s): 401: Invalid token.

 

 

from the MA (/var/log/httpd/access_log)

164.113.32.153 - - [03/Oct/2017:16:16:55 -0500] "POST /esmond/perfsonar/archive/ HTTP/1.1" 401 27 "-" "python-requests/2.6.0 CPython/2.6.6 Linux/2.6.32-696.10.3.el6.x86_64"

164.113.32.153 - - [03/Oct/2017:16:16:55 -0500] "POST /esmond/perfsonar/archive/ HTTP/1.1" 401 27 "-" "python-requests/2.6.0 CPython/2.6.6 Linux/2.6.32-696.10.3.el6.x86_64"

164.113.32.153 - - [03/Oct/2017:16:16:55 -0500] "POST /esmond/perfsonar/archive/ HTTP/1.1" 401 27 "-" "python-requests/2.6.0 CPython/2.6.6 Linux/2.6.32-696.10.3.el6.x86_64"

164.113.32.153 - - [03/Oct/2017:16:16:55 -0500] "POST /esmond/perfsonar/archive/ HTTP/1.1" 401 27 "-" "python-requests/2.6.0 CPython/2.6.6 Linux/2.6.32-696.10.3.el6.x86_64"

164.113.32.153 - - [03/Oct/2017:16:16:55 -0500] "POST /esmond/perfsonar/archive/ HTTP/1.1" 401 27 "-" "python-requests/2.6.0 CPython/2.6.6 Linux/2.6.32-696.10.3.el6.x86_64"

164.113.32.105 - - [03/Oct/2017:16:16:55 -0500] "POST /esmond/perfsonar/archive/ HTTP/1.1" 401 27 "-" "python-requests/2.6.0 CPython/2.7.5 Linux/3.10.0-693.2.2.el7.x86_64"

164.113.32.153 - - [03/Oct/2017:16:16:55 -0500] "POST /esmond/perfsonar/archive/ HTTP/1.1" 401 27 "-" "python-requests/2.6.0 CPython/2.6.6 Linux/2.6.32-696.10.3.el6.x86_64"

 

the MA is at https://ps-dashboard.perfsonar.kanren.net/esmond/perfsonar/archive/  (although since that DNS name changed in the last 48 hours and it may be cached in some places, you may want to be safe:  https://164.113.48.16/esmond/perfsonar/archive/  You'll see that a few tests HAVE archived (perhaps 1 out of every 300) but I can't find any rhyme or reason as to which ones or why.  Some of the tests that DID work came from a host where hundreds of other tests have failed both before and after.

 

Should I wait longer?  Is there some process I should restart on the testing host or MA host after making these changes?   Is it possible I've missed some part of the HTTPD configuration or and esmond config step?  anyone have any ideas?

 

 


 

Sincerely,

Casey Russell

Network Engineer

KanREN

2029 Becker Drive, Suite 282
Lawrence, Kansas 66047

linkedintwittertwitter

 

On Tue, Oct 3, 2017 at 10:16 AM, Casey Russell <> wrote:

Ivan,

 

     When I set up the new MA, I did (per the examples) use a single identifier, with multiple IP blocks, for example 

 

python esmond/manage.py add_user_ip_address kanren_v4 164.113.0.0/16 198.248.0.0/16 69.77.0.0/17

 

     Are you saying you've had better luck breaking those up into 3 different entries with 3 different ids like so?

 

python esmond/manage.py add_user_ip_address kanren_1 164.113.0.0/16

python esmond/manage.py add_user_ip_address kanren_2 198.248.0.0/16

python esmond/manage.py add_user_ip_address kanren_3 69.77.0.0/17


 

Sincerely,

Casey Russell

Network Engineer

KanREN

2029 Becker Drive, Suite 282
Lawrence, Kansas 66047

linkedintwittertwitter

 

On Tue, Oct 3, 2017 at 12:14 AM, Garnizov, Ivan (RRZE) <> wrote:

Hello Casey,

 

Could it be the case, that you are using one and the same identifier for all IP registrations?

In my procedures dating back from 3.5.1 I had to generate different ID for every IP added, unless you add them in network groups.

Please note, this might have changed with the upgrades of pS, but it makes also perfect sense, if it is the case to get these messages on your attempts for subsequent IP authorizations and still  get denied on service requests from “authorized” systems.

 

I believe it would be quite easy to check this suggestion for a single IP and then later consider a more global approach.

 

Regards,

Ivan Garnizov

 

From: [mailto:] On Behalf Of Casey Russell
Sent: Freitag, 29. September 2017 23:41
To:
Subject: [perfsonar-user] New Central MA

 

Group,

 

     I've recently activated my new central MA, but posts to the esmond database seem to be failing.  

 

2001:49d0:23c0:1003::2 - - [29/Sep/2017:16:33:37 -0500] "POST /esmond/perfsonar/archive/ HTTP/1.1" 401 27 "-" "python-requests/2.6.0 CPython/2.6.6 Linux/2.6.32-696.6.3.el6.x86_64"

2001:49d0:23c0:1003::2 - - [29/Sep/2017:16:33:37 -0500] "POST /esmond/perfsonar/archive/ HTTP/1.1" 401 27 "-" "python-requests/2.6.0 CPython/2.6.6 Linux/2.6.32-696.6.3.el6.x86_64"

2001:49d0:23c0:1003::2 - - [29/Sep/2017:16:33:38 -0500] "POST /esmond/perfsonar/archive/ HTTP/1.1" 401 27 "-" "python-requests/2.6.0 CPython/2.6.6 Linux/2.6.32-696.6.3.el6.x86_64"

2001:49d0:23c0:1003::2 - - [29/Sep/2017:16:33:38 -0500] "POST /esmond/perfsonar/archive/ HTTP/1.1" 401 27 "-" "python-requests/2.6.0 CPython/2.6.6 Linux/2.6.32-696.6.3.el6.x86_64"

2001:49d0:23c0:1003::2 - - [29/Sep/2017:16:33:38 -0500] "POST /esmond/perfsonar/archive/ HTTP/1.1" 401 27 "-" "python-requests/2.6.0 CPython/2.6.6 Linux/2.6.32-696.6.3.el6.x86_64"

 

The 401 would indicate that they're "unauthorized" although they should be allowed by IP (v6) 

 

(esmond)[root@ps-dashboard esmond]# python esmond/manage.py add_user_ip_address kanren_v6 2001:49d0::/32

<clipping some stuff here for brevity>

Setting timeseries permissions.

IP 2001:49d0::/32 already assigned to kanren_v6, skipping creation

 

My reading of the documentation indicates if these testing hosts in the mesh are trying to submit an old API key and username, when that fails, it will fall back to IP authorization.  Is that correct?  Is this 401 caused by something in the httpd configs and not esmond specifically?  

 

I'm open to any guidance here.

 

Sincerely,

Casey Russell

Network Engineer

KanREN

2029 Becker Drive, Suite 282
Lawrence, Kansas 66047

linkedintwittertwitter

 

 





Archive powered by MHonArc 2.6.19.

Top of Page