Hello Andy and Ivan,
I don’t know if this might provide any insight but I ended up purging the data in esmond on the new MA. After I did that some of the tests started reporting. My maddash dashboard on the new MA is still reporting missing data for some tests so I’m working
through those issues. I’ve postponed any data migration from the old MA to the new for now.
Thanks,
George
From: Andrew Lake <>
Date: Monday, October 8, 2018 at 9:34 AM
To: "Garnizov, Ivan ()" <>
Cc: George Uhl <>, perfsonar-user <>, "Noss, Martyn J. (GSFC-423.0)[InuTeq, LLC]"
<>, "DeBold, Daniel F. (GSFC-423.0)[InuTeq, LLC]" <>
Subject: Re: [non-nasa source] [perfsonar-user] FW: http POST 500 527 errors on central archive
Hi George,
No schema change in 4.1. Anything in the log files under /var/log/esmond/? It sounds like some type of problem with cassandra which stores the data. The postgresql portion seems to be working fine since the metadata is in there.
Thanks,
Andy
On Mon, Oct 8, 2018 at 9:10 AM Garnizov, Ivan <> wrote:
Hello George,
From my point of view this is a mystery. In your excerpts you point out at some uri-s, which denote the creation and operation with this MA structure. Your logs
show that there were exact requests with specific identifiers, meaning these have been obtained during the communication between pScheduler and Esmond. This is an odd case.
There is a small chance for failures in the pscheduler communication, so in order to take this out could you please check this command and share the output.
pscheduler result --diags --archivings <<RUN-URL>>
Please also share the versions of the perfSONAR software installed.
Regards,
Ivan Garnizov
GEANT SA1T2: pS deployments GN Operations
GEANT SA2T3: pS development team
GEANT SA3T5: eduPERT team
Jubiläumsjahr 2018 - IT in Bewegung
Das RRZE - der IT-Dienstleister der FAU
www.50-jahre.rrze.fau.de
Von: Uhl, George D. (GSFC-423.0)[SGT INC] [mailto:]
Gesendet: Freitag, 5. Oktober 2018 18:28
An: Andrew Lake <>; Garnizov, Ivan (RRZE) <>
Cc: perfsonar-user <>; Noss, Martyn J. (GSFC-423.0)[InuTeq, LLC] <>; DeBold, Daniel
F. (GSFC-423.0)[InuTeq, LLC] <>
Betreff: Re: [non-nasa source] [perfsonar-user] FW: http POST 500 527 errors on central archive
So I’m still having a problem with the new MA. There is metadata but no data results. Has the esmond DB schema changed over time? When I migrated the cassandra
and postgresql data I followed the instructions for migrating between different architecture/OS’s using node tool and pg_dump/pg_restore. The old MA is Centos 6.10 with pS version 4.0.2. The new MA is Centos 7.5 with pS version 4.1.2. I’m OK with blowing
up anything on the MA and starting over in order to normalize things.
curl http://127.0.0.1/esmond/perfsonar/archive/00e33431866d48db9b1d2bdfa1c9055d/throughput/base
[{"ts":1538176946,"val":342194714.0},{"ts":1538193784,"val":373138965.0},{"ts":1538206654,"val":414243868.0},{"ts":1538233274,"val":392761156.0},{"ts":1538239344,"val":396938506.0},{"ts":1538250393,"val":422353967.0},{"ts":1538250492,"val":327419777.0},{"ts":1538256840,"val":286180836.0},{"ts":1538264494,"val":409137240.0},{"ts":1538276435,"val":387952723.0},{"ts":1538283432,"val":92656195.0},{"ts":1538292387,"val":390943807.0},{"ts":1538297144,"val":337560494.0},{"ts":1538300458,"val":323470335.0},{"ts":1538307132,"val":363868238.0},{"ts":1538310906,"val":395132660.0},{"ts":1538322387,"val":328717941.0},{"ts":1538323883,"val":351052824.0},{"ts":1538335830,"val":464294398.0},{"ts":1538338356,"val":277059462.0},{"ts":1538352429,"val":390164621.0},{"ts":1538352850,"val":413313155.0},{"ts":1538374808,"val":347319943.0},{"ts":1538375897,"val":315045344.0},{"ts":1538393120,"val":301838232.0},{"ts":1538393173,"val":331426905.0},{"ts":1538400930,"val":307827704.0},{"ts":1538403597,"val":401989005.0},{"ts":1538410047,"val":256682546.0},{"ts":1538416296,"val":389765975.0},{"ts":1538422532,"val":419781239.0},{"ts":1538431725,"val":347150448.0},{"ts":1538434429,"val":393347041.0},{"ts":1538454458,"val":334131566.0},{"ts":1538460548,"val":206073951.0},{"ts":1538460904,"val":367130065.0},{"ts":1538467084,"val":331088965.0},{"ts":1538478704,"val":360601494.0},{"ts":1538485677,"val":438744298.0},{"ts":1538493209,"val":125849464.0},{"ts":1538497534,"val":327205637.0},{"ts":1538511358,"val":402021533.0},{"ts":1538517189,"val":254681290.0},{"ts":1538519057,"val":225262101.0},{"ts":1538521535,"val":352387039.0},{"ts":1538531738,"val":379275392.0},{"ts":1538532875,"val":358800094.0},{"ts":1538545841,"val":407971834.0},{"ts":1538547600,"val":327858651.0},{"ts":1538561816,"val":392739035.0},{"ts":1538566059,"val":395114433.0},{"ts":1538573463,"val":338501610.0},{"ts":1538579463,"val":421967555.0},{"ts":1538587897,"val":393370013.0},{"ts":1538590681,"val":266143730.0},{"ts":1538606011,"val":292249668.0},{"ts":1538608003,"val":390442394.0},{"ts":1538624277,"val":438816146.0},{"ts":1538625245,"val":348580647.0},{"ts":1538637398,"val":329249436.0},{"ts":1538649866,"val":320188818.0},{"ts":1538650066,"val":459650575.0},{"ts":1538664794,"val":436696538.0},{"ts":1538670612,"val":393987177.0},{"ts":1538672079,"val":341157918.0},{"ts":1538679183,"val":392162229.0},{"ts":1538682086,"val":278019286.0},{"ts":1538689836,"val":105675807.0},{"ts":1538690536,"val":277924661.0},{"ts":1538699094,"val":178722522.0},{"ts":1538704348,"val":329069993.0},{"ts":1538712563,"val":321945405.0},{"ts":1538724003,"val":330416285.0},{"ts":1538730937,"val":388382301.0},{"ts":1538739704,"val":185038721.0},{"ts":1538744813,"val":118534331.0},{"ts":1538753337,"val":282245395.0}]
Example metadata json snippet from new MA:
{
"url": "http://127.0.0.1/esmond/perfsonar/archive/0a0e1b97ca684443b6fdcd3ccd568cb7/",
"metadata-key": "0a0e1b97ca684443b6fdcd3ccd568cb7",
"subject-type": "point-to-point",
"event-types": [
{
"summaries": [],
"time-updated": 1538688241,
"base-uri": "/esmond/perfsonar/archive/0a0e1b97ca684443b6fdcd3ccd568cb7/failures/base",
"event-type": "failures"
},
{
"summaries": [],
"time-updated": 1466746041,
"base-uri": "/esmond/perfsonar/archive/0a0e1b97ca684443b6fdcd3ccd568cb7/throughput-subintervals/base",
"event-type": "throughput-subintervals"
},
{
"summaries": [
{
"summary-type": "average",
"uri": "/esmond/perfsonar/archive/0a0e1b97ca684443b6fdcd3ccd568cb7/throughput/averages/86400",
"time-updated": 1466746041,
"summary-window": "86400"
}
],
"time-updated": 1466746041,
"base-uri": "/esmond/perfsonar/archive/0a0e1b97ca684443b6fdcd3ccd568cb7/throughput/base",
"event-type": "throughput"
}
],
Missing values from the new MA:
From:
Andrew Lake <>
Date: Thursday, October 4, 2018 at 10:32 AM
To: George Uhl <>
Cc: perfsonar-user <>
Subject: Re: [non-nasa source] [perfsonar-user] FW: http POST 500 527 errors on central archive
Hi George,
Glad you were able to find the source of the issue. I captured the info you shared in a documentation issue. It sounds like we should make a note of that somewhere
since it is not immediately obvious the cause but easy to run into.
On Wed, Oct 3, 2018 at 4:35 PM Uhl, George D. (GSFC-423.0)[SGT INC] <> wrote:
It turns out it was an selinux issue. I’m running enforcing/active and somehow the esmond.log file was impacted after I attempted to migrate the data this morning.
I dug around and found an email from Jon Clausen dated Thurs Sep 14, 2017, 5:56 am and followed the procedure outlined (see below). It worked – the HTTP 500 errors have stopped. I’m going to change selinux to permissive and go from there. How/why this stopped
working is still unknown.
Our measurement archive is installed from the CentOS 7 *.iso, with perfSonar
components installed from the internet2 repo.
I chose to leave SELinux active/enforcing, which is actually not causing
many problems, generally - to my pleasant surprise.
The system has yum-cron installed, so updates happen 'whenever'.
Early this morning there were a lot of package updates, and subsequently the
toolkit URL was unable to load test results, showing an 'internal server
The cause turned out to be a change in SELinux 'type' on a couple of log
files, which resulted in permission denied errors in
[Thu Sep 14 10:59:19.652856 2017] [:error] [pid 14147] [remote
] ValueError: Unable to configure handler 'esmond_handler': [Errno 13]
Permission denied: '/var/log/esmond/esmond.log'
The 'cure' would have been to change object type using chcon, or as I did:
move the log file out of the way and restart apache.
The object type change can be seen here:
[jac@psma01 ~]$ ls -Z /var/log/esmond/esmond.log*
-rw-r--r--. apache apache system_u:object_r:httpd_log_t:s0
/var/log/esmond/esmond.log
-rw-r--r--. apache apache system_u:object_r:var_log_t:s0
/var/log/esmond/esmond.log.2017-08-21
This mail is mostly a 'heads up' to other people running the measurement
archive under SELinux, but I guess the info may be relevant to the perfSonar
Should I submit the info somwhere?
From:
<> on behalf of George Uhl <>
Date: Wednesday, October 3, 2018 at 4:12 PM
To: Andrew Lake <>
Cc: perfsonar-user <>
Subject: Re: [non-nasa source] [perfsonar-user] FW: http POST 500 527 errors on central archive
The log files were not copied from the old server. I did a chmod on esmond.log to 666 but that didn’t help either.
# ls -la /var/log/esmond
total 5056
drwxr-xr-x. 3 apache apache 177 Oct 3 14:23
.
drwxr-xr-x. 15 root root 4096 Oct 3 19:11
..
drwxr-xr-x. 2 apache apache 6 Mar 15 2018
crashlog
-rw-r--r--. 1 apache apache 5108885 Oct 3 14:52 django.log
-rw-r--r--. 1 apache apache 8230 Oct 3 14:23 django.log.1
-rw-r--r--. 1 apache apache 8480 Oct 3 14:23 django.log.2
-rw-r--r--. 1 apache apache 8480 Oct 3 14:23 django.log.3
-rw-r--r--. 1 apache apache 12720 Oct 3 14:23 django.log.4
-rw-r--r--. 1 apache apache 4240 Oct 3 14:23 django.log.5
-rw-rw-rw-. 1 apache apache 0 Mar 15 2018 esmond.log
-rw-r--r--. 1 apache apache 0 Mar 15 2018 install.log
From:
Andrew Lake <>
Date: Wednesday, October 3, 2018 at 4:00 PM
To: George Uhl <>
Cc: perfsonar-user <>
Subject: Re: [non-nasa source] [perfsonar-user] FW: http POST 500 527 errors on central archive
Hi George,
The error you are seeing is about the log permissions. Make sure the log files under /var/log/esmond are owned by the "apache" user then restart httpd. Did you
copy over logs from the old host as well?
On Wed, Oct 3, 2018 at 3:51 PM Uhl, George D. (GSFC-423.0)[SGT INC] <> wrote:
It appears that the existing user api key was overwritten during the migration process. Running both
/usr/sbin/esmond_manage add_ps_metadata_post_user
/usr/sbin/esmond_manage add_timeseries_post_user
reveal that the Key was changed to the value in the old archive for the user. Changing the auth-token
in the archives.d/*.json file on the testpoint servers didn’t help. I’m still getting HTTP 500 errors.
From:
<> on behalf of George Uhl <>
Date: Wednesday, October 3, 2018 at 1:52 PM
To: "" <>
Subject: [non-nasa source] [perfsonar-user] FW: http POST 500 527 errors on central archive
From:
George Uhl <>
Date: Wednesday, October 3, 2018 at 1:38 PM
To: "" <>
Subject: http POST 500 527 errors on central archive
I attempted to migrate data from an older pS archive to one built on pS 4.1 by following these
instructions. Unfortunately once I had completed the migration none of my pS testpoint servers are able to post data into esmond. Now I’m getting Code 500 errors instead.
Wed Oct 03 17:30:37.021250 2018] [:error] [pid 2836] [remote x.x.x.x:y] mod_wsgi (pid=2836): Exception occurred processing WSGI script ‘/usr/lib/esmond/esmond/wsgi.py’.
[Wed Oct 03 17:30:37.021312 2018] [:error] [pid 2836] [remote x.x.x.x:y] Traceback (most recent call last):
[Wed Oct 03 17:30:37.021344 2018] [:error] [pid 2836] [remote x.x.x.x:y] File "/usr/lib/esmond/esmond/wsgi.py", line 28, in application
[Wed Oct 03 17:30:37.021433 2018] [:error] [pid 2836] [remote x.x.x.x:y] return get_wsgi_application()(environ, start_response)
[Wed Oct 03 17:30:37.021452 2018] [:error] [pid 2836] [remote x.x.x.x:y] File "/usr/lib/esmond/lib/python2.7/site-packages/django/core/wsgi.py", line 13, in get_wsgi_application
[Wed Oct 03 17:30:37.021482 2018] [:error] [pid 2836] [remote x.x.x.x:y] django.setup(set_prefix=False)
[Wed Oct 03 17:30:37.021496 2018] [:error] [pid 2836] [remote x.x.x.x:y] File "/usr/lib/esmond/lib/python2.7/site-packages/django/__init__.py", line 22, in setup
[Wed Oct 03 17:30:37.021520 2018] [:error] [pid 2836] [remote x.x.x.x:y] configure_logging(settings.LOGGING_CONFIG, settings.LOGGING)
[Wed Oct 03 17:30:37.021533 2018] [:error] [pid 2836] [remote x.x.x.x:y] File "/usr/lib/esmond/lib/python2.7/site-packages/django/utils/log.py", line 75, in configure_logging
[Wed Oct 03 17:30:37.021558 2018] [:error] [pid 2836] [remote x.x.x.x:y] logging_config_func(logging_settings)
[Wed Oct 03 17:30:37.021572 2018] [:error] [pid 2836] [remote x.x.x.x:y] File "/usr/lib64/python2.7/logging/config.py", line 803, in dictConfig
[Wed Oct 03 17:30:37.021597 2018] [:error] [pid 2836] [remote x.x.x.x:y] dictConfigClass(config).configure()
[Wed Oct 03 17:30:37.021610 2018] [:error] [pid 2836] [remote x.x.x.x:y] File "/usr/lib64/python2.7/logging/config.py", line 585, in configure
[Wed Oct 03 17:30:37.021633 2018] [:error] [pid 2836] [remote x.x.x.x:y] '%r: %s' % (name, e))
[Wed Oct 03 17:30:37.021662 2018] [:error] [pid 2836] [remote x.x.x.x:y] ValueError: Unable to configure handler 'esmond_handler': [Errno 13] Permission denied: '/var/log/esmond/esmond.log'
[03/Oct/2018:17:35:42 +0000] x.x.x.x TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 "POST /esmond/perfsonar/archive/ HTTP/1.1" 527
x.x.x.x - - [03/Oct/2018:17:35:28 +0000] “POST /esmond/perfsonar/archive/ HTTP/1.1” 500 527
|