Skip to Content.
Sympa Menu

grouper-dev - grouper diagnostics

Subject: Grouper Developers Forum

List archive

grouper diagnostics


Chronological Thread 
  • From: Chris Hyzer <>
  • To: Grouper Dev <>
  • Subject: grouper diagnostics
  • Date: Mon, 17 May 2010 03:21:54 -0400
  • Accept-language: en-US
  • Acceptlanguage: en-US

I added a diagnostics URL to Grouper Ws.  If you have used the loader you may have had a job which didn’t run when it was supposed to (for whatever reason… crash, misconfigure, DB down, etc).  Well, now you hook up this diagnostics URL to a web monitoring software (e.g. nagios, big brother, bmc, or whatever) and you will know when a job didn’t run successfully.

 

https://bugs.internet2.edu/jira/browse/GRP-411

 

https://spaces.internet2.edu/display/GrouperWG/Grouper+diagnostics

 

Grouper diagnostics provides a URL on Grouper WS which will help to give the health of Grouper.  This can include memory in the WS server, connection to the Grouper Registry DB, that sources can perform queries, and that Grouper loader jobs are successfully executing.  If everything is ok, a 200 HTTP code will be returned, else 500.  A description of the issue will be returned as well.  The point is that this URL can be pointed to be web monitoring software like nagio, big brother, BMC, etc.

There is general information displayed on success as well, the server name, number of WS requests (since server started), the last error (if recent), etc

There isnt any sensitive information in these calls, but if you want to lock them down, do that in your servlet container or web server (or dont map the servlet in the WS web.xml).  You could restrict to your PC and nagios server source IP addresses for example.

Each test is configurable to restrict it (without causing an error) in the grouper-ws.properties.  If you want to customize the number of minutes since a SUCCESS should be detected in loader jobs, you can do that as well.  These settings are in the grouper-ws.properties

Note, there is a lot of intelligent caching here so that repeated hits do not do queries each time.

Sample configuration

#if ignore tests.  Note, in job names, invalid chars need to be replaced with underscore (e.g. colon)
#anything in this regex: [^a-zA-Z0-9._-]
ws.diagnostic.ignore.memoryTest = false
ws.diagnostic.ignore.dbTest_grouper = false
ws.diagnostic.ignore.source_jdbc = false
ws.diagnostic.ignore.loader_CHANGE_LOG_changeLogTempToChangeLog = false
 
#number of minute that can go by without a success before an error is thrown
ws.diagnostic.minutesSinceLastSuccess.loader_SQL_GROUP_LIST__aStem_aGroup2 = 60
 

Trivial option

Use this to do checks often, or when there is a cluster, you can use this on all nodes, and a deeper check on one node only

https://url.to.grouper.edu/grouperWs/status?diagnosticType=trivial

Note, this is a success, but since there was an error recently, it is displayed

Server: mchyzer-PC, grouperVersion: 1.6.0, up since: 2010/05/17 02:19, 0 requests
SUCCESS memoryTest: Allocating 100000 bytes to an array to make sure not out of memory (11ms elapsed)
 
 
Diagnostics errors since start: 3 (11ms elapsed)
Last diagnostics error date: 2010/05/17 02:23:27
Last diagnostics error message:
There was an error in the diagnostic task DiagnosticLoaderJobTest, Loader job CHANGE_LOG_changeLogTempToChangeLog
 
:Cant find a success since: 2010/05/17 01:38:50.000, expecting one in the last 30 minutes
java.lang.RuntimeException: Cant find a success since: 2010/05/17 01:38:50.000, expecting one in the last 30 minutes
        at edu.internet2.middleware.grouper.ws.status.DiagnosticLoaderJobTest.doTask(DiagnosticLoaderJobTest.java:103)
        at edu.internet2.middleware.grouper.ws.status.DiagnosticTask.executeTask(DiagnosticTask.java:44)
        at edu.internet2.middleware.grouper.ws.status.GrouperStatusServlet.doGet(GrouperStatusServlet.java:129)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
        at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:433)
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
        at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
        at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
        at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
        at java.lang.Thread.run(Thread.java:619)

DB option

This will do a lightweight query to the registry, and the memory query

https://url.to.grouper.edu/grouperWs/status?diagnosticType=db

Server: mchyzer-PC, grouperVersion: 1.6.0, up since: 2010/05/17 02:19, 0 requests
SUCCESS memoryTest: Allocating 100000 bytes to an array to make sure not out of memory (20ms elapsed)
SUCCESS dbTest_grouper: Retrieved object from database (28ms elapsed)
 
 
Diagnostics errors since start: 3 (28ms elapsed)

Subject sources

This will do a find by ID on all sources, and the DB test, and the memory test.  Note that the same sources.xml settings that configure the Grouper startup settings will apply here as well.  i.e. you can skip a source, or set the ID to search for.

https://url.to.grouper.edu/grouperWs/status?diagnosticType=sources

Server: mchyzer-PC, grouperVersion: 1.6.0, up since: 2010/05/17 02:19, 0 requests
SUCCESS memoryTest: Allocating 100000 bytes to an array to make sure not out of memory (37ms elapsed)
SUCCESS dbTest_grouper: Retrieved object from database (40ms elapsed)
SUCCESS source_g:gsa: Searched for subject by id: grouperTestSubjectByIdOnStartupASDFGHJ (42ms elapsed)
SUCCESS source_jdbc: Searched for subject by id: grouperTestSubjectByIdOnStartupASDFGHJ (45ms elapsed)
SUCCESS source_g:isa: Searched for subject by id: grouperTestSubjectByIdOnStartupASDFGHJ (45ms elapsed)
 
 
Diagnostics errors since start: 3 (45ms elapsed)

Loader jobs

This will test all loader jobs (for a success within a certain threshold),  do a find by ID on all sources, and the DB test, and the memory test.  By default all loader jobs will look for a success within the last 25 hours.  The exception is change log jobs which look for a success within the last 30 minutes.  This is configurable in the grouper-ws.properties

https://url.to.grouper.edu/grouperWs/status?diagnosticType=all

Server: mchyzer-PC, grouperVersion: 1.6.0, up since: 2010/05/17 02:45, 0 requests
SUCCESS memoryTest: Allocating 100000 bytes to an array to make sure not out of memory (6055ms elapsed)
SUCCESS dbTest_grouper: Retrieved object from database (6076ms elapsed)
SUCCESS source_g:gsa: Searched for subject by id: grouperTestSubjectByIdOnStartupASDFGHJ (6077ms elapsed)
SUCCESS source_jdbc: Searched for subject by id: grouperTestSubjectByIdOnStartupASDFGHJ (6091ms elapsed)
SUCCESS source_g:isa: Searched for subject by id: grouperTestSubjectByIdOnStartupASDFGHJ (6091ms elapsed)
SUCCESS loader_CHANGE_LOG_changeLogTempToChangeLog: Loader job CHANGE_LOG_changeLogTempToChangeLog ignored in config (6091ms elapsed)
SUCCESS loader_MAINTENANCE__grouperReport: Loader job MAINTENANCE__grouperReport ignored in config (6091ms elapsed)
SUCCESS loader_MAINTENANCE_cleanLogs: Found the most recent success: 2010/05/17 02:39:00.000, expecting one in the last 1500 minutes (6122ms elapsed)
SUCCESS loader_CHANGE_LOG_consumer_chrisTest: Loader job CHANGE_LOG_consumer_chrisTest ignored in config (6122ms elapsed)
SUCCESS loader_CHANGE_LOG_consumer_chrisTest: Loader job CHANGE_LOG_consumer_chrisTest ignored in config (6122ms elapsed)
SUCCESS loader_CHANGE_LOG_consumer_xmpp: Loader job CHANGE_LOG_consumer_xmpp ignored in config (6122ms elapsed)
SUCCESS loader_CHANGE_LOG_consumer_xmpp: Loader job CHANGE_LOG_consumer_xmpp ignored in config (6122ms elapsed)
SUCCESS loader_SQL_GROUP_LIST__aStem:aGroup2__f74068fd47124b079ea0c750354f6935: Found the most recent success: 2010/05/17 02:39:00.000, expecting one in the last 1500 minutes (6125ms elapsed)
SUCCESS loader_SQL_SIMPLE__aStem:aGroup__a186d80e0fe946b78dba45d16a2a1be7: Found the most recent success: 2010/05/17 02:39:00.000, expecting one in the last 1500 minutes (6132ms elapsed)
SUCCESS loader_ATTR_SQL_SIMPLE__penn:community:employee:orgPermissions:orgs__a8c2933dd66945af9755372efa9141b5: Found the most recent success: 2010/05/17 02:39:00.000, expecting one in the last 1500 minutes (6135ms elapsed)
 
 
Diagnostics errors since start: 0 (6135ms elapsed)

Here is an example of an error

 HTTP Status 500 -
 
type Exception report
 
message
 
description The server encountered an internal error () that prevented it from fulfilling this request.
 
exception
 
java.lang.RuntimeException:
There was an error in the diagnostic task DiagnosticLoaderJobTest, Loader job CHANGE_LOG_changeLogTempToChangeLog
 
:Cant find a success since: 2010/05/17 01:38:50.000, expecting one in the last 30 minutes
        edu.internet2.middleware.grouper.ws.status.GrouperStatusServlet.doGet(GrouperStatusServlet.java:191)
        javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
        javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
 
root cause
 
java.lang.RuntimeException: Cant find a success since: 2010/05/17 01:38:50.000, expecting one in the last 30 minutes
        edu.internet2.middleware.grouper.ws.status.DiagnosticLoaderJobTest.doTask(DiagnosticLoaderJobTest.java:103)
        edu.internet2.middleware.grouper.ws.status.DiagnosticTask.executeTask(DiagnosticTask.java:44)
        edu.internet2.middleware.grouper.ws.status.GrouperStatusServlet.doGet(GrouperStatusServlet.java:129)
        javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
        javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
 
note The full stack trace of the root cause is available in the Apache Tomcat/6.0.20 logs.

sda

 



  • grouper diagnostics, Chris Hyzer, 05/17/2010

Archive powered by MHonArc 2.6.16.

Top of Page