EditAttachPrintable
r10 - 2015-06-29 - 19:30:53 - GeraldMitchellYou are here: TWiki >  Deployment Web > DeploymentMonitoring > CLMServerMonitoring > CLMServerMonitoringPlugin

new.png CLM Server Monitoring plug-in

constantchange.png Authors: MichaelTabb JorgeAlbertoDiaz BorisKuschel GeraldMitchell
Build basis: Collaborative Lifecycle Management 4.0.6 and higher

Overview

The CLM Server Monitoring component of the Jazz Foundation Server is an Aspect Oriented Programming (AOP) based piece of software that enables instrumentation of existing foundation server code. The goal of CLM Server Monitoring is to be able to identify performance or hot spot areas in a running server. This will help aid customers, support, and developers alike. The server side plug-in will be bundled and shipped with the foundation server. This section will provide overall design and guidance of the CLM Server Monitoring plug-in.

Technology

CLM Server Monitoring uses a combination of AspectJ and Java MBeans.

AspectJ - AspectJ is used to provide pointcuts and advice to insert itself into the call stack of a normal program flow. The various pointcuts and advice are primarily used to insert itself before and after measurable aspects of server behavior. For example we may want to measure how long a SQL call takes. To accomplish that, an advice would be placed before and after that sql call. As the before advice fires, the clock starts, when the after advice fires, the clock is stopped and the metric is rolled up into a statistical collector to be used for later computation.

Java MBean - Many of the monitors and various other components of the CLM Server Monitoring are exposed as Java MBeans. Enables a multitude of clients to harvest the wealth of information or to remotely invoke an operation.

Monitors

A monitor meters and tracks activity inside its given domain. There are a few interesting areas that have been identified which could useful performance metrics in a running server. All monitors are registered as managed beans (mbeans) in the java bean server. This allows for remote management of a java interface. There are 2 elements which make up a monitor and its mbean:

  • Configuration - Provides the monitors with options which helps it meter its domain, and some light problem detection abilites
  • Counters - A rollup of the activity that has been metered inside its domain. This may differ from monitor to monitor depending on what is being watched. At a minimum a monitor should be able to provide things like max,min, and avg functionality.

Each monitor has the ability to breakout statistics into separate rollups per unique instance. For example in a request monitor, a user may want to see statistics for the /scr and /rootservices URLs separately. This is possible by enabling the RollupEnabled property on the monitors. To view the monitor data or to change the monitor attributes, you can use a tool call JConsole. JConsole is a standalone GUI that comes with a JDK. The JMX server must be configured to allow remote connections. See the How-tos section below.

Once the JMX server is running, JConsole can be used to connect and introspect the available beans. This is what the monitoring beans look like through JConsole:

Monitors_Jconsole.png

The data is presented in a tree structure. The nodes can be expanded and traversed to find the attributes and their values as defined for each of the following monitors:

HTTP server request monitor

This monitor tracks incoming HTTP requests to the server.

  • Object Name: type=Request
  • Management interface: com.ibm.team.server.monitoring.management.request.RequestManagedMonitorMBean
  • Attributes:
    Name Type Description
    MinimumRequestSize Long The minimum size of the request payload. This only applies to requests that allow it such as put or post. This is computed using all historical requests while the server is up
    MaximumRequestSize Long The maximum size of the request payload. This only applies to requests that allow it such as put or post. This is computed using all historical requests while the server is up
    AverageRequestSize Long The average request size. This is computed using the total number of requests with a payload
    Count Long The total number of requests
    Duration Long The total time spent handling all requests
    Enabled Boolean Whether or not the monitor should be enabled. This allows this domain to be actively tracked
    MaximumResponseTime Long The longest time it took the server to handle a request
    MinimumResponseTime Long The shortest time it tool the server to handle a request
    AverageResponseTime Long The average time it took the server to handle a request
    Name String The name of the monitor
    RollupEnabled Boolean Whether or not to allow rollups to be calculated on a per unique request
    Threshold Long If a requests duration exceeds this value, an issue is raised
    TimeSinceLastReset Long The amount of time that has passed since the counters for this monitor have been reset

  • Operations:
    Name Parameters Description
    dismissAllProblems   Dismisses all problems that are of a request nature
    reset   Resets all possible counters in the above mentioned attribute table

Service monitor

Monitors calls made to 0.6 SDK-style services. These are services that have been declared within the SDK framework.

  • Object Name: type=Service
  • Management interface: com.ibm.team.server.monitoring.management.ResourceMonitorMBean
  • Attributes:
    Name Type Description
    Count Long The total number of service calls
    Duration Long The total time spent handling all service calls
    Enabled Boolean Whenther or not the monitor should be enabled. This alllows this domain to be actively tracked
    MaximumResponseTime Long The longest time it took the server to handle a service call
    MinimumResponseTime Long The shortest time it tool the server to handle a service call
    AverageResponseTime Long The average time it took the server to handle a service call
    Name String The name of the monitor
    RollupEnabled Boolean Whether or not to allow rollups to be calculated on a per unique service call
    Threshold Long If a service call duration exceeds this value, an issue is raised
    TimeSinceLastReset Long The amount of time that has passed since the counters for this monitor have been reset

  • Operations:
    Name Parameters Description
    dismissAllProblems   Dismisses all problems that are of a service nature
    reset   Resets all possible counters in the above mentioned attribute table

Transactional cache

Monitors all activity for the transactional cache subsystem that is included in the 0.6 SDK.

  • Object Name: type=Cache
  • Management interface: com.ibm.team.server.monitoring.management.cache.CacheManagedMonitorMBean
  • Attributes:
    Name Type Description
    AutoUpdated Long The number of times the cache has been autoupdated
    CacheAdded Long The number of times an object has been added to cache
    CacheHit Long The number of times the cache has had a successful hit
    CacheInvalidated Long The number of times the cache has invalidated an entry
    CacheMethodStats Tabular Statistics for the various methods used on the cache API
    CacheMiss Long The number of times the cache had a miss
    Count Long The total number of service calls
    Duration Long The total time spent handling all service calls
    Enabled Boolean Whether or not the monitor should be enabled. This allows this domain to be actively tracked
    MaximumResponseTime Long The longest time it took the server to handle a service call
    MinimumResponseTime Long The shortest time it tool the server to handle a service call
    AverageResponseTime Long The average time it took the server to handle a service call
    Name String The name of the monitor
    RollupEnabled Boolean Whether or not to allow rollups to be calculated on a per unique service call
    Threshold Long If a service call duration exceeds this value, an issue is raised
    TimeSinceLastReset Long The amount of time that has passed since the counters for this monitor have been reset

  • Operations:
    Name Parameters Description
    dismissAllProblems   Dismisses all problems that are of a service nature
    reset   Resets all possible counters in the above mentioned attribute table

SPARQL monitor

Monitors SPARQL query execution.

  • Object Name: type=SPARQL
  • Management interface: com.ibm.team.server.monitoring.management.ResourceMonitorMBean
  • Attributes:
    Name Type Description
    Count Long The total number of SPARQL query executions
    Duration Long The total time spent handling all SPARQL query executions
    Enabled Boolean Whenther or not the monitor should be enabled. This alllows this domain to be actively tracked
    MaximumResponseTime Long The longest time it took the server to handle a SPARQL query execution
    MinimumResponseTime Long The shortest time it tool the server to handle a SPARQL query execution
    AverageResponseTime Long The average time it took the server to handle a SPARQL query execution
    Name String The name of the monitor
    RollupEnabled Boolean Whether or not to allow rollups to be calculated on a per unique SPARQL query
    Threshold Long If a SPARQL query execution duration exceeds this value, an issue is raised
    TimeSinceLastReset Long The amount of time that has passed since the counters for this monitor have been reset

  • Operations:
    Name Parameters Description
    dismissAllProblems   Dismisses all problems that are of a SPARQL query nature
    reset   Resets all possible counters in the above mentioned attribute table

SQL monitor

Monitors all SQL statement activity.

  • Object Name: type=DB Statement
  • Management interface: com.ibm.team.server.monitoring.management.ResourceMonitorMBean
  • Attributes:
    Name Type Description
    Count Long The total number of SQL statement executions
    Duration Long The total time spent handling all SQL statement executions
    Enabled Boolean Whether or not the monitor should be enabled. This allows this domain to be actively tracked
    MaximumResponseTime Long The longest time it took the server to handle a SQL statement execution
    MinimumResponseTime Long The shortest time it tool the server to handle a SQL statement execution
    AverageResponseTime Long The average time it took the server to handle a SQL statement execution
    Name String The name of the monitor
    RollupEnabled Boolean Whether or not to allow rollups to be calculated on a per unique SQL query statement
    Threshold Long If a SQL query execution duration exceeds this value, an issue is raised
    TimeSinceLastReset Long The amount of time that has passed since the counters for this monitor have been reset

  • Operations:
    Name Parameters Description
    dismissAllProblems   Dismisses all problems that are of a SQL query statement nature
    reset   Resets all possible counters in the above mentioned attribute table

HTTP client request

Monitors calls made from an Apache HTTP client. Usually these calls are for server-to-server communication.

  • Object Name: type=HTTP Client Request
  • Management interface: com.ibm.team.server.monitoring.management.ResourceMonitorMBean
  • Attributes:
    Name Type Description
    Count Long The total number of HTTP client requests
    Duration Long The total time spent handling all HTTP client requests
    Enabled Boolean Whether or not the monitor should be enabled. This allows this domain to be actively tracked
    MaximumResponseTime Long The longest time it took the server to handle an HTTP client request
    MinimumResponseTime Long The shortest time it tool the server to handle an HTTP client request
    AverageResponseTime Long The average time it took the server to handle an HTTP client request
    Name String The name of the monitor
    RollupEnabled Boolean Whether or not to allow rollups to be calculated on a per unique HTTP client request URL
    Threshold Long If an HTTP client request duration exceeds this value, an issue is raised
    TimeSinceLastReset Long The amount of time that has passed since the counters for this monitor have been reset

  • Operations:
    Name Parameters Description
    dismissAllProblems   Dismisses all problems that are of an HTTP client request nature
    reset   Resets all possible counters in the above mentioned attribute table

Monitoring configuration

This MBean contains configuration information for the entire CLM Server Monitoring feature.

  • Object Name: type=Monitoring,name=Configuration
  • Management interface: com.ibm.team.server.monitoring.jmx.beans.SmarterServerConfigurationMBean
  • Attributes:
    Name Type Description
    AllMonitorsEnabled Boolean Acts as a one stop monitor enablement. Making this true turns on all monitors, regardless if the individual monitor is disabled
    CacheDiskStoreDirectory String A directory to store cached monitoring artifacts
    CacheMaxBytesLocalDisk String The maximum amount of disk space allocated for the cache
    CacheMaxEntriesLocalHeap Long The maximum amount of objects to keep in the Java heap
    CachingEnabled Boolean Whether or not to enable caching (should be deprecated. Is never used)
    CallTreeBufferSize Long The amount of heap to allocate per call tree buffer. This may not be necessary and should be set very low. The buffer is variable, but starts out with this initial value. Making it too big could use too much memory
    CallTreeCollectionEnabled Boolean Whether or not to collect a call tree Not Supported in production environments
    DiagnosticsEnabled Boolean Exposes server side diagnostics that come with a Jazz server base
    MaxProblemsToKeep Long The total number of problems to keep in memory before they get scrubbed
    ProblemCacheMaxBytesLocalDisk Long The max number of bytes to allocate on disk for the problem cache
    ProblemCacheMaxEntriesLocalHeap Long Max number of problems to keep on JVM heap
    ProblemScrubberInterval Long The interval to run the problem scrubber to remove excessive problems that go over the MaxProblemsToKeep threshold
    ProblemScrubberStrategy String Can either be lifo or fifo. This determines which problems to get rid of first when scrubbing
    RegisterActivityWithSession Boolean Associates a session object with an activity. The session bean will be registered on its own and assimilates itself with an HttpSession Not Supported in production environments

CLM Server Monitoring rules

Rules are inputted differently depending on version. 4.0.6 behaves differently than 5.0 and later.

CLM Server Monitoring rules 5.0+

The issue rules are defined in an xml document and sent to the server to install into the rules engine. This provides the engine a way to evaluate resources as they are executing. A rule is comprised of the target resource information (type and name) followed by 1..n number of action elements. The action elements contain the logic which tests or evaluates the resource. See adding/removing rule section

Rule Schema

<?xml version="1.0" encoding="UTF-8" ?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="ruleSet">
   <xsd:complexType>
      <xsd:sequence>
         <xsd:element name="rule" maxOccurs="unbounded" minOccurs="0">
            <xsd:complexType> 
                   <xsd:sequence>
                     <xsd:element name="action" maxOccurs="unbounded">
                        <xsd:complexType>                             
                           <xsd:choice>
                           <xsd:element name="and" maxOccurs="1" minOccurs="0" type="logicType"/>
                           <xsd:element name="or" maxOccurs="1" minOccurs="0" type="logicType"/>
                           <xsd:element name="not" maxOccurs="1" minOccurs="0" type="logicType"/>
                           <xsd:element name="test" maxOccurs="1" minOccurs="0" type="Test"/>
                        </xsd:choice>
                           <xsd:attribute name="name" use="required" type="xsd:string"/>
                        </xsd:complexType>
                     </xsd:element>
                   </xsd:sequence>
                   <xsd:attribute name="resourceType" type="xsd:string" use="required"/>
                   <xsd:attribute name="resourceName" type="xsd:string" use="optional"/>
                   <xsd:attribute name="id" type="xsd:string" use="optional"/>
            </xsd:complexType>
         </xsd:element>
      </xsd:sequence>
   </xsd:complexType>
</xsd:element>
<xsd:complexType name="Test">
   <xsd:attribute name="operator" type="opTypes" use="required"/>
   <xsd:attribute name="attributePath" type="xsd:string" use="required"/>
   <xsd:attribute name="value" type="xsd:string" use="required"/>
</xsd:complexType>
<xsd:simpleType name="opTypes">
    <xsd:restriction base="xsd:string">
        <xsd:enumeration value="EQ" />
        <xsd:enumeration value="LT" />
        <xsd:enumeration value="GT" />
        <xsd:enumeration value="IN" />
   <xsd:enumeration value="NEQ" />
   <xsd:enumeration value="STARTSWITH" />
   <xsd:enumeration value="CONTAINS" />
    </xsd:restriction>
</xsd:simpleType>
<xsd:complexType name="logicType">
   <xsd:sequence>
      <xsd:element name="not" maxOccurs="unbounded" minOccurs="0" type="logicType"/>
      <xsd:element name="and" maxOccurs="unbounded" minOccurs="0" type="logicType"/>
      <xsd:element name="or" maxOccurs="unbounded" minOccurs="0" type="logicType"/>
      <xsd:element name="test" maxOccurs="unbounded" minOccurs="0" type="Test"/>
   </xsd:sequence>
</xsd:complexType>
</xsd:schema> 

Examples

Example simple rule:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ruleSet>
    <rule resourceType="request" resourceName="/scr">
        <action name="problem">
               <test operator="GT" value="10000" attributePath="duration"/>
        </action>
    </rule>
</ruleSet>

Example or condition rule:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ruleSet>
    <rule resourceType="request" resourceName="/scr">
        <action name="problem">
             <or>
                <test operator="EQ" value="someValue" attributePath="someAttribute"/>
                <test operator="GT" value="10000" attributePath="duration"/>
             </or>
        </action>
    </rule> 
</ruleSet> 

Example and condition rule:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ruleSet>
    <rule resourceType="request" resourceName="/scr">
        <action name="problem">
             <and>
                <test operator="EQ" value="someValue" attributePath="someAttribute"/>
                <test operator="GT" value="10000" attributePath="duration"/>
             </and>
        </action>
    </rule> 
</ruleSet> 

Example mixed group condition and ungrouped tests

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ruleSet>
    <rule resourceType="request" resourceName="/scr">
        <action name="problem">
             <or>
                <test operator="EQ" value="someValue" attributePath="someAttribute"/>
                <test operator="GT" value="10000" attributePath="duration"/>
             </or>
             <test operator="EQ" value="foo" attributePath="another.attribute"/>
        </action>
    </rule> 
</ruleSet> 

Example nested conditions:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ruleSet>
    <rule resourceType="service">
        <action name="problem">
            <or>
                <and>
                    <not>
                        <test attributePath="name" value="com.ibm.team.repository.service.diagnostics.database.internal" operator="STARTSWITH"/>
                    </not>
                    <test attributePath="duration" value="10000" operator="GT"/>
                </and>
                <test attributePath="duration" value="70000" operator="GT"/>
            </or>
        </action>
        <action name="javacore">
            <test attributePath="duration" value="100000" operator="GT"/>
        </action>
    </rule>
</ruleSet>

Adding/Removing Rule

There are 2 ways to add/remove rules:

  1. Edit the issueRules.xml file - Provisional
  2. Use the SmarterServerConfigurationMBean:

  • Object Name: type=Monitoring,name=Rules,rulesType=Issue
  • Management interface: com.ibm.team.server.monitoring.jmx.beans.ProblemRulesAdminMBean
  • Attributes: None

  • Operations:
    Name Parameters Description
    installRuleSet String ruleXml, boolean persist The rule set will be installed and replace the current one is it exists. The persist parameter is used to write to the issueRules.xml file
    listRules String List of rules running in server

CLM Server Monitoring rules 4.0.6

CLM Server Monitoring needs a more robust way to identify performance issues. In 4.0.5, each monitor had its own Threshold attribute which was a one size fits all value that encompassed all resources that the monitor was watching. For example the request monitor can have a threshold value 1000ms. This means that any requests with a duration > 1000ms, an issue was raised. This does not scale very well, as some operations are expected to take longer than others. This goes for any of the monitors. Some things just take longer and flagging an issue is premature. In 4.0.6 problem detection rules were put into place. This topic will describe what they are, and how to use them.

Currently, in 4.0.6, the rules are properties-based, and are stored in the same smarterserver.properties file as all other properties. This is provisional and is likely to change. See the Adding or removing rules section.

Syntax

[context/namspace].[groupName: optional].[resource type].[attribute].[operator]=[value]

context/namespace - To avoid collisions in the future, with other possible rule types, this namespace uniquely identifies the rule type.
groupName - The name of the group this rule belongs to.
resouceType - The type of resource to test. There are a few in the system. Some examples are Request,Service,Http_Client_Request,Cache,etc...
attribute - The name of the attribute to test.
operator - Equality and relational operator. The only possible values are <, >, and =

A few basic rules about this syntax:

  1. Any rule missing a group name will be grouped into the DEFAULT group
  2. A rule group rules together using the logical AND operator. This means all rules must evaluate to true
  3. When evaluating multiple groups, they are bound together using the logical OR operator. This means only one set of the groups has to evaluate to true in order to have the compound statement to evaluate to true
  4. The monitor already has a built in duration for the resource. This rule is always included as a rule in the default group. Using a rule such as the following does not make much sense, but is acceptable:

problem.service.duration.gt=10000

The threshold on the monitor should be used.

Rule groups

Rules can be grouped together in an AND ordering. This means all rules in the group must evaluate to true for the group to return a true evaluation result. The use of groups now allows you to evaluate items with different conditions and allows users to gain a more finer grain control over problem detection. All that is needed in addition to the simple syntax is a group name.

An example to set the threshold for a specific service (FooService):

problem.mygroup.service.duration.gt=20000
problem.mygroup.service.servicename.eq=FooService

This rule states that a problem will be flagged on a service named FooService if the duration of the call took > 20000ms. This now supercedes the default threshold defined by the monitor. Even if the monitor has an overall threshold of say 10000ms, it will not apply here.

Adding or removing rules

There are two ways to add or remove rules:

  1. Edit the smarterserver.properties - Provisional
  2. Use the SmarterServerConfigurationMBean:

  • Object Name: type=Monitoring,name=Rules,rulesType=Issue
  • Management interface: com.ibm.team.server.monitoring.jmx.beans.ProblemRulesAdminMBean
  • Attributes: None

  • Operations:
    Name Parameters Description
    addRule String key, String value, boolean persist This rule will be dynamicall installed and effective immediately
    removeRule String key Removes a rule if it exists
    listRules String List of rules running in server

Multivalue rules

Sometimes there might be multiple values that are valid for a particular attribute. Using the service resource as an example, a problem may want to be flagged on both FooService and BarService if its duration > 30000ms. A few properties can be defined as follows:

problem.mygroup.service.duration.gt=30000
problem.mygroup.service.servicename.eq=FooService,BarService

The value for the attribute key must be a comma separated string. Under the covers, the service name gets grouped into an or expression, overall an and expression is used to group the duration + the or expression for the service name.

Actions

Sometimes an action must be taken when certain conditions are met. The problem rule vocabulary has been expanded to accommodate this. It is used in conjunction with an attribute value test.

Take for example a test on a request duration for > 2000ms:

problem.group1.request.duration.gt=2000

This makes any request that takes longer than 2000ms a problem. It may be desirable to perform an action when the duration is say > 10000ms. To do this the original duration test must be extended. Using the example above, the new rule definition will look like:

problem.group1.request.duration.gt=2000;action.gt=10000:<some action>

In plain English, this says "flag the request as a problem if it exceeds a 2000ms threshold." Additionally take some action when the duration has exceeded a 10000ms threshold.

The number actions available is limited. Currently, there are only 2 actions available:

  • heap - Performs a Java heap dump
  • core - Performs a core dump
  • javacore - Performs a Java core dump

If I wanted to force a core dump when a request exceeds 10000ms, I could write the following rule:

problem.group1.request.duration.gt=2000;action.gt=10000:core

CLM Server Monitoring plug-in: How-tos

Wip Section under construction
This section links to information that is specific to components of the CLM Server Monitoring plug-in. On the pages, you will find specific installation, configuration, and usage information. For an overview of the CLM Server Monitoring architecture and components, see the information in the CLM server monitoring architecture section.

CLM Server Monitoring data dictionary

The CLM Server Monitoring plug-in exposes performance monitoring information and management operations via its JMX interface. A data dictionary describes the JMX interface in detail and how the different generated JMX objects relate to each other and to the resources monitoring domains. The CLM Server Monitoring data dictionary wiki page contains such detailed information.

Related topics: None

External links:

  • None

Additional contributors: None

Edit | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r14 | r12 < r11 < r10 < r9 | More topic actions...
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Contributions are governed by our Terms of Use. Please read the following disclaimer.
Ideas, requests, problems regarding the Deployment wiki? Create a new task in the RTC Deployment wiki project