r1 - 2015-04-28 - 15:14:51 - GeraldMitchellYou are here: TWiki >  Deployment Web > DeploymentMonitoring > CLMServerMonitoring > CLMServerMonitoringPlugin > CLMServerMonitoringRulesEngine

new.png CLM Server Monitoring Rules Engine

constantchange.png Authors: MichaelTabb JorgeAlbertoDiaz BorisKuschel GeraldMitchell
Build basis: Collaborative Lifecycle Management 4.0.6 and higher

Overview

The CLM Server Monitoring component of the Jazz Foundation Server is an Aspect Oriented Programming (AOP) based piece of software that enables instrumentation of existing foundation server code. The goal of CLM Server Monitoring is to be able to identify performance or hot spot areas in a running server. This will help aid customers, support, and developers alike. The server side plug-in will be bundled and shipped with the foundation server. This section will provide overall design and guidance of the CLM Server Monitoring plug-in.

Rules are inputted differently depending on version, in that CLM 4.0.6 behaves differently than 5.0 and later.

  • There is a basic threshold mechanism that can be used for basic operation of a timed event occurrence.
  • In CLM 4.0.6 and later, a textual rules set was used, which was restrictive in capability.
  • In CLM 5.0.0 and later, a XML rules set is used, which allows for more complex and flexible rules.

It is recommended that for 5.0.0 and later that the XML set is used, for the best results.

Basic Rule - Threshold, Enabled

Each monitor had its own Threshold attribute, which is a one size fits all value that encompassed all resources that the monitor is watching.

For example, the request monitor can have a threshold value of 1000ms. This means that any requests with a duration > 1000ms, an issue is raised.

This does not scale very well, as some operations are expected to take longer than others. (This goes for any of the monitors. ) in these cases the rules are needed because flagging an issue on a general threshold is premature.

It is best to only use the threshold as a broad test of monitoring being enabled, and use as the threshold as a normally very large value to show what may need additional specialty rules.

Each monitor can also be enabled or disabled.

CLM Server Monitoring Rules (CLM 5.0+)

The issue rules are defined in an xml document and sent to the server to install into the rules engine. This provides the engine a way to evaluate resources as they are executing. A rule is comprised of the target resource information (type and name) followed by 1..n number of action elements. The action elements contain the logic which tests or evaluates the resource. See adding/removing rule section

Rule Schema

<?xml version="1.0" encoding="UTF-8" ?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="ruleSet">
   <xsd:complexType>
      <xsd:sequence>
         <xsd:element name="rule" maxOccurs="unbounded" minOccurs="0">
            <xsd:complexType> 
                   <xsd:sequence>
                     <xsd:element name="action" maxOccurs="unbounded">
                        <xsd:complexType>                             
                           <xsd:choice>
                           <xsd:element name="and" maxOccurs="1" minOccurs="0" type="logicType"/>
                           <xsd:element name="or" maxOccurs="1" minOccurs="0" type="logicType"/>
                           <xsd:element name="not" maxOccurs="1" minOccurs="0" type="logicType"/>
                           <xsd:element name="test" maxOccurs="1" minOccurs="0" type="Test"/>
                        </xsd:choice>
                           <xsd:attribute name="name" use="required" type="xsd:string"/>
                        </xsd:complexType>
                     </xsd:element>
                   </xsd:sequence>
                   <xsd:attribute name="resourceType" type="xsd:string" use="required"/>
                   <xsd:attribute name="resourceName" type="xsd:string" use="optional"/>
                   <xsd:attribute name="id" type="xsd:string" use="optional"/>
            </xsd:complexType>
         </xsd:element>
      </xsd:sequence>
   </xsd:complexType>
</xsd:element>
<xsd:complexType name="Test">
   <xsd:attribute name="operator" type="opTypes" use="required"/>
   <xsd:attribute name="attributePath" type="xsd:string" use="required"/>
   <xsd:attribute name="value" type="xsd:string" use="required"/>
</xsd:complexType>
<xsd:simpleType name="opTypes">
    <xsd:restriction base="xsd:string">
        <xsd:enumeration value="EQ" />
        <xsd:enumeration value="LT" />
        <xsd:enumeration value="GT" />
        <xsd:enumeration value="IN" />
   <xsd:enumeration value="NEQ" />
   <xsd:enumeration value="STARTSWITH" />
   <xsd:enumeration value="CONTAINS" />
    </xsd:restriction>
</xsd:simpleType>
<xsd:complexType name="logicType">
   <xsd:sequence>
      <xsd:element name="not" maxOccurs="unbounded" minOccurs="0" type="logicType"/>
      <xsd:element name="and" maxOccurs="unbounded" minOccurs="0" type="logicType"/>
      <xsd:element name="or" maxOccurs="unbounded" minOccurs="0" type="logicType"/>
      <xsd:element name="test" maxOccurs="unbounded" minOccurs="0" type="Test"/>
   </xsd:sequence>
</xsd:complexType>
</xsd:schema> 

Examples

Example simple rule:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ruleSet>
    <rule resourceType="request" resourceName="/scr">
        <action name="problem">
               <test operator="GT" value="10000" attributePath="duration"/>
        </action>
    </rule>
</ruleSet>

Example "or" condition rule:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ruleSet>
    <rule resourceType="request" resourceName="/scr">
        <action name="problem">
             <or>
                <test operator="EQ" value="someValue" attributePath="someAttribute"/>
                <test operator="GT" value="10000" attributePath="duration"/>
             </or>
        </action>
    </rule> 
</ruleSet> 

Example "and" condition rule:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ruleSet>
    <rule resourceType="request" resourceName="/scr">
        <action name="problem">
             <and>
                <test operator="EQ" value="someValue" attributePath="someAttribute"/>
                <test operator="GT" value="10000" attributePath="duration"/>
             </and>
        </action>
    </rule> 
</ruleSet> 

Example mixed group condition and ungrouped tests

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ruleSet>
    <rule resourceType="request" resourceName="/scr">
        <action name="problem">
             <or>
                <test operator="EQ" value="someValue" attributePath="someAttribute"/>
                <test operator="GT" value="10000" attributePath="duration"/>
             </or>
             <test operator="EQ" value="foo" attributePath="another.attribute"/>
        </action>
    </rule> 
</ruleSet> 

Example nested conditions:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ruleSet>
    <rule resourceType="service">
        <action name="problem">
            <or>
                <and>
                    <not>
                        <test attributePath="name" value="com.ibm.team.repository.service.diagnostics.database.internal" operator="STARTSWITH"/>
                    </not>
                    <test attributePath="duration" value="10000" operator="GT"/>
                </and>
                <test attributePath="duration" value="70000" operator="GT"/>
            </or>
        </action>
        <action name="javacore">
            <test attributePath="duration" value="100000" operator="GT"/>
        </action>
    </rule>
</ruleSet>

Adding/Removing Rules

There are 2 ways to add/remove rules:

  1. Edit the issueRules.xml file - Provisional
  2. Use the SmarterServerConfigurationMBean:

  • Object Name: type=Monitoring,name=Rules,rulesType=Issue
  • Management interface: com.ibm.team.server.monitoring.jmx.beans.ProblemRulesAdminMBean
  • Attributes: None

  • Operations:
    Name Parameters Description
    installRuleSet String ruleXml, boolean persist The rule set will be installed and replace the current one is it exists. The persist parameter is used to write to the issueRules.xml file
    listRules String List of rules running in server

Legacy textual CLM Server Monitoring rules (CLM 4.0.6&4.0.7) reference

NOTE: this section is left as a reference only section for those on CLM 4.0.6 or 4.0.7 versions. Use the XML rules for CLM 5.0.0 and higher.

CLM Server Monitoring needs a more robust way to identify performance issues. In CLM 4.0.6 problem detection rules were put into place where the rules are properties-based, and are stored in the same smarterserver.properties file as all other properties. See the Adding or removing rules section.

Syntax

[context/namspace].[groupName: optional].[resource type].[attribute].[operator]=[value]

context/namespace - To avoid collisions in the future, with other possible rule types, this namespace uniquely identifies the rule type.
groupName - The name of the group this rule belongs to.
resouceType - The type of resource to test. There are a few in the system. Some examples are Request,Service,Http_Client_Request,Cache,etc...
attribute - The name of the attribute to test.
operator - Equality and relational operator. The only possible values are <, >, and =

A few basic rules about this syntax:

  1. Any rule missing a group name will be grouped into the DEFAULT group
  2. A rule group rules together using the logical AND operator. This means all rules must evaluate to true
  3. When evaluating multiple groups, they are bound together using the logical OR operator. This means only one set of the groups has to evaluate to true in order to have the compound statement to evaluate to true
  4. The monitor already has a built in duration for the resource. This rule is always included as a rule in the default group. Using a rule such as the following does not make much sense, but is acceptable:

problem.service.duration.gt=10000

The threshold on the monitor should be used.

Rule groups

Rules can be grouped together in an AND ordering. This means all rules in the group must evaluate to true for the group to return a true evaluation result. The use of groups now allows you to evaluate items with different conditions and allows users to gain a more finer grain control over problem detection. All that is needed in addition to the simple syntax is a group name.

An example to set the threshold for a specific service (FooService):

problem.mygroup.service.duration.gt=20000
problem.mygroup.service.servicename.eq=FooService

This rule states that a problem will be flagged on a service named FooService if the duration of the call took > 20000ms. This now supercedes the default threshold defined by the monitor. Even if the monitor has an overall threshold of say 10000ms, it will not apply here.

Adding or removing rules

There are two ways to add or remove rules:

  1. Edit the smarterserver.properties - Provisional
  2. Use the SmarterServerConfigurationMBean:

  • Object Name: type=Monitoring,name=Rules,rulesType=Issue
  • Management interface: com.ibm.team.server.monitoring.jmx.beans.ProblemRulesAdminMBean
  • Attributes: None

  • Operations:
    Name Parameters Description
    addRule String key, String value, boolean persist This rule will be dynamicall installed and effective immediately
    removeRule String key Removes a rule if it exists
    listRules String List of rules running in server

Multivalue rules

Sometimes there might be multiple values that are valid for a particular attribute. Using the service resource as an example, a problem may want to be flagged on both FooService and BarService if its duration > 30000ms. A few properties can be defined as follows:

problem.mygroup.service.duration.gt=30000
problem.mygroup.service.servicename.eq=FooService,BarService

The value for the attribute key must be a comma separated string. Under the covers, the service name gets grouped into an or expression, overall an and expression is used to group the duration + the or expression for the service name.

Actions

Sometimes an action must be taken when certain conditions are met. The problem rule vocabulary has been expanded to accommodate this. It is used in conjunction with an attribute value test.

Take for example a test on a request duration for > 2000ms:

problem.group1.request.duration.gt=2000

This makes any request that takes longer than 2000ms a problem. It may be desirable to perform an action when the duration is say > 10000ms. To do this the original duration test must be extended. Using the example above, the new rule definition will look like:

problem.group1.request.duration.gt=2000;action.gt=10000:<some action>

In plain English, this says "flag the request as a problem if it exceeds a 2000ms threshold." Additionally take some action when the duration has exceeded a 10000ms threshold.

The number actions available is limited. Currently, there are only 3 actions available:

  • heap - Performs a Java heap dump
  • core - Performs a core dump
  • javacore - Performs a Java core dump

If I wanted to force a core dump when a request exceeds 10000ms, I could write the following rule:

problem.group1.request.duration.gt=2000;action.gt=10000:core

Related topics: None

External links:

Additional contributors: None

Edit | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r1 | More topic actions
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Contributions are governed by our Terms of Use. Please read the following disclaimer.
Ideas, requests, problems regarding the Deployment wiki? Create a new task in the RTC Deployment wiki project