Of course, changes in usage patterns or deployment size can require adjustments to your deployment topology. IBM recommends adopting a proactive approach based on a “Measure-analyze-act” control loop.
You measure key health metrics, analyze them to identify current (or potential) problems, and then act on your analysis to address the problems.
Start by identifying the key indicators of deployment health, and then measure the values of those metrics periodically. Keep a historical record of the measurements so you can see changes over time. IBM strongly recommends that you use a commercial tool for Application Performance Monitoring (APM) to do this.
You can get a good initial assessment of deployment health by monitoring usage metrics at the operating system level. This includes:
Collect this information for all servers that are part of the deployment (including proxy servers and database servers).
ELM applications surface health indicators through an implementation of J2EE industry standard managed beans, which are defined as part of the Java Management Extensions (JMX) specification. Managed beans provide a defined, well understood way of getting information about what ELM applications are doing.
Start by reviewing the introductory material here:
These articles introduce a set of starter metrics that can provide early indications of problems:
Documentation for additional MBeans can be found here:
You can also discover available MBeans (and inspect their attributes) from a running ELM Java process.
The “Analyze” part of the “Measure-analyze-act” loop involves looking at the data collected during measurement to identify current (or potential) problems. The simplest form of analysis is to assign a threshold to each metric. There is a potential issue if the metric routinely exceeds the threshold value. For example, a server may be overloaded if the percentage of used CPU exceeds 90%. You can also look at patterns over time. For example, you might only flag a CPU utilization issue if 7 out of the last 10 measurements exceeded 90%. You can also derive rates of change from your historical data. For example, you might see that used disk space is growing at a rate that means you’ll run out in 30 days. That might be more useful to you than waiting for used disk space to reach 95%. There will often be a correlation between metrics. For example, the response time of a service can be impacted by the CPU usage on a server, or by the number of transactions processed per second.
An APM tool can greatly simplify the analysis process. IBM does not have a specific recommendation for APM tools, although some customers have successfully deployed Splunk. Other options include IBM Instana Observability and Prometheus.
An APM tool should be able to do the following things:
The final step in the Measure-Analyze-Act loop is to take action based on your analysis of the measurements. Some of the actions that you might take after analysis include:
After you take action, you start the Measure-Analyze-Act cycle again.
I | Attachment | Action | Size | Date | Who | Comment |
---|---|---|---|---|---|---|
![]() |
MAA.png | manage | 5.8 K | 2023-10-28 - 16:01 | VaughnRokosz |
Status icon key: