Introduction
This document explains on how to troubleshoot HTTP Connection pool used by the ELM server(server to sever calls).
The ELM applications (ETM, EWM, and ERM) utilize HTTP connections for server-to-server communication. By default, each application is configured with a fixed number of connections, which can be adjusted in the advanced properties of each application.
These connections are used for internal communication, with connections being created, reused, and unused connections returned to the pool. However, in some cases, connections are not returned to the pool, leading to a gradual exhaustion of server HTTP connections. Over time, this can result in the server becoming unreachable.
The document outlines the following steps:
- Increase the number of connections as a short-term mitigation.
- Collect data related to connection pool.
- Redirect connection pool logs to a dedicated file.
- Parse connection pool logs using the foundation tool.
- Scanning connection pool logging.
Increase the number of connections as a short-term mitigation
By default, each application is configured with a fixed number of connections, which can be adjusted in the advanced properties of each application.
For example:
https://<<Server>>/ccm/admin#action=com.ibm.team.repository.admin.configureAdvanced
https://<<Server>>/qm/admin#action=com.ibm.team.repository.admin.configureAdvanced
https://<<Server>>/rm/admin#action=com.ibm.team.repository.admin.configureAdvanced
Search for:
Maximum outgoing HTTP connections per destination to 50
Maximum outgoing HTTP connections total to 250
HTTP connections per destination can be set to 250, while the total HTTP connections can be configured to 1000. These changes typically take effect immediately, although the save operation may take up to a minute. However, on a busy server, careful consideration is necessary. Instead of using the web UI, navigate to the corresponding teamserver.properties file, append the following two lines at the end, save, and close the file. The changes will be applied upon the next server restart.
com.ibm.team.repository.maxOutgoingConnectionsPerRoute=250
com.ibm.team.repository.maxOutgoingTotalConnections=1000
Collect data related to connection pool
Connection usage logging is disabled by default due to its verbosity. However, it can be enabled either through the repodebug interface or by modifying the log4j2.xml file. We use the latter method because it persists across server restarts and allows connection-related logs to be kept separate from the main application log file. Enabling this logging does not impact performance, but you must ensure sufficient disk space is available in the "log" folder to store the logs.
Redirect connection pool logs to a dedicated file
Here’s how to enable "connPoolByRoute" logging in the existing log4j2.xml file. Always ensure you back up the current log4j2.xml file before making any changes.
1. Open the file in an editor.
2. Locate the "RollingFile" section and add the corresponding configuration.
3. Find the "Logger" section and include the necessary Logger configuration.
4. Save the updated file.
5. Upon successful configuration, the qm-conpool.log file should be created.
This file will contain entries related to the
ConnPoolByRoute class because logging for this class has been redirected there. Additionally, the qm.log file will exclude these entries due to the additivity setting being set to false.
<RollingFile name="connPoolByRoute" filename="${dir}/${app}-conpool.log" filePattern="${dir}/${app}-conpool%i.log">
<PatternLayout pattern="%d{ISO8601_OFFSET_DATE_TIME_HHMM} %X{Tid} - %m%n"/>
<Policies>
<SizeBasedTriggeringPolicy size="10MB"/>
</Policies>
<DefaultRolloverStrategy max="5"/>
</RollingFile>
<Logger name="org.apache.http.impl.conn.tsccm.ConnPoolByRoute" additivity="false" level="DEBUG">
<AppenderRef ref="connPoolByRoute"/>
</Logger>
Adhere to the existing logging pattern for the specified classes. Add new entries only if the "ConnPoolByRoute" logger is insufficient. If unsure, consult your team or the engineering team for guidance.
<Logger name="com.ibm.team.application.contribution.server.rest.client.utils.RestClientForTrustedFriends" additivity="false" level="TRACE">
<Logger name="com.ibm.team.application.contribution.oslc.utils.OslcUtils" additivity="false" level="TRACE">
<Logger name="com.ibm.team.application.contribution.oslc.common.IHttpRequestHandler" additivity="false" level="TRACE">
<Logger name="com.ibm.team.scm.sdk.utils.ScmLinkUtils" additivity="false" level="DEBUG">
<Logger name="com.ibm.team.scm.sdk.ScmLinkQueryService" additivity="false" level="DEBUG">
Parse connection pool logs using the foundation tool
The connection pool log is highly detailed, containing information about connection leases and releases. Manually reviewing the log to identify whether leased connections are properly closed can be difficult. To address this, Glenn Bardwell developed a tool that parses HTTP connection pool logs to identify unclosed connections.
You can run the tool using the following command:
1. The JAR file can be downloaded in the below workitem.
2. Input file: Your connection pool log.
3. Output file: The file where information about connection leaks will be recorded.
Download L3 parser tool
usage:
Command line
java -jar logconnection.jar <input file name> <output file name>
Sample results from the tool:
Mismatch in thread:[TID: 89595B58
Request : <No request matches thread id. Thread id may not include [TID
Leases:387 Line Number:30035
Lease Get Free Connection Line number:
30018
30029
30036
Lease CreateConnection Line number:
30024
Lease Release Connection Line number:
30025
30030
End Line Number:0
The sample outcome shows that three connections were leased (log line numbers: 30018, 30029, 30036), but only two were released (log line numbers: 30025, 30030).
To investigate, open the log file in an editor and enable line numbering (e.g., use Ctrl + G in Notepad++). Navigate to the specified lines and examine the details of the HTTP requests. Use the thread ID (TID) associated with each request to identify the lease and release patterns.
If further assistance is required, reach out to the respective product team.
Note: If there are numerous connection log files, focus on the file where a potential leak is reported. To retain more log lines in a single file, increase the file size limit to 100–200 MB using the log4j2.xml configuration.
Scanning connection pool logging
Pay attention to the counters in the log.
total kept alive: 73,
total issued: 379,
total allocated: 452 out of 2000
The thread ID -
49A992E9 shows that 452 connections are currently in use from a pool of 2000, reflecting an increase in the "maxOutgoingTotalConnections" limit to 2000. Of these, 73 connections are kept alive, and 379 have been issued. The "total allocated" count represents the sum of kept-alive and issued connections (73 + 379 = 452). In a well-functioning system where connections are periodically released, the "total allocated" count should decrease over time and remain well below the maximum available connections.
2024-12-19T20:48:14,073+0100 [Default Executor-thread-3208 @@ 20:48 <unauthenticated> <Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36 Edg/131.0.0.0@10.94.117.167> /qm/proxy] DEBUG org.apache.http.impl.conn.tsccm.ConnPoolByRoute [TID: 85C55A7D] - Pooling connection [{s}->https://jira.123.cloud:443][null]; keep alive indefinitely
2024-12-19T20:48:14,111+0100 [Default Executor-thread-3341 @@ 20:48 <unauthenticated> <Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36 Edg/131.0.0.0@10.94.117.167> /qm/proxy] DEBUG org.apache.http.impl.conn.tsccm.ConnPoolByRoute [TID: 49A992E9] - [{s}->https://jira.123.cloud:443] total kept alive: 73, total issued: 379, total allocated: 452 out of 2000
......
.......
.......
2024-12-20T04:45:04,003+0100 [Default Executor-thread-3413 @@ 04:45 <unauthenticated> <Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36 Edg/131.0.0.0@10.94.117.77> /qm/proxy] DEBUG org.apache.http.impl.conn.tsccm.ConnPoolByRoute [TID: FFEDA716] - [{s}->https://jira.123.cloud:443] total kept alive: 86, total issued: 420, total allocated: 506 out of 2000
The example log extracted from a customer system reveals connection leaks. Examine the connection usage counter for thread IDs
49A992E9 and
FFEDA716. The log clearly shows a growing number of connections. Approximately eight hours later, there is a noticeable leak of
54 connections (506 - 452). This indicates a persistent issue. To address it, further investigation is required, and assistance from the engineering team will be necessary to identify and resolve the problem.
Heading 1
External links: