IBM® Engineering Lifecycle Management (ELM) consists of a set of interconnected web applications on multiple physical or virtual server instances. Engineering Lifecycle Management on Hybrid Cloud v1.0 provides containers of ELM applications coupled with an operator that offers the ability to quickly deploy ELM on any of the following environments:
This article provides the results of performance testing for Engineering Lifecycle Management (ELM) applications using the IBM Cloud Kubernetes service. The objectives of the performance testing were to:
The supported user loads for the applications are shown below. In this table, the "Model" column is the number of users defined for the small and medium deployments in the ELM 7.1 sizing overview. The "Actual" column is the user loads that could be handled during the testing. Higher user loads can be tolerated but response times will be degraded.
The limiting factor is the number of CPUs assigned to the container. Once the CPU usage reaches 100%, response times degrade. For Report Builder, disk speed is also a bottleneck.
Abbreviations:
Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multi-programming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.
This testing was done as a way to compare and characterize the differences in performance between different versions of the product. The results shown here should thus be looked at as a comparison of the contrasting performance between different versions, and not as an absolute benchmark of performance.
The high level architecture for Engineering Lifecycle Management on Hybrid Cloud is shown below.
The performance test environment was an instance of this architecture hosted in the IBM Cloud, using the IBM Cloud Kubernetes service. The Kubernetes cluster was configured with 8 worker nodes. Each worker node was configured with 32 CPU and 128G of RAM.
A bare metal database server running Oracle 19c was used for the application databases. This server was configured with 1 terabyte of RAM and 96 CPUs. A RAID array of 16 NVME drives was used for the Oracle storage subsystem.
The performance tests focused on the ELM application containers, and applied load to containers of different sizes. The tests did not explicitly test the ELM operator or other parts of the IBM Cloud infrastructure.
The repositories used for testing were of medium size:
The size of the application databases were:
Application | Database size (GB) |
---|---|
ERM DOORS Next | 240 |
LQE | 203 |
EWM | 98 |
ETM | 67 |
This section provides analysis of the performance testing conducted for IBM Engineering Requirements Management DOORS Next. Two configurations were tested:
Two workloads were used in the testing:
There were two variants of each of these workloads: one with link validity enabled, and one with link validity disabled. All workloads were executed in a global configuration having 250 contributions.
Workload | Link Validity | Small (4CPU, 8G RAM) | Medium (8CPU, 16G RAM) |
---|---|---|---|
Standard | Enabled | 25 users | 125 users |
Standard | Disabled | 50 users | 200 users |
Views only | Enabled | 25 users | 50 users |
Views only | Disabled | 50 users | 100 |
Note that the workloads simulate opening module views that display link columns. If link validity is enabled, there is additional CPU load and the max user load is slightly lower than when link validity is disabled.
The small ERM container was configured with 4 CPUs and 8G of RAM.
The test results from executing the standard workload are shown below.
CPU
User Load | Average CPU Usage | CPU Usage Percentage |
---|---|---|
10 | 1.35 | 34% |
25 | 2.35 | 59% |
50 | 3.18 | 80% |
75 | 3.79 | 95 |
100 | 3.98 | 100% |
125 | 4 | 100% |
Response Times
Response times increase between 25 and 50 users. At 75 users, when the CPU is 100% utilized, response times for many operations are over 1 minute.
Throughput
The test results from executing a views-only workload are shown below.
CPU
User Load | Average CPU Usage | CPU Usage Percentage |
---|---|---|
25 | 2.14 | 54% |
50 | 3.51 | 88% |
75 | 4 | 100 |
100 | 3.98 | 100% |
125 | 3.99 | 100% |
150 | 3.97 | 99% |
175 | 3.97 | 99% |
200 | 3.97 | 99% |
Response Time
Throughput
The medium container for ERM was configured with 8 CPUs and 16G of RAM.
CPU
User Load | Average CPU Usage | Percentage |
---|---|---|
25 | 1.64 | 21% |
50 | 2.69 | 34% |
75 | 3.56 | 45% |
100 | 4.18 | 52% |
125 | 5.56 | 70% |
150 | 7.6 | 95% |
175 | 7.88 | 99% |
200 | 7.9 | 99% |
250 | 7.6 | 95% |
Response Time
Throughput
CPU
User Load | Average CPU Usage | Percentage |
---|---|---|
25 | 2.38 | 30% |
50 | 3.91 | 49% |
75 | 4.94 | 62% |
100 | 6.77 | 85% |
125 | 6.95 | 87% |
150 | 7.66 | 96% |
175 | 7.82 | 98% |
200 | 7.88 | 99% |
250 | 6.58 | 82% |
Response Time
Throughput
This section provides analysis of the performance testing conducted for ETM. Two configurations were tested:
Metric | Small container | Medium container |
---|---|---|
CPU Utilization | 100% at 900 users | 30% at 1000 users |
Response Time | Response times increase above 900 users | Response time is stable |
Throughput | 65 transactions per second at 900 users | 75 transactions per second at 1000 users |
The bottleneck is CPU for the small container tests. The medium container handled 1000 users without problems.
The workload for the cloud testing was the same as that used in previous ETM tests in the IBM lab. For the container tests, the user load was increased from 200 to 1000 users, with each stage lasting 15 minutes.
The small ETM container was configured with 4 CPUs and 8G of RAM.
CPU is the bottleneck for the small container.
CPU
Users | Average CPU Usage | Percentage |
---|---|---|
200 | 1.39 | 34.75% |
400 | 2.09 | 52.25% |
600 | 2.72 | 68% |
800 | 3.21 | 80.25% |
1000 | 3.98 | 99.50% |
Throughput
Response Time
Response times are stable as load increases, up until the point at which the CPU usage reaches 100% (1000 users). At that point, response times increase.
The medium container was configured with 8 CPU and 16G of RAM.
The medium container was capable of handling more than 1000 users.
CPU
Users | Average CPU Usage | Percentage |
---|---|---|
200 | 0.82 | 10.25% |
400 | 1.24 | 15.50% |
600 | 1.38 | 17.25% |
800 | 1.89 | 23.62% |
1000 | 2.38 | 29.75% |
Throughput
Response Time
The test results are summarized below.
Metric | Small Environment | Medium Environment |
---|---|---|
Peak Throughput | 50–56 transactions/sec at 20 threads | 180 transactions/sec at 100 threads. |
CPU Utilization | 100% at 30 threads | 90% at 100 threads |
Estimated user limit | 600 users | 2000 users |
The bottleneck is CPU. Once the available CPUs are 100% utilized, performance starts to degrade. Adding more CPUs to the LQE container will increase throughput.
The performance tests characterize workload in terms of threads. The workload is generated by sending requests directly to the LQE container from a test program, and the test program simulates multi-user load by running multiple threads. Each thread can be thought of as a simulating a very active user that does nothing but make requests for links. There is no pausing, so each thread makes requests as fast as it can. This is an extremely intense workload that is designed to drive the LQE container to its limits.
Real users are much less active and interact with LQE indirectly through the applications. For example, when a user opens a view that has link columns using ERM DOORS Next, a request will be sent to the LQE container from the ERM DOORS Next container. You can consider each thread (or simulated user) to be the equivalent of 20 real users.
The small container is configured with 8 CPUs and 16G of RAM. For this configuration:
Throughput
CPU
Response Time
The medium container is configured with 16 CPUs and 32G of RAM. For this configuration:
CPU is the primary bottleneck.
Throughput
CPU
Response Time
This section provides analysis of the performance testing conducted for Report Builder. Two configurations were tested:
The following reports were used when applying load to the Report Builder application:
Each simulated user selects one of these reports to execute. The reports are executed at the rate of one report per minute per user. The number of simulated users was increased until the Report Builder application was overloaded.
Metric | Small (2 CPU, 4G RAM) | Medium (8 CPU, 16G RAM) |
---|---|---|
Max user | 75 | 250 users |
Throughput | 6 reports/second at 75 users | Peaks at 15 transactions/sec (250 users) |
CPU Utilization | 90% at 100 users | CPU usage never exceeds 25% |
Response Time | Sharp increase at 100 users | Response times increase as load increases past 100 users |
For the small container, the bottleneck is CPU. For the medium container, there is a bottleneck in Report Builder created by the storage subsystem.
The small container is configured with 2 CPUs and 4G of RAM.
Report Throughput
CPU
Report Performance
The medium container for Report Builder is configured with 8 CPUs and 16G of RAM.
An analysis of the Report Builder performance indicated that the throughput is limited by an internal bottleneck related to the storage subsystem. Report Builder stores runtime statistics on the persistent volume associated with the container. The writes to the persistent volume are limiting throughput.
Page Throughput
CPU
Report Performance
This section provides analysis of the performance testing conducted for EWM across small and medium containers in the IBM cloud. Two configurations were tested:
The workload used in testing the EWM container included these operations:
Each operation is 12.5% of the total user load, with the exception of the two "Import CSV" operations (which are limited to 2 users). The simulation uses a 30 second think time. User load is increased incrementally until the container becomes overloaded.
Metric | Small container | Medium container |
---|---|---|
Max users | 70 | 125 |
CPU Utilization | 100% at 75 users | 100% at 150 users. |
Response Time | Stable until 70 users | Stable until 125 users |
Throughput | 28 transactions per second at 70 users | 52 transactions per second at 125 users |
The small container is configured with 4 CPUs and 8G of RAM.
CPU
CPU usage reaches 100% at 75 users.
Users | % CPU |
---|---|
40 | 42.25% |
50 | 56.00% |
60 | 61.75% |
65 | 73.50% |
70 | 76.75% |
75 | 100% |
80 | 100% |
Page Element Throughput
The chart below shows how throughput (measured in transactions per second) varies with user load. A "page element" is an HTTP call. Throughput increases as user load increases, up to 75 users. Throughput then levels out, since the workload overloads the CPU and response times start to increase.
Response Time
Response times increase slightly as the workload increases, but there is a sharper increase in response times at 75 users (when the CPU is 100% utitlized).
The medium container is configured with 8 CPUs and 16G of RAM.
CPU
Users | % CPU |
---|---|
100 | 55.37% |
125 | 70.75% |
150 | 100% |
200 | 100% |
250 | 100% |
300 | 100% |
Page Element Throughput
The chart below shows how throughput (measured in transactions per second) varies with user load. A "page element" is an HTTP call. Throughput increases as user load increases, up to a maximum of 52 transactions per second at 125 users. Throughput then levels out, since the container overloads the CPU and response times increase.
Response Time
The response times for opening a dashboard tab that includes plan widgets degrades faster than other operations. The response time for that operation is shown below.
The response time increases once the CPU maxes out (at 150 users).
Status icon key: