The 6.0 release of the CLM applications includes a number of major new features, and one of the biggest is the addition of configuration management capabilities to IBM Rational DOORS Next Generation (DNG) and IBM Rational Quality Manager (RQM). In 6.0, you can apply version control concepts to requirements or test artifacts, much as you have always been able to do with source code artifacts in Rational Team Concert (RTC). You can organize your requirements or your test artifacts into streams which can evolve independently, and you can capture the state of a stream in a baseline. This allows teams to work in parallel, keeping their work isolated from each other until they decide to deliver from one stream to another. The new features also support reuse. You can treat a stream as a component, and create variants of a component by creating child streams. The new versioning features can also apply to links between artifacts. You can specify how streams or baselines in one application are related to the streams or baselines in another, so that when you create links, you always find the right versions.
There are two new parts of the CLM infrastructure that support version-aware linking. The global configuration (GC) application allows you to specify how the streams and baselines in different applications are related. You create global configurations which reference streams or baselines from Rational Quality Manager or DOORS Next Generation, and then you can select the appropriate global configuration when working in either of those applications to ensure that links go to the right place. In Rational Team Concert, you can associate a global configuration with a release, and this makes sure that links created from work items will find the correct versions of RQM or DNG artifacts. The link indexing application (LDX) keeps track of the links and transparently provides you with information about which artifacts are linked to a specific version when you use the products (e.g. when you open the Links tab in a requirement).
This document discusses the performance characteristics of the global configuration application and the link indexing application, to help you decide how to best deploy these new capabilities.
Users were distributed across user groups and each user group repeatedly runs at least one script (use case). Tests were run with a 30 second “think time” between pages for each user. Each test simulated multiple virtual users scaled up over multiple stages. All users were loaded in at a rate of one user every second.
The specific versions of software used were:
Software |
Version |
---|---|
IBM Rational CLM Applications |
6.0 |
IBM HTTP Server and Web Server Plugin for WebSphere |
8.5.5.2 |
IBM WebSphere Application Server |
8.5.5.1 |
IBM Tivoli Directory Server |
6.1 |
DB2 Database |
10.5 |
Function | Number of Machines | Machine Type | CPU / Machine | Total # of CPU cores | Memory/Machine | Disk | Disk capacity | Network interface | OS and Version |
---|---|---|---|---|---|---|---|---|---|
Reverse Proxy Server (IBM HTTP Server and WebSphere Plugin) | 1 | IBM x3250 M3 | 1 x Intel Xeon CPU X3480 3.07GHz (quad-core) | 4 | 15.5GB | RAID 0 -- SAS Disk x 1 | 279GB | Gigabit Ethernet | Red Hat Enterprise Linux Server release 6.6 (Santiago) |
JTS/LDX Server | 1 | VM image | n/a | 8 vCPU | 32GB | SCSI (virtual) | 80GB | virtual | Red Hat Enterprise Linux Server release 6.6 (Santiago) |
GC Server | 1 | VM image | n/a | 8 vCPU | 32GB | SCSI (virtual) | 80GB | virtual | Red Hat Enterprise Linux Server release 6.6 (Santiago) |
RQM Server | 1 | VM image | n/a | 8 vCPU | 32GB | SCSI (virtual) | 80GB | virtual | Red Hat Enterprise Linux Server release 6.6 (Santiago) |
RTC Server | 1 | VM image | n/a | 8 vCPU | 32GB | SCSI (virtual) | 80GB | virtual | Red Hat Enterprise Linux Server release 6.6 (Santiago) |
DNG Server | 1 | VM image | n/a | 8 vCPU | 32GB | SCSI (virtual) | 80GB | virtual | Red Hat Enterprise Linux Server release 6.6 (Santiago) |
DB2 Server | 1 | IBM x3650 M3 | 2 x Intel Xeon CPU X5667 (quad-core) | 8 | 31.3GB | RAID 10 -- SAS Disk x 8 | 279GB | Gigabit Ethernet | Red Hat Enterprise Linux Server release 6.6 (Santiago) |
A global configuration is a new concept in the 6.0 release that allows you to assemble a hierarchy of streams or baselines from domain tools such as DNG, RQM, and RTC. You use a global configuration to work in a unified context that delivers the right versions of artifacts as you work in each tool, and navigate from tool to tool. This capability is delivered by a new web application (the global configuration application).
Our performance testing for 6.0 looked at the following aspects of the global configuration (gc) application:
We carried out these tests by first developing automation that would simulate users interacting with the GC web UI. We then executed the simulation while incrementally increasing the user load, watching for the point at which the transaction response times started to increase. We watched the CPU utilization for the different servers in the test topology to see how this varied as the throughput increased.
This is a summary of what we found.
User Grouping | Percentage of the workload | Use Case | Use Case Description |
---|---|---|---|
Reads | 70% | ListComponentsPaged | Search for the first 25 global components. Results returned in one page. |
QuickSearch | Search for a global stream, component, and baseline (in that order). | ||
SearchAndSelectComponent | Search for a global component and click link to view component details. | ||
Writes | 30% | CreateAndCommitBaseline | Search for a global stream configuration for a specific component, use the stream configuration to create a staging area for global baseline creation, replace the contributing stream configurations for DNG and RQM with a baseline configuration, replace the contributing stream configuration for RTC with a snaphot, and commit the global baseline. |
CreateStreamFromBaseline | Search for a global baseline for a specific component, use the baseline configuration to a create a global stream, replace the contributing baseline configurations for DNG and RQM with a stream configuration, and replace the contributing snapshot for RTC with a stream configuration. |
The create stream/baseline operations execute roughly 220 times per hour per 100 users. The read use cases execute roughly 4300 times per hour per 100 users.
Artifacts were created during the course of the performance tests. We did not observe that response times degraded due to artifact counts.
The charts below show the response times for the different use cases in the workload as a function of the number of simulated users. There is a sharp increase in many response times at the 1350 user level; the response times are relatively flat up to 1100 users. Based on these results, we use the 1100 user level as the maximum supportable workload for this test topology.
Note that many of the operations which have degraded involve interaction with the applications. For example, the rmSelectRandomBaseline operation is measuring the response time of the DNG server. Search or list operations in the GC UI remain fast even at 1350 users.
This chart shows how many transactions per second were processed by the GC application at different user loads.
This chart looks at the CPU utilizations for the different servers in the test topology, as a function of GC throughput. Note that the DNG server is near its CPU limit at 130 transactions per second, which suggests that the DNG server is the bottleneck for this particular workload. The GC application is only at 24% CPU; it is capable of processing more requests.
Given that the GC application did not reach its maximum, we used the available data to estimate the point at which the GC application would become overloaded. Here we assume that CPU would be the limiting factor, and that the GC application will behave linearly as the load increases. The chart below extends the CPU data to higher throughput levels, and from this we can estimate that the GC application is actually capable of processing 455 transactions per second (at which point the CPU utilization would be 83%).
The table below describes the limits of the GC application from higher-level perspective. We derive the estimate maximums by using the ratio between the estimated maximum throughput (455) and the actual throughput at 1100 users (107). We multiple the values measured at 1100 users by this ratio (4.25) to estimate the maximum values.
Name | Measured value | Maximum (estimated) |
---|---|---|
# Users | 1100 users | 4675 |
Rate of stream, baseline creation | 2500 per hour | 10625 per hour |
Rate of search/browse | 141,000 | 599,000 |
GC transactions per second | 107 | 455 |
Now we look at the behavior of the GC application when used in typical integration scenarios, such as creating links between a work item and a test case, or a test case and a requirement, or a work item and a requirement. These scenarios involve users interacting with the UIs of RTC, RQM, and DNG - the users don't interact directly with the GC Web UI. In these scenarios, there are two primary ways in which the GC application is used:
How you select a global configuration from RQM or DNG is shown below:
In terms of how this interacts with the GC app, step 3 in the diagram sends search transactions to the GC app to find streams or baselines. The applications then ask the GC app for more information about the selected stream or baseline by issuing a GET for the configuration UI:
GET /gc/configuration/<id>
This tells the applications which local streams or baselines correspond to the global stream or baseline.
We've previously looked at how the GC app handles searching. In this next set of tests, we first look at how well the GC application can resolve global configurations to local configurations. Then, we look at an integration scenario to see how much stress DNG, RTC, and RQM place on the GC application.
User Grouping | Percentage of the workload | Use Case | Use Case Description |
---|---|---|---|
GC Simulation | 100% | QM_OSLCSim | Request global configuration via RQM proxy. |
RM_OSLCSim | Request global configuration via DNG proxy. |
The other workload we used (described below) simulates artifact linking. Most of the simulation operates in previously-selected global configurations, but a small number of the simulated users (7% of the total) are selecting new global configurations. To put this into the perspective of the protocol:
Test results for this integration workload start in this section.
User Grouping | Percentage of the workload | Use Case | Use Case Description |
---|---|---|---|
RTC-DNG Integration | 31% | LinkDefectToRequirement | Open a new defect, set 'Filed Against' field to current project and 'Found In' field to the release which corresponds to global configuration, save defect, click Links tab, click to add link type 'Implements Requirement', select to create a new requirement, input requirement information and save requirement, and save defect. |
RTC-RQM Integration | 31% | LinkDefectToTestCase | Open a new defect, set 'Filed Against' field to current project and 'Found In' field to the release which corresponds to global configuration, save defect, click Links tab, click to add link type 'Tested by Test Case', select to create a new test case, input test case information and save test case, and save defect. |
RQM-DNG Integration | 31% | LinkTestCaseToRequirement | In the context of a global stream configuration, open a new test case, input test case information and save, click Requirements tab, click to create a new requirement, input information and save requirement, and save test case. |
GC Selections | 7% (50/50) | QMSelectGlobalConfiguration | Resolve the local configuration context for RQM by searching for and selecting a specific global stream configuration. |
RMSelectGlobalConfiguration | Resolve the local configuration context for DNG by searching for and selecting a specific global stream configuration. |
The tests described in this section simulate the interactions of the CLM applications with the GC application, but the simulation is designed to artificially stress the GC application in order to estimate its maximum throughput. We can then compare this maximum throughput to that observed in normal usage in order to estimate how much link creation a single GC application can support.
The simulations issue requests for global configuration URIs to the DNG or RQM applications. DNG and RQM then send messages to the JTS to authenticate and then finally these applications forward the request to the GC application. In our test topology where the GC, DNG, JTS, and RQM applications are all on separate servers (with a reverse proxy on the front-end), this interchange involves cross-server network communication.
In this test, we slowly increased the number of simulated users requesting information about global configurations, and we looked at the response times, CPU utilization, and throughput for the various servers in the test topology. Here's what we found.
Because the JTS is involved in authenticating requests to the GC application, both the JTS and the GC become very busy in this stress scenario. The implication for scalability is that the JTS and the GC application are coupled together, and the maximum throughput needs to take both applications into account. At the 500 user level, the combined CPU utilization of the JTS and the GC application is roughly 50%. If we consider that the recommended topology will co-locate GC and JTS on the same server, and that the JTS needs capacity to support other operations (including the link indexer), it is reasonable (if arbitrary) to use 50% CPU as the maximum amount we can afford to allocate to GC operations. That would mean that the effective maximum throughput for the GC application is 316 transactions per second.
This chart shows how the throughput processed by the GC application varies as the simulated user load increased. The throughput increases linearly up to the 700 user level, at which point it begins to fall off as the other applications start to struggle.
This chart shows how the CPU utilization of the test servers changes as the user load increases. Above 1400 simulated users, the CPU usage flattens out. This happens because the servers are struggling to process the load, and the response times are increasing as a consequence of that struggle. Because the transactions are taking longer to process, the throughput flattens out, and so the CPU usage flattens out as well.
This chart is an alternative view, showing the CPU utilization as a function of the throughput handled by the GC application.
Here are the results for the simulation of the integration scenario.
The conclusion is that the work of link creation is largely handled by the RQM, DNG, and RTC servers, with very little demand on the GC application. It is not possible to create links fast enough to overload the GC application in this test topology.
We have previously estimated that the GC application is capable of processing 316 transactions per second, in a stress scenario simulating link creation. We also know that a CLM system becomes overloaded at the 1000 user level, when creating a total of 10326 links per hour. At this rate of link creation, the GC application processes only .53 transactions per second (which is far below the maximum throughput which the GC application can sustain). An estimate of the maximum possible rate of link creation supported by a single GC application would therefore be:
10326 links per hour x (316 TPS maximum / .53 TPS at 10326 links per hour = 6.2 million links per hour
A single CLM deployment would not be able to achieve this rate. But you can use this rate as the upper limit for a more complex, federated deployment where there are multiple independent CLM instances (each with their own JTS), all sharing a centralized deployment of the GC application.
Integration Scenario | Number of links created per hour vs. User Load | |||||
---|---|---|---|---|---|---|
500 | 600 | 700 | 800 | 900 | 1000 | |
RTC-DNG | 1,664 | 2,018 | 2,360 | 2,697 | 2,949 | 3,205 |
RTC-RQM | 1,659 | 2,010 | 2,356 | 2,611 | 2,881 | 3,053 |
RQM-DNG | 2,277 | 2,728 | 3,143 | 3,540 | 3,838 | 4,068 |
The link indexing service keeps track of the links between versions of artifacts. The service is deployed as a separate web application, and will poll every minute to pick up changes in Rational Team Concert, global configurations or Rational Quality Manager. If links are created between artifacts, the back link service will store information about the two linked artifacts and the kind of link relating them. Later on, applications can query the back link service to determine what links are pointing back at a particular artifact. If you open the Links tab in a requirement, the back link service will provide information about any work items or test artifacts that reference the requirement. If you open a test artifact, the back link service will provide information about what work items or release plans reference the test artifact.
Our testing of the link service looked at the following things:
When you first deploy the 6.0 release, the configuration management features are not enabled, and the back link indexing service is not active. To enable configuration management, you must specify an activation key in the Advanced Properties UI for your RQM and DOORS Next Generation services. When you enter the activation key for Rational Quality Management, the system will automatically locate your Rational Team Concert and Rational Quality Manager data and begin indexing all of your project areas. This may be a lengthy process, but you can continue to use your project areas while the indexing process is going on.
In our testing, we found that the initial indexing could proceed at the following rates:
The rate for processing test cases can be used for other types of test artifacts. If you know roughly how many work items or test artifacts you have across all your project areas, you can use the rates above to estimate how long it will take for the initial indexing to complete.
Application | Artifact count | Time required |
---|---|---|
Rational Team Concert | 223843 | 49m 8s |
Rational Quality Manager | 437959 | 1hr 53m 36s |
During initial indexing, the size of the index grew from 192M to 3768M (to handle 438K links), which works out to roughly 816M per 100K links.
The performance numbers above were taken from a set of Linux virtual machines, each of which had 8 virtual CPUs. The system topology consisted of an IHS server (8G of RAM), a DB2 database server (16G of RAM), and a CLM application server which included all applications including the back link indexer and the global configuration application (32G of RAM). During the indexing process, the application used roughly 50% of the CPU.
In this set of tests, we looked at 3 common integration scenarios, to see how fast the link indexer could process changes. The scenarios were:
We looked at each scenario individually, and we adjusted the automation scripts so that they executed the smallest set of transactions needed to create the links. This allowed us to artificially increase the rate of link creation, and get a clearer estimate of the processing capability of the link indexer. In the normal scenarios, the applications become overloaded long before the link indexer.
The detailed results are below, but here's a summary of what we found:
These results can be expressed in terms of the maximum steady-state rate of link creation which can be handled by the link indexer as follows:
These rates should be thought of as "per-server" rates. In other words, these are the rates at which the link indexer can process data provided by a single RQM or RTC server. As more servers are added, additional threads are added to the LDX to allow data from different servers to be processed in parallel. For example, if you had two RQM servers configured to use a common link indexer, then the maximum indexing rate across both servers would be 2 x 36000 = 72000 links per hour. Each additional server will, however, add to the CPU utilization on the link indexer. In our tests, we observed a typical CPU utilization of 4.0%. We could project a CPU utilization of 50% if we had 12 deployments similar to our test systems, all operating at their maximum capacity for link creation. Be aware that this is only a projection; we did not test at this level of scale.
Please refer to Appendix B for an overview of the LDX architecture, which will provide some additional context in which to interpret these results.
This section provides the details from the tests of link indexing performance. We conducted these tests by simulating the behavior of users that were creating pairs of new artifacts and then linking them together. We increased the number of simulated users until we encountered a bottleneck somewhere in the system. During the test, we monitored the link indexing application to extract out information about the number of changes it processed per second. We also watched the link indexer to make sure it was keeping up with the incoming traffic and not falling behind.
The table below summarizes the rates at which the automation could create links at the different user loads.
Integration Scenario | Number of links created per hour vs. User Load | ||||||||
---|---|---|---|---|---|---|---|---|---|
30 | 60 | 90 | 120 | 150 | 180 | 210 | 240 | 270 | |
RTC-DNG | 3,492 | 6,936 | 10,416 | 13,792 | 17,232 | 20,496 | 23,600 | 26,156 | 29,140 |
RTC-RQM | 3,488 | 6,984 | 10,484 | 13,972 | 17,452 | 20,884 | 24,440 | 27,944 | 31,280 |
RQM-DNG | 3,324 | 6,656 | 9,908 | 13,188 | 16,080 | -- | -- | -- | -- |
This test simulated a collection of users that were linking work items to requirements. The automation simulated the following operations:
We slowly ramped up the user load until the system became overloaded.
The chart below summarizes both the CPU utilization and the maximum link indexing rates. The left-hand Y axis shows the CPU utilization for the JTS, DNG, and RTC servers. The right-hand Y axis shows the indexing rate (in work items per second). The LDX server is not very busy (around 3% CPU). The RTC and the DNG servers are doing the bulk of the work. CPU utilization in the RTC server has risen to over 60% as the load reaches 270 simulated users.
The rate at which the LDX can index the newly created links is relatively stable (between 55-60 work items per second). The LDX was able to keep up with the link creation rate at all load levels (the maximum rate of link creation was 8 per second).
Please refer to Appendix B for a description of how we calculated the link indexing rates.
We slowly ramped up the user load until the system became overloaded.
The chart below summarizes both the CPU utilization and the link indexing rates. The left-hand Y axis shows the CPU utilization for the JTS, RQM, and RTC servers. The right-hand Y axis shows the indexing rate (in work items per second). The LDX server is not very busy (around 4% CPU). The RTC and the RQM servers are doing the bulk of the work. CPU utilization in the RTC server has risen to over 60% as the load reaches 270 simulated users.
This test requires the link indexer to process both work items and test cases. The LDX can process work items at a maximum rate in the range of 50-55 work items per second. The LDX can process test cases at a maximum rate of around 20 test cases per second. The LDX was able to keep up with the link creation rate at all load levels (the maximum rate of link creation was 9 per second).
Please refer to Appendix B for a description of how we calculated the link indexing rates.
We slowly ramped up the user load until the system became overloaded.
The chart below summarizes both the CPU utilization and the link indexing rates. The left-hand Y axis shows the CPU utilization for the JTS, RQM, and DNG servers. The right-hand Y axis shows the indexing rate (in work items per second). The LDX server is not very busy (around 4% CPU). The DNG and the RQM servers are doing the bulk of the work. CPU utilization in the RQM server has risen to over 80% as the load reaches 150 simulated users.
This test requires the link indexer to process test cases. In this context, the LDX can process test cases at a maximum rate of around 10 test cases per second. The LDX was able to keep up with the link creation rate at all load levels (the maximum rate of link creation was 4.5 per second). The processing rate for test cases in this scenario is nearly half of what it is when creating links from work items to test cases. This is because this scenario creates twice as many changes as the RTC-QM scenario, since it saves the test case twice (once on creation and once to add the requirement link). The LDX therefore does twice as much work for each link that is created.
Please refer to Appendix B for a description of how we calculated the link indexing rates.
JVM arguments were set to:
-Xgcpolicy:gencon -Xmx16g -Xms16g -Xmn4g -Xcompressedrefs -Xgc:preferredHeapBase=0x100000000 -XX:MaxDirectMemorySize=1G -Xverbosegclog:logs/garbColl.log
JVM arguments were set to:
-Xgcpolicy:gencon -Xmx16g -Xms16g -Xmn4g -Xcompressedrefs -Xgc:preferredHeapBase=0x100000000 -XX:MaxDirectMemorySize=1G -Xverbosegclog:logs/garbColl.log
JVM arguments were set to:
-Xgcpolicy:gencon -Xmx24g -Xms24g -Xmn6g -Xcompressedrefs -Xgc:preferredHeapBase=0x100000000 -XX:MaxDirectMemorySize=1G -Xverbosegclog:logs/garbColl.log
In httpd.conf:
<IfModule worker.c> ThreadLimit 25 ServerLimit 80 StartServers 1 MaxClients 2000 MinSpareThreads 25 MaxSpareThreads 75 ThreadsPerChild 25 MaxRequestsPerChild 0 </IfModule>
The link indexing service is meant to run behind the scenes. It is automatically configured and used by the CLM application, so under normal conditions, you don't even need to know it is there. Still, a brief overview of the implementation of the link indexing service may help to put some of the test results into context.
The first thing to know is that changes made to work items, global configurations, and test artifacts are tracked by the applications and exposed by each application as a "tracked resource set" (also known as a TRS feed). From the perspective of the LDX, each TRS feed is considered a "data source". The LDX polls the data sources looking for new changes once per minute; there will be one data source per application per server.
In our test topology, for example, the LDX knows about three data sources: one for the RQM server, one for the RTC server, and one for the GC server. Since the integration scenarios did not involve GC creation, there is no indexing activity for the GC data source in these tests.
Deployments that have more servers will have more data sources, and the LDX will cycle through all of the data sources it knows about once per minute. When it wakes up, it retrieves the list of changes since the last time it checked. It then gets details about each of the changed artifacts, and updates the link index if it finds that a link has been added.
The next thing to know is that each data source will be processed by 2 threads (by default, although this can be configured). This has two implications:
So, returning to our test topology - we have three data sources, with 2 threads per data source. If the change rates were high enough to keep the LDX busy all of the time, this would mean that there would be 6 busy threads, each consuming a significant fraction of a CPU core (but not 100% of a core, since the LDX is frequently waiting for responses from the applications). If we estimate that each thread is busy for 25% of the time, then in our 8 virtual CPU systems, we could expect a maximum CPU utilization of 6 threads x .25 cores used per thread / 8 total CPUs = 18.75%. In our tests, of course, the change rate could never get high enough to keep the LDX busy even half of the time, so the CPU used for indexing was much smaller.
Still, it is a good idea to think about how many RQM or RTC servers an LDX is likely to need to handle when sizing the LDX server. A shared LDX server that will be handling a large number of data sources would benefit from more processors.
The last thing to know concerns the types of changes that are exposed to the LDX by the application. For RTC, there will be one update per new or modified work item. For RQM, things are somewhat more complicated. Because test artifacts are versioned resources, when we create a new test case, and then save it later with a link, there are multiple changes recorded in the change table (the creation of the new test case is recorded, and then the creation of a second version of the test case is recorded because the test case was saved with a link, and then a deletion request for the original version if recorded). The LDX ends up doing more work to process test case changes than it does to process work item changes. For the case where the automation first creates a work item, then creates a test case, and then updates the work item with a link to the test case - only the creation of the new test case is recorded in the change tables (there are no additional saves).
This is why the processing rate for QM resources in our tests is lower for the scenario involving QM to RM linking but higher for the scenario involving CCM to QM linking. In the QM to RM case, more information has to be processed by the LDX because of the additional saves.
The LDX runs behind the scenes and does not normally require attention. It does, however, have a rich set of administration features (accessible via the ldx/web URL), and we used these features to get an idea of what the LDX was doing when the CLM system was under load. If particular, if you drill down into the status of the data sources, you can find a History tab which summarizes past indexing activity, and the "View change log" link provides detailed information about what happened during each LDX polling interval.
The information provided by the change log is shown below, for a single load level for each of the scenarios. The key bits of information are:
The rate found by dividing the number of retrieved artifacts by the duration represents the maximum rate at which the LDX can read from that data source. For the RTC change log in the RTC-DNG scenario below, the first entry shows that the LDX processed 446 work items in 9 seconds for a rate of 49.56 work items per second.
There is some variation in the rates because the precision of the Duration value is not high (it is rounded to the nearest second). Additionally, the duration is often small (especially for RTC), which leads to further variation. In this document, we have looked at the range of calculated rates, and used a value in the middle of the range as the estimate.
We can also determine whether the LDX is keeping up by checking to see whether the duration is staying constant for a fixed load. If the duration (and the number of retrieved artifacts) is growing over time, then that is an indication that the LDX is not keeping up, so that on each polling interval, it finds an ever-growing backlog. In our tests, the LDX always kept up.
Status icon key: