Introduction
This page provides guidance for sizing database servers that host the Lifecycle Query Engine - Relational store (LQE rs) application. LQE rs is an implementation of the Lifecycle Query Engine that replaces Jena with a relational database. This improves report scalability while reducing LQE resource requirements.
Standard disclaimer
The information in this document is distributed AS IS. The use of this information or the implementation of any of these techniques is a customer responsibility and depends on the customer’s ability to evaluate and integrate them into the customer’s operational environment. While each item may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee that the same or similar results will be obtained elsewhere. Customers attempting to adapt these techniques to their own environments do so at their own risk. Any pointers in this publication to external Web sites are provided for convenience only and do not in any manner serve as an endorsement of these Web sites. Any performance data contained in this document was determined in a controlled environment, and therefore, the results that may be obtained in other operating environments may vary significantly. Users of this document should verify the applicable data for their specific environment.
Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multi-programming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.
This testing was done as a way to compare and characterize the differences in performance between different versions of the product. The results shown here should thus be looked at as a comparison of the contrasting performance between different versions, and not as an absolute benchmark of performance.
Summary of results
We ran a series of tests to determine how much database memory was optimal for our performance workload. Based on these tests, we recommend setting your buffer cache or buffer pool to 10% of the disk size of your LQE rs database. Then, monitor the I/O on your database and adjust that number up or down based on your own reporting workload.
Background
The Jazz Reporting Service is the default reporting option for the IBM® Engineering solutions. You can quickly and easily consolidate data from a variety of sources across your tools and project areas. It consists of three components: Report Builder, Data Collection Component, and Lifecycle Query Engine.
The Lifecycle Query Engine (LQE) implements a Linked Lifecycle Data Index over data provided by one or more lifecycle tools. A lifecycle tool makes its data available for indexing by exposing its Linked Lifecycle Data via a Tracked Resource Set. With this index, you can build sophisticated traceability reports to understand relationships between your data (for example, find all requirements that are associated with failing test cases).
When LQE was introduced, it stored index data using
Apache Jena TDB. Reports were implemented using
SPARQL queries. The architecture is shown below:
This architecture has several disadvantages for large indexes:
- The TDB index must be local to the LQE server machine, which prevents addressing scale issues through clustering
- Large Jena indexes must be almost entirely cached in memory for performance reasons, so that LQE servers require large amounts of RAM (as well as large Java heaps)
- Jena does not have a sophisticated query optimization engine, so complex traceability reports could be slow.
One big limitation of the Jena architecture is how it deals with a workload involving both reading and writing. In the context of LQE, reads come from report execution, and writes come from indexing. If a query is running when a write request is processed, the updated data is temporarily stored in a journal file. Jena can combine the data in the main database with the updates in the journal to present a consistent view of the state of the database at the time the query started. The journal is merged into the main database once there are no reads pending.
There are some problems with this approach. First, the journal will continue to grow as long as there are queries running. And queries will be running if the reports are slow, or if the report workload is high (i.e. many users are running reports frequently). A busy LQE server may not be idle very often, which means that the journal can grow very large. Additionally, queries get slower as the journal grows, since Jena has to do more work to identify the correct dataset for a given report. This means there is a feedback loop where slow queries cause the journal to grow, which in turn causes the queries to be slower and then prevents the journal from being merged into the main database. This feedback loop can even be triggered by a single slow query.
Finally, it can take a long time to merge a large journal into the main database. While the merging is in progress, all read operations are blocked. This creates a queue of report requests waiting for the merge to complete. And once these reports start running, the cycle can start again.
Because of these disadvantages, Jena is being replaced with a relational database (see below). LQE continues to access application data via TRS feeds, but the results of indexing are stored on a relational database server. Report Builder executes reports by running SQL against the relational store.
This improves the scalability of the LQE solution in several ways.
- Clustering becomes possible through standard database solutions like Oracle RAC
- Database optimization engines are more sophisticated than Jena, allowing for improved query execution
- Relational databases are designed for high data volumes, with many optimizations available to limit physical IO
- Relational databases are much better at dealing with concurrent reading and writing. While locking is still needed, the locking usually happens at the row level. Lock escalation to the table level is not frequent, so one slow SQL query will not disrupt the entire server.
- Since query processing shifts from the LQE server to the database server in this architecture, the LQE server requires less RAM. RAM freed up from LQE can be moved to the database server where it can be used more efficiently
It is also possible to deploy the two LQE architectures side-by-side, to reduce risk during the transition to the new architecture. A Report Builder can be configured to run both Jena and SQL queries, and even to compare the queries for correctness and performance.
Test results
The goal of these tests was to determine how much database memory should be allocated to the database caches, specifically:
- The buffer cache on Oracle
- The buffer pool on DB2
When a database executes a SQL statement, it will read table and index data from disk into these caches. Performance degrades if the database doesn't have enough of the data it needs in memory, since reading from disk is significantly slower than reading from memory. The amount of memory required depends on what the reports are doing, how big the index is, and how many reports are running concurrently. These tradeoffs are
discussed below, but in order to estimate memory requirements, we used an LQE index of fixed size and executed a load test against the index. We then monitored I/O on the database server for a range of cache sizes. When caches are undersized, we would expect to see high physical I/O. Increasing the cache size should then reduce I/O, and we can determine the optimum cache size by charting I/O vs. cache size and looking for a point of diminishing returns.
For Oracle, we monitored the read IO requests per second, using an AWR snapshot covering the period of the load test. When this number is high, Oracle is not finding data in the buffer cache and therefore making requests out to disk to read in the data.
On DB2, we monitored the disk utilization on the operating system. DB2 allows us to disable file system caching, so we don't have to worry about the Linux page cache influencing the I/O numbers. When the disk utilization is high, DB2 is not finding data in the buffer pool and therefore making requests out to disk.
Test results for Oracle
In these tests, we kept the total memory allocated to Oracle fixed at 96G, and then varied the size of the buffer cache (by setting the parameter DB_CACHE_SIZE). For each value of buffer cache, we then ran a 100 user reporting workload (with report caching disabled). We collected operating system metrics during the test, and also captured an AWR report during the peak load. We extracted the number of Read IO requests per second from the AWR report.
We disabled caching in report builder to have better control over when SQL statements execute. With report caching enabled, a request to generate a report may avoid executing a SQL statement, which reduces pressure on the buffer cache. With caching disabled, each request to generate a report runs a SQL statement. This results in a more predictable I/O load.
I/O as a function of buffer cache size is shown below. For this workload, a 24G buffer cache is optimal.
Other test details:
- Disk size of LQE rs database: 698G
- Total RAM on database server: 768G
- Database CPU utilization: 25%
- Disk busy (24G buffer cache): 12%
A 24G buffer cache is 3.4% of the disk size of the LQE rs test database. Since the amount of memory required to cache report data can be highly variable, we recommend starting a buffer cache of 10% of the total database size.
The storage subsystem was not very busy (12% utilization) in these tests, even for small buffer caches. This is due to the large amount of RAM available on the test server (768G). Even though only 96G is allocated to Oracle, the database files are being kept in the Linux page cache (which is resident in memory). Even though Oracle is making physical reads, Linux is satisfying those reads by reading from memory. If we had used less RAM on the Oracle server, we would have see higher disk utilization with smaller buffer caches.
Test results for DB2
For DB2 tests, we set the total amount of database memory to be 128G, disabled self-tuning memory, and then varied the buffer pool size from 3G to 60G. We ran a reporting workload and measured the disk utilization (disk busy) of the DB2 server. We plotted disk utilization vs. buffer pool size (see chart below). The optimal buffer pool size is where the curve starts to flatten out (where there are diminishing returns from adding more memory). For this workload, the optimal buffer pool size is 27G.
We recommend disabling self-tuning memory for the LQE rs database. When self-tuning memory is enabled, DB2 can shrink the buffer pool, and this can temporarily lead to poor query performance. Performance improves once DB2 increases the size of the buffer pool. When self-tuning memory is disabled, report performance will be more consistent.
We disabled caching in report builder to have better control over when SQL statements execute. With report caching enabled, a request to generate a report may avoid executing a SQL statement, which reduces pressure on the buffer cache. With caching disabled, each request to generate a report runs a SQL statement. This results in a more predictable I/O load.
Other test details:
- Disk size of LQE rs database: 665G
- Total RAM on database server: 768G
- Database CPU utilization: 25%
A 27G buffer cache is 4% of the disk size of the LQE rs test database. Since the amount of memory required to cache report data can be highly variable, we recommend starting a buffer cache of 10% of the total database size.
Note that the storage system is maxed out for smaller buffer pools, unlike the Oracle tests. That is because we disabled file system caching for the LQE rs database, which prevents the Linux page cache from loading the database files into memory.
#Factors
Factors impacting database sizing
As a general rule, reports perform best if the database is able to cache the rows that the report operates on in memory. If the rows are not in memory, the database must read them into memory from disk, and this adds to the report execution time. It is not necessary to have so much memory that the database is held in memory in its entirety - you only need enough for whatever subset of rows are being commonly used by reports. This is why we suggest sizing your database caches based on 10% of the disk size of the data warehouse, as a starting point.
There are several factors that impact memory usage and may require you to increase (or decrease) the database memory. This section describes those factors in more detail.
Overall reporting workload
One factor that impacts database memory usage is the reports themselves. One factor that impacts database memory usage is the reports themselves. The amount of memory required depends on what the reports are doing, how big the index is, how many reports are running concurrently, how many rows are returned, and how long the reports run.
There are database tables that represent the different ELM artifacts and the relationships between them. A traceability report that looks at links between multiple artifacts (e.g. work items linked to test cases with failing test results) will need to examine more database tables than a report against a single artifact type (e.g. work items open in the current release), and this can require larger database caches to achieve optimum performance.
Additionally, the LQE rs schema stores artifact properties in their own database tables. So, the columns that you select for display in a report can result in examining additional database tables.
Report scoping is another factor. For reports scoped to a global configuration, larger global configurations will require more memory. However, using conditions can help to reduce the memory required by allowing the database to filter out artifacts that are not relevant. Reports that return more rows will touch more parts of the database than reports that return fewer rows.
Finally, consider your overall reporting workload. Think about the number of concurrent users that might be running reports, either directly or through dashboards. Since Report Builder caches report results, what's important is the total number of active reports (and their variants). Each set of report parameters (including scope values like global configurations as well as the user running the report) results in a unique report instance, which then may touch different parts of the database.
So, if you have mostly complex reports that run with a variety of different parameters, under high concurrency, you should consider increasing the database caches to 15 or 20% of the disk size of the LQE database. Simpler reports running at lower concurrency will require less (say 5%).
Amount of indexed data
Monitoring memory usage
Because memory requirements will vary based on your specific reporting workload, we recommend that you monitor the database server so that you can adjust database cache sizes as needed. You can monitor:
- The disk size of the LQE rs database
- On Oracle, how much of the tables and indexes are resident in memory
- The disk sizes of the individual tables and indexes in the LQE rs database
- Key operating system metrics (disk utilization, CPU utilization)
- On Oracle, the number of disk reads per second (from an AWR report)
Oracle: What's resident in memory
To see which tables and indexes are cached in memory on Oracle, run the following SQL command:
set pagesize 0
COLUMN OBJECT_NAME FORMAT A40
COLUMN NUMBER_OF_BLOCKS FORMAT 999,999,999,999
COLUMN OBJECT_TYPE FORMAT a10
SELECT o.OBJECT_NAME, o.object_type, COUNT(*) NUMBER_OF_BLOCKS FROM DBA_OBJECTS o, V$BH bh WHERE o.DATA_OBJECT_ID = bh.OBJD AND o.OWNER = 'RITRS' GROUP BY o.OBJECT_NAME, o.object_type ORDER BY COUNT(*);
This lists the number of blocks of each table or index that are resident in memory. Multiply this by the block size of the database to convert to bytes.
Some example output is shown below. Here, the REQUIREMENT table has 1,683,362 blocks in memory, and the block size for this Oracle instance is 8k. So, 13G of data from the REQUIREMENT table is loaded into the buffer cache.
RESOURCE_PROPERTY TABLE 580,249
WORK_ITEM TABLE 637,868
REQUIREMENT TABLE 1,683,362
Size of the LQE rs database
On DB2, you can get the size of the RS database by running the following command:
db2 "call GET_DBSIZE_INFO (?,?,?,1)"
Value of output parameters
--------------------------
Parameter Name : SNAPSHOTTIMESTAMP
Parameter Value : 2023-03-23-12.42.05.471605
Parameter Name : DATABASESIZE
Parameter Value : 665232097280
Parameter Name : DATABASECAPACITY
Parameter Value : 2242420260864
The DATABASESIZE parameter is the size of the database in bytes.
On Oracle, run the following script to get the sizes of all tablespaces on the Oracle instance:
set pagesize 0
col "Tablespace" for a22
col "Used MB" for 99,999,999
select df.tablespace_name "Tablespace",
totalusedspace "Used MB"
from
(select tablespace_name,
round(sum(bytes) / 1048576) TotalSpace
from dba_data_files
group by tablespace_name) df,
(select round(sum(bytes)/(1024*1024)) totalusedspace, tablespace_name
from dba_segments
group by tablespace_name) tu
where df.tablespace_name = tu.tablespace_name ;
The output of that script will be similar to this:
Tablespace Used MB
SYSTEM 1006
SYSAUX 37588
VNF_IDX 380097
VNF_32K 332177
LQE 1477
UNDOTBS1 172
USERS 32111
VSTR_32K 8
VSTR_IDX 11
To get the LQE rs database size, add up the "used MB" values for the VNF_IDX and VNF_32K tablespaces.
Getting table and index sizes from Oracle
On Oracle, you can get the sizes of the tables in the LQE rs database by running the following script. This shows table sizes in bytes.
set pagesize 0
COLUMN TABLESPACE_NAME FORMAT A10
COLUMN SEGMENT_NAME FORMAT A45
SELECT DS.TABLESPACE_NAME, SEGMENT_NAME, DS.BYTES
FROM DBA_SEGMENTS DS
WHERE (DS.TABLESPACE_NAME = 'VNF_32K') and SEGMENT_NAME IN (SELECT TABLE_NAME FROM DBA_TABLES)
ORDER BY DS.BYTES DESC;
Here is sample output taken from the IBM performance lab, showing the top 5 largest tables in our test configuration:
VNF_32K URL 6.4357E+10
VNF_32K REQUIREMENT_RELATIONSHIP 4.4023E+10
VNF_32K TEST_RICH_TEXT_SECTION 3.2749E+10
VNF_32K REQUIREMENT_STRING_PROP 3.1541E+10
VNF_32K TEST_SCRIPT_STEP_RESULT 2.3488E+10
Get the sizes of the indexes by running the following script. This shows index sizes in bytes.
set pagesize 0
COLUMN TABLESPACE_NAME FORMAT A10
COLUMN SEGMENT_NAME FORMAT A50
SELECT DS.TABLESPACE_NAME, SEGMENT_NAME, DS.BYTES
FROM DBA_SEGMENTS DS
WHERE DS.TABLESPACE_NAME = 'VNF_IDX' and SEGMENT_NAME IN (SELECT INDEX_NAME FROM DBA_INDEXES)
ORDER BY DS.BYTES DESC;
Here is sample output taken from the IBM performance lab, showing the top 5 largest indexes in our test configuration:
VNF_IDX REQUIREMENT_STRING_PROP_IDX3 2.5636E+10
VNF_IDX REQUIREMENT_STRING_PROP_IDX2 1.4361E+10
VNF_IDX SELECTIONS_SELECTS_IDX2 1.4093E+10
VNF_IDX SELECTIONS_SELECTS_IDX4 1.3556E+10
VNF_IDX SELECTIONS_SELECTS_IDX3 1.2549E+10
Note that the LQE rs schema stores tables and indexes in separate tablespaces.
Getting table and index sizes from DB2
To get the sizes of individual tables on DB2, connect to the LQE rs database and then run the following SQL statement:
SELECT SUBSTR(TABSCHEMA,1,18) TABSCHEMA,SUBSTR(TABNAME,1,30) TABNAME,(DATA_OBJECT_P_SIZE + INDEX_OBJECT_P_SIZE + LONG_OBJECT_P_SIZE + LOB_OBJECT_P_SIZE + XML_OBJECT_P_SIZE) AS TOTAL_SIZE_IN_KB,(DATA_OBJECT_P_SIZE + INDEX_OBJECT_P_SIZE + LONG_OBJECT_P_SIZE + LOB_OBJECT_P_SIZE + XML_OBJECT_P_SIZE)/1024 AS TOTAL_SIZE_IN_MB, (DATA_OBJECT_P_SIZE + INDEX_OBJECT_P_SIZE + LONG_OBJECT_P_SIZE + LOB_OBJECT_P_SIZE + XML_OBJECT_P_SIZE) / (1024*1024) AS TOTAL_SIZE_IN_GB
FROM SYSIBMADM.ADMINTABINFO WHERE TABSCHEMA = 'RITRS' ORDER BY TOTAL_SIZE_IN_KB DESC;
Here is sample output taken from the IBM performance lab, showing the top 5 largest tables in our DB2 test configuration:
TABSCHEMA TABNAME TOTAL_SIZE_IN_KB TOTAL_SIZE_IN_MB TOTAL_SIZE_IN_GB
------------------ ------------------------------ -------------------- -------------------- --------------------
RITRS URL 118309888 115537 112
RITRS REQUIREMENT_RELATIONSHIP 100173824 97826 95
RITRS REQUIREMENT_STRING_PROP 56057856 54744 53
RITRS SELECTIONS_SELECTS 44907520 43855 42
RITRS VERSION_RESOURCE 37067776 36199 35
You can get the total size of all indexes defined for a table by running the following script:
SELECT distinct substr(tabname, 1, 30) as TABLE, substr(tabschema, 1, 20) as SCHEMA,
index_object_p_size as IndexSizeKB
FROM TABLE(sysproc.admin_get_index_info('T','','')) AS t where t.tabschema = 'RITRS' order by IndexSizeKB desc;
Here is sample output taken from the IBM performance lab, showing the tables that have the largest collection of indexes:
TABLE SCHEMA INDEXSIZEKB
------------------------------ -------------------- --------------------
URL RITRS 55036928
REQUIREMENT_RELATIONSHIP RITRS 53405696
SELECTIONS_SELECTS RITRS 31303680
VERSION_RESOURCE RITRS 18227200
REQUIREMENT_STRING_PROP RITRS 18143232
Test environment and data
In these tests, we use a single test deployment of the Engineering Lifecycle management applications, and index that deploying using LQE rs (both DB2 and Oracle), as well as indexing with the Jena-based LQE. This allows for comparison of reports (both performance and content) across the 3 LQEs. The deployment topology is shown below:
The servers in the topology are all physical servers. The hardware specifications are listed below.
Role |
Server |
Machine type |
Processor |
Total processors |
Memory |
Storage |
OS and version |
Proxy Server |
IBM HTTP Server and WebSphere Plugin |
IBM System x3550 M3 |
2 x Intel Xeon X5667 3.07 GHz (quad-core) |
16 |
16 GB |
RAID 5 – 279GB SAS Disk x 2 |
RHEL 7 |
DNG |
WebSphere Liberty |
IBM System x3550 M4 |
2 x Intel Xeon E5-2640 2.5GHz (six-core) |
24 |
64 GB |
RAID 5 – 279GB SAS Disk x 4 |
RHEL 7 |
JTS/GC/LDX |
WebSphere Liberty |
IBM System x3550 M4 |
2 x Intel Xeon E5-2640 2.5GHz (six-core) |
24 |
32 GB |
RAID 5 – 279GB SAS Disk x 4 |
RHEL 7 |
ETM |
WebSphere Liberty |
IBM System x3550 M4 |
2 x Intel Xeon E5-2640 2.5GHz (six-core) |
24 |
32 GB |
RAID 5 – 279GB SAS Disk x 4 |
RHEL 7 |
EWM |
WebSphere Liberty |
IBM System x3550 M4 |
2 x Intel Xeon E5-2640 2.5GHz (six-core) |
24 |
32 GB |
RAID 5 – 279GB SAS Disk x 4 |
RHEL 7 |
LQE rs Oracle/Jena |
Oracle 19c |
IBM System SR650 |
2 x Xeon Silver 4114 10C 2.2GHz (ten-core) |
40 |
768 GB |
RAID 10 – 900GB SAS Disk x 16 |
RHEL 7 |
LQE rs DB2 |
DB2 11.5 |
IBM System SR650 |
2 x Xeon Silver 4114 10C 2.2GHz (ten-core) |
40 |
768 GB |
RAID 10 – 900GB SAS Disk x 16 |
RHEL 7 |
Report Builder - DB2 |
WebSphere Liberty |
IBM System x3550 M4 |
2 x Intel Xeon E5-2640 2.5GHz (six-core) |
24 |
64 GB |
RAID 5 – 279GB SAS Disk x 4 |
RHEL 7 |
Report Builder - Oracle |
WebSphere Liberty |
IBM System x3550 M4 |
2 x Intel Xeon E5-2640 2.5GHz (six-core) |
24 |
64 GB |
RAID 5 – 279GB SAS Disk x 4 |
RHEL 7 |
Abbreviations:
- JTS = Jazz Team Server
- GC = Global configuration application
- LDX = Link index provider
- DNG = DOORS Next Generation
- EWM = Engineering Workflow Management
- ETM = Engineering Test Management
- LQE rs = Lifecycle Query Engine - Relational Store
- RHEL = Red hat Enterprise Linux
Test data
The test deployment includes the following kinds of artifacts:
- DOORS Next: 10 million requirements and 60 million links. 5 million links have link validity.
- EWM: 2 million work items
- ETM: 5 million total test artifacts
- Global configurations: 1000 total components
When indexed, the LQE database sizes are:
- Jena LQE: 1320 G
- LQE rs Oracle: 698G
- LQE rs DB2: 665G
External links: