r11 - 2014-04-02 - 15:10:34 - Main.gcovellYou are here: TWiki >

Deployment Web > DeploymentPlanningAndDesign > PerformanceDatasheetsAndSizingGuidelines > CaseStudyComparingDataManagerETLCLM404

Case Study: Comparing Data Manager ETL performance between CLM 4.0.3 and 4.0.4

Authors: WangPengPeng
Last updated: September 9, 2013
Build basis: CLM 4.0.4

Page contents

Introduction
- What our tests measure
Findings
Topology
- Data shape
Results
- CCM Data Manager ETL performance improvement
- RM Data Manager ETL performance improvement
Appendix A

Introduction

The Data Manager ETL is a powerful tool that extracts, transforms, and loads CLM operational data into the data warehouse, which the CLM Reporting features can output as complex statistics and trend charts. You can use the Data Manager ETL to load historical operational data from one or many CLM servers. The Data Manager ETL has an initial load and delta loads. The initial load is the first time that all CLM data is loaded. The duration of the initial load depends on the volume of data: if the volume is large, the inital load can take up to a day or more to complete. The delta loads load incremental changes to the data. The duration of a delta load depends on the time interval between loads and the amount of change to the data during that interval. In general, the time interval between delta loads is daily or weekly. If no big changes occur during an interval, the delta load can be completed in hours. However, if there are many incremental changes or there are many CLM servers, a delta load can take much longer. In the CLM 4.0.4 release, the development team made significant improvements to the Data Manager ETL for the Change and Configuration Management (CCM) and the Requirements Management (RM) applications.

This case study compares the performance between CLM 4.0.3 and CLM 4.0.4 for the CCM and RM Data Manager ETL, with identical data in the same test environment. Based on the test data, the Data Manager ETL performance improved significantly between releases.

Disclaimer

The information in this document is distributed AS IS. The use of this information or the implementation of any of these techniques is a customer responsibility and depends on the customer’s ability to evaluate and integrate them into the customer’s operational environment. While each item may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee that the same or similar results will be obtained elsewhere. Customers attempting to adapt these techniques to their own environments do so at their own risk. Any pointers in this publication to external Web sites are provided for convenience only and do not in any manner serve as an endorsement of these Web sites. Any performance data contained in this document was determined in a controlled environment, and therefore, the results that may be obtained in other operating environments may vary significantly. Users of this document should verify the applicable data for their specific environment.

Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multi-programming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.

This testing was done as a way to compare and characterize the differences in performance between different versions of the product. The results shown here should thus be looked at as a comparison of the contrasting performance between different versions, and not as an absolute benchmark of performance.

What our tests measure

We use predominantly automated tooling such as Rational Performance Tester (RPT) to simulate a workload normally generated by client software such as the Eclipse client or web browsers. All response times listed are those measured by our automated tooling and not a client.

The diagram below describes at a very high level which aspects of the entire end-to-end experience (human end-user to server and back again) that our performance tests simulate. The tests described in this article simulate a segment of the end-to-end transaction as indicated in the middle of the diagram. Performance tests are server-side and capture response times for this segment of the transaction.

Findings

The performance of the Data Manager ETL improved significantly from CLM 4.0.3 to CLM 4.0.4 based on the test data. The initial ETL load for CCM improved about 30% (100,000 work items with 2 history entries per work item). The delta load for CCM improved more than 10% (10% increment of work items and history). The initial ETL load for RM improved about 40% (400,000 requirements). The delta ETL load for RM also improved about 40% (10% increment of requirements).

Topology

The tests focused on a distributed server setup that is aligned with CLM Departmental topology D1 (see Figure 1). Unlike the D1 topology, the IBM Tivoli Directory Server was used for user authentication in these tests.

Figure 1: Departmental - Single application server, Windows/DB2 D1 diagram

This case study used the same test environment and same test data to test the ETL performance for CLM 4.0.3 and CLM 4.0.4. Test data was generated using automation. The test environment for the latest release was upgraded from the earlier one by using the CLM upgrade process. To create four VMs, one X3550 M3 7944J2A (at 2.67 GHz, 48 GB RAM, and 12 physical cores) was used. In the topology, four CLM applications (JTS, CCM, QM, and RM) were installed on VM1; the CLM repository was installed on VM2; the Data Manager ETL tool was installed on VM3; and the data warehouse was installed on VM4.

The same software configuration was used for both CLM 4.0.3 and CLM 4.0.4. The WebSphere Application Server was version 8.5.1, 64-bit. The database server was IBM DB2 9.7.5, 64-bit. The Rational Reporting for Development Intelligence tool was version 2.0.4. The Jazz Team Sever, CCM, QM, and RM applications co-existed in the same WebSphere Application Server profile. The JVM setting was as follows:

-verbose:gc -XX:+PrintGCDetails -Xverbosegclog:gc.log -Xgcpolicy:gencon
-Xmx8g -Xms8g -Xmn1g -Xcompressedrefs -Xgc:preferredHeapBase=0x100000000
-XX:MaxDirectMemorySize=1g

IBM Tivoli Directory Server was used for managing user authentication.

Topology 1 (DB2)		ESX Server1 (36 GB memory)
Server	CLM Server	DB Server	RRDI Data Manager
Host name	CLMsvr1	DBsvr1	DMsvr1
CPU	4 vCPU	4 vCPU	2 vCPU
Memory	16 GB	12 GB	4 GB
Hard disk	120 GB	100 GB	80 GB
OS	Win2008 R2 64-bit	Win2008 R2 64-bit	Win2008 R2 64-bit
Configuration 1	CLM 4.0.3, WAS 8.5.1	DB2 v9.7 fp5	RRDI Dev Tools 2.0.3
Configuration 2	CLM 4.0.4, WAS 8.5.1	DB2 v9.7 fp5	RRDI Dev Tools 2.0.4

Data shape

	Record type	Initial load	Delta load
CCM	APT_ProjectCapacity	1	1
	APT_TeamCapacity	0	0
	Build	0	0
	Build Result	0	0
	Build Unit Test Result	0	0
	Build Unit Test Events	0	0
	Complex CustomAttribute	0	0
	Custom Attribute	0	0
	File Classification	3	3
	First Stream Classification	3	3
	History Custom Attribute	0	0
	SCM Component	2	0
	SCM WorkSpace	2	1
	WorkItem	100026	10000
	WorkItem Approval	100000	10000
	WorkItem Dimension Approval Description	100000	10000
	WorkItem Dimension	3	0
	WorkItem Dimension Approval Type	3	0
	WorkItem Dimension Category	2	0
	WorkItem Dimension Deliverable	0	0
	WorkItem Dimension Enumeration	34	0
	WorkItem Dimension Resolution	18	0
	Dimension	68	0
	WorkItem Dimension Type	8	0
	WorkItem Hierarchy	0	0
	WorkItem History	242926	20100
	WorkItem History Complex Custom Attribute	0	0
	WorkItem Link	112000	10000
	WorkItem Type Mapping	4	0
RM	CrossAppLink	0	0
	Custom Attribute	422710	51010
	Requirement	422960	51150
	Collection Requirement Lookup	1110	21000
	Module Requirement Lookup	22000	2000
	Implemented BY	100	0
	Request Affected	5988	0
	Request Tracking	0	0
	REQUICOL_TESTPLAN_LOOKUP	0	0
	REQUIREMENT_TESTCASE_LOOKUP	0	0
	REQUIREMENT_HIERARCHY	12626	0
	REQUIREMENT_EXTERNAL_LINK	0	0
	RequirementsHierarchyParent	6184	0
	Attribute Define	10	10
	Requirement Link Type	176	176
	Requirement Type	203	203

Results

Based on the test data and test environment, the Data Manager ETL performance improved significantly. The CCM ETL improved about 30% and the RM ETL improved about 40%.

CCM Data Manager ETL performance improvement

The major improvements occurred in three ETL builds: WorkItemPreviouseHistory, WorkItemStateHistory, and WorkItemHistory. One improvement involved changing the Update Detection Method from "select" to "update." If the majority of records will be updated or inserted, the "update" method is most efficient. Otherwise, if fewer records need to be updated, the "select" method might be faster.

As the following four figures show, the duration of the full load performance was reduced about 30% based on the test data shape. For specific ETL function, the duration of WorkItemPreviousHistory build was reduced about 80%. The WorkItemStateHistory build was reduced 90%. The WorkItemHistory build was reduced 15%.

[Note]: The format of units in the charts for the ETL duration is HH:MM:SS.

RM Data Manager ETL performance improvement

The major improvement ocurred on the way staging was introduced, which reduced the number of times an operation fetched data from the REST service. In CLM 4.0.4, the RM ETL added two temporary tables that load most of the data at once. In the previous ETL, each record type would be loaded several times. For example, the Custom Attribute, Custom Attribute Type, and Requirement records use the same REST service. Before 4.0.4, the ETL would get the data three separate times and would take about 11 hours to fetch the data from the REST service. In 4.0.4, loading to the temporary table means that the data is fetched only once from the REST service. Then, the ETL builds fetch the data from the relational temporary table in the data warehouse, so the performance is improved.

Related jazz.net defects: 74735, 43066.

As the following two figures show, the throughput of RM Data Manager ETL improved more than 40%. (Throughput_4.0.4 - Throughput_4.0.3)/Throughput_4.0.4

Appendix A

Product	Version	Highlights for configurations under test
IBM WebSphere Application Server	8.5.0.1	JVM settings: GC policy and arguments, max and init heap sizes: -verbose:gc -XX:+PrintGCDetails -Xverbosegclog:gc.log -Xgcpolicy:gencon -Xmx8g -Xms8g -Xmn1g -Xcompressedrefs -Xgc:preferredHeapBase=0x100000000 -XX:MaxDirectMemorySize=1g
DB2	DB2 9.7.5	Transaction log setting of data warehouse: Transaction log size changed to 40960 db2 update db cfg using LOGFILSIZ=40960
LDAP server	IBM Tivoli Directory Server 6.3
License server		Hosted locally by JTS server
Network		Shared subnet within test lab

For more information

Collaborative Lifecycle Management 2012 Sizing Report (Standard Topology E1)

About the authors

WangPengPeng

Questions and comments:

What other performance information would you like to see here?
Do you have performance scenarios to share?
Do you have scenarios that are not addressed in documentation?
Where are you having problems in performance?

Warning: Can't find topic Deployment.PerformanceDatasheetReaderComments

Deployment

Community information and contribution guidelines

Status icon key:

To do
Under construction
New
Updated
Constant change
None - stable page

Smaller versions of status icons for inline text:

Copyright © by IBM and non-IBM contributing authors. All material on this collaboration platform is the property of the contributing authors.
Contributions are governed by our Terms of Use. Please read the following disclaimer.
Dashboards and work items are no longer publicly available, so some links may be invalid. We now provide similar information through other means. Learn more here.