E
dit
A
ttach
P
rintable
r11 - 2014-04-02 - 15:10:34 - Main.gcovell
You are here:
TWiki
>
Deployment Web
>
DeploymentPlanningAndDesign
>
PerformanceDatasheetsAndSizingGuidelines
>
CaseStudyComparingDataManagerETLCLM404
<div id="header-title" style="padding: 10px 15px; border-width:1px; border-style:solid; border-color:#FFD28C; background-image: url(<nop>https://jazz.net/wiki/pub/Deployment/WebPreferences/TLASE.jpg); background-size: cover; font-size:120%"> <!-- * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup, Main.TWikiDeploymentDatasheetsAuthorsGroup, Main.GrantCovell --> ---+!! Case Study: Comparing Data Manager ETL performance between CLM 4.0.3 and 4.0.4</br> %DKGRAY% Authors: Main.WangPengPeng</br> Last updated: September 9, 2013</br> Build basis: CLM 4.0.4 %ENDCOLOR%</div></sticky> <!-- Page contents top of page on right hand side in box --> <sticky><div style="float:right; border-width:1px; border-style:solid; border-color:#DFDFDF; background-color:#F6F6F6; margin:0 0 15px 15px; padding: 0 15px 0 15px;"> %TOC{title="Page contents"}% </div></sticky> <sticky><div style="margin:15px;"></sticky> ---++ Introduction The Data Manager ETL is a powerful tool that extracts, transforms, and loads CLM operational data into the data warehouse, which the CLM Reporting features can output as complex statistics and trend charts. You can use the Data Manager ETL to load historical operational data from one or many CLM servers. The Data Manager ETL has an initial load and delta loads. The initial load is the first time that all CLM data is loaded. The duration of the initial load depends on the volume of data: if the volume is large, the inital load can take up to a day or more to complete. The delta loads load incremental changes to the data. The duration of a delta load depends on the time interval between loads and the amount of change to the data during that interval. In general, the time interval between delta loads is daily or weekly. If no big changes occur during an interval, the delta load can be completed in hours. However, if there are many incremental changes or there are many CLM servers, a delta load can take much longer. In the CLM 4.0.4 release, the development team made significant improvements to the Data Manager ETL for the Change and Configuration Management (CCM) and the Requirements Management (RM) applications. This case study compares the performance between CLM 4.0.3 and CLM 4.0.4 for the CCM and RM Data Manager ETL, with identical data in the same test environment. Based on the test data, the Data Manager ETL performance improved significantly between releases. ---+++!! Disclaimer %INCLUDE{"PerformanceDatasheetDisclaimer"}% ---++ Findings The performance of the Data Manager ETL improved significantly from CLM 4.0.3 to CLM 4.0.4 based on the test data. The initial ETL load for CCM improved about 30% (100,000 work items with 2 history entries per work item). The delta load for CCM improved more than 10% (10% increment of work items and history). The initial ETL load for RM improved about 40% (400,000 requirements). The delta ETL load for RM also improved about 40% (10% increment of requirements). ---++ Topology The tests focused on a distributed server setup that is aligned with CLM [[https://jazz.net/library/article/820#Departmental_Single_APP_server_WIN_][Departmental topology D1]] (see Figure 1). Unlike the D1 topology, the IBM Tivoli Directory Server was used for user authentication in these tests. Figure 1: Departmental - Single application server, Windows/DB2 <img src="%ATTACHURLPATH%/Departmental_Single_Application_Server_Windows_DB2_640.png" alt="D1 diagram" width="70%" height="70%" /> This case study used the same test environment and same test data to test the ETL performance for CLM 4.0.3 and CLM 4.0.4. Test data was generated using automation. The test environment for the latest release was upgraded from the earlier one by using the CLM upgrade process. To create four VMs, one X3550 M3 7944J2A (at 2.67 GHz, 48 GB RAM, and 12 physical cores) was used. In the topology, four CLM applications (JTS, CCM, QM, and RM) were installed on VM1; the CLM repository was installed on VM2; the Data Manager ETL tool was installed on VM3; and the data warehouse was installed on VM4. The same software configuration was used for both CLM 4.0.3 and CLM 4.0.4. The !WebSphere Application Server was version 8.5.1, 64-bit. The database server was IBM DB2 9.7.5, 64-bit. The Rational Reporting for Development Intelligence tool was version 2.0.4. The Jazz Team Sever, CCM, QM, and RM applications co-existed in the same !WebSphere Application Server profile. The JVM setting was as follows: <verbatim> -verbose:gc -XX:+PrintGCDetails -Xverbosegclog:gc.log -Xgcpolicy:gencon -Xmx8g -Xms8g -Xmn1g -Xcompressedrefs -Xgc:preferredHeapBase=0x100000000 -XX:MaxDirectMemorySize=1g </verbatim> IBM Tivoli Directory Server was used for managing user authentication. <table class="gray-table"> <tr> <th><strong>Topology 1 (DB2)</strong></th> <th></th> <th><strong>ESX Server1 (36 GB memory)</strong></th> <th></th> </tr> <tr> <td><strong>Server</strong></td> <td>CLM Server</td> <td>DB Server</td> <td>RRDI Data Manager</td> </tr> <tr> <td><strong>Host name</strong></td> <td>CLMsvr1</td> <td>DBsvr1</td> <td>DMsvr1</td> </tr> <tr> <td><strong>CPU</strong></td> <td>4 vCPU</td> <td>4 vCPU</td> <td>2 vCPU</td> </tr> <tr> <td><strong>Memory</strong></td> <td>16 GB</td> <td>12 GB</td> <td>4 GB</td> </tr> <tr> <td><strong>Hard disk</strong></td> <td>120 GB</td> <td>100 GB</td> <td>80 GB</td> </tr> <tr> <td><strong>OS</strong></td> <td>Win2008 R2 64-bit</td> <td>Win2008 R2 64-bit</td> <td>Win2008 R2 64-bit</td> </tr> <tr> <td><strong>Configuration 1</strong></td> <td>CLM 4.0.3, WAS 8.5.1</td> <td>DB2 v9.7 fp5</td> <td>RRDI Dev Tools 2.0.3</td> </tr> <tr> <td><strong>Configuration 2</strong></td> <td>CLM 4.0.4, WAS 8.5.1</td> <td>DB2 v9.7 fp5</td> <td>RRDI Dev Tools 2.0.4</td> </tr> </table> ---+++ Data shape <table class="gray-table"> <tr> <th></th> <th><strong>Record type</strong></th> <th><strong>Initial load</strong></th> <th><strong>Delta load</strong></th> </tr> <tr> <td><strong>CCM</strong></td> <td>APT_ProjectCapacity</td> <td>1</td> <td>1</td> </tr> <tr> <td> </td> <td>APT_TeamCapacity</td> <td>0</td> <td>0</td> </tr> <tr> <td> </td> <td>Build</td> <td>0</td> <td>0</td> </tr> <tr> <td> </td> <td>Build Result</td> <td>0</td> <td>0</td> </tr> <tr> <td> </td> <td>Build Unit Test Result</td> <td>0</td> <td>0</td> </tr> <tr> <td> </td> <td>Build Unit Test Events</td> <td>0</td> <td>0</td> </tr> <tr> <td> </td> <td>Complex !CustomAttribute</td> <td>0</td> <td>0</td> </tr> <tr> <td> </td> <td>Custom Attribute</td> <td>0</td> <td>0</td> </tr> <tr> <td> </td> <td>File Classification</td> <td>3</td> <td>3</td> </tr> <tr> <td> </td> <td>First Stream Classification</td> <td>3</td> <td>3</td> </tr> <tr> <td> </td> <td>History Custom Attribute</td> <td>0</td> <td>0</td> </tr> <tr> <td> </td> <td>SCM Component</td> <td>2</td> <td>0</td> </tr> <tr> <td> </td> <td>SCM !WorkSpace</td> <td>2</td> <td>1</td> </tr> <tr> <td> </td> <td> !WorkItem</td> <td>100026</td> <td>10000</td> </tr> <tr> <td> </td> <td> !WorkItem Approval</td> <td>100000</td> <td>10000</td> </tr> <tr> <td> </td> <td> !WorkItem Dimension Approval Description</td> <td>100000</td> <td>10000</td> </tr> <tr> <td> </td> <td> !WorkItem Dimension</td> <td>3</td> <td>0</td> </tr> <tr> <td> </td> <td> !WorkItem Dimension Approval Type</td> <td>3</td> <td>0</td> </tr> <tr> <td> </td> <td> !WorkItem Dimension Category</td> <td>2</td> <td>0</td> </tr> <tr> <td> </td> <td> !WorkItem Dimension Deliverable</td> <td>0</td> <td>0</td> </tr> <tr> <td> </td> <td> !WorkItem Dimension Enumeration</td> <td>34</td> <td>0</td> </tr> <tr> <td> </td> <td> !WorkItem Dimension Resolution</td> <td>18</td> <td>0</td> </tr> <tr> <td> </td> <td>Dimension</td> <td>68</td> <td>0</td> </tr> <tr> <td> </td> <td> !WorkItem Dimension Type</td> <td>8</td> <td>0</td> </tr> <tr> <td> </td> <td> !WorkItem Hierarchy</td> <td>0</td> <td>0</td> </tr> <tr> <td></td> <td> !WorkItem History</td> <td>242926</td> <td>20100</td> </tr> <tr> <td> </td> <td> !WorkItem History Complex Custom Attribute</td> <td>0</td> <td>0</td> </tr> <tr> <td> </td> <td> !WorkItem Link</td> <td>112000</td> <td>10000</td> </tr> <tr> <td> </td> <td> !WorkItem Type Mapping</td> <td>4</td> <td>0</td> </tr> <tr> <td><strong>RM</strong></td> <td> !CrossAppLink</td> <td>0</td> <td>0</td> </tr> <tr> <td> </td> <td>Custom Attribute</td> <td>422710</td> <td>51010</td> </tr> <tr> <td> </td> <td>Requirement</td> <td>422960</td> <td>51150</td> </tr> <tr> <td> </td> <td>Collection Requirement Lookup</td> <td>1110</td> <td>21000</td> </tr> <tr> <td> </td> <td>Module Requirement Lookup</td> <td>22000</td> <td>2000</td> </tr> <tr> <td> </td> <td>Implemented BY</td> <td>100</td> <td>0</td> </tr> <tr> <td> </td> <td>Request Affected</td> <td>5988</td> <td>0</td> </tr> <tr> <td> </td> <td>Request Tracking</td> <td>0</td> <td>0</td> </tr> <tr> <td> </td> <td>REQUICOL_TESTPLAN_LOOKUP</td> <td>0</td> <td>0</td> </tr> <tr> <td> </td> <td>REQUIREMENT_TESTCASE_LOOKUP</td> <td>0</td> <td>0</td> </tr> <tr> <td> </td> <td>REQUIREMENT_HIERARCHY</td> <td>12626</td> <td>0</td> </tr> <tr> <td> </td> <td>REQUIREMENT_EXTERNAL_LINK</td> <td>0</td> <td>0</td> </tr> <tr> <td> </td> <td> !RequirementsHierarchyParent</td> <td>6184</td> <td>0</td> </tr> <tr> <td> </td> <td>Attribute Define</td> <td>10</td> <td>10</td> </tr> <tr> <td> </td> <td>Requirement Link Type</td> <td>176</td> <td>176</td> </tr> <tr> <td> </td> <td>Requirement Type</td> <td>203</td> <td>203</td> </tr> </table> ---++Results Based on the test data and test environment, the Data Manager ETL performance improved significantly. The CCM ETL improved about 30% and the RM ETL improved about 40%. ---+++ CCM Data Manager ETL performance improvement The major improvements occurred in three ETL builds: !WorkItemPreviouseHistory, !WorkItemStateHistory, and !WorkItemHistory. One improvement involved changing the Update Detection Method from "select" to "update." If the majority of records will be updated or inserted, the "update" method is most efficient. Otherwise, if fewer records need to be updated, the "select" method might be faster. As the following four figures show, the duration of the full load performance was reduced about 30% based on the test data shape. For specific ETL function, the duration of !WorkItemPreviousHistory build was reduced about 80%. The !WorkItemStateHistory build was reduced 90%. The !WorkItemHistory build was reduced 15%. <img src="%ATTACHURLPATH%/CCM_FULL_DM_ETL_DURATION.png" width="70%" height="70%" /> <img src="%ATTACHURLPATH%/CCM_DELTA_DM_ETL_DURATION.png" width="70%" height="70%" /> <img src="%ATTACHURLPATH%/CCM_FULL_DM_ETL_ThROUGHPUT.png" width="70%" height="70%" /> <img src="%ATTACHURLPATH%/CCM_DELTA_DM_ETL_THROUGHPUT.png" width="70%" height="70%" /> [Note]: The format of units in the charts for the ETL duration is HH:MM:SS. ---+++ RM Data Manager ETL performance improvement The major improvement ocurred on the way staging was introduced, which reduced the number of times an operation fetched data from the REST service. In CLM 4.0.4, the RM ETL added two temporary tables that load most of the data at once. In the previous ETL, each record type would be loaded several times. For example, the Custom Attribute, Custom Attribute Type, and Requirement records use the same REST service. Before 4.0.4, the ETL would get the data three separate times and would take about 11 hours to fetch the data from the REST service. In 4.0.4, loading to the temporary table means that the data is fetched only once from the REST service. Then, the ETL builds fetch the data from the relational temporary table in the data warehouse, so the performance is improved. Related jazz.net defects: [[https://jazz.net/jazz03/resource/itemName/com.ibm.team.workitem.WorkItem/74735][74735]], [[https://jazz.net/jazz03/resource/itemName/com.ibm.team.workitem.WorkItem/43066][43066]]. As the following two figures show, the throughput of RM Data Manager ETL improved more than 40%. (Throughput_4.0.4 - Throughput_4.0.3)/Throughput_4.0.4 <img src="%ATTACHURLPATH%/RM_FULL_DM_ETL_DURATION.png" width="70%" height="70%" /> <img src="%ATTACHURLPATH%/RM_FULL_DM_ETL_ThROUGHPUT.png" width="70%" height="70%" /> ---++ Appendix A #AppendixA <table class="gray-table"> <tbody> <tr> <th align="left" width="200"><strong>Product</strong><br></th> <th align="left" width="100"><strong>Version</strong></th> <th align="left" width="600"><strong>Highlights for configurations under test</strong></th> </tr> <tr> <td style="vertical-align: top;">IBM !WebSphere Application Server</td> <td style="vertical-align: top;">8.5.0.1</td> <td style="vertical-align: top;"><strong>JVM settings:</strong> * GC policy and arguments, max and init heap sizes: <verbatim> -verbose:gc -XX:+PrintGCDetails -Xverbosegclog:gc.log -Xgcpolicy:gencon -Xmx8g -Xms8g -Xmn1g -Xcompressedrefs -Xgc:preferredHeapBase=0x100000000 -XX:MaxDirectMemorySize=1g </verbatim> </td> </tr> <tr> <td>DB2</td> <td>DB2 9.7.5</td> <td style="vertical-align: top;"><strong>Transaction log setting of data warehouse:</strong> * Transaction log size changed to 40960 <verbatim> db2 update db cfg using LOGFILSIZ=40960 </verbatim> </td> </tr> <tr> <td>LDAP server</td> <td>IBM Tivoli Directory Server 6.3</td> <td> </td> </tr> <tr> <td>License server</td> <td> </td> <td>Hosted locally by JTS server</td> </tr> <tr> <td>Network</td> <td> </td> <td>Shared subnet within test lab</td> </tr> </tbody> </table> ---++++!! For more information * [[SizingReportCLM2012][Collaborative Lifecycle Management 2012 Sizing Report (Standard Topology E1)]] ---++++!! About the authors Main.WangPengPeng -------------------- ---+++++!! Questions and comments: * What other performance information would you like to see here? * Do you have performance scenarios to share? * Do you have scenarios that are not addressed in documentation? * Where are you having problems in performance? %COMMENT{type="below" target="PerformanceDatasheetReaderComments" button="Submit"}% %INCLUDE{"PerformanceDatasheetReaderComments"}% <sticky></div></sticky>
E
dit
|
A
ttach
|
P
rintable
|
V
iew topic
|
Backlinks:
We
b
,
A
l
l Webs
|
H
istory
: r11
<
r10
<
r9
<
r8
<
r7
|
M
ore topic actions
Deployment
Deployment web
Planning and design
Installing and upgrading
Migrating and evolving
Integrating
Administering
Monitoring
Troubleshooting
Community information and contribution guidelines
Create new topic
Topic list
Search
Advanced search
Notify
RSS
Atom
Changes
Statistics
Web preferences
NOTE: Please use the Sandbox web for testing
Status icon key:
To do
Under construction
New
Updated
Constant change
None - stable page
Smaller versions of status icons for inline text:
Copyright © by IBM and non-IBM contributing authors. All material on this collaboration platform is the property of the contributing authors.
Contributions are governed by our
Terms of Use.
Please read the following
disclaimer
.
Dashboards and work items are no longer publicly available, so some links may be invalid. We now provide similar information through other means. Learn more
here
.