This situation is to help determine both cause and resolution where ETL (Extract, Transform, Load) processes take significantly longer than expected to run.
Introduction to ETLs
In order to understand how running ETLs can cause performance issues across the server, you should first understand what the ETL is doing.
First, the ETL (Extract, Transform, Load) jobs are initiated as a scheduled task, run by the following service as com.ibm.team.datawarehouse.service.internal.SnapshotRunnerTask
The ETLs are run in a certain sequence, detailed in the article Running the ETLs
After the task is initiated, the ETL will extract the data from the application database, transform it, and load it into the operational data store (ODS) tables within the data warehouse database. While the ETL is active, network latency, database configuration, and JDBC connections can contribute to long running ETLs or performance issues. If the application servers and database servers are located remotely, the loading process is affected.
There are two types of ETLs which run on the server - Full Data Collection Job and an Incremental, or Delta, data collection Job. On the Data Collection Jobs page, each are denoted by two different icons under Action. Hover over the icon for a description. The Full Data Collection Jobs occur during the initial run of the data collection jobs. These jobs take significantly longer because it is gathering all data. Subsequent job runs are done using an incremental or 'delta' run where only changed data from the last job is collected. These jobs are much faster.
Once all the application ETLs have run, the JTS Star ETL is run, which will again extract, transform and load the data, this time from the ODS and stored in Jazz Team Server. Again, network latency, database configuration, and JDBC connections can contribute to performance degradation at this step.
Both sets of ETLs require system resources to run the tasks and extract and load the data on both the server hosting the Collaborative Lifecycle Management (CLM) application and the database server.
The Data Collection Component (DCC)
is available as of CLM 5.0 and higher. Deploying DCC on another system will alleviate many of the known performance constraints described below when you are using the traditional Java ETL's. If you are running the out of the box CLM ETL's, it is strongly recommended you use DCC in place of the Java ETL's. If you are using Rational Insight to execute the out of the box CLM ETL's, it is also recommended to use the Insight/DCC integration scripts to invoke the CLM DCC ETL's from your Insight Data Manager ETL
. A video is also available on this page
Initial assessment and analysis
For information on how to assess and analyze information related to long running ETLs, review the Initial Assessment and Analysis
Migrating from the data mart to the data warehouse
Despite improvements to ETL and server performance, the migration between the data mart and data warehouse is still expected to be long running. If you previously used the data mart, the first ETL run into the data warehouse will automatically migrate the data over from the data mart, which is typically a longer run time than if you were to run a full data collection job. This is due to many factors including the time it takes to gather the records, and move them between databases (where network or database latency may come into play).
Older CLM versions
CLM versions prior to 4.0.1 might have run full collection for each build. Improvements have been made to ensure that delta builds are being run (ensuring that only changes between the last build and the current are captured) as well as overall ETL performance. Ensure you are running on the latest version of CLM to take advantage of these enhancements.
After an ETL is run, the database statistics are updated for the tables. In some environments, the database statistics would cause extremely slow running ETLs and is indicated in the ETL log where total time for each build is not the sum of each of the times above (See How do I read the ETL Log Files
for more information). If your database administrator (DBA) is running UPDATE STATS by default, you can turn it off from the Jazz Team Server.
We do not recommend turning this off by default because over time the database statistics it will increase performance due to the indexing that is completed.
To turn this feature off, navigate to your Jazz Team Server Data Warehouse Connections page:
and change "Automatically update the database statistics" to false.
The RICALM.REQUEST_BASELINE table will contain information necessary for reporting. There may be large amounts of historical data, causing the table to contain more information which may no longer be needed and be larger in size. It may be advantageous to remove some of the historical data if it is no longer needed. DO NOT PROCEED with removing the historical data without being under the advisement of support.
Oracle Data Guard
When running Oracle Data Guard, performance may be slow due to the functions of Data Guard. It may be advantageous to investigate turning Oracle Data Guard off. DO NOT PROCEED with turning Oracle Data Guard off without first consulting your Database Administrator.
There are certain indices on the RICALM.REQUEST_BASELINE table which may be contributing to performance degradation during updates. It may be possible to remove these indices. DO NOT PROCEED with removing the indices without being under the advisement of support.
Possible causes and solutions
After reviewing the above information on assessment and analysis, you might find an error that correlates to the long running ETL. Navigate to one of the following pages:
Additional contributors: None
Questions and comments: