Resolving Extract, Transform, Load issues on the initial data load from a large Rational Team Concert environment
Minori Bajpai, IBM Software Group, Rational
Last updated: August 29, 2014
Build basis: Rational Team Concert 4.0.2, Insight 1.1.1.2
This page describes a usage or software configuration that may not be supported by IBM.
This paper highlights the problems Rational Engineering Services encountered while populating a large Rational Team Concert (RTC) data warehouse with an initial data load. It provides the steps taken in looking for solutions on how to handle a huge data load when performing Extract, Transform, Load (ETL) via the Insight data manager.
The problems were specifically encountered in two “Out-of-the-Box” RTC jobs:
1) RTC_WorkitemPreviousHistory
2) RTC_HistoryCustomAttr3
Environment configuration:
- Rational solution for Collaborative Lifecycle Managmente 4.0.2
- Rational Insight 1.1.1.2 started with 32 bit application
- Data manager catalog 4.0.4
- DB2 9.5
- Windows 2008 R2 Enterprise Service Pack 1, 64 bit 8 GB RAM
- Cognos 10.1.1
Issues encountered during execution:
Approximately five days after starting the full data load for RTC, the ETL started failing. The table below provides a list of errors we encountered and the recommended fixes from development:ETL Issue | Error Message | Suggested Fix | Result | Development team Involved |
---|---|---|---|---|
Failed on RTC_WorkitemPreviousHistory job. | AuthenticationException: java.lang.OutOfMemoryError: | ODBC driver fix was applied. | Same job continued to fail. | Insight |
Failed on RTC_WorkitemPreviousHistory job. | CRRRE9000E: Internal Java error. | Jazz Authentication in XDC file. | Same job continued to fail. | Insight |
Failed on RTC_WorkitemPreviousHistory job. | CRRRE9000E: Internal Java error. | Change the page size from 10 to 2 in RTC XDC file for WI History. | Same job continued to fail. | RTC |
Failed on RTC_WorkitemPreviousHistory job. | com.ibm.rational.drivers .jdbc.xml.internal. PageFetcherThread:java.lang. NullPointerException | Enhanced JVM startup arguments. | Same job continued to fail. | Insight. |
Failed on RTC_WorkitemPreviousHistory job. | Database not fetching data. | Development had us increase table space for 3 tables in DW | Same job continued to fail. | SWAT |
Failed on RTC_WorkitemPreviousHistory job. | CCL_ASSERT(FALSE); | Enabled ODBC.log and UDA Trace. | Same job continued to fail. | Cognos |
Failed on RTC_WorkitemPreviousHistory job. | CCL_ASSERT(FALSE); | Increase Insight and RTC servers RAM | Same job continued to fail. | SWAT |
Failed on RTC_WorkitemPreviousHistory job. | CCL_ASSERT(FALSE); | Changed the isolation level of failing job to Read Uncommitted. | Same job continued to fail. | RTC |
Failed on RTC_WorkitemPreviousHistory job. | CCL_ASSERT(FALSE); | Changed the RTC job logic by removing “Name” from the lookup. | Same job continued to fail. | RTC |
RTC_HistoryCustomAttr3 | CCL_ASSERT(FALSE); | Apply windows patch: http://support.microsoft.com/kb/977332 | Same job continued to fail. | RTC |
Failed on RTC_WorkitemPreviousHistory job. | DM-DLV-0101 Delivery ‘Fact’: failed on statement UPDATE | RTC XDC files page size was changed to 1000. | Job failed after a longer time period. | RTC |
Failed on RTC_WorkitemPreviousHistory job. | <Exception Name=”CCLAssertError” Error=”0″ Severity=”Fatal”><Messages>< MessageText>CCL_ASSERT(FALSE);< /MessageText></Messages> <TraceInfo><Trace Text=”rtg.cpp(200): CCLAssertError: CCL_THROW: “/>< /TraceInfo></Exception> | Development provided a file to get better logs. | Same job continued to fail. | Cognos |
Failed on RTC_WorkitemPreviousHistory job. | DM-DLV-0101 Delivery ‘Fact’: failed on statement UPDATE | Upgraded to Insight 1112-64 bit | Continued to fail | Insight |
Passed RTC_WorkitemPreviousHistory job. | DM-DLV-0101 Delivery ‘Fact5’: failed on statement INSERT | Applied dimension break on both failing jobs. | RTC_History CustomAttr3 failed. | Cognos |
RTC_HistoryCustomAttr3 failed | Split the job in small chunks. | RTC | ||
RTC_WorkitemPreviousHistory job”Delta ETL | UDA-SQL-0530 Could not start a gateway thread to execute the request asynchronously | Use file caching for attribute” box under the properties in the Lookup_ReqHisUID_NoSource1 | Passed | Cognos |
Explanation of each of the above fixes:
- The initial suggestion by development was to increase memory allocation to the XML JDBC driver. Insight development applied a fix directly to our Insight server, the fix was applied to the ODBC driver, they replaced original file with below file: com.ibm.rational.dataservices.client_2.0.500.v20131213_1030.jar. Tracked in WI:
https://jazz.net/jazz/resource/itemName/com.ibm.team.workitem.WorkItem/294230 - Development suspected that some connection resources were not being closed during ‘Form authentication’ according to the java exception stack and hence recommended we use the ‘Jazz authentication’ method in XDC file instead of Form authentication.
- RTC development suggested changing the page size of history in the XDC file from 10 to 2.The idea was to process less data to address the lack of memory issue.
- On reviewing the Java cores, Insight development thought the root cause is that ODBC driver use default heap size to start JVM rather than assigning a special initial and maximum heap size. Normally, the JVM can work properly in most scenarios, but might encounter a memory issue in a critical case. Unfortunately, ODBC driver does not support customize JVM startup arguments, so they enhanced a version of ratlxml.dll and provided it to us. We applied it as follows:
- Stop ETL jobs and close Data Manager
- Open the window registry and find the ratlxml.dll location, Backup and replace with the new one.
- Open the window registry and set the new ‘JVMArguments’, the value: -Xms64m -Xmx256m -XX:+PrintGCDetails -Xmctl
- -Xms64m means the initial heap size is 64MB
- -Xmx256m means the maximum heap size is 256MB
- -XX:+PrintGCDetails and -Xmctl are used to print more information if OOM happens.
- Restart ETL jobs and you can find the following logs in ODBC driver log:
- 23:11:56: 0|5528 jdbc[192]: JVMArguments:-Xms64m -Xmx256m -XX:+PrintGCDetails -Xmctl
- 23:11:56: 0|5528 jdbc[290]: JVM argument[2]:-Xms64m
- 23:11:56: 0|5528 jdbc[290]: JVM argument[3]:-Xmx256m
- 23:11:56: 0|5528 jdbc[290]: JVM argument[4]:-XX:+PrintGCDetails
- 23:11:56: 0|5528 jdbc[290]: JVM argument[5]:-Xmctl
- Since data Manager was not fetching the data, development suggested checking the table space pool of the data warehouse to ensure it was not full and no locks were generated in the database side. Our DBA confirmed there were no issues reported in the DB logs. Insight development saw three tables had less than 60% free table space. We increased them and reran the ETL, but it failed with same errors.
- Cognos development started looking into this issue as we were now getting a Cognos error. They had us enable ODBC.log, UDA Trace but none returned any useful info.
- We monitored system resources; network connectivity etc in Insight server, DB server and RTC server all appeared to be good. We increased ETL servers RAM from 8 GB to 12 GB and RTC servers RAM from 4 GB to 8 GB.
- On RTC developments recommendation we changed the ‘Isolation Level’ on data source in RTC_workitemPreviousHistory job to Read Uncommitted.
- RTC development changed the failing job”s search logic by removing Name ( which is quite a big field and hence occupied large amount of memory) from the data lookup task. This change is also planned to be implemented in CLM 5.0.
- RTC development suggested applying a windows patch from Microsoft which our Insight server rejected.
https://jazz.net/jazz/web/projects/Jazz%20Foundation#action=com.ibm.team.workitem.viewWorkItem&id=152474 - RTC development suggested changing the XDC files page size to 1000 instead of 2, ETL execution lasted longer than previous runs. This change was done to address possible small page size of Work item history resources causing windows to run out of client socket connections.
- Cognos development provided some files to be applied in Insight server to get more detailed errors in the logs. After reviewing the logs, Cognos – UDA development confirmed the issue all along was out of memory and that was not getting reported properly.
- Insight development suggested upgrading Insight 1.1.1.2 to 64 bit and initiate ETL by Data Movement tasks to take advantage of 64 bit processes which does not have an upper limit of 2 GB memory allocation before failing the job. We upgrade the environment and restarted the ETL, the data movement task continued to use 32 bit processes and ETL continued to fail. Cognos development explained that Cognos 10.1.1 (which is included with Insight 1.1.1.2) is 64 bit compatible but it does not have 64-bit job execution capability. This is available with Cognos 10.2.1(which is included with Insight 1.1.1.3) but still the ODBC driver will continue to be 32 bit so further upgrading was of no use.
- Cognos dev showed us Dimension breaks for both the failing jobs.
http://pic.dhe.ibm.com/infocenter/cbi/v10r1m0/topic/com.ibm.swg.im.cognos.ug_ds.10.1.0.doc/ug_ds_id23424DimensionBreaks.html
With this last change the first failing job RTC_WorkItemPreviousHistory passed. - The second job RTC_HistoryCustomAttr continued to fail. To resolve this Cognos development suggested breaking the job in smaller chunks. RTC dev helped us in breaking the job in small chuncks, 3 months at a time. This helped us complete the Full ETL.
- The delta ETL job “RTC_HistoryCustomAttr” started failing because of memory allocation limit. This issue was resolved by checking “Use file caching for attribute” box under the properties in the Lookup_ReqHisUID_NoSource associated with this failing job.Instead of writing the lookup information in memory it wrote in temporary files which are deleted after the job has completed. This was recommended by Cognos development.
Below is the issue we faced during delta ETL:
Copyright © 2014 IBM Corporation
Dashboards and work items are no longer publicly available, so some links may be invalid. We now provide similar information through other means.
Learn more here.