Resolving Extract, Transform, Load issues on the initial data load from a large Rational Team Concert environment

This page describes a usage or software configuration that may not be supported by IBM.

This paper highlights the problems Rational Engineering Services encountered while populating a large Rational Team Concert (RTC) data warehouse with an initial data load. It provides the steps taken in looking for solutions on how to handle a huge data load when performing Extract, Transform, Load (ETL) via the  Insight data manager.

The problems were specifically encountered in two “Out-of-the-Box” RTC jobs:

1) RTC_WorkitemPreviousHistory
2) RTC_HistoryCustomAttr3

Environment configuration:

  • Rational solution for Collaborative Lifecycle Managmente 4.0.2
  • Rational Insight 1.1.1.2 started with 32 bit application
  • Data manager catalog 4.0.4
  • DB2 9.5
  • Windows 2008 R2 Enterprise Service Pack 1, 64 bit 8 GB RAM
  • Cognos 10.1.1
We had created a standalone environment replica of our production using host file redirect so that production outages would not impact our ETL runs and we could reset any admin passwords on our schedule. We processed 5 years worth of RTC data, and had 996,887 workitems in this RTC01 server. To give you a sense of the magnitude, the size of this repository is around 10 times that of jazz.net. We had previously successfully processed 439,756 workitems for our RTC02 repository and 573,653 workitems for RTC03 repository.

Issues encountered during execution:

Approximately five days after starting the full data load for RTC, the ETL started failing. The table below provides a list of errors we encountered and the recommended fixes from development:


ETL Issue Error Message Suggested Fix Result Development team Involved
Failed on RTC_WorkitemPreviousHistory job. AuthenticationException: java.lang.OutOfMemoryError: ODBC driver fix was applied. Same job continued to fail. Insight
Failed on RTC_WorkitemPreviousHistory job. CRRRE9000E: Internal Java error. Jazz Authentication in XDC file. Same job continued to fail. Insight
Failed on RTC_WorkitemPreviousHistory job. CRRRE9000E: Internal Java error. Change the page size from 10 to 2 in RTC XDC file for WI History. Same job continued to fail. RTC
Failed on RTC_WorkitemPreviousHistory job. com.ibm.rational.drivers .jdbc.xml.internal. PageFetcherThread:java.lang. NullPointerException Enhanced JVM startup arguments. Same job continued to fail. Insight.
Failed on RTC_WorkitemPreviousHistory job. Database not fetching data. Development had us increase table space for 3 tables in DW Same job continued to fail. SWAT
Failed on RTC_WorkitemPreviousHistory job. CCL_ASSERT(FALSE); Enabled ODBC.log and UDA Trace. Same job continued to fail. Cognos
Failed on RTC_WorkitemPreviousHistory job. CCL_ASSERT(FALSE); Increase Insight and RTC servers RAM Same job continued to fail. SWAT
Failed on RTC_WorkitemPreviousHistory job. CCL_ASSERT(FALSE); Changed the isolation level of failing job to Read Uncommitted. Same job continued to fail. RTC
Failed on RTC_WorkitemPreviousHistory job. CCL_ASSERT(FALSE); Changed the RTC job logic by removing “Name” from the lookup. Same job continued to fail. RTC
RTC_HistoryCustomAttr3 CCL_ASSERT(FALSE); Apply windows patch: http://support.microsoft.com/kb/977332 Same job continued to fail. RTC
Failed on RTC_WorkitemPreviousHistory job. DM-DLV-0101 Delivery ‘Fact’: failed on statement UPDATE RTC XDC files page size was changed to 1000. Job failed after a longer time period. RTC
Failed on RTC_WorkitemPreviousHistory job. <Exception Name=”CCLAssertError” Error=”0″ Severity=”Fatal”><Messages>< MessageText>CCL_ASSERT(FALSE);< /MessageText></Messages> <TraceInfo><Trace Text=”rtg.cpp(200): CCLAssertError: CCL_THROW: “/>< /TraceInfo></Exception> Development provided a file to get better logs. Same job continued to fail. Cognos
Failed on RTC_WorkitemPreviousHistory job. DM-DLV-0101 Delivery ‘Fact’: failed on statement UPDATE Upgraded to Insight 1112-64 bit Continued to fail Insight
Passed RTC_WorkitemPreviousHistory job. DM-DLV-0101 Delivery ‘Fact5’: failed on statement  INSERT Applied dimension break on both failing jobs. RTC_History CustomAttr3 failed. Cognos
RTC_HistoryCustomAttr3 failed
Split the job in small chunks.
RTC
RTC_WorkitemPreviousHistory job”Delta ETL UDA-SQL-0530 Could not start a gateway thread to execute the request asynchronously Use file caching for attribute” box under the properties in the Lookup_ReqHisUID_NoSource1 Passed Cognos

Explanation of each of the above fixes:

  1. The initial suggestion by development was to increase memory allocation to the XML JDBC driver. Insight development applied a fix directly to our Insight server, the fix was applied to the ODBC driver, they replaced original file with below file: com.ibm.rational.dataservices.client_2.0.500.v20131213_1030.jar. Tracked in WI:
    https://jazz.net/jazz/resource/itemName/com.ibm.team.workitem.WorkItem/294230
  2. Development suspected that some connection resources were not being closed during ‘Form authentication’ according to the java exception stack and hence recommended we use the ‘Jazz authentication’ method in XDC file instead of Form authentication.
  3. RTC development suggested changing the page size of history in the XDC file from 10 to 2.The idea was to process less data to address the lack of memory issue.
  4. On reviewing the Java cores, Insight development thought the root cause is that ODBC driver use default heap size to start JVM rather than assigning a special initial and maximum heap size. Normally, the JVM can work properly in most scenarios, but might encounter a memory issue in a critical case.  Unfortunately, ODBC driver does not support customize JVM startup arguments, so they enhanced a version of ratlxml.dll and provided it to us. We applied it as follows:
    • Stop ETL jobs and close Data Manager
    • Open the window registry and find the ratlxml.dll location, Backup and replace with the new one.
    • Open the window registry and set the new ‘JVMArguments’, the value: -Xms64m -Xmx256m -XX:+PrintGCDetails -Xmctl
      • -Xms64m means the initial heap size is 64MB
      • -Xmx256m means the maximum heap size is 256MB
      • -XX:+PrintGCDetails and -Xmctl are used to print more information if OOM happens.
    • Restart ETL jobs and you can find the following logs in ODBC driver log:
      • 23:11:56:  0|5528 jdbc[192]: JVMArguments:-Xms64m -Xmx256m -XX:+PrintGCDetails -Xmctl
      • 23:11:56:  0|5528 jdbc[290]: JVM argument[2]:-Xms64m
      • 23:11:56:  0|5528 jdbc[290]: JVM argument[3]:-Xmx256m
      • 23:11:56:  0|5528 jdbc[290]: JVM argument[4]:-XX:+PrintGCDetails
      • 23:11:56:  0|5528 jdbc[290]: JVM argument[5]:-Xmctl
  5. Since data Manager was not fetching the data, development suggested checking the table space pool of the data warehouse to ensure it was not full and no locks were generated in the database side. Our DBA confirmed there were no issues reported in the DB logs. Insight development saw three tables had less than 60% free table space. We increased them and reran the ETL, but it failed with same errors.
  6. Cognos development started looking into this issue as we were now getting a Cognos error. They had us enable ODBC.log, UDA Trace but none returned any useful info.
  7. We monitored system resources; network connectivity etc in Insight server, DB server and RTC server all appeared to be good. We increased ETL servers RAM from 8 GB to 12 GB and RTC servers RAM from 4 GB to 8 GB.
  8. On RTC developments recommendation we changed the ‘Isolation Level’ on data source in RTC_workitemPreviousHistory job to Read Uncommitted.
  9. RTC development changed the failing job”s search logic by removing Name ( which is quite a big field and hence occupied large amount of memory) from the data lookup task. This change is also planned to be implemented in CLM 5.0.
  10. RTC development suggested applying a windows patch from Microsoft which our Insight server rejected.
    https://jazz.net/jazz/web/projects/Jazz%20Foundation#action=com.ibm.team.workitem.viewWorkItem&id=152474
  11. RTC development suggested changing the XDC files page size to 1000 instead of 2, ETL execution lasted longer than previous runs. This change was done to address possible small page size of Work item history resources causing windows to run out of client socket connections.
  12. Cognos development provided some files to be applied in Insight server to get more detailed errors in the logs. After reviewing the logs, Cognos – UDA development confirmed the issue all along was out of memory and that was not getting reported properly.
  13. Insight development suggested upgrading Insight 1.1.1.2 to 64 bit and initiate ETL by Data Movement tasks to take advantage of 64 bit processes which does not have an upper limit of 2 GB memory allocation before failing the job. We upgrade the environment and restarted the ETL, the data movement task continued to use 32 bit processes and ETL continued to fail. Cognos development explained that Cognos 10.1.1 (which is included with Insight 1.1.1.2) is 64 bit compatible but it does not have 64-bit job execution capability. This is available with Cognos 10.2.1(which is included with Insight 1.1.1.3) but still the ODBC driver will continue to be 32 bit so further upgrading was of no use.
  14. Cognos dev showed us Dimension breaks for both the failing jobs.
    http://pic.dhe.ibm.com/infocenter/cbi/v10r1m0/topic/com.ibm.swg.im.cognos.ug_ds.10.1.0.doc/ug_ds_id23424DimensionBreaks.html
    With this last change the first failing job RTC_WorkItemPreviousHistory passed.
  15. The second job RTC_HistoryCustomAttr continued to fail. To resolve this Cognos development suggested breaking the job in smaller chunks. RTC dev helped us in breaking the job in small chuncks, 3 months at a time. This helped us complete the Full ETL.

  16. Below is the issue we faced during delta ETL:

  17. The delta ETL job “RTC_HistoryCustomAttr” started failing because of memory allocation limit. This issue was resolved by checking “Use file caching for attribute” box under the properties in the Lookup_ReqHisUID_NoSource associated with this failing job.Instead of writing the lookup information in memory it wrote in temporary files which are deleted after the job has completed. This was recommended by Cognos development.

Dashboards and work items are no longer publicly available, so some links may be invalid. We now provide similar information through other means. Learn more here.
Feedback
Was this information helpful? Yes No 1 person rated this as helpful.