Jazz Community Site - Jazz Team Blog » Ensuring faster migrations between database instances, vendors, and product versions

Boris Kuschel

clm

repotools

rqm

RRC

rtc

Tue, 7 Feb 2012

7 min read

Not rated

One of the great utilities that come with the Rational solution for Collaborative Lifecycle Management products is called “repotools”. It has been used (and abused) for many purposes related to database creation, migration, upgrade, indexing, fixing, etc. To appreciate this, one can just type in repotools(.sh/.exe) –help and check out the exhaustive list of options.

One of the most useful features of repotools is that it allows you to move databases from one instance to another or even from one vendor to another as the tool extracts and loads data in a database independent format (tar ball with gzipped XML content). The commands related to this operation are repotools with the “-export” option, to get the data out of one instance, and the “-import” option to get the data into another instance. This method has also been used, on the rare occasion, for migrating between versions of Jazz Products (RQM 2.0.1.1 to RQM 3.0.1.2 for example) as opposed to the standard “-addTables” options that is usually used for this purpose. Generally speaking, we endeavor to provide the “-addTables” option for this purpose as it simply updates the previous versions schema and data with deltas to bring it up to the new version and is therefore much quicker.

Depending on the size of the repository in question, the import and export operations can take quite a long time so it is important to identify and reduce bottlenecks before attempting such an operation. The most common bottleneck is the connection between the database and the middleware. It is very important for both a repotools operation and the normal operation of the product to ensure that this connection has good throughput and low network latency. Due to the nature of the import/export bulk data transfer operations latency becomes even more an issue, so even if you haven’t experienced any pain related to it so far, it is good to ensure that it will not be an issue when running repotools. There are, of course, other bottlenecks that can occur, including disk I/O and CPU in certain situations.

Network Latency

Latency was a major contributor to a customer upgrade that was initially projected to take multiple days. Reducing the latency from 200ms to less than 10ms (by collocating), along with other optimizations and applying recent performance patches brought these down to one day. The goal should be near-zero latency between the middleware tiers.

Network latency is typically be around 100 ms to 150 ms for a gigabit Ethernet connection. This is usually enough for most databases until you get to databases that are 10’s of Gigabyte in size. In such cases it may be necessary to collocate the middleware with database (ie. No network separation) in order to eliminate network latency solely for the purposes of running repotools. It may be also necessary to do this if your network latency issue is not easily fixable in order to complete the repotools process in an acceptable amount of time.

In effect, collocation requires the installation and configuration of the middleware on the database server for the purposes of running repotools export and import. (There is an easier path. For export, you can simply zip up the middleware directory (usually JazzTeamServer) transfer it to the database server in a similar location and run repotools from there.) For import, once the tar has been imported you should run “repotools –rebuildIndices” on the “real” middleware.

CAUTION: By eliminating the network bottleneck you may expose other bottlenecks in your database system. The most probable issues in this case are disk I/O and CPU.

Disk I/O

This can be quite an exhaustive topic when it relates to configuring a relational database including memory buffer pools, partitioning of tablespaces etc. These topics are often vendor and configuration specific so I will not cover them here. Suffice it to say, you should ensure that the database server is setup up in a performant way before attempting any data intensive operation like repotools export/import. Nevertheless, we do have some sample recommendations for specific vendors.

Repotools export/import not only queries and inserts into the database but it also reads and writes content (the tar ball) to the disk. In the case were the repotools (middleware) is collocated with the database server, one has to ensure that the tar ball is located on a disk that is not being using by the database server (database files, database table spaces, etc.) If the middleware is on another machine then this is not as much an issue because the bottleneck in such cases will most likely be the network latency mentioned previously.

CPU

Repotools is relatively bound to one CPU due to its single threaded nature. This is usually not a problem as the two issues above are usually bounding factors. To determine if the CPU is an issue, run repotools export for a while and see what the effect on the CPU is. Normally the CPU that repotools is using should not be thrashing at 100%. Some spikes may occur, but usually it should be well below this.

If you see that this is the case, then the CPU is underpowered for a repotools operation and a reallocation or collocation of the middleware should be performed for this purpose (see network latency).

Other than these bottlenecks it is always good to ensure that there are no sessions on the database that are needlessly consuming time before doing a repotools operation. The best way to ensure this (there are of course other less drastic ways specific to the particular RDBMS) is to restart the instance.

Having performance issues with repotools is usually a good indicator that your overall operational configuration is not optimal as well. Having good network latency, disk and CPU performance are all conducive to a good performing system so many of the issues discussed in this article apply to those situations as well.