File Storage and Content Compression in Jazz SCM
Matt Lennon, IBM
Last updated: June 27, 2012
Build basis: Rational Team Concert 3.0.1.x, 4.0
The Jazz SCM system stores versioned file content directly in the Jazz repository database in the form of BLOBs. In order to conserve space in its content storage tables, Jazz uses several strategies, described below.
When a new version of a file is checked in to Jazz source control, the system first attempts to locate other files already in the system with identical content. Jazz generates an SHA-256 hash of the file’s contents and uses that hash as a key to locate identical content in the content storage table. If it finds a match, it stores a reference to the existing content (blob) instead of storing another copy.
If the new file’s content can’t be shared, Jazz attempts to compress the content. It uses two different compression methods: gzip and delta compression. The system compresses the file with each method in turn, then compares the results – storing whichever result yielded the most compression. If neither method yields significant compression, the file is simply stored uncompressed. The system’s compression threshold and other compression settings are controlled by these settings.
Compression and Content Type
The Jazz server applies the space-saving strategies described above to all versioned files regardless of content type – binary files as well as text files. But it goes further – applying similar strategies to non-versioned files such as work item attachments, and even certain kinds of non-file content such as build results, process definitions, and team member photos.
Jazz.net Content Storage Statistics
Because of the nature of the compression algorithms Jazz uses, the amount of compression and space savings you will observe on your own Jazz server depends on the nature of your file content and may be difficult to predict ahead of time. Some types of content such as text and source code tend to yield significant compression, while others do not – for instance JPEG files, RAR and zip archives.
However, the Jazz team’s experiences with our Jazz.net server may provide a helpful data point. Here are various content storage statistics for the Jazz.net repository as of June, 2012:
|Jazz.net Server Content Statistics|
|Number of files under source control||888,000|
|Number of file versions||3,261,000|
|Versioned file content||97 GB (uncompressed)||45 GB (compressed)||54% compression rate|
|Attachment content||35 GB (uncompressed)||22 GB (compressed)||37% compression rate|
|Build Results content||54 GB (uncompressed)||5 GB (compressed)||91% compression rate|
You can see these statistics for your own Jazz server by viewing the “Latest Metrics by Namespace” report. In the RTC Web UI, go to one of your project areas, then click “Reports > Shared Reports”, then select “Repository” in the “Select view” drop-down. The “Latest Metrics by Namespace” report is in that list.
Note: the versioned file statistics appear in the “FileItem” namespace in this report.
Configuring Compression Parameters
We suggest you do not change your Jazz server’s settings for content compression. You can view the current settings on your server from the Jazz Team Server Administration web page – click “Advanced Properties” and scroll down to the “com.ibm.team.repository.service.internal.VersionedContentService” section (there are actually several different content services listed, each with their own versions of these properties). For example:
|Versioned Content Compression – Delta Compression Minimum Ratio||75%|
|Versioned Content Compression – Delta Compression Minimum Size||2048 bytes|
|Versioned Content Compression – Delta Compression On||true|
|Versioned Content Compression – GZIP Compression On||true|
© Copyright 2012 IBM Corporation