File Storage and Content Compression in Jazz SCM

The Jazz SCM system stores versioned file content directly in the Jazz repository database in the form of BLOBs. In order to conserve space in its content storage tables, Jazz uses several strategies, described below.

Content Sharing

When a new version of a file is checked in to Jazz source control, the system first attempts to locate other files already in the system with identical content. Jazz generates an SHA-256 hash of the file’s contents and uses that hash as a key to locate identical content in the content storage table. If it finds a match, it stores a reference to the existing content (blob) instead of storing another copy.

Content Compression

If the new file’s content can’t be shared, Jazz attempts to compress the content. It uses two different compression methods: gzip and delta compression. The system compresses the file with each method in turn, then compares the results – storing whichever result yielded the most compression. If neither method yields significant compression, the file is simply stored uncompressed. The system’s compression threshold and other compression settings are controlled by these settings.

Compression and Content Type

The Jazz server applies the space-saving strategies described above to all versioned files regardless of content type – binary files as well as text files. But it goes further – applying similar strategies to non-versioned files such as work item attachments, and even certain kinds of non-file content such as build results, process definitions, and team member photos.

Jazz.net Content Storage Statistics

Because of the nature of the compression algorithms Jazz uses, the amount of compression and space savings you will observe on your own Jazz server depends on the nature of your file content and may be difficult to predict ahead of time. Some types of content such as text and source code tend to yield significant compression, while others do not – for instance JPEG files, RAR and zip archives.

However, the Jazz team’s experiences with our Jazz.net server may provide a helpful data point. Here are various content storage statistics for the Jazz.net repository as of June, 2012:

Jazz.net Server Content Statistics
Number of files under source control 888,000
Number of file versions 3,261,000
Versioned file content 97 GB (uncompressed) 45 GB (compressed) 54% compression rate
Attachment content 35 GB (uncompressed) 22 GB (compressed) 37% compression rate
Build Results content 54 GB (uncompressed) 5 GB (compressed) 91% compression rate

You can see these statistics for your own Jazz server by viewing the “Latest Metrics by Namespace” report. In the RTC Web UI, go to one of your project areas, then click “Reports > Shared Reports”, then select “Repository” in the “Select view” drop-down. The “Latest Metrics by Namespace” report is in that list.
Note: the versioned file statistics appear in the “FileItem” namespace in this report.

Configuring Compression Parameters

We suggest you do not change your Jazz server’s settings for content compression. You can view the current settings on your server from the Jazz Team Server Administration web page – click “Advanced Properties” and scroll down to the “com.ibm.team.repository.service.internal.VersionedContentService” section (there are actually several different content services listed, each with their own versions of these properties). For example:

Property Default Value
Versioned Content Compression – Delta Compression Minimum Ratio 75%
Versioned Content Compression – Delta Compression Minimum Size 2048 bytes
Versioned Content Compression – Delta Compression On true
Versioned Content Compression – GZIP Compression On true

Feedback
Was this information helpful? Yes No 11 people rated this as helpful.