Now that Rational Team Concert 1.0 is available, it’s time to start a series of blogs that I’ve been planning for a long time. The Jazz team has been self-hosting since the summer of 2006 and since then our infrastructure and repository have grown over time. I’d like to share with you some facts about our self-hosting environment.
Before we get to the current self-hosting numbers, let’s start from the beginning. A long, long, long time ago when Jazz was only a glimmer in our eyes, we started coding and tracking work using Bugzilla, Cruise Control, and CVS. Our teams had existing experience with these tools, so we sticked with what we knew. As development progressed we got to a point where things started to build, run, and demo. However, on a project the size of Jazz, we can’t wait for all the stars to align and self host on all components at the same time, so we used an iterative approach and started by moving off Cruise Control and onto the Jazz Build component first. Then as the other components proved themselves worthy we moved iteratively. In a development organization, self-hosting is a big step. In many ways it’s the graduation to the big leagues – it’s big time pressure, camera, lights, action! For me, this was the most stressful part of the entire project; as lead of the Source Control team, I knew that if we made mistakes and lost code or corrupted files, our entire development team could have lost weeks of valuable coding time. All things considered, it actually went pretty well, and here we are 2 years later having migrated our database countless times across model refactorings and big code changes without major data loss. We had some minor data loss, but nothing that stopped the development team as a whole.
Our first self-hosting server was a modest server class machine running Tomcat and Derby. Since our team size at that point was a lot smaller than today, the Team Concert Express-C configuration worked great to get started. The development team maintained the machine and we were in the mode of having to often run in debug mode to diagnose problems.
- CPU: 2 Intel Xeon processor 3.00 GHz
- Memory: 4 GB
- Disk: 200GB
- OS: RedHat Enterprise 4
- Application Server: Tomcat
- Database: Derby
And this was also around the time I started collecting self-hosting numbers, here is the very first page I posted on our wiki in 2006 describing some of the data in our repository after a couple months of self-hosting.
Since 2006 things have really been growing in our repository. For starters, we migrated to DB2 as soon as the team size grew and got a big performance and scalability improvements. And as you can see when we compare the August 2006 numbers with today, the repository has grown quite a lot:
- Number of folders/files in the integration stream: 40,008 (up from 11,679)
- Number of files in the repository: 697,868 states of 113,567 files (up from 121,111)
- Number of workspaces and streams: 1,276 (up from 198)
- Number of contributors with workspaces: 188 (up from 24)
- Size of the Jazz integration stream: 665MB (up from 102MB)
- Size of all file content in the repository: 25.5GB (up from 1.9GB)
But it’s not all about SCM artifacts, there are many other things in the repository. So let’s look at what we have today across all the components in the Jazz team’s self-hosting database:
General
- Total Size of the repository: 21,074,957KB (compressed) 49,739,209 KB (uncompressed)
- Users: 12,922
- SCM users: 188
- Number of links between artifacts: 350,785
- Teams: 67
- Iterations: 124
Build
- Build engines: 24
- Build definitions: 47
- Build results: 447
- Build result contributions: 15GB
SCM
- Change sets: 48,337
- Files: 113,567
- Suspended change sets: 640
- Components: 78
- Baselines: 36,707
- Snapshots: 36,451
Work Items
- Work Items: 47,203
- Work Item States: 323,929 (6.9 avg states per work item)
- Work Item Attachements: 12,718 taking 1.841G
- Work Item Queries: 2,467
- Categories: 184
- Deliverables: 84
Iteration Planning
- Iteration Plans: 266
- Iteration Plan States: 2,314 (8.7 avg states per plan)
Reports
- Reports: 62
- Report Queries: 138
Dashboards
- Team and personal dashboard: 2,113
We’ve also moved off the single server machine to a dual server setup:
- application server node (hosts Rational Team Concert server-side components including Jazz Team Server)
- CPU: 2 Intel Xeon process 3.00 GHz
- Memory: 8 GB
- Disk: 400 GB located on a external SAN RAID 5 connected using fibre channel
- OS: RedHat Enterprise 4
- Application Server: WebSphere Application Server 6.1
- DBMS: DB2 9.5
- database server node
- CPU: 2 Intel Xeon process 3.00 GHz
- Memory: 4 GB
- Disk: 200 GB RAID internal
- OS: RedHat Enterprise 4
- Database: DB2 v9.1
So what’s in your repository? You too can get these numbers from your repository by using the pre-defined Repository reports that ship as part of the product. The reports have all the data you need to track what’s stored in your Jazz repository. There are two kinds of reports, the first shows the latest snapshot of the repository and the second the historical data.
- Repository Latest Metric: These show the current footprint and counts for all items in the repository. You can filter to show a subset of the components using the parameters section at the top of the report. This can help answer the questions such as, how many files do we have in the repository and what is their current footprint.
- Historical Repository Metrics: These show the historical footprint and counts for all items in the repository. You can filter to show a subset of the components using the parameters section at the top of the report. This can help answer questions such as, how much stuff are we adding to the repository monthly or daily?
—
Jean-Michel
Jazz Source Control Team
This is the first hand information for the customer and field team to get our practice on Jazz , it will bring more confidence to the community , very good !
Cool data, thanks for sharing. Why is there a database on the application tier? Shouldn’t the data be stored on the data tier?
Is there any chance these numbers can be updated to reflect current reality? While the information here is very impressive to customers, I think the latest data would really knock their socks off!
Hi Gary, I’ve got the data but having blogged about this yet. It’s posted on our wiki:
https://jazz.net/wiki/bin/view/Main/PerformanceScalability#Selfhosting_Repository_data_sets
We took a sample on June 24th of 2009.
JM:
Sweet – thanks!
Gary