Jazz Jazz Community Blog Enterprise performance and scalability testing

If you’ve been reading some of our previous blog posts (wan performance testing using metronome, selfhosting sizing numbers, repository workspace scalability, scaling to new heights), you’ll know that performance and scalability are very important to our team. Now that we are neck deep in testing Rational Team Concert 2.0 (RTC), it’s time to share how it’s going. Before we get into the details, here’s what we are planning to deliver in the performance and scalability area for RTC 2.0:

  1. Validate that 1000 concurrent developers can be supported by an RTC server. However, the most interesting part of this work is to highlight the factors and guidance that would allow 2000 or more users given the appropriate server hardware.
  2. There will be no repository size limit in 2.0, and as a result want to push the limits in our testing and verify large repository sizes.
  3. Provide server sizing recommendations based on our results.
  4. Provide instructions and a download of our performance test infrastructure so that others can run these tests themselves.

Vertical versus horizontal scaling

The plan for 2.0 is to scale “vertically”, meaning supporting many users on a single server. The server can be split into an application and data tier, but in RTC 2.0 we won’t support clustering of the application servers. We use the term “vertical” to  mean a single/tiered server and “horizontal” to mean a clustered environment.

Why can’t we support clustering in 2.0? The short answer is that there are instances on the server of coding patterns that make clustering currently impossible without some design and code rework. Support for full clustering will be delivered in a later release, once these current limitations are removed and clustering is fully tested. But having said that, we are very confident we can support a lot of developers on a 2-tier server setup as we’ve already seen jazz.net with over 26,000 contributors and a bit under 100 developers.

As a result, RTC 2.0 is “vertically” focused, so I hope you aren’t afraid of heights!

Testing methodology

It is tempting to just start writing code to automate a typical user load, but the problem is that once the tests are run and some data is collected, how can you validate that it is really simulating 1000 users or more? There are so many usage patterns that it is hard to guess how teams will use RTC.

As a result, our first step was to establish some guidelines on what we think would be a valid load for 1000 users. Without such a model, we really have no idea how to interpret the results of our tests. The model will be both a guide to designing the tests and a validation tool during the test runs. The good news is that we can build a model based on current metrics available from our existing deployments of RTC. Our model includes the following:

  • Which service calls contribute the most to the load of a typical server? This will help us focus on automating the right tests and possibly ignoring others which don’t contribute to the load of the server. We have limited test hardware and focusing on the expensive services is a priority.
  • How much data does a typical team add to the server? How can we generate this data?
  • How many developer users do we really have on Jazz.net and how can this help us calibrate our test model?

Service Calls

As we’ve outlined in the metronome blog, we collect both client and server statistics on service calls. This means that we can take snapshots of the services on a “real” installation and collect a bunch of cool numbers. We’ve collected a significant amount of data on our In-house-deploy of RTC and our assumption is that this server usage is typical of a very active development team. So we will be using it for the basis of most of our test calibration.

The server counters can be found by pointing your browser at https://<your-rtc-server>/jazz/service/com.ibm.team.repository.service.internal.counters.ICounterContentService/. They are reset automatically when the server is re-started. Here is a sample output which includes for each call a bunch of interesting numbers such as the amount of data received, sent, and time spent in each call.

It’s not all that important that everyone understand what each service does, but what is more important is that we compare our test results with what we are seeing in production to validate our load testing with real data. We took several snapshots over the weeks and tried to include all phases of a milestone to gather different usage patterns of the server, for example at the start of a milestone there is mad flurry of development to get all the features and bug fixed and teams are creating and editing plans a lot.

Here is the data that we’ve collected and are using for our calibration. This first set of services shows the number of times a service was called in a 62 and 44 hour period. The first was during a milestone week, the second 44 hour period the week after.

The most popular service is by far IDirectWritingContentService.get(), which is called when a file is fetched from the server. In a 62 hour period there were 6.3 million calls!

Next, let’s review the services which take the longest for each individual call. It is important that our tests take these services into account as they could affect the server load. It’s interesting that of the top 20 services, 11 of them are Source Control related and 4 reports related – this in in line with the components that process the most data on the server.

The last sets of numbers are the most interesting and show the top 20 services that took the most cumulative time on the server. It’s obvious that any performance tests must account for exercising all these calls, since this is where the server spent the most time. Most fascinating is that they show that although in a 62 hour period we fetched over 6.3 million files from the server (IDirectWritingContentService.get), fetching feeds and all those cool collaboration and team central sections actually cumulatively accounted for more time (IFeedService). So we can’t forget to automate a feeding frenzy!

After reviewing all these service calls and crunching the numbers, we’ve got a testing plan. Our tests will focus mainly on the following very popular service calls:

  • IDirectWritingContentService and IFilesystemService.getFileTreeByVersionable are called during automated builds, when accepting change sets, loading attachments, and loading repository workspaces. This is the heavy hitter in both number of times called and cummulative time. Note that in RTC 2.0 M3D1 we added another service IVersionableContent.GET() for files, so consider this as part of the file fetching counts.
  • IFeedService is called when fetching RSS feeds and events fromthe server. This drives Team Central, the Feeds view, and the dashboards.
  • ITeamBuildService is called during an automated build when status, results, and downloads are uploaded.
  • IScmService.compareWorkspaces/refreshWorkspaces/interpretChanges are called when refreshing the Pending Changes view, browsing change sets, and comparing streams, repository workspaces, and snapshots.
  • IDirectWritingContentService.put() is called when checking in files or uploading work item and process attachements to the server.
  • IQueryService is called when creating and running queries.
  • IReportRestService is called when fetching reports and report parameters in the dashboards and rich client.
  • IScmService.acceptCombined() is called when accepting change sets.

With a little bit of 5th grade math and some spreadsheet wizardry by our very own Dave Schlegel, we take these numbers and extrapolate how much we would expect from X more users and validate against our test runs. As an example, here is the table that shows the count targets normalized to a 750 developer run. For example, after running a 750 developer test for 1 hour, we should be seeing 960K file fetches. This means that when the tests are run, we will validate that we are seeing at least this much activity from the top 20 services. We repeat this process with the other dimensions.

Data sets

Since 2008, we have been tracking database growth of our repository.  Over the last 9 months of development, the uncompressed size of the jazz.net repository has grown by 24.8 GB.  It is important to note that actual storage growth is less than 1 GB per month over this period (due to compression – see actual data below).

  • July 2008
    • 21,074,957 KB (compressed)
    • 49,739,209 KB (uncompressed)
  • March 2009
    • 29,194,827 KB (compressed)
    • 74,586,992 KB (uncompressed)

It’s interesting to note that in this time frame we doubled the amount of files added to Source Control.

While executing our automated tests, we need to validate that enough data is being added to the repository.  We again looked at our jazz.net deployment, as an example of an extremely high load scenario with activities like checking in files to support over 13 languages and the addition of a significant amount of server code in support of the Jazz Foundation project.  With this amount of activity within our jazz.net environment, we measured a growth rate of over 50MB/day  (uncompressed data) for 80 developers.  Our estimate for 1000 developers would be (roughly 10x) or 500MB/day with a target total size repository of 800GB!

As for the individual artifacts, these are some of our current testing targets. It’s not saying that we can’t support more, but it’s what we are trying to test.

Artifacts 80 Users 1000 Users
Projects 10 100
Teams 100 1000
Work Items 80, 000 1, 000, 000
Files 250,000 2,500,000
Change sets 72,000 720,000
Concurrent builds 15 150

Calculating active developers

A key factor in calibrating our expected test results was to calculate how many developers were active on each system we collected stats from. Luckily for us, there is a nifty feature in the Administrators page of the RTC web UI that provides a floating license report.

As shown below, for jazz.net there is a small blip each day where we measured a peak of around 80 active developers. Since our team is a geographically distributed team, we have not seen consistent peak usage rates of 80+ users for long periods during each day. For our calibration purposes, we decided to use 80 developers as the number of active users.

What’s next

We will be posting results of these tests over the next weeks and months. More details about the hardware and databases we tested, and much more. So go download Beta1 with the unlimited trial server key and help us scale up RTC!

Jean-Michel Lemieux
Team Concert PMC
Jazz Source Control Lead

1 Star2 Stars3 Stars4 Stars5 Stars (8 votes, average: 4.50 out of 5)
Loading...
2 Comments
  1. Sreenivasan Rajagopal April 9, 2009 @ 10:14 am

    An excellent article on Rational Team Concert performance and scalability , backed up with solid data set.Congrats Team !

  2. Brian Wolfe May 20, 2009 @ 7:54 am

    Good to see you working on this.
    You will probably want to define your users as role based. In an organization as large as 1000 developers, there will be a large cross section of user types and you may want to define your personas to accomodate the patterns of the roles. product managers, documentation, dedicated testers, coders, etc will contribute to the load in differnt ways. Also, the number of builds being driven and the number of defined dev lines may well be significantly larger.

You must be logged in to post a comment.