Blogs about Jazz

Blogs > Jazz Team Blog >

Keeping my Jazz Server happy and my users overjoyed

I have been working with a number of customers recently, and most of my work seems to be focused on customers that are looking to rapidly expand their Jazz deployments.  They have realized some dramatic improvements in their software development teams that use the Jazz tools, and now they want to expand this capability to cover more of their software development organization.  To ensure the continued high performance and stability of their Jazz environment, I have pulled together here a few guidelines that organizations should consider following.

Use plans intelligently

This seems like it should be easy, but I have seen this pop up and cause problems.  Some of our customers like to have GIANT plans, with over 500 plan items in them.  These are “roll-up” plans, that show the status of a number of different teams or iterations.  Customers will then see long loading times, and general slow display of these plans.  It takes a long time to render plans this large in the browser, and some browsers seem to be better than others.

I like to suggest that any given plan should have no more than 200 work items in it.  It is very tough to work with a plan this large, and I cannot believe that any human can conceptually get their head around the relative priority and status of over 200 work efforts.  Instead, you should, divide and conquer.  Use the individual plans for each team, and roll up relative status on each of these plans to a dashboard, in a series of widgets, that will help show the overall health of a number of teams. Create links to indicate relationships between the plan items.

You can also have large work items at a team level, and then decompose into smaller work items that can actually be worked on at a sub-team level.  That is how we typically do our work on Jazz.net.  Some of our initiatives (like clustering) were cross platform initiatives which we represented with high level plan items (like Epics).  We then broke this down into individual stories that could be worked on in the individual product teams.

Monitor your system

I am frequently amazed at how little quantitative data people have about the performance of their systems.  Jazz deployments are no exception.  Production systems need to be monitored, you need to have a good feel for how well your systems are performing, how responsive they are, and you need to monitor the application and server logs.  How else can you expect to catch problems and issues while they are small and easy to manage?

A Jazz server needs to be monitored.  You should have some monitoring scripts in place that are able to monitor the following things:

  • CPU utilization (how busy is the CPU?)
  • Memory utilization (how much of the system memory am I using?)
  • Network throughput (how busy is the network?)
  • JVM Heap size (is my JVM starving for memory?)
  • License usage (how many users are on the system?)
  • Repository size (how quickly are our repositories growing?)

Luckily, a Jazz server allows you to get much of this information interactively, via http requests, using cURL, Python, or any decent scripting language.  I have seen some really nice implementations of this type of monitoring being done by some of our customers, and it is impressive.  It does require some commitment of time and effort to do, but it is a necessary piece of any solid Enterprise deployment of Jazz capabilities.  Would you drive your car without a speedometer, gas gauge, oil pressure gauge, an warning lights?  Much like the indicators on an automobile, tracking these metrics over time will give you a much more accurate indication of the relative health of your Jazz deployment.

Another useful thing to use to monitor your Jazz deployment is the JazzMon tool.  Read the JazzMon FAQ, and take a look at the Jazz.net forum entries on JazzMon, to get an idea of the types of information that this can provide for you.  If you decide to implement some automated monitoring of your own, it also serves as a good model for how to get this data from your Jazz application servers.

All too often Jazz deployments rely only on user reports that the system is either fantastic, or way too slow.  Conflicting reports on the performance of the Jazz applications can come from people who sit in the same office area.  Who do you believe?  Believe them both.  Some users use the Jazz applications in ways that make performance seem fantastic, doing operations that are relatively less expensive (in terms of system and network resources) than the activities being done by their co-workers.  Others do expensive operations, and often have little idea that what they are doing is putting an excessive load on the system.  Ask users what they were doing when they saw poor performance, and also ask if any of their other network reliant applications were also running slowly.  Those two pieces of information can often allow you to quickly zero in on why certain users are experiencing poor performance.  Couple this with the high level system metrics mentioned above, and isolating the root causes for performance issues becomes much more organized, productive, and successful.

Do your builds intelligently

A good example of an expensive operation is the use of automated builds, or continuous builds.  I am not advocating that you quit doing automated builds or continuous builds.  To the contrary, i think that these can be critical to the smooth operation of a high performing software development team.  You do however need to be smart about how you do these builds.  Builds by themselves are an expensive operation, they have to pull multiple objects (your source files) from the repository, and then all of these objects need to be transmitted over the network to the build machine.  If you are doing a build of something like the Android code base, this results in a LOT of network traffic, regardless of the SCM system that you are relying upon.

There are two techniques that should be used to help address this.  The first is to not do “clean” builds when doing continuous integration.  A “clean” build creates an entirely new workspace, and then populates it with the files in a particular baseline.  This retrieval of the 1000’s of files involved in a build causes a lot of file retrievals from the repository, and the subsequent transmission of ALL of these files to the build machine.  This is a VERY expensive operation (in terms of system and network resources).  The more efficient approach is to update and existing workspace, which will only retrieve and transmit the files (with 10’s or even 100’s of files) that have changed since the last build.

The second technique is to use an SCM Proxy server.  An SCM Proxy server will cache the contents of files in a proxy server, which is located near the build machines.  These cached contents can then greatly reduce the amount of data transfer involved in builds, and the load on the infrastructure.

Plan server capacity with performance in mind

It always seems like nobody wants to allocate enough resources to the Jazz servers.  These servers represent the infrastructure on which your software development capability depends.  Worrying about standing up another virtual machine that costs $20,000 a year to maintain, while your development teams have a burn rate of that much on a day to day basis, is a complete loss of perspective.  I don’t want you to go and over-allocate hardware, but keep in mind that as you do software development, the size of your repositories is only going to grow.  Your user base and number of stakeholders are also going to grow.  Planning capacity to meet the needs of your teams today, for a solution that will get deployed and be serving users for the next 5 years,  is not productive.  This kind of thinking leads to poor deployment decisions, poor performance and unhappy users.

While there is documentation of a Jazz server supporting 1500-2000 concurrent users (see article on How Many Users Will Your Team Concert Server Support?), I find that it is best to go with a lower number.  This testing was done under idealized conditions, in an isolated network, with a dedicated database server.  In addition to this, these repositories were starting from size 0, and the I/O to the database servers and the disks was optimal.  If I also factor in the fact that this is the absolute high end for users.  You want to architect your solution so it has ROOM TO GROW.

Maybe an analogy would help here.  You see elevators commonly have a rating on them, something like “Do not exceed 3500 pounds” (or 1000 Kg).  This is a max rating for the elevator.  The elevator can actually carry more weight than that, but if you do load on more weight, then you are operating outside of the safe operating zone of your particular elevator.  So if your Jazz workload is anticipated to be 2000 pounds (or 2000 Kg), then I would like to build you an elevator that is rated to handle 3500 pounds.  Why?  Because once an elevator is in the building, it is tough to replace, and impossible to rebuild.  So i want to make sure that your elevator will handle your currently anticipated load, as well as any reasonable future load.

So what I usually do is figure roughly 300 concurrent developers on a single Jazz application server (not the JTS, that is a special case), and maybe just as many concurrent casual users.  Depending on how your development team is physically located, and how often they do builds and deliveries in the SCM system, you could have a greater number of developers hosted on the server because they just are not all on the server doing Jazz related transactions at the same time.

Keep in mind that SCM operations are expensive (they consume more resources, thus fewer users per deployed application server), while work item operations are cheaper (they consume fewer resources, and thus allow more users to be hosted on the application server).  Some of the more expensive operations are builds (since you need to transfer multiple source files in a constant stream), and displaying large plans (since they retrieve multiple work items in a constant stream).  Another performance killer is frequently loaded dashboards with multiple reports on them.  Each time the dashboard is refreshed, ALL of the reports on that dashboard are reloaded.

Plan your deployment for the long term

Many times I see customers rush into deploying their Jazz solutions, without thinking about the long term ramifications of their actions.  They use poorly thought out fully qualified domain names for the server Public URI.  They deploy on undersized hardware “temporarily”.  They start deploying software on server boxes (or virtual machines), with little thought about the proximity of the resources, or the capacity needed.  They don’t do the work that is needed to properly install SSL certificates.  They do not have a plan or strategy for how to scale up, and how to add additional Jazz application servers.

Stop it!!!

You are deploying infrastructure that is going to support your organization’s software development.  This is important, and you need to minimize the risks of losing productivity due to downtime or other problems.  You need to plan ahead, take the time to do things correctly, and make sure that you are addressing the following concerns:

  • Have a plan for how your Jazz infrastructure will grow over time – Make sure that machine names are able to be informative and unique.  Be aware of network and database infrastructure, and the impact that they can have on your deployment.
  • Have a strategy for the deployment of process to new projects – Understand how your Jazz infrastructure is going to support and promote the software development processes that your organization has.  What will you do with projects that want to modify the process, or the information stored in work items?
  • Choose a Public URI – Choose a URI that is both a fully qualified domain name, and conforms to the naming standards of your organization.  Make sure that your Public URI is something that allows for growth in the future (use something like https://jazz.acme.com/ccm01, rather than something like https://zeus.acme.com/ccm).
  • Set up a test environment – You MUST have a test environment to be able to check out patches, new releases, upgrades, new integrations, process changes, and any other changes that you will be making to your production Jazz systems.
  • Deploy with a Reverse Proxy – This is a big one.  It doesn’t take much in terms of processing resources, and all it requires is a little bit of advanced planning.  It gives you the ability to move application servers in the future, the ability to hide port numbers, and all types of other flexibility with your Jazz architecture in the future.

Pay attention to your JVM Settings

One of the best kept secrets about Jazz deployments is that you can tune the JVM to give yourself optimal performance.  There will be a technote on this coming out soon, but I can cheat and give you some of this information now.  For more details, go and check out the technote.

The JVM settings can have a big impact on your performance.  Right now we are seeing some customer deployments experience “hangs”, or “pauses”, as the JVM does garbage collection.  Garbage collection is just part of the whole JVM experience, so you cannot get away from it.  You can get into the internals of garbage collection if you want, but I want to just put out what our current recommended JVM settings are.  These settings should reduce, or possibly even eliminate, any pauses from garbage collection that is occurring in your JVMs that are hosting the Jazz applications.

First of all, we strongly recommend that you use a 64-bit architecture (that is all we support with the 4.x versions of our Jazz applications, and that you have a MINIMUM of 8GB of RAM for the JTS server.  For WAS, we use a thread pool size of 500 to host Jazz.net, but we suggest that you set this to 200 for your own Jazz servers.

  • ThreadPool size: min:200 / max: 200

Next we get to the JVM settings that you should be using as a starting point for your Jazz applications, for both WAS and Tomcat deployments.  The JVM settings below should be a starting point for you, and based on your performance and usage patterns, you may need to tune them further.

  • Solaris (and even the unsupported MacOS)

-Xmx4g -Xms4g -Xmn512m
-XX:MaxPermSize=768M -XX:ReservedCodeCacheSize=512M -XX:CodeCacheMinimumFreeSpace=2M

  • AIX

-Xmx4g -Xms4g -Xmn512m
-Xgcpolicy:gencon -Xnocompressedrefs

  • Other

-Xmx4g -Xms4g -Xmn512m
-Xgcpolicy:gencon -Xcompressedrefs
-Xgc:preferredHeapBase=0x100000000

Additionally, in clustered deployments, for the deployment manager and load balancer (proxy) we recommend:

-Xgcpolicy:gencon

Note: The setting -Xgc:preferredHeapBase is optional but recommended to avoid native memory outages. As detailed in  http://www-01.ibm.com/support/docview.wss?uid=swg1IZ73156, the JVM may allocate the heap in the low 4Gb memory space, and compressed references optimization may run out of memory for these. Another possibility to work around the issue is to disable compressed references using -Xnocompressedrefs in place of -Xcompressedrefs -Xgc:preferredHeapBase=0x100000000 . However there is considerable performance impact to using this option as it increases heap consumption by 70% and reduces performance by 20%.

If you suspect issues with garbage collection, use the following JVM setting:

-verbose:gc

Our teams have seen many customers running this in their production environments with minimal impact on the performance of the Jazz applications. This is often a first step for us when we start debugging issues that we think may be related to garbage collection.

Final thoughts

Some of our customers have done all of these things; some have done only a subset of what I mention here.  These are the best practices that I have witnessed over the past 3 years while dealing with our Jazz customers.  Know what your deployment looks like, and have a record of the servers in use and relevant information on those servers.  They should be well understood and documented by your team.  Having this up-to-date documentation will help IBM support teams quickly determine the appropriate actions to take for any problems that you might have in the future.