Whether new users or seasoned experts, customers using IBM Jazz products all want the same thing: They want to use the Jazz products without worrying that their deployment implementation will slow them down, and that it will keep up with them as they add users and grow. A frequent question we hear, whether it’s from a new administrator setting up Collaborative Lifecycle Management (CLM) for the first time, or an experienced administrator keeping an eye on an existing deployment , is “How many users will my environment support?” In this article, we'll answer that question by drawing on the tests executed by the Rational Performance Engineering team, the hardware and sizing models generated by IBM Global Techline (an internal pre-sales support organization), as well as our analysis of many customer deployments.
Estimating the number of users that can be supported by a given set of hardware is hard to do precisely, since so much depends on what exactly the users are doing. So, we'll provide estimates for what we consider typical workloads running against typical datasets. We'll also provide information on what most influences the performance of a deployment, so you can evaluate whether these typical workloads can be good predictors for your own deployments.
The information contained in this article is derived in part from data collected from tests performed in IBM test labs, under controlled conditions. Customer results may vary greatly. Sizing does not constitute a performance guarantee. IBM assumes no liability for actual results that differ from any estimates provided by IBM.
Later in this document, we provide estimates for the number of users that typical deployments of the CLM products can support. But to put those estimates into context, we need to discuss the factors that most influence performance, so you can understand whether the estimates will apply to your specific environments. Here are the things we consider when designing our own performance tests:
As a starting point, we assume a server will have at least 16G of total RAM (with 8G dedicated to Java) and 8 logical processors. You may need more memory or more processors, however, depending on your workload (and in some cases, you may be able able to use less). Additionally, deployments involving more than 50 active users should be deployed across multiple servers, as described in our recommended deployment topologies for the CLM and SSE products on the jazz.net Deployment wiki.
As a general rule, the number of active users that can be supported by a deployment is most strongly influenced by the number of available processors. Our own performance tests show that the throughput which a deployment can handle (in terms of requests per second) increases linearly as the number of processors increases (up to 24 processors, anyway). As the number of active users increases, the number of requests sent to the server over a given interval increases. Since these requests are handled in parallel inside the application server, you need more CPUs available to handle more parallel work. As the CPU utilization on a server nears 100%, it takes longer to process the transactions, which means that response times degrade.
The reliability of a CLM deployment is most strongly influenced by the available RAM. Without enough memory, the CLM applications can hang or crash. You'll need to allocate enough RAM to satisfy the primary consumers of memory on a CLM server, which are:
The amount of RAM available can also influence performance, in two primary ways:
These factors are behind our recommended deployment topologies for the CLM and SSE products on the jazz.net Deployment wiki. We recommend running each application on its own server, with a separate JTS server. We would additionally recommend no fewer than 8 logical processors per server and 16G of RAM (with 8G dedicated to the JVM heap). Deploying multiple systems of this class allows the CLM workload to be spread out across more processors and memory, and allows each application to run without interference from other co-located applications.
There are exceptions to 8 processor/16G recommendations, and these will be discussed later. You may need as much as 256G of RAM for large DOORS Next Generation or Lifecycle Query Engine deployments, for example. Your database servers will usually benefit from additional RAM and processors, especially as your deployment grows. Also, if you deploy fewer servers (e.g. combining Rational Team Concert, Rational Quality Manager, and Jazz Team Server on one system), those servers will need to be larger. A 4 processor/8G server will struggle with 50 active users, especially when all of the CLM applications are running on one server. Keep this in mind if you are considering using a departmental topology. You may be able to use systems with 4 processors/8G RAM for RTC workloads that only involve workitems.
You will notice that the published test reports utilize hardware (cores and RAM) which are sized generously. A well designed deployment will be able to handle peak loads and unexpected usage patterns. We customarily size environments so that they may be able to accommodate bursts of up to double the amount of expected normal load. However, we do not recommending presuming an environment can handle such extreme loads for more than 5% of their standard operational time.
The size of your repository can impact performance, and can influence the hardware requirements needed to deliver good performance. When we talk about repository size, we really mean two distinctly different things:
We recommend that you develop a model of how your users will interact with the system, so that you can have a rough idea of how the size of the deployment will evolve over time. When we do this for our own performance tests, we start by thinking about the user population, and we identify the primary roles which individuals take on as they do their jobs. We then define the operations which each role is likely to perform, and we estimate the frequency at which each operation will occur. From this you can then estimate how many artifacts are likely to be created over time.
For example, you may have a total user base of 1000 people. If half of those people are using Rational Team Concert, and if on average the RTC users create 5 work items per day, then the total number of workitems created per day would be 1000 x .5 x 5 = 2500 work items per day. Over the course of a year (assuming a 5 day work week and 50 working weeks per year), that would work out to 625,000 work items per year. You could further refine this model by thinking about the project structure and how these 625,000 work items would be distributed across RTC project areas. You could think about how long your development iterations are, and then estimate how many work items are managed in a typical iteration to estimate how big your development plans are likely to be, and how much data is likely to surface in dashboard queries.
The size and shape of your repository has an influence on several areas:
Both the full-text index and the triple-store use the operating system's file system cache to help optimize queries. The file system cache will keep the indices in memory if there is enough free RAM, but if other applications need RAM, the operating system will shrink the file system cache. You'll get the best performance out of a server with large indices if you have enough RAM to keep most of the indices in memory. As a rule of thumb, you can look at the size of the indices on disk, and estimate the total amount of RAM required by looking at the sum of: JVM max heap size + disk size of indices...and then add an additional 4G for the overhead of operating system processes.
Some CLM applications (like the Lifecycle Query Engine) use memory-mapped I/O for the triple-store, instead of the file cache. But the dynamic is similar: you need enough RAM to keep the indices in memory in order to maintain performance.
Database servers also benefit from additional RAM as repositories grow. Most databases use memory to cache information (like DB2's buffer pools), so when you have more RAM, you have the ability to cache more data (which can then speed up SQL queries). Since a single database server is usually shared between the CLM application servers, the memory demands on the database server are a functional of the combined load across all CLM application servers. If your CLM deployment will host millions of artifacts, across multiple applications like RTC, RQM, and DNG - then you should deploy database servers with 32G or more of memory.
Tuning the JVM can be complex, but we have a few recommendations:
The final topic regarding JVM memory and repository size concerns native memory. An in-depth discussion of how Java uses native memory is available on developerWorks (see http://www.ibm.com/developerworks/library/j-nativememory-linux/) . This applies to CLM because the networking support in the application servers uses direct memory buffers (which are a kind of native memory) to transfer data. It turns out that direct memory can be reclaimed by the Java garbage collector, but that reclamation is not very sophisticated. If a Java program asks for direct memory and there isn't enough, then the allocation code forces garbage collection to run by calling System.gc. Unfortunately, this is an inefficent mechanism for reclaiming Java memory, and so it can result in a long pause (the JVM stops other processing while doing garbage collection). It can also result in more frequent garbage collections cycles than would be required solely by the usage of heap memory.
One way to improve matters is to use a JVM flag to allocate enough direct memory so that the rate of System.gc calls is reduced (e.g. setting -XX:MaxDirectMemorySize=
Other references:
Although it is somewhat obvious, the way in which the user population interacts with a CLM deployment can impact performance. You should have some understanding of what people are going to be doing when considering how to size your servers. The main things to consider are:
As the number of active users go up, the CPU and memory resources needed to handle the traffic from those users goes up. Because our sizing estimates are focused on active users, it is worth exploring what we mean by that in more detail.
When we talk about users, we generally talk about registered or licensed users and a subset of concurrent or active users. Registered users refer to the number of users who are licensed and permitted to use a system. For example, a company of 1200 employees may have 800 registered users or even fewer, given that not everyone will need to use the CLM tools. Active (or concurrent) users are the number of users logged in and actually using the system at any given time. A company of 1200 employees may have 800 registered users, but only 200 are logged in and actively working at any given time. We call these 200 users concurrent and active regardless of the intensity of how they may be working.
The number of concurrent users can often be estimated from the number of registered users. In an organization located on one continent or spanning just a few time zones, the number of concurrent users can be estimated to be between 20% and 25% of the total number of registered users. In an organization which spans the globe where registered users work in many different time zones, the number of concurrent users can be estimated to be between 10% and 15% of the total number of registered users.
The next thing to consider is how busy the active users really are. In our performance testing, we typically assume that each simulated user performs an operation once per minute on average. That may seem like a low rate, but it is an average that tries to account for the times when someone is doing other things (like taking a phone call, attending a meeting, or going to lunch). The activity of any single user is going to vary greatly, so what we try to do is to model the activity of the entire user population. If you assume that there will always be 200 active users, and each active user does something once per minute (60 operations per hour), then that works out to a load on the server of 3 operations per second. Since each operation (like creating a workitem) consists of several HTTP requests, that can be as much as 30 HTTP transactions per second sent to the server for that set of 200 active users.
Trying to model your usage realistically is perhaps the most important step you can take to sizing your servers properly. It is easy to overestimate the number of active users and how busy they are. Since you'll need more powerful hardware to support heavier load, you want to be sure that your usage estimates are not too conservative (while still allowing for bursts of activity).
There are some simplifying assumptions that you can make to make sizing a bit easier. These are reasonable as first approximations which hold in most cases, although of course there are always exceptions.
The first assumption is that if you use the enterprise deployment topologies and place each application on its own server, then you don't need to worry about interactions between the applications. In other words, you can size your RTC server based on your RTC workload, and you can size your RQM server based on your RQM workload, and the total user load across the deployment is additive. You can have 250 active RTC users and 250 active RQM users (for a total of 500 users) as long as RTC and RQM are running on different servers. There is no significant coupling between workloads due to a shared Jazz Team Server or a shared database server. The performance bottlenecks usually form in the application servers.
A corollary is that is you are using a departmental deployment in which the applications are running on a single server, the workloads are coupled so that fewer users per application can be supported. If you could support 400 RTC users on a standalone server, you could only support 200 RTC users on a server which was hosting both RTC and RQM. In practice, a departmental topology often involves fewer total processors and less overall memory, so you need to lower your user expectations based on that as well. As a rule of thumb, keep the total number of active users for a departmental topology around 50 (and no more than 100).
The next simplifying assumption is that most operations are similar enough that you can treat them as identical. This means you can focus mostly on the number of users and how busy they are, and not worry too much about what exactly they are doing. There are some exceptions to this, which will be identified in the sizing sections.
That just about wraps up the discussion of the factors that influence performance. But before we get to the sizing estimations, there are a few last assumptions and caveats to call out:
Of all the Jazz products, Rational Team Concert (RTC) permits the widest variety of end-user and automated workflows. When estimating a RTC workload it is important to consider what percentage of the anticipated usage will come from work item users, SCM users and build agents or any combination of the three. Description of representative RTC workload we use in our sizing estimates can be found here.
Rational Team Concert (RTC) has three major workload types which can be used concurrently:
Our representative workloads combine these major workload types as follows:
Given this representative base hardware (configured as hypervisor): |
Model | x3550 M4 |
Processor | Intel XEON processor E5-2670v2 | |
Processor speed | 2.5 Ghz | |
And given a virtual machine (VM) of this size: | Logical cores | 12 vCPUs |
RAM | 16 GB or more | |
We estimate that this workload will perform comfortably | Approximate range of concurrent work item users: Between 400 and 600 |
There are some exceptions which lower the capacity. These include:
Given this representative base hardware (configured as hypervisor): |
Model | x3550 M4 |
Processor | Intel XEON processor E5-2670v2 | |
Processor speed | 2.5 Ghz | |
And given a virtual machine (VM) of this size: | Logical cores | 16 vCPUs |
RAM | 32 GB or more | |
We estimate that this workload will perform comfortably | Approximate range of concurrent work item users: Between 200 and 300 | |
Approximate range of concurrent SCM users: Between 100 and 200 |
The SCM workload we tested places additional demands on the RTC server, so additional memory and processors are needed in order to maintain performance. Concurrent loading of large workspaces in particular can be expensive. In our testing, we reached 100% CPU utilization on an RTC server with 4 processors, when initializing the workspaces for 100 test users (a transfer rate of nearly 3,000,000 files per hour). The response times of work item operations were temporarily degrading during the initialization period. So, we recommend that you use additional processors when combining SCM and work item operations.
Given this representative base hardware (configured as hypervisor): |
Model | x3550 M4 |
Processor | Intel XEON processor E5-2670v2 | |
Processor speed | 2.5 Ghz | |
And given a virtual machine (VM) of this size: | Logical cores | 24 vCPUs |
RAM | 48 GB or more | |
We estimate that this workload will perform comfortably | Approximate range of concurrent work item users: Between 200 and 300 | |
Approximate range of concurrent SCM users: Between 100 and 200 | ||
Approximate range of concurrent build agents (presuming all clients, servers and build agents are at same version level): Between 100 and 200 with the use of a content caching proxy server between RTC SCM and the build servers |
RDNG will perform best on the Linux platform, with DB2 as the database; there are known performance issues for deployments on Windows (for virtual systems) and on Oracle. Use fast storage (like SSD drives) if you expect to have more than 500,000 artifacts. Repository size and user load are the main factors that impact RNDG performance.
Note that artifacts should be spread out across multiple project areas, with no more than 200,000 artifacts per project area.
RDNG 6.0.2 improved performance by adopting memory-mapped I/O for accessing index data on the DNG server. The maximum user load therefore increased in 6.0.2. However, memory-mapped I/O did not reliably improve performance on virtualized Windows platforms - if you are sizing a Windows 6.0.2 virtual server, please use the 6.0.1 sizing guidance.
Use the following chart as a rough guide to RDNG server sizing for Linux/DB2 deployments.
As a general rule, you should expect that a single DNG server can handle up to 500 concurrent users. Our ongoing scalability tests use up to 1 million artifacts, but we project that 2 million artifacts is the practical limit. To support 500 users and 2 million artifacts, you will need to deploy a large server (24 logical procesors, 128G or more of RAM as well as a fast disk system like an SSD). Also keep in mind that each project area should have no more than 200,000 artifacts.
If you have fewer artifacts, you can reduce the hardware requirements. Smaller repositories will need less memory. Smaller user loads will need fewer processors. However, our minimum recommendation would be a system no smaller than 16G or RAM and 8 logical processors - but that configuration should not go past 100 concurrent users and 500,000 total artifacts.
As the number of artifacts grows beyond 500,000, you will need to add more RAM and use SSD drives. If you want to minimize the risk of performance problems with RDNG, use SSD drives from the start. As the number of artifacts grows past 1 million, consider adding an additional RDNG server and use that for new projects.
The 6.0.3 release introduced components, which allow for data to be partitioned within a project area. However, components have limited impact on scalability. You are still limited to 200,000 artifacts per project area.
Use the following chart as a rough guide to RDNG server sizing for 6.0.1 Linux/DB2 deployments:
As a general rule, you should expect that a single DNG server can handle up to 400 concurrent users. Our ongoing scalability tests use up to 1 million artifacts, but we project that 2 million artifacts is the practical limit. To support 300-400 users and 2 million artifacts, you will need to deploy a large server (24 logical procesors, 128G or more of RAM as well as a fast disk system like an SSD). If you have fewer artifacts, you can reduce the hardware requirements. Smaller repositories will need less memory. Smaller user loads will need fewer processors. However, our minimum recommendation would be a system no smaller than 16G or RAM and 8 logical processors - but that configuration should not go past 100 concurrent users and 500,000 total artifacts..
As the number of artifacts grows beyond 500,000, you will need to add more RAM and use SSD drives.
Description of representative RRC workload we use in our sizing estimates can be found here.
More information sizing and tuning can be found here:
Description of representative RQM workload we use in our sizing estimates can be found here.
Given this representative base hardware (configured as hypervisor): |
Model | x3550 M4 |
Processor | Intel XEON processor E5-2670v2 | |
Processor speed | 2.5 Ghz | |
And given a virtual machine (VM) of this size: | Logical cores | 8 vCPUs |
RAM | 16 GB or more | |
We estimate that this workload will perform comfortably | Approximate range of concurrent RQM users: Between 350 and 500 |
The material below appeared in the original CLM Sizing strategy document. I have kept in here for reference, but we have not quantified the impact of most of these factors through testing, so it is difficult to act on this information.
The performance and behavior of a CLM or SSE deployment can be influenced by many factors or dimensions. These dimensions fall into two categories: application and product related (functional dimensions) and environment or topology related (non-functional dimensions). CLM and SSE’s flexibility and performance can be undermined by unintentional functional and non-functional design choices.
It is easy to design an ideal topology which presumes zero-second latency, unlimited bandwidth and ever-increasing storage capacity, but such capabilities are impossible in real deployments where there are known and possibly unexpected constraints and limits. It is important to be mindful of the gamut of possible constraints for the various dimensions of a CLM and SSE environment.
This table collects some of the factors which may degrade performance. This table is by no means comprehensive. There may be other factors not listed here. A factor listed here may have different effect in different environments.
Dimension | Factors unique to every customer deployment which may influence performance |
---|---|
Product specific |
|
Network |
|
Storage |
|
Browser clients |
|
Desktop clients |
|
Build agents |
|
Application server |
|
Database server |
|
License server |
|
Authentication server |
|
Load balancer |
|
Business continuity |
|
Users |
|
Organizational Maturity |
|
A sizing estimate attempts to forecast the number of users which a prescribed hardware configuration may comfortably support with a maximum response time at a defined interaction rate and a typical datashape. A sizing document is often created before a product is deployed in order to estimate the anticipated hardware needs for a deployment. Sometimes the sizing document’s most immediate goal is to specify hardware for allocation or procurement. Very often the sizing document specifies a maximum or largest anticipated target number of users simply so that the corresponding hardware allocation will not be exceeded and the anticipated hardware is requested only once.
Every CLM sizing document is an estimate: The CLM products are flexible and can be deployed to meet customers’ needs; customer hardware and deployment environments are unique for each customer; customer workload and habits will be unique and can be expected to vary across the products’ deployment lifetime.
A sizing estimate represents the best-guess assessment at a given point in time. As an environment is uses and grows, it is natural that assumptions made to support the sizing estimate may change. After deploying the environment we strongly recommend monitoring the environment and re-evaluating the assumptions and conclusions made during the planning phase as part of a deployment’s maintenance strategy. It is typical that an environment will grow and mature differently than the sizing estimate may anticipate. A standard maintenance and monitoring plan should include revisiting the sizing estimate and appropriately adjusting and tuning your deployment at key milestones, if not at a yearly minimum.
Warning: Can't find topic Deployment.CLMSizingStrategyComments
I | Attachment | Action | Size | Date | Who | Comment |
---|---|---|---|---|---|---|
![]() |
601Specs.png | manage | 12.7 K | 2018-01-05 - 18:19 | VaughnRokosz | |
![]() |
602Specs.png | manage | 12.9 K | 2018-01-05 - 18:15 | VaughnRokosz | |
![]() |
sizinglifecycle.png | manage | 18.0 K | 2015-12-22 - 16:44 | VaughnRokosz |
Status icon key: