r60 - 2017-04-07 - 09:34:15 - TimFeeneyYou are here: TWiki >  Deployment Web > DeploymentMonitoring > CLMExpensiveScenarios

Known Expensive Scenarios

Authors: TimFeeney
Build basis: The Rational solution for Collaborative Lifecycle Management (CLM) and the Rational solution for systems and software engineering (SSE) v6.0.3.

This page aims to capture user and system scenarios across the ALM portfolio that can potentially drive relatively higher load on a Jazz application. Such scenarios can lead to server debt (such as out-of-memory errors) if run during peak times on systems that don't have sufficient spare resources available. These will be qualified or quantified to make them easier to understand. Where possible, best practices are provided that can minimize or avoid the issue altogether.

This list starts with the assumption that the applications are being run in a topology and on servers that are sized and tuned following our recommendations.

Consider:

  1. User scenarios known to ALWAYS have high demands. These tend to be computationally expensive and/or use/consume large amounts of data/memory. They can lead lead to system slow down, and on resource-constrained servers, have been known to bring down environments. Examples: Large BIRT reports, Builds, High volume imports
  2. User scenarios known to SOMETIMES have high demands. Their resource consumption or computation demands tend to be more reasonable/manageable. With appropriate system resources, system configuration guidance or usage best practices, their impact could be mitigated or avoided. Examples: Plan loading, populating a dashboard.
  3. System scenarios with potential high impact. Examples: ETL jobs, backup, online migration.

The following table summarizes the known expensive scenarios. For each scenario, the table includes a link to a description of the scenario, a unique ID (name) to be used by the applications when starting/stopping these scenarios and for log correlation and a link to known best practices. Note that any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice.

Table 1: Summary of Known Expensive Scenarios and related best practices
Product Scenario Scenario ID Best Practice
Rational DOORS Next Generation Enabling Suspect traceability DNG_Suspect link
Running RPE/RRDG reports with large result set DNG_Report link
Importing large number of requirements DNG_Import link
Exporting large number of requirements DNG_Export link
Using view query with large result set DNG_Query link
Running DNG ETL jobs to populate data warehouse DNG_ETL link
 
Rational Team Concert Comparing a repository workspace to a stream with which it is extremely out-of-date RTC_Compare_Workspace link
Annotating an extremely large text file RTC_Annotate_File link
Importing Microsoft Project plan with large number of tasks RTC_MSP_Import link
Exporting large number of work items to a Microsoft Project plan RTC_MSP_Export link
Adding many build result contributions to a build result RTC_Add_Build_Contribution link
Loading a large plan RTC_Load_Plan link
 
Rational Quality Manager Duplicating test plans with a large test hierarchy RQM_Duplicate_Test_Plan link
Bulk archiving and/or deletion of test results RQM_Bulk_ArchiveDelete link
 
Jazz Reporting Services (all reporting technologies) Running BIRT reports based on live data JRS_BIRT link
Running DCC jobs which require high storage and processing power JRS_DCC link
Performing LQE index maintenance JRS_LQE_Maintenance link
Executing high-volume and highly complex queries JRS_LQE_Query link
Refreshing a data source in Report Builder JRS_Refresh_Data_Source link

Table 2: Throttling and Logging by Scenario
Scenario ID Throttling[1] Advanced Logging[2]
DNG_Suspect NA NA
DNG_Report RDMPublishReportsRunners (3) Report name, Project Area, Component, Global/Local Configuration, Module/Requirement Set
DNG_Import ReqIF.importThreadPoolSize (1) ReqIF filename/path, Project Area, Component, Local Configuration
DNG_Export ReqIF limited by # cores on DNG server ReqIF Definition, Project Area, Component, Global/Local Configuration
DNG_Query query.client.timeout (30 seconds), SPARQL Query abort timeout (5 min), query load management Query (RQL or Module/View/Filter), Project Area, Component, Global/Local Configuration
DNG_ETL NA NA
RTC_Compare_Workspace Maximum limit of SCM workspace compare scenarios (0) Workspace, Stream, Project Area
RTC_Annotate_File 1GB file Filename/path, file size, workspace, Project Area
RTC_MSP_Import NA Filename/path, Project Area, Plan name
RTC_MSP_Export NA Project Area, Plan
RTC_Add_Build_Contribution NA Build engine, Build definition, Build result label, contribution added, Project Area
RTC_Load_Plan delayed child loading Plan name, Plan view, Project Area
RQM_Duplicate_Test_Plan NA Plan name, 'include links' setting, Source/Target Project area, Component, Global/Local configuration
RQM_Bulk_ArchiveDelete NA Project Area, Component, Configuration, Query context (i.e. description of artifacts to archive/delete)
JRS_BIRT NA Report name, Report runtime settings (as applicable) Project area, Component, Configuration
JRS_DCC job schedule NA
JRS_LQE_Maintenance NA NA
JRS_LQE_Query LQE query limits Sparql Query executed (from LQE Admin UI), Report Builder Report name (if contains advanced query)
JRS_Refresh_Data_Source metamodel.autorefresh.time (6am), metamodel.autorefresh.repeat.inminutes (720) NA

[1] Built-in properties or characteristics that limit the scenario (defaults shown in parentheses); see scenario details for further details [2] When advanced logging is enabled for a scenario, this information will be included.

Monitoring Expensive Scenarios

Starting in v6.0.3, the start and stop of the DNG, RTC and RQM expensive scenarios are captured in their respective application logs. If advanced logging is turned on (managed by the new Serviceability tab for each application), then additional information about the scenario occurrence is logged.

Further, new Java MBeans track the occurence of the scenarios as well and can be captured by enterprise monitoring tools for further analysis and trending.

Rational DOORS Next Generation

Enabling Suspect traceability

Link validity and suspect links are capabilities used to monitor related information for updates. When dependencies change (such as a customer requirement) we need a way of marking related information as "suspect" so that it can be reviewed and updated as appropriate.

Suspect links and link validity offer very similar functionality but under very different circumstances. Suspect links describe related information within a single context whereas link validity is used when Configuration Management functions are being used.

Due to the approach used for local indexing for suspect links you should be aware that there is resource overhead on the server that you should consider when deploying that functionality. Its performance is a function of the number of artifacts, concurrent users and the rate of change activity. Often it is turned on without understanding what it does, how it will be used or the cost of doing so.

Best Practices for Enabling Suspect traceability

Regular use in production during normal hours should be limited to small deployments, when the server is lightly loaded, or only when necessary, otherwise you risk driving load on the server.

Once suspect tracking is enabled, a full reindex occurs up front (see Suspicion Indexing). When you enable suspect traceability (in a project that is not enabled for configuration management), an index of change information for all link types, artifact types, and attributes is automatically built. It should not be turned off then on again as that will cause another full reindex. Instead, pause the indexing (via the Suspicion Profile Settings for Requirements Management page).

By default, the index is automatically refreshed with new changes every 60 seconds, but you can change that setting. If you lower the default refresh setting, a greater load is placed on the server. For the refresh setting, use a number that is 30 seconds or higher.

This does not apply to projects with configuration management enabled, in which case link validity, not suspicion profiles, provides the capability to note changes in requirements.

Running RPE/RRDG reports with large result set

Generation of poorly constructed RPE/RRDG reports that include a result set of over 5K requirements can be slow. Concurrent generation of PDF reports for modules including more than 5K requirements should be limited to generating only one PDF at a time. DNG has the advanced property "RDMPublishReportsRunners", which defaults to 3 and will constrain the number of RPE/RRDG reports that can be generated concurrently.

Best Practices for Running RPE/RRDG reports with large result set

Be sure to test reports in advance with near production equivalent data. Where possible, define the report with JRS. Also see the general reporting best practices.

Importing large number of requirements

Importing can be computationally expensive. At the end of an import, when all indexing occurs, the import can block other user activity. Imports of 10K requirements or less should be fine. DNG currently limits how many ReqIF imports occur at once through the ReqIF.importThreadPoolSize advanced property (defaults to 1). If the ReqIF import thread limit is reached you can still submit additional import requests. There will be a status message "Waiting for the completion of {N} previously submitted ReqIF Import Tasks."; you can close the import dialog to let this continue in the background.

Best Practices for Importing large number of requirements

For large (10K or greater) imports, recommend importing during off hours or when the system is lightly loaded.

Exporting large number of requirements

Export is less of a problem with CSV/ReqIF exports. Note that the number of concurrent ReqIF exports is limited to the number of virtual cores on the DNG server. Similar to the ReqIF import, when the export limit is reached, the user will get a status message and can let the export continue in the background. Word/PDF can be problem and should be limited to one at a time when exporting >10K requirements.

Best Practices for Exporting large number of requirements

Recommend exporting during off hours for large (10K or greater) exports to Word/PDF.

Using view query with large result set

When browsing artifacts and modules in DNG, it is done in the context of a view using a query (based on filter settings) to populate the contents of the view. Large view queries resulting in 10K or more requirements can be expensive, especially across multiple folders or filtered based on strings, dates or links. Traceability queries are the more expensive patterns. DNG has the Advanced Property 'query.client.timeout' to limit the run time of view queries; it defaults to 30 seconds. There is a second property "SPARQL Query abort timeout (in ms)", which defaults to 5 minutes, that will limit most queries not limited by 'query.client.timeout' (e.g., loading folder structure). Some query scenarios not governed by these timeouts include TRS, Suspect indexing (when the suspect data is deleted upon "untracking" or rebuilding the index), Building the type system, Recent feeds (e.g. comments, requirements). As of 6.0.3, DNG includes a query load management mechanism to proactively manage and monitor the load resulting from view queries.

Running DNG ETL jobs to populate data warehouse

This is a more of a system initiated operation and is a function of repository size and amount of change. Generally for repository sizes >100K-200K it is best to run the ETL jobs off hours. Alternatively, use DCC to populate the data warehouse. This doesn't apply to configuration management enabled projects that populate the LQE database instead of the data warehouse.

Best Practices for Running DNG ETL jobs to populate data warehouse

For large repository sizes, containing >100K-200K artifacts, it is best to run the ETL jobs off hours.

Rational Team Concert

Comparing a repository workspace to a stream with which it is extremely out-of-date

This could cause server issues if a large number of these types of comparisons happen concurrently. As of 5.0.1, a server property may be set that limits the number of comparisons that can happen at the same time (Maximum limit of SCM workspace compare scenarios). This value defaults to 0 which allows unlimited compares. When limited, the compare operations are queued until other compares finish. The user will see a slower compare or, in the extreme case where the thread waits too long, the compare will fail after 15 minutes (not configurable).

Best Practices for Comparing a repository workspace to a stream with which it is extremely out-of-date

When a workspace compare is performed, the service IFileSystemService#compareWorkspace will be called. Occurrences of these calls will appear in the active services page of the CCM application. Should there be a large number of calls appearing at once, you may want to limit the number of workspace compares by setting the "Maximum limit of SCM workspace compare scenarios" advanced property to something on the order of 30.

Annotating an extremely large text file

This is a point in time limitation in 6.0.2 that is expected to be addressed in a future release. See OutOfMemoryError when trying to annotate a 2G file. The expected default of 1MB may be adjusted as further testing is performed. The issue has been observed when annotating files with size in the multi-gigabyte range. Since the annotate operation puts all contents, file, history, etc, in memory, large files, especially with large history, can require significant memory.

Importing Microsoft Project plan with large number of tasks

How long an import takes depends on the quantity of items in the plan and their nested structure. For example, an import from an MS Project file containing 2000 tasks could take up to 30 minutes on the first import and 8-10 minutes on subsequent imports depending on server configuration and load. Consider also the memory demands of an import which will take approximately 100KB for each task being imported over and above the memory needed for typical RTC operation. In most cases, import of Microsoft Project plans is an infrequent scenarios, generally performed at the start of a project. However, if imports are to be a frequent occurrence, be sure that the server memory allocation has ample spare capacity. Note the numbers provided are based on testing in a non-production environment.

Best Practices for Importing Microsoft Project plan with large number of tasks

If your MS Project file contains more than 1000 tasks, we recommend you import or export during off-hours or when the server is lightly loaded.

Exporting large number of work items to a Microsoft Project plan

Similar to an import, export time and load are dependent on the size and complexity of the plan. The impact is primarily to memory on the server.

Adding many build result contributions to a build result

When a large number of contributions (compilation contributions, Junit tests, downloads, log files, links, work item references, etc) are included on a build result, because of the way they are stored (single data structure), the server could spend a lot of time marshalling and unmarshalling the persisted contributions when adding/deleting contributions. At best this is a slow running operation, however, should there be a large number of concurrent builds performing similar work (adding many build result contributions), the potential for impact to the server increases.

Best Practices for Adding many build result contributions to a build result

Keep build result contributors to a minimum. If you are using an external build tool, e.g. Build Forge, Jenkins, that is integrated with RTC, keep the overlap between build results in both tools to a minimum, where there is overlap, consider storing only in the external build tool. Large contributions should be published as links not the actual content, e.g. download files greater than 10MB should be published using the 'artifactLinkPublisher' task vs 'artifactFilePublisher'. See Publishing build results and contributions for a full list of tasks, some of which have a 'link' vs 'content' version.

Loading a large plan

The RTC plan editor provides users with great flexibility to display plans in customized and flexible configurations. In order, to provide rapid display of custom plan configurations, the RTC planning editor must fetch all the details of each work item when loading plans. Consequently, when the scope of a plan includes a large number of work items, loading of such plans can drive server load. We have greatly improved plan loading performance with each release by deferring the loading of out placed "child" work items or by allowing users to turn on and configure server side plan filtering to avoid loading work items that will never be displayed in plans.

Rational Quality Manager

Duplicating test plans with a large test hierarchy.

The impact of duplicating a test hierarchy depends on the number of items and their size, and whether you choose to copy referenced artifacts. A test plan might include multiple child test plans, each with their own test cases and test scripts, resulting in potentially thousands of artifacts, each of which could reference a large amount of content storage. While you can count the number of objects ahead of time, you cannot determine the overall memory size of the selected hierarchy. Because the duplication occurs in a single transaction, it can require a high amount of memory to complete.

See Copying huge Test Plan brings server down with Out of Memory errors

Best Practices for Duplicating test plans with a large test hierarchy.

A best practice is to not do a deep copy and only copy references to test cases, test scripts, etc. Should a deep copy be needed, break down the overall hierarchy duplication into smaller subsets. If that is not possible, it is best to perform the operation when the system is more lightly loaded and/or increase the available system memory.

An even better best practice is to move away from duplication altogether in support of reuse by clone and own. Instead, transition to use of Configuration management in the QM application.

Bulk archiving and/or deletion of test results

If you select more than 10000 artifacts to archive or delete in bulk, the operation can take a long time and might time out. See Web browser-based artifact deletes greater than around 10,000 items will fail to execute.

Best Practices for Bulk archiving and/or deletion of test results

Either select a page of results at a time or ensure that less than 10000 artifacts are included in the result set, that is, work with smaller sets of assets.

Jazz Reporting Services (all reporting technologies)

Running BIRT reports based on live data

Reports on live data run more slowly than those using the Data Warehouse or LQE, which are optimized for reporting. In addition, custom BIRT reports can be inefficient in their construction or pull large volumes of data, increasing load. Each of the applications has an advanced server property, "Maximum Record Count", that limits the number of rows a report can fetch. Any report passing that limit will fail. The default is -1 which leaves the report unconstrained. The setting should not be used as a solution to bad behaving reports. It is rather a way to discover bad behaving reports as they will fail to render when they go past the limit.

Running DCC jobs which require high storage and processing power

Most DCC jobs can run at regular intervals, obtaining a delta from the previous run. However, a few DCC jobs involve a larger amount of data, and place higher demands on storage and processing power on the DCC server. Note that given DCC shares the same data warehouse as the applications, load on the DCC processing these storage/processing intensive jobs, could affect the applications
  • Activity Fact Details (Activity History)
  • Build Fact Details (Build History)
  • File Fact Details (File History)
  • Project Management Fact Details (Project Management History)
  • Quality Management Fact Details (Quality Management History)
  • Requirement Management Fact Details (Requirement Management History)
  • Request Management Fact Details (Request Management History)
  • Task Fact Details (Task History)
  • Jazz Foundation Services - Statistics

Best Practices for Running DCC jobs which require high storage and processing power

Schedule the identified jobs to run during off-hours or when server load is light. (Note: Job names listed are based on v6.0.2; where different, the names for earlier releases is in parentheses).

Performing LQE index maintenance

Backup, compaction, re-index and addition or removal of data sources can drive load on LQE.

Best Practices for Performing LQE index maintenance

Schedule these scenarios during off-hours or when server usage is light.

Executing high-volume and highly complex queries

High-volume and highly complex queries can put a heavy load on the data source. As indicated above, ensure your reports return only what is necessary for the report consumer.

Report Builder provides an Advanced section where users can edit the queries generated by Report Builder, or write custom SQL (DW) or SPARQL (LQE) queries for a report. Inexperienced users can easily write and run an inefficient or incorrect query that could cause the data source become unresponsive.

For LQE, you can set properties in LQE's Query Service, such as the result limit (default is 3000 results) and query timeout (default is 60 seconds). LQE limits SPARQL queries based on these settings. (Note that this does not apply to SPARQL queries against metadata.)

Best Practices for Executing high-volume and highly complex queries

LQE offers a simple Query interface in its Administration UI. You can use this interface to run sample queries to discover information and improve your queries.

You can also copy SPARQL from the Advanced section in Report Builder into this interface and make small changes to debug issues. This UI can target different scopes or configurations, or all data. Be aware that queries run from this UI still impact the data source, and are subject to the LQE Query Service limits, as described above. Access to LQE data sources can be restricted.

For more information on improving LQE performance, see Monitoring and managing the performance of Lifecycle Query Engine and Improving Lifecycle Query Engine performance.

Refreshing a data source in Report Builder

Upon initiating a refresh of a data source from the Report Builder admin UI, the data source is queried for the latest metadata. This can increase demand on both LQE and RB servers and impact the performance of other running reports. This is especially true for LQE data sources, where most of the metadata must be queried (whereas most of the data warehouse metadata is hard-coded in Report Builder which is not possible to do so for LQE). Many factors affect the length of a refresh: the number of project areas, the complexity of their data model (for example, a large number of enumeration values), and the amount of change.

If a refresh is in progress, when a user accesses the Report Builder UI, a message displays and the user must wait until the refresh completes.

To better understand the data flow in/out of LQE, including refresh of data sources, reading from and populating the indices, see LQE Data Flows. Note the meta data refreshes would be included in the TRS feeds from the applications.

Best Practices for Refreshing a data source in Report Builder

Report Builder refreshes the data sources when the server starts. It also runs a background job twice daily to automatically refresh all data sources. Configure the refresh to run at times of lighter load for your organization, using the following properties in the server/conf/rs/app.properties file:
  • metamodel.autorefresh.time=6\:00AM
  • metamodel.autorefresh.repeat.inminutes=720

Administrators can refresh individual data sources on demand; be aware of the potential impact and refresh when it is less likely to affect other report users.


Related topics:

External links:

Topic attachments
I Attachment Action Size Date Who Comment
Pngpng LQEDataFlows.png manage 36.5 K 2016-05-19 - 21:19 TimFeeney Illustrates data flow in/out of LQE
Edit | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r60 < r59 < r58 < r57 < r56 | More topic actions
Deployment.CLMExpensiveScenarios moved from Deployment.CLMExpensiveOperations on 2016-05-11 - 08:46 by TimFeeney -
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Contributions are governed by our Terms of Use. Please read the following disclaimer.
Ideas, requests, problems regarding the Deployment wiki? Create a new task in the RTC Deployment wiki project