r11 - 2021-12-02 - 17:39:57 - VaughnRokoszYou are here: TWiki >  Deployment Web > DeploymentTroubleshooting > PerformanceTroubleshooting > Db2MustGather

Must-gather information for diagnosing long-running SQL statements on DB2todo.png

Authors: WilliamChatham, HongyanHuo, MadanKumarShanmugam, VaughnRokosz
Build basis: DOORS Next 7.x, ETM, EWM 6.x, 7.x

In order to diagnose a long-running SQL statement on DB2 we need to gather the following:

  • A Db2mon report.
  • The explain plan of any long running queries found in the report.
  • If possible, the explain plan with row actuals for any long-running queries
  • db2support files.

Please be aware that there is some time-sensitivity to collecting the data discussed here. You should gather the data as close to the time that the problem occurs. The detailed information for a SQL statement may not be available in the monitor or in the cursor cache. So you may have to gather this as you recreate the issue.

There is a more general must gather for performance issues for DOORS Next which should be used in conjunction with this guide.

Db2Mon report

The db2mon script is a created on top of the SQL monitoring interfaces provided by DB2 LUW It monitors database activity for a set amount of time ( default = 30 seconds ) and provides various metrics in a detailed report section as the output. The monitor report provides several valuable bits of information, including: *Database throughput *Analysis of processing time *Analysis of time spent waiting by the database *Top SQL statements by execution time

db2mon can be found from db2mon script for monitoring performance (ibm.com).

Running Db2Mon (Linux)

Db2mon needs to be run using the db2 administrator:

  • 1 cd to the directory where the db2mon is installed.
  • 2 Run "db2 connect to DB_Name" where DB_Name is the required database.
  • 3 Run "db2 flush package cache dynamic" To clear the cache.
  • 4 Run "./db2mon.sh DB_Name >> report.out".
  • 5 Repeat /db2mon.sh DB_Name >> report.out" command 3 to 5 times in a row, so that we'll have a collection of a few datapoints gathered together.

Running Db2Mon (Windows)

Download db2mon script tar file for your db2 version, say 'DB2 v11.1' and extract on the system where you will collect.

  • 1 cd to the directory where the db2mon is installed (where 'db2mon.sql' is located).
  • 2 Create a new file 'db2mon.cmd' with the following code.
set DBNAME=%1
db2 -v connect to %DBNAME%
db2 -v create bufferpool db2monbp
db2 -v create user temporary tablespace db2montmptbsp bufferpool db2monbp
db2 +c -tvf db2mon.sql
db2 -v commit work
db2 -v connect reset
db2 -v connect to %DBNAME%
db2 -v drop tablespace db2montmptbsp
db2 -v drop bufferpool db2monbp
db2 -v connect reset
 
  • 3 Open a CMD window and cd to the directory where 'db2mon.cmd' exists
  • 4 Run the following command to set the DB2 command line (DB2CLP) variable: db2cmd -i -w db2clpsetcp
  • 5 Type echo %DB2CLP% to confirm the DB2CLP variable has been set correctly, This should now show: DB20FADE
  • 6 Run the db2mon command: db2mon.cmd > report.out Example: (C:\dbmon\v111>db2mon.cmd RM702 > report.out)
  • 7 In report.out search for "REPORT STARTS HERE" this is where the report starts.

Analysing the report output

In the report.out there are 2 sections to check

  • Currently executing SQL at start of capture/Currently executing SQL at end of capture
  • Top SQL statements by execution time

Currently executing SQL at start of capture / Currently executing SQL at end of capture

Please note that there can be multiple queries captured in this section. Check the "ELAPSED_TIME_SEC", if the time is longer than a few seconds it indicates a possible query issue. The example below shows a query running for 28 seconds and has not yet completed:

================================================================
 Currently executing SQL at end of capture (non-zero metrics only)
================================================================

with mon as (select ts, count, coord_member,application_handle,uow_id,activity_id,elapsed_time_sec,total_cpu_time,rows_read,direct_reads,direct_writes,executable_id,package_name,section_number,active_sorts,active_sorts_top,active_sort_consumers,active_sort_consumers_top,sort_shrheap_allocated,sort_shrheap_top,post_threshold_sorts,post_shrthreshold_sorts,post_threshold_hash_joins,post_shrthreshold_hash_joins,post_threshold_hash_grpbys,post_threshold_olap_funcs,total_act_time,total_act_wait_time,lock_wait_time,pool_read_time,direct_read_time,direct_write_time,fcm_recv_wait_time,fcm_send_wait_time,total_extended_latch_wait_time,log_disk_wait_time,cf_wait_time,reclaim_wait_time,spacemappage_reclaim_wait_time,stmt_text from mon_current_sql_plus_end ) select t.metric "Metric", t.value "Value" from mon, table( values ('COORD_MEMBER',varchar(coord_member)), ('STMT_TEXT', cast(substr(stmt_text,1,120) as varchar(120))), ('EXECUTABLE_ID', 'x'''||hex(executable_id)||''''), ('PACKAGE_NAME', package_name || '    (Section '||cast(section_number as varchar(10))||')'), ('Partition count', varchar(count)), ('APPLICATION_HANDLE',varchar(application_handle) || '  (UOW_ID '||cast(uow_id as varchar(8))||', ACTIVITY_ID '||cast(activity_id as varchar(8))||')' ), ('ELAPSED_TIME_SEC',varchar(elapsed_time_sec)), ('TOTAL_CPU_TIME',varchar(total_cpu_time)), ('ROWS_READ',varchar(rows_read)), ('DIRECT_READS',varchar(direct_reads)), ('DIRECT_WRITES',varchar(direct_writes)), ('ACTIVE_SORTS',varchar(active_sorts)), ('ACTIVE_SORTS_TOP',varchar(active_sorts_top)), ('ACTIVE_SORT_CONSUMERS',varchar(active_sort_consumers)), ('ACTIVE_SORT_CONSUMERS_TOP',varchar(active_sort_consumers_top)), ('SORT_SHRHEAP_ALLOCATED',varchar(sort_shrheap_allocated)), ('SORT_SHRHEAP_TOP',varchar(sort_shrheap_top)), ('POST_THRESHOLD_SORTS',varchar(post_threshold_sorts)), ('POST_SHRTHRESHOLD_SORTS',varchar(post_shrthreshold_sorts)), ('POST_THRESHOLD_HASH_JOINS',varchar(post_threshold_hash_joins)), ('POST_SHRTHRESHOLD_HASH_JOINS',varchar(post_shrthreshold_hash_joins)), ('POST_THRESHOLD_HASH_GRPBYS',varchar(post_threshold_hash_grpbys)), ('POST_THRESHOLD_OLAP_FUNCS',varchar(post_threshold_olap_funcs)), ('Pct wait times', 'Total: '      || cast(case when total_act_time > 0 then smallint( (total_act_wait_time / double(total_act_time)) * 100 ) else 0 end as varchar(5)) || '   Lock: '    || cast(case when total_act_time > 0 then smallint( (lock_wait_time / double(total_act_time)) * 100 ) else 0 end as varchar(5)) || '   Pool rd: ' || cast(case when total_act_time > 0 then smallint( (pool_read_time / double(total_act_time)) * 100 ) else 0 end as varchar(5)) || '   Dir IO: '  || cast(case when total_act_time > 0 then smallint( ((direct_read_time+direct_write_time) / double(total_act_time)) * 100 ) else 0 end as varchar(5)) || '   FCM: '     || cast(case when total_act_time > 0 then smallint( ((fcm_recv_wait_time+fcm_send_wait_time) / double(total_act_time)) * 100 ) else 0 end as varchar(5)) || '   Latch: '   || cast(case when total_act_time > 0 then smallint( (total_extended_latch_wait_time / double(total_act_time)) * 100 ) else 0 end as varchar(5)) || '   Log: '     || cast(case when total_act_time > 0 then smallint( (log_disk_wait_time / double(total_act_time)) * 100 ) else 0 end as varchar(5)) || '   CF: '      || cast(case when total_act_time > 0 then smallint( (cf_wait_time / double(total_act_time)) * 100 ) else 0 end as varchar(5)) || '   Reclaim: ' || cast(case when total_act_time > 0 then smallint( ((reclaim_wait_time+spacemappage_reclaim_wait_time) / double(total_act_time)) * 100 ) else 0 end as varchar(5)) ), (cast(repeat('-',32) as varchar(32)),cast(repeat('-',120) as varchar(120))) ) as t(metric,value) where t.metric = 'COORD_MEMBER' or t.value <> '0' order by elapsed_time_sec desc with UR

Metric                           Value                                                                                                                                                   
-------------------------------- --------------------------------------------------------------------------------------------------------------------------------------------------------
COORD_MEMBER                     0                                                                                                                                                       
STMT_TEXT                        with ctx(ctx_uuid) as ( select CTX_SUB.CONTEXT_UUID from REPOSITORY.CONTEXT_SUBCONTEXTS CTX_SUB union select CTX_MEMB2.C                                
EXECUTABLE_ID                    x'0100000000000000250800000000000000000000020020210403101135761558'                                                                                     
PACKAGE_NAME                     SYSSH200    (Section 1)                                                                                                                                 
Partition count                  1                                                                                                                                                       
APPLICATION_HANDLE               2505  (UOW_ID 6249, ACTIVITY_ID 1)                                                                                                                      
ELAPSED_TIME_SEC                 28                                                                                                                                                      
TOTAL_CPU_TIME                   2320751                                                                                                                                                 
ROWS_READ                        868292                                                                                                                                                  
ACTIVE_SORTS                     1                                                                                                                                                       
ACTIVE_SORTS_TOP                 1                                                                                                                                                       
ACTIVE_SORT_CONSUMERS            5                                                                                                                                                       
ACTIVE_SORT_CONSUMERS_TOP        5                                                                                                                                                       
SORT_SHRHEAP_ALLOCATED           295                                                                                                                                                     
SORT_SHRHEAP_TOP                 1424                                                                                                                                                    
Pct wait times                   Total: 5   Lock: 0   Pool rd: 0   Dir IO: 0   FCM: 0   Latch: 0   Log: 0   CF: 0   Reclaim: 0     

Keep note of the EXECUTABLE_ID "x'0100000000000000250800000000000000000000020020210403101135761558'" as this can be used to obtain the explain plan.

Top SQL statements by execution time

This section covers queries that finished executing and are stored in the package cache. The list is sorted by "COORD_STMT_EXEC_TIME", but we are actually interested in the "AVG_COORD_EXEC_TIME" (milliseconds)

======================================
Top SQL statements by execution time
======================================
select member,  count(*) num_stmts, integer(num_exec_with_metrics) total_exec, sum(coord_stmt_exec_time,

MEMBER  NUM_STMTS   TOTAL_EXEC  COORD_STMT_EXEC_TIME AVG_COORD_EXEC_TIME PCT_COORD_STMT_EXEC_TIME TOTAL_CPU_TIME 
-------- -------------------- ------------------- -------------- ------------ ------------- ----------------------------------------------
0                 1                      4                                         1729                             432.25                                       29.89                       
0                 1                      1                                         2526                           2526.00                                       50.18                        
0                 1                      5                                         1832                             366.40                                       31.04                        

So find the largest "AVG_COORD_EXEC_TIME" 2526.00 in the example above and then navigate to the section “Statement & plan identifiers - top statements by execution time”, this is again sorted in the same order as before so you can find and take note of the EXECUTABLE_ID to gather the explain plan.

Explain plan

An execution plan shows the detailed steps necessary to execute a SQL statement. The steps are expressed as a set of database operators that produce and consume rows.

Get the explain plan by using the EXECUTABLE_ID (do NOT leave out the character x) as gathered from the db2mon report and add it to the command below:

               db2 "CALL EXPLAIN_FROM_SECTION(x'0100000000000000250800000000000000000000020020210403101135761558', 'M', NULL, 0, NULL, ?, ?, ?, ?, ?)" 

Followed by:

               db2exfmt -d <DB_Name> -1 -o /tmp/Explain_test.txt

We can also obtain the bind variables from the queries by running the following command

               db2 "export to output.txt of del messages test select XMLPARSE(DOCUMENT max_coord_stmt_exec_time_args) FROM TABLE(MON_GET_PKG_CACHE_STMT(NULL,x'0100000000000000250800000000000000000000020020210403101135761558',NULL,-1))"

This generates an xml file called output.txt.001.xml which contains the set of parameters used in the query.

Automating the db2mon data gathering

If the DB2 server is installed on Linux, instead of manually running db2mon and collecting EXPLAIN PLANs, an alternative way is provided to automate the whole process. We have a shell script db2mon_auto_v2.sh, which can be found at the bottom of the article, that wraps around db2mon and does post processing. The script will run db2mon multiple times, then generates the EXPLAIN PLANs as well as binding parameter input files for the hotspot queries captured.

Prerequisites

  1. copy the wrapper script db2mon_auto_v2.sh to the same directory that contains db2mon.sh, user to execute the script must have read and write permission to this directory
  2. db2 EXPLAIN tables must be created. if not yet, run the following commands as db2 administrator
                db2 connect to <dbname>
                db2 "CALL SYSPROC.SYSINSTALLOBJECTS('EXPLAIN', 'C', CAST (NULL AS VARCHAR(128)), CAST (NULL AS VARCHAR(128)))"
          
  3. test db2mon.sh and make sure it runs properly alone: ./db2mon.sh > report.out. Look for errors in report.out and fix them prior to running the wrapper script

Usage

Issue command as db2 administrator:
 
./db2mon_auto_v2.sh <dbname> <number_of_db2monCalls>
For example, to run db2mon.sh 5 times against database RMDB:
./db2mon_auto_v2.sh RMDB 5
where number_of_db2monCalls is an integer; we recommend to use a value between 3 and 10

Output files

The script finishes in a few minutes depending on the number_of_db2monCalls specified in the command line. The output files will be generated under a newly created directory called report_db2mon%datetime% . A list of hotspot queries with their EXECUTABLE_IDs and stats will be written to the end of the report file.

db2support files

db2support is a built in DB2 tool that collects all important DB2 configuration information as well as certain system catalogs relevant to the query optimizer. To run the tool, use the following command as db2 administrator

db2support -d <dbname> -cl 0
db2support will compress all information collected in a single zip file.

Getting explain plans with row actuals

The most useful bit of diagnostic information is an explain plan that shows the actual row counts processed by each step. Unfortunately, db2mon_auto only generates explain plans that include row estimates. You need to run through some additional steps to generate an explain plan that includes row actuals. Here's an overview of the process:

  1. Create EXPLAIN tables if you have not already done so.
  2. Identify any slow SQL statements of interest, and create a stand-alone text file for each statement that is compatible with db2batch. You may be able to get SQL statements from product log files. Otherwise, you'll need to use the db2mon_auto output.
  3. Enable the actuals_mon DB2 event monitor.
  4. Run the SQL statement using DB2 batch and generate the explain plan
  5. Optional: Disable the event monitor and drop diagnostic tables.

Creating db2batch-compatible files for slow SQL statements

db2batch is a benchmarking tool that allows you to run SQL statements from a flat file and then collect performance information about the execution.

In order to generate explain plans with row actuals, you'll need a structure similar to the one below:

CALL SYSPROC.WLM_SET_CLIENT_INFO(NULL, NULL, NULL, 'actuals', NULL);
--#BGBLK 1
...PARAM statements that declare values of bind parameters
…query with params …;
--#EOBLK

The call to WLM_SET_CLIENT_INFO creates the association between the event monitor for actuals and the SQL statement to be executed. The SQL statement itself is placed between --#BGBLK 1 and --#EOBLK tags. If the SQL statement contains bind parameters (e.g. question marks in the SQL), then you must define values for each of the bind parameters using --#PARAM directives (these are placed after the #BGBLK directive but before the SQL statement). You can get more information about the format expected by db2batch by referring to the DB2 documentation.

Most SQL statements issued from the ELM products use bind parameters, and it can be tricky to extract out the values of the bind parameters in order to create a db2batch-compatible file. There are a couple of ways to do this.

First, for DOORS Next Generation, you can check the product log files (rm.log). The view service will log slow SQL automatically if a statement runs for longer than a specified threshold (configurable in rm/admin Advanced Properties). The log file will contain the SQL (with PARAM statements for any bind variable as well as the #BGBLK and #EOBLK directorives). You can just copy the relevant section of the log into a file and then add in the WLM_SECT_CLIENT_INFO call at the top. Look for occurences of #BGBLK in rm.log:

ViewThresh.PNG

.....
GenSQL.PNG

You can also enable diagnostic logging in DOORS Next Generation by editing conf/rm/log4j2.xml and adding in the following line:

<Logger name="com.ibm.rdm.fronting.server.rrs.views.execution.sql.internal.run.DirectSqlExecutor" level="DEBUG"/>

The DEBUG level logging will log SQL generated by DNG in db2batch-compatible form, so you can just copy the relevant sections out of rm.log and paste them into a stand-alone file.

Note that the logging applies only to SQL generated by DNG. It will not log slow SQL that is generated by Jazz Foundation or other applications. In that case, you'll need to get the bind parameters from the db2mon_auto output. db2mon_auto generates explain plans without row actuals, but you can extract the SQL statement from the .expln file. You can also get the values of the bind parameters for the longest execution from the XML file generated for the SQL statement.

db2monBinds.PNG

Each bind parameter will have a value in the XML file, similar to this:

<stmt_value_data>_ldLMIFMoEeyE0fLvzO5lDQ</stmt_value_data>

For each entry, you need to manually at a #PARAM directive to your SQL. This can be tedious, so you may want to use a simple script to extract values and generate PARAM statements.

Enabling the actuals_mon event monitor

Run the following commands to set up the DB2 event monitoring for gathering row actuals:

db2 connect to <RMDB>
db2 update database configuration using section_actuals base
db2 create event monitor actuals_mon for activities write to table
db2 set event monitor actuals_mon state 1
db2 "create workload actuals current client_acctng('actuals') collect activity data with details,section"
db2 "grant usage on workload actuals to public"

Running the SQL statement using DB2 batch and generating explain plans

Run the following commands to execute the SQL via db2batch and then generate an explain plan.

To run the SQL:

db2batch -d <RMDB> -f <query>.sql -i complete -o r 0 p 5 -r <query_exe_with_actuals>.out3

Optionally, disable the event monitor (if you aren't planning to generate more explain plans):

db2 set event monitor actuals_mon state 0

Next, get the application id for the statement that was just executed:

db2 "SELECT a.time_completed, Substr(appl_name, 1, 20)   appl_name, Substr(a.appl_id, 1, 28)   appl_id, a.uow_id, a.activity_id, Length(a.section_actuals)  act_len, Substr(s.stmt_text, 1, 50) stmt FROM   activity_actuals_mon a,  activitystmt_actuals_mon s WHERE  a.appl_id = s.appl_id  AND a.uow_id = s.uow_id  AND a.activity_id = s.activity_id"

Next, generate the explain plan. Use the values of ‘*LOCAL.db2inst1.xxxxx’ (don’t leave out the * as a valid character), ‘2’, and ‘1’, from the output of the previous step.

db2 "CALL EXPLAIN_FROM_ACTIVITY( '*LOCAL.db2inst1.210308171806', 3, 1, 'ACTUALS_MON', '', ?, ?, ?, ?, ? )" 

Finally, generate the explain plan. This should include row actuals.

db2exfmt -d <RMDB> -1 -o actual.expln

To verify that you have the row actuals, look in the .expln file and check for the "Rows Actual" entry:

    Rows 
    Rows Actual
    RETURN
    (   1)
    Cost 
    I/O 

Optional: Disabling the event monitor and cleaning up

After you are done generating explain plans, you can optionally disable the event monitor and clean up:

db2 ALTER workload actuals disable; 
db2 DROP workload actuals;
db2 DROP TABLE ACTIVITYMETRICS_ACTUALS_MON;
db2 DROP TABLE ACTIVITYSTMT_ACTUALS_MON;
db2 DROP TABLE ACTIVITYVALS_ACTUALS_MON;
db2 DROP TABLE ACTIVITY_ACTUALS_MON;
db2 DROP TABLE CONTROL_ACTUALS_MON;
db2 drop event monitor actuals_mon;

Other useful Db2 commands:

To gather DB2 parameters and their current values use the following command :

               db2 get database configuration for <dbname>

To monitor Tablespaces, Transaction logs, No of reads/update/writes, Memory used and heap size the Command below will run 12 times (Every 5 minutes), so will generate 1 hour of monitoring.

               db2pd -d <dbname> -tablespaces -logs -tcbstat-transactions -dyn-rep 12 300 > db2pd.mon.out

db2support - Problem analysis and environment collection tool command

               db2support . -d <dbname> -s

Note: These next commands require the actual SQL of the long running query top run:

The Db2batch command captures the actual timings and other metrics as the query you are testing is running:

               db2batch -d dbname -f file.sql > file_sql.out

The Db2 Design Advisor is a tool that can help you significantly improve your workload performance. It identifies all of the objects that are needed to improve the performance of your workload. You can either run the command by passing in the query or by using a File containing the SQL:

                db2advis -d dbname -n schemaname -s "sql_statement" > db2advis.out
or
                db2advis -d dbname -n schemaname -i file.sql > db2advis.out

The db2caem command can be used to execute a poorly performing query and gather formatted explain output that includes section actuals showing the actual number of rows processed by each operator in the access plan.

                db2caem -d dbname -st "sql text goes here"


References:

Related topics: Performance sizing guides and datasheets, Disk benchmarking

External links:

Additional contributors: ShubjitNaik, MadanKumar

Topic attachments
I Attachment Action Size Date Who Comment
Pngpng GenSQL.PNG manage 100.8 K 2021-12-02 - 17:34 VaughnRokosz  
Pngpng ViewThresh.PNG manage 5.8 K 2021-12-02 - 17:34 VaughnRokosz  
Pngpng db2monBinds.PNG manage 129.5 K 2021-12-02 - 17:34 VaughnRokosz  
Shsh db2mon_auto_v2.sh manage 5.6 K 2021-06-15 - 15:49 UnknownUser  
Edit | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r11 < r10 < r9 < r8 < r7 | More topic actions
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Contributions are governed by our Terms of Use. Please read the following disclaimer.