~~EditAttach~~Printable

r10 - 2021-09-07 - 09:58:52 - PaulEllisYou are here: TWiki >

Deployment Web > DeploymentTroubleshooting > PerformanceTroubleshooting > Db2MustGather

Must-gather information for diagnosing long-running SQL statements on DB2

Authors: WilliamChatham, HongyanHuo, MadanKumarShanmugam
Build basis: DOORS Next 7.x, ETM, EWM 6.x, 7.x

Page contents

In order to diagnose a long-running SQL statement on DB2 we need to gather the following:

A Db2mon report.
The explain plan of any long running queries found in the report.
db2support files.

Please be aware that there is some time-sensitivity to collecting the data discussed here. You should gather the data as close to the time that the problem occurs. The detailed information for a SQL statement may not be available in the monitor or in the cursor cache. So you may have to gather this as you recreate the issue.

There is a more general must gather for performance issues for DOORS Next which should be used in conjunction with this guide.

Db2Mon report

The db2mon script is a created on top of the SQL monitoring interfaces provided by DB2 LUW It monitors database activity for a set amount of time ( default = 30 seconds ) and provides various metrics in a detailed report section as the output. The monitor report provides several valuable bits of information, including: *Database throughput *Analysis of processing time *Analysis of time spent waiting by the database *Top SQL statements by execution time

db2mon can be found from db2mon script for monitoring performance (ibm.com).

Running Db2Mon (Linux)

Db2mon needs to be run using the db2 administrator:

1 cd to the directory where the db2mon is installed.
2 Run "db2 connect to DB_Name" where DB_Name is the required database.
3 Run "db2 flush package cache dynamic" To clear the cache.
4 Run "./db2mon.sh DB_Name >> report.out".
5 Repeat /db2mon.sh DB_Name >> report.out" command 3 to 5 times in a row, so that we'll have a collection of a few datapoints gathered together.

Running Db2Mon (Windows)

Download db2mon script tar file for your db2 version, say 'DB2 v11.1' and extract on the system where you will collect.

1 cd to the directory where the db2mon is installed (where 'db2mon.sql' is located).
2 Create a new file 'db2mon.cmd' with the following code.

set DBNAME=%1
db2 -v connect to %DBNAME%
db2 -v create bufferpool db2monbp
db2 -v create user temporary tablespace db2montmptbsp bufferpool db2monbp
db2 +c -tvf db2mon.sql
db2 -v commit work
db2 -v connect reset
db2 -v connect to %DBNAME%
db2 -v drop tablespace db2montmptbsp
db2 -v drop bufferpool db2monbp
db2 -v connect reset

3 Open a CMD window and cd to the directory where 'db2mon.cmd' exists
4 Run the following command to set the DB2 command line (DB2CLP) variable: db2cmd -i -w db2clpsetcp
5 Type echo %DB2CLP% to confirm the DB2CLP variable has been set correctly, This should now show: DB20FADE
6 Run the db2mon command: db2mon.cmd > report.out Example: (C:\dbmon\v111>db2mon.cmd RM702 > report.out)
7 In report.out search for "REPORT STARTS HERE" this is where the report starts.

Analysing the report output

In the report.out there are 2 sections to check

Currently executing SQL at start of capture/Currently executing SQL at end of capture
Top SQL statements by execution time

Currently executing SQL at start of capture / Currently executing SQL at end of capture

Please note that there can be multiple queries captured in this section. Check the "ELAPSED_TIME_SEC", if the time is longer than a few seconds it indicates a possible query issue. The example below shows a query running for 28 seconds and has not yet completed:

================================================================
 Currently executing SQL at end of capture (non-zero metrics only)
================================================================

with mon as (select ts, count, coord_member,application_handle,uow_id,activity_id,elapsed_time_sec,total_cpu_time,rows_read,direct_reads,direct_writes,executable_id,package_name,section_number,active_sorts,active_sorts_top,active_sort_consumers,active_sort_consumers_top,sort_shrheap_allocated,sort_shrheap_top,post_threshold_sorts,post_shrthreshold_sorts,post_threshold_hash_joins,post_shrthreshold_hash_joins,post_threshold_hash_grpbys,post_threshold_olap_funcs,total_act_time,total_act_wait_time,lock_wait_time,pool_read_time,direct_read_time,direct_write_time,fcm_recv_wait_time,fcm_send_wait_time,total_extended_latch_wait_time,log_disk_wait_time,cf_wait_time,reclaim_wait_time,spacemappage_reclaim_wait_time,stmt_text from mon_current_sql_plus_end ) select t.metric "Metric", t.value "Value" from mon, table( values ('COORD_MEMBER',varchar(coord_member)), ('STMT_TEXT', cast(substr(stmt_text,1,120) as varchar(120))), ('EXECUTABLE_ID', 'x'''||hex(executable_id)||''''), ('PACKAGE_NAME', package_name || '    (Section '||cast(section_number as varchar(10))||')'), ('Partition count', varchar(count)), ('APPLICATION_HANDLE',varchar(application_handle) || '  (UOW_ID '||cast(uow_id as varchar(8))||', ACTIVITY_ID '||cast(activity_id as varchar(8))||')' ), ('ELAPSED_TIME_SEC',varchar(elapsed_time_sec)), ('TOTAL_CPU_TIME',varchar(total_cpu_time)), ('ROWS_READ',varchar(rows_read)), ('DIRECT_READS',varchar(direct_reads)), ('DIRECT_WRITES',varchar(direct_writes)), ('ACTIVE_SORTS',varchar(active_sorts)), ('ACTIVE_SORTS_TOP',varchar(active_sorts_top)), ('ACTIVE_SORT_CONSUMERS',varchar(active_sort_consumers)), ('ACTIVE_SORT_CONSUMERS_TOP',varchar(active_sort_consumers_top)), ('SORT_SHRHEAP_ALLOCATED',varchar(sort_shrheap_allocated)), ('SORT_SHRHEAP_TOP',varchar(sort_shrheap_top)), ('POST_THRESHOLD_SORTS',varchar(post_threshold_sorts)), ('POST_SHRTHRESHOLD_SORTS',varchar(post_shrthreshold_sorts)), ('POST_THRESHOLD_HASH_JOINS',varchar(post_threshold_hash_joins)), ('POST_SHRTHRESHOLD_HASH_JOINS',varchar(post_shrthreshold_hash_joins)), ('POST_THRESHOLD_HASH_GRPBYS',varchar(post_threshold_hash_grpbys)), ('POST_THRESHOLD_OLAP_FUNCS',varchar(post_threshold_olap_funcs)), ('Pct wait times', 'Total: '      || cast(case when total_act_time > 0 then smallint( (total_act_wait_time / double(total_act_time)) * 100 ) else 0 end as varchar(5)) || '   Lock: '    || cast(case when total_act_time > 0 then smallint( (lock_wait_time / double(total_act_time)) * 100 ) else 0 end as varchar(5)) || '   Pool rd: ' || cast(case when total_act_time > 0 then smallint( (pool_read_time / double(total_act_time)) * 100 ) else 0 end as varchar(5)) || '   Dir IO: '  || cast(case when total_act_time > 0 then smallint( ((direct_read_time+direct_write_time) / double(total_act_time)) * 100 ) else 0 end as varchar(5)) || '   FCM: '     || cast(case when total_act_time > 0 then smallint( ((fcm_recv_wait_time+fcm_send_wait_time) / double(total_act_time)) * 100 ) else 0 end as varchar(5)) || '   Latch: '   || cast(case when total_act_time > 0 then smallint( (total_extended_latch_wait_time / double(total_act_time)) * 100 ) else 0 end as varchar(5)) || '   Log: '     || cast(case when total_act_time > 0 then smallint( (log_disk_wait_time / double(total_act_time)) * 100 ) else 0 end as varchar(5)) || '   CF: '      || cast(case when total_act_time > 0 then smallint( (cf_wait_time / double(total_act_time)) * 100 ) else 0 end as varchar(5)) || '   Reclaim: ' || cast(case when total_act_time > 0 then smallint( ((reclaim_wait_time+spacemappage_reclaim_wait_time) / double(total_act_time)) * 100 ) else 0 end as varchar(5)) ), (cast(repeat('-',32) as varchar(32)),cast(repeat('-',120) as varchar(120))) ) as t(metric,value) where t.metric = 'COORD_MEMBER' or t.value <> '0' order by elapsed_time_sec desc with UR

Metric                           Value                                                                                                                                                   
-------------------------------- --------------------------------------------------------------------------------------------------------------------------------------------------------
COORD_MEMBER                     0                                                                                                                                                       
STMT_TEXT                        with ctx(ctx_uuid) as ( select CTX_SUB.CONTEXT_UUID from REPOSITORY.CONTEXT_SUBCONTEXTS CTX_SUB union select CTX_MEMB2.C                                
EXECUTABLE_ID                    x'0100000000000000250800000000000000000000020020210403101135761558'                                                                                     
PACKAGE_NAME                     SYSSH200    (Section 1)                                                                                                                                 
Partition count                  1                                                                                                                                                       
APPLICATION_HANDLE               2505  (UOW_ID 6249, ACTIVITY_ID 1)                                                                                                                      
ELAPSED_TIME_SEC                 28                                                                                                                                                      
TOTAL_CPU_TIME                   2320751                                                                                                                                                 
ROWS_READ                        868292                                                                                                                                                  
ACTIVE_SORTS                     1                                                                                                                                                       
ACTIVE_SORTS_TOP                 1                                                                                                                                                       
ACTIVE_SORT_CONSUMERS            5                                                                                                                                                       
ACTIVE_SORT_CONSUMERS_TOP        5                                                                                                                                                       
SORT_SHRHEAP_ALLOCATED           295                                                                                                                                                     
SORT_SHRHEAP_TOP                 1424                                                                                                                                                    
Pct wait times                   Total: 5   Lock: 0   Pool rd: 0   Dir IO: 0   FCM: 0   Latch: 0   Log: 0   CF: 0   Reclaim: 0

Keep note of the EXECUTABLE_ID "x'0100000000000000250800000000000000000000020020210403101135761558'" as this can be used to obtain the explain plan.

Top SQL statements by execution time

This section covers queries that finished executing and are stored in the package cache. The list is sorted by "COORD_STMT_EXEC_TIME", but we are actually interested in the "AVG_COORD_EXEC_TIME" (milliseconds)

======================================
Top SQL statements by execution time
======================================
select member,  count(*) num_stmts, integer(num_exec_with_metrics) total_exec, sum(coord_stmt_exec_time,

MEMBER  NUM_STMTS   TOTAL_EXEC  COORD_STMT_EXEC_TIME AVG_COORD_EXEC_TIME PCT_COORD_STMT_EXEC_TIME TOTAL_CPU_TIME 
-------- -------------------- ------------------- -------------- ------------ ------------- ----------------------------------------------
0                 1                      4                                         1729                             432.25                                       29.89                       
0                 1                      1                                         2526                           2526.00                                       50.18                        
0                 1                      5                                         1832                             366.40                                       31.04

So find the largest "AVG_COORD_EXEC_TIME" 2526.00 in the example above and then navigate to the section “Statement & plan identifiers - top statements by execution time”, this is again sorted in the same order as before so you can find and take note of the EXECUTABLE_ID to gather the explain plan.

Explain plan

An execution plan shows the detailed steps necessary to execute a SQL statement. The steps are expressed as a set of database operators that produce and consume rows.

Get the explain plan by using the EXECUTABLE_ID (do NOT leave out the character x) as gathered from the db2mon report and add it to the command below:

               db2 "CALL EXPLAIN_FROM_SECTION(x'0100000000000000250800000000000000000000020020210403101135761558', 'M', NULL, 0, NULL, ?, ?, ?, ?, ?)"

Followed by:

               db2exfmt -d <DB_Name> -1 -o /tmp/Explain_test.txt

We can also obtain the bind variables from the queries by running the following command

               db2 "export to output.txt of del messages test select XMLPARSE(DOCUMENT max_coord_stmt_exec_time_args) FROM TABLE(MON_GET_PKG_CACHE_STMT(NULL,x'0100000000000000250800000000000000000000020020210403101135761558',NULL,-1))"

This generates an xml file called output.txt.001.xml which contains the set of parameters used in the query.

Automating the db2mon data gathering

If the DB2 server is installed on Linux, instead of manually running db2mon and collecting EXPLAIN PLANs, an alternative way is provided to automate the whole process. We have a shell script db2mon_auto_v2.sh, which can be found at the bottom of the article, that wraps around db2mon and does post processing. The script will run db2mon multiple times, then generates the EXPLAIN PLANs as well as binding parameter input files for the hotspot queries captured.

Prerequisites

copy the wrapper script db2mon_auto_v2.sh to the same directory that contains db2mon.sh, user to execute the script must have read and write permission to this directory

db2 EXPLAIN tables must be created. if not yet, run the following commands as db2 administrator

            db2 connect to <dbname>
            db2 "CALL SYSPROC.SYSINSTALLOBJECTS('EXPLAIN', 'C', CAST (NULL AS VARCHAR(128)), CAST (NULL AS VARCHAR(128)))"

test db2mon.sh and make sure it runs properly alone: ./db2mon.sh > report.out. Look for errors in report.out and fix them prior to running the wrapper script

Usage

Issue command as db2 administrator:

 
./db2mon_auto_v2.sh <dbname> <number_of_db2monCalls>

For example, to run db2mon.sh 5 times against database RMDB:

./db2mon_auto_v2.sh RMDB 5

where number_of_db2monCalls is an integer; we recommend to use a value between 3 and 10

Output files

The script finishes in a few minutes depending on the number_of_db2monCalls specified in the command line. The output files will be generated under a newly created directory called report_db2mon%datetime% . A list of hotspot queries with their EXECUTABLE_IDs and stats will be written to the end of the report file.

db2support files

db2support is a built in DB2 tool that collects all important DB2 configuration information as well as certain system catalogs relevant to the query optimizer. To run the tool, use the following command as db2 administrator

db2support -d <dbname> -cl 0

db2support will compress all information collected in a single zip file.

Other useful Db2 commands:

To gather DB2 parameters and their current values use the following command :

               db2 get database configuration for <dbname>

To monitor Tablespaces, Transaction logs, No of reads/update/writes, Memory used and heap size the Command below will run 12 times (Every 5 minutes), so will generate 1 hour of monitoring.

               db2pd -d <dbname> -tablespaces -logs -tcbstat-transactions -dyn-rep 12 300 > db2pd.mon.out

db2support - Problem analysis and environment collection tool command

               db2support . -d <dbname> -s

Note: These next commands require the actual SQL of the long running query top run:

The Db2batch command captures the actual timings and other metrics as the query you are testing is running:

               db2batch -d dbname -f file.sql > file_sql.out

The Db2 Design Advisor is a tool that can help you significantly improve your workload performance. It identifies all of the objects that are needed to improve the performance of your workload. You can either run the command by passing in the query or by using a File containing the SQL:

                db2advis -d dbname -n schemaname -s "sql_statement" > db2advis.out

                db2advis -d dbname -n schemaname -i file.sql > db2advis.out

The db2caem command can be used to execute a poorly performing query and gather formatted explain output that includes section actuals showing the actual number of rows processed by each operator in the access plan.

                db2caem -d dbname -st "sql text goes here"

References:

External links:

Additional contributors: ShubjitNaik, MadanKumar

db2mon_auto_v2.sh: db2mon_auto_v2.sh

Attachments

Topic attachments
I	Attachment	Action	Size	Date	Who	Comment
sh	db2mon_auto_v2.sh	manage	5.6 K	2021-06-15 - 15:49	UnknownUser

Deployment

Community information and contribution guidelines

Status icon key:

To do
Under construction
New
Updated
Constant change
None - stable page

Smaller versions of status icons for inline text:

Copyright © by IBM and non-IBM contributing authors. All material on this collaboration platform is the property of the contributing authors.
Contributions are governed by our Terms of Use. Please read the following disclaimer.
Dashboards and work items are no longer publicly available, so some links may be invalid. We now provide similar information through other means. Learn more here.