DXL Optimization

So I recently created a script that goes through a DOORS document and checks certain objects for valid links in another document.  If a valid link is not found, we print out some identifying information to a log file.  The document that is being searched through currently has just over 13k objects in it.

 

At first my script would take ~8.5 minutes to execute.  I found a script on this forum from another user that gave me some stats on which functions were taking the most time to execute, so I made some slight modifications and got the run time down to ~7 minutes.  I'd like to get that down even lower if possible, but not having a great understanding of the underlying workings of DXL, I'm not positive what changes would cause any / the greatest effect towards improving runtime. 

 

So my question is two fold.

  1. What things in general will slow down / speed up a DXL script?  Where should one start when looking to optimize?
  2. What things specifically about my script could be causing it to take so long?

Here's a copy of the script I'm running.  I've also attached a copy in case its easier to digest that way.

 

///////////////////////////////////////////////////////////////////////////////
//
// Neil Ruggiero
// 6/6/2018
//
///////////////////////////////////////////////////////////////////////////////


/////////////////////////////////// GLOBALS //////////////////////////////////
pragma runLim, 0 // Set timeout; enforced by DOORS. 0 = no run limit

//The following should be the only variables you need to change to adapt this
//script for your specific needs
//*********************************CHANGE ME*******************************************************************************
string TargetModuleName =  //Document you're checking links from
string TargetModulePath =  //Where this document lives in DOORS structure
string WritePath =  //Where you want your link outputs
string LogPath =  //Where you want your logging outputs
string OriginDestination =  //Desired CSCI for message origin / destination
string TargetApplicability = //Desired _Mast Applicability in STD
int StartSection = 6 //Which section of IDD messages start appearing.
//******************************** CHANGE ME*******************************************************************************

Stream OutputLog = write(LogPath)
Stream Output = write(WritePath)
Module CurrentModule = current Module
Regexp delimeter = regexp2 "([0-9]+)" //searching on all digits

string examineLetter
string Spare = "Spare" //Filters out unused cells
string MessageName
int CurrentSection = 0
int NonDigitRemoved
int stringLength

//Object number build strings
string TargetObjectNumber
string TargetOrigin
string TargetDestination
/////////////////////////////////// GLOBALS ///////////////////////////////////


///////////////////////////// OBJECT NUMBER VARIABLES /////////////////////////
int Section = 6                    // Appendix A, constant
int SubSect1 = 0                   // varies
int SubSect2 = 0                   // varies
int MsgFormatTable = 2010          // const
int MsgFormatTableIndex = 2        // varies
string MsgFormatTableEndTag = "01" // const [int truncates 01->1 string keeps 01]
int SpecialSectionIndex = 0        // varies in sections 11, 12, 16 of IDD 100852

int MsgDescrTable = 1010           // const
int OriginatorIndex = 3            // varies
int DestinationIndex = 4           // varies
string MsgDescrTableEndTag = "02"  // const [int truncates 01->1 string keeps 01] 
///////////////////////////// OBJECT NUMBER VARIABLES /////////////////////////


///////////////////////////////////////////////////////////////////////////////
//  
// Takes in object number e.g. 6.1.1.2.0-1.0-1
// Strips out all non digit charcaters
// Returns only digits of string e.g. 61120101
///////////////////////////////////////////////////////////////////////////////
string StripNonDigit (string myString) {

    stringLength = length myString
    string buildString = ""
    
    for (i = 0; i < stringLength; i++) {
        //examineLetter = myString[i:i]
        if (delimeter(myString[i:i])) {
            //examineLetter is a digit, append to buildString
            buildString = buildString myString[i:i]
        }
    }

    // determine what section of IDD we're in.  Info used later
    CurrentSection = intOf(buildString[0:0])
    NonDigitRemoved = stringLength - (length buildString)

    return buildString
}


///////////////////////////////////////////////////////////////////////////////
// 
// Creates build strings based upon what section of IDD we're in. The IDD follows
// a set pattern, so we know which Object Numbers contain the information we're
// looking for.
//
///////////////////////////////////////////////////////////////////////////////
void GenerateObjectNumberStrings (void) {

    if (SubSect1 == 11 || SubSect1 == 12 || SubSect1 == 16) {
        TargetObjectNumber = Section"" SubSect1"" SpecialSectionIndex"" SubSect2"" MsgFormatTable"" MsgFormatTableIndex"" MsgFormatTableEndTag
        TargetOrigin = Section"" SubSect1"" SpecialSectionIndex"" SubSect2"" MsgDescrTable OriginatorIndex"" MsgDescrTableEndTag
        TargetDestination = Section"" SubSect1"" SpecialSectionIndex"" SubSect2"" MsgDescrTable DestinationIndex"" MsgDescrTableEndTag
    } else { //Build strings normally
        TargetObjectNumber = Section"" SubSect1"" SubSect2"" MsgFormatTable"" MsgFormatTableIndex"" MsgFormatTableEndTag
        TargetOrigin = Section"" SubSect1"" SubSect2"" MsgDescrTable OriginatorIndex"" MsgDescrTableEndTag
        TargetDestination = Section"" SubSect1"" SubSect2"" MsgDescrTable DestinationIndex"" MsgDescrTableEndTag
    }
}

///////////////////////////////////////////////////////////////////////////////
// 
// This method is used to update object number variables.
///////////////////////////////////////////////////////////////////////////////
void UpdateIndices (void) {

    //new section information always processes the same
    if (NonDigitRemoved == 1) { //We've reached a new section, adjust obj num
        SubSect1++
        SubSect2 = 0
        SpecialSectionIndex = 0
    }

    //New Message section processes differently depending on section format.  Sections
    //11, 12, and 16 have an additional level in the hierarchy before getting to the
    //stuff we care about.
    if (SubSect1 == 11 || SubSect1 == 12 || SubSect1 == 16) {
        //we're in a 'special' section which processes slightly differently
        if (NonDigitRemoved == 2) {
            //reached new header in this section, adjust object number variables
            SpecialSectionIndex++
            SubSect2 = 0
        }
        if (NonDigitRemoved == 3) {
            //We've reached a new message within the section, adjust obj num variables / Origin, Destination
            SubSect2++
            Origin = ""
            Destination = ""
            MsgFormatTableIndex = 2
            MessageName = obj."Object Heading"
        }
    } else {
        //we're in a 'normal' section that doesn't have additional headers in the hierchy
        if (NonDigitRemoved == 2) {
            //We've reached a new message within the section, adjust obj num variables / Origin, Destination
            SubSect2++
            Origin = ""
            Destination = ""
            MsgFormatTableIndex = 2
            MessageName = obj."Object Heading"
        }
    }
}
///////////////////////////////////////////////////////////////////////////////
// This function goes through each message in the IDD and checks for a link
// to STD.  If a link is NOT found, the object number number will be
// printed to a log file.
///////////////////////////////////////////////////////////////////////////////
void ScrapeData(void) {

    //Set up the data.  Declaring all variables here rather than re declaring
    //every loop improves run time
    Link link
    Module tarModule
    Object targetObj
    string ObjNum
    string Origin
    string Destination
    string CurrentObjectNumber
    string CurrentObjectText
    string MastApplicability
    int TargetAbsoluteNum
    bool foundLink
    bool foundBandLink

    for obj in all CurrentModule do { // Loop through each object in the IDD
        ObjNum = number obj
        CurrentObjectNumber = StripNonDigit(ObjNum)
        CurrentObjectText = obj."Object Text"

        GenerateObjectNumberStrings()

        if (CurrentSection >= StartSection) { //First obj # we care about is in StartSection
            //Get Origin and Destination for current message
            if (CurrentObjectNumber == TargetOrigin) {
                Origin = obj."Object Text"
            } else if (CurrentObjectNumber == TargetDestination) {
                Destination = obj."Object Text"
            }

            if (CurrentObjectNumber == TargetObjectNumber) {
                //We're looking at an object we care about
                foundLink = false
                foundBadLink = false
                for link in obj <- "*" do { //Loop through all links in current object
                    
                    //get target link _Mast Applicability information
                    TargetAbsoluteNum = sourceAbsNo(link)
                    tarModule = read((TargetModulePath TargetModuleName), true)
                    targetObj = object(TargetAbsoluteNum, tarModule)

                    if (!null targetObj) {
                        MastApplicability = targetObj."_Mast Applicability"
                    }

                    if (MastApplicability == TargetApplicability) {
                        if (source(link) == TargetModuleName) {
                            //found link!!!
                            foundLink = true
                        }
                    } else {
                        if (source(link) == TargetModuleName) {
                            //found bad link
                            FoundBadLink = true
                        }
                    }
                }

                MsgFormatTableIndex++
                if (foundLink == false) {
                    //We found an object with no proper link
                    if (Origin == OriginDestination || Destination == OriginDestination) { 
                        //Object's desination or origin is Inboard Interface - we care about it
                        if (CurrentObjectText != Spare) {
                            //We don't care about 'spare' fields
                            if (foundBadLink == true) {
                                //found link to STD, but step is not tagged as LPPM
                                Output << "\nLink found, step not tagged as xxx for " MessageName " - " CurrentObjectText
                                Output << "  Section: " Section"" "." SubSect1""
                            } else {
                                //found no link to STD for this object
                                Output << "\nNo link found for " MessageName " - " CurrentObjectText
                                Output << "  Section: " Section"" "." SubSect1""
                            }
                        }
                    }
                }   
            }
            UpdateIndices()
        }
    }
}

void main (void) {
    ScrapeData()
    print "\nScript Finished!"
    print "\nLook at " WritePath " for your results"
}

main()

View DetailsScrapeInfo.dxl

 

In order to cut the run time down I've already moved the initialization of several variables out of high frequency loops.  I've also avoided using a buffer in the RemoveNonDigit() method and string concatenation instead.  These two things improved my runtime by about 1.5 minutes, but I'd love to have a discussion on possible further improvements.


NRuggiero - Thu Jun 21 14:01:29 EDT 2018

Re: DXL Optimization
davidcs - Thu Jun 21 14:16:28 EDT 2018

It looks like you are opening the target modules visible...

tarModule = read((TargetModulePath TargetModuleName), true)

Try not displaying these modules, it should help cut down the execution time...

tarModule = read((TargetModulePath TargetModuleName), false, true)

Re: DXL Optimization
Mathias Mamsch - Fri Jun 22 03:32:32 EDT 2018

Well I find it weird that you really see this kind of improvements by moving variable declarations. You need to be very careful on how accurate your measurements are. A lot of times where you think you find an improvement, you are bound to misinterpret the cause, because performance is very very volatile in DXL.

Let me tell you from experience (and I got a bit here):

- Moving your variable declarations outside your loops will NOT improve performance (except in very very special situations) ! Please don't make this a habit - it makes your code really messy because you never know what function sets what variable and so on.

- In a script like this performance improvements can come from anywhere:

      - a module being open at script start (forgot to close before starting the script)?

      - module loading time decrease due to client internal caching

      - variable network performance

      - DOORS swapping around memory due to the memory requirements of all opened modules

      - hard disk utilization and file system caching (when writing to the log files)

So please be very careful interpreting time measurements in DXL - false assumptions will stay with you until the rest of your developer life and a lot of times make your code bad.

That being said, I do not see a lot of performance critical code here - lets analyze the script. What I can see here are the following operations:

- opening modules (I/O bound, 70-80% of the time your script executes is probably opening link targets)

- locating objects inside a module (not sure, depends on the number of links!)

- writing to a file (I/O bound, probably 10-20% of your time)

- string manipulation and formatting (CPU bound, probably 5% of your time)

As you can see, the performance of your script is 90% I/O bound so you should not be able to save relevant time by moving declarations outside of functions. If you want to really optimize the performance if your script, you should consider the following measures:

- Buffer all file output in a buffer, write it to the files in one statement at the end of the script (as long as you do not produce >200MB log files)

- when opening a link target, avoid doing this twice. Do not use "object" for locating the link target objects. Instead use a "for o in all do " loop to create a Skip lookup for the link target. Use the skip to locate link targets (much faster).

- If you want this script not to fail - regardless of the size of your modules - split your script in 3 stages:

     - 1. Read the current module, extract and cache all necessary information, grab information about all link targets

     - 2. Open the link targets one by one, read and cache all necessary information from the link targets. Close the modules afterwards.

     - 3. Produce the results from the cached informations.

- Replace string concatenation functions by buffers

If you want to try this, I will help you - but I warn you beforehand: the speedups you will get should not be incredible because of the time needed to open the modules should supercede every other operation in the script.

Maybe this helps, regards, Mathias

Re: DXL Optimization
NRuggiero - Fri Jun 22 08:43:15 EDT 2018

davidcs - Thu Jun 21 14:16:28 EDT 2018

It looks like you are opening the target modules visible...

tarModule = read((TargetModulePath TargetModuleName), true)

Try not displaying these modules, it should help cut down the execution time...

tarModule = read((TargetModulePath TargetModuleName), false, true)

So after trying your suggestion this morning, my run time went from ~7 minutes to ~5 seconds.

 

Looks like that was the thing that was REALLY slowing my script down.  Thanks for the tip!

Re: DXL Optimization
NRuggiero - Fri Jun 22 08:53:43 EDT 2018

Mathias Mamsch - Fri Jun 22 03:32:32 EDT 2018

Well I find it weird that you really see this kind of improvements by moving variable declarations. You need to be very careful on how accurate your measurements are. A lot of times where you think you find an improvement, you are bound to misinterpret the cause, because performance is very very volatile in DXL.

Let me tell you from experience (and I got a bit here):

- Moving your variable declarations outside your loops will NOT improve performance (except in very very special situations) ! Please don't make this a habit - it makes your code really messy because you never know what function sets what variable and so on.

- In a script like this performance improvements can come from anywhere:

      - a module being open at script start (forgot to close before starting the script)?

      - module loading time decrease due to client internal caching

      - variable network performance

      - DOORS swapping around memory due to the memory requirements of all opened modules

      - hard disk utilization and file system caching (when writing to the log files)

So please be very careful interpreting time measurements in DXL - false assumptions will stay with you until the rest of your developer life and a lot of times make your code bad.

That being said, I do not see a lot of performance critical code here - lets analyze the script. What I can see here are the following operations:

- opening modules (I/O bound, 70-80% of the time your script executes is probably opening link targets)

- locating objects inside a module (not sure, depends on the number of links!)

- writing to a file (I/O bound, probably 10-20% of your time)

- string manipulation and formatting (CPU bound, probably 5% of your time)

As you can see, the performance of your script is 90% I/O bound so you should not be able to save relevant time by moving declarations outside of functions. If you want to really optimize the performance if your script, you should consider the following measures:

- Buffer all file output in a buffer, write it to the files in one statement at the end of the script (as long as you do not produce >200MB log files)

- when opening a link target, avoid doing this twice. Do not use "object" for locating the link target objects. Instead use a "for o in all do " loop to create a Skip lookup for the link target. Use the skip to locate link targets (much faster).

- If you want this script not to fail - regardless of the size of your modules - split your script in 3 stages:

     - 1. Read the current module, extract and cache all necessary information, grab information about all link targets

     - 2. Open the link targets one by one, read and cache all necessary information from the link targets. Close the modules afterwards.

     - 3. Produce the results from the cached informations.

- Replace string concatenation functions by buffers

If you want to try this, I will help you - but I warn you beforehand: the speedups you will get should not be incredible because of the time needed to open the modules should supercede every other operation in the script.

Maybe this helps, regards, Mathias

Thanks for all the info Mathias!  I definitely think that some of my assumption of what was taking the most time to run in my script was off.  Based off the change I made, which was to open the module but not in visible mode, cut down the run time of my script my ~99%!

 

As you mentioned, it looks like my script was really I/O bound and not CPU bound almost at all.  I'm also going to try writing all my data to a buffer than only writing to file at the end of the script, I imagine that'll shave off even a few more seconds!  I haven't yet worked with Skip lists (this is the first script I've made) so I'll have to read up on that a little too.

 

Thanks again for all the information you've shared!!  I really really appreciate it and will be coming back to this thread often for performance improvements.