Need help speeding up DXL script

I'd appreciate some help in speeding up the attached script which is used to generate a requirements test coverage report in DOORS 8.2. Note that I am by no means a DXL expert and this has been patched together from snippets from this forum and the DXL help manual. I run this on each requirements module in the database manually, then copy/paste the DXL output to an excel spreadsheet for futher analysis. I know how to send it to a text file or excel spreadsheet directly, but that isn'y my problem. The script reviews the attributes of each object, traverses each in-link, looks at the attributes of each source object, sets some booleans based on matches, then prints a report of attributes and booleans. It takes about 2 hours to run on some modules which have a large number of objects (low thousands), with 0-100 links per object. Running it on all modules in the DB takes the best part of a day. Any suggestions on speeding up the main loop and any  general comments would be welcome.

Thanks,

                 Ken.


mcnairk - Thu Jul 24 09:51:04 EDT 2014

Re: Need help speeding up DXL script
a8155058 - Thu Jul 24 12:03:36 EDT 2014

Instead of using regular expressions, use "matches". In the DOORS version you are using, the regex will leak memory and there is no way to reclaim the memory.  Fortunately, in later DOORS version the delete was overloaded to support regex.

Replace the print statements with buffers.

If you need further assistance I will help you fix the script.

Re: Need help speeding up DXL script
mcnairk - Thu Jul 24 12:09:53 EDT 2014

a8155058 - Thu Jul 24 12:03:36 EDT 2014

Instead of using regular expressions, use "matches". In the DOORS version you are using, the regex will leak memory and there is no way to reclaim the memory.  Fortunately, in later DOORS version the delete was overloaded to support regex.

Replace the print statements with buffers.

If you need further assistance I will help you fix the script.

I'll try your suggestions and get back to you if I need additional help. Part of the problem is that I don't know how big the buffers need to be...

Many thanks!

Re: Need help speeding up DXL script
a8155058 - Thu Jul 24 12:25:39 EDT 2014

mcnairk - Thu Jul 24 12:09:53 EDT 2014

I'll try your suggestions and get back to you if I need additional help. Part of the problem is that I don't know how big the buffers need to be...

Many thanks!

a dxl buffer will automatically resize so no need to worry about setting it.

you can declare a buffer two ways


Buffer b = create

b += "hello"

delete(b)

 

 

Buffer b = create(1000)

b += "hellow"

delete(b)

 

 

I added you on LinkedIn ( please don't share my identity ) so feel free to email if you have further problems. I don't frequent the forums that much.

Re: Need help speeding up DXL script
a8155058 - Thu Jul 24 13:08:46 EDT 2014

a8155058 - Thu Jul 24 12:25:39 EDT 2014

a dxl buffer will automatically resize so no need to worry about setting it.

you can declare a buffer two ways


Buffer b = create

b += "hello"

delete(b)

 

 

Buffer b = create(1000)

b += "hellow"

delete(b)

 

 

I added you on LinkedIn ( please don't share my identity ) so feel free to email if you have further problems. I don't frequent the forums that much.

Additional comments:

This is just wrong!

exception_list = exception_list ObjId " \t" objNo "\t" text_MyProject "\t" req_introduced_in_build "\t" text_site "\t" is_info_bool "\t" is_tested_indirectly_bool "\t" is_untestable_bool "\t" has_MyProject_link_bool "\t" has_other_link_bool "\n"

Instead do this

b // buffer where you declared somewhere else
b += ObjId " \t" objNo "\t" text_MyProject "\t" req_introduced_in_build "\t" text_site "\t" is_info_bool "\t" is_tested_indirectly_bool "\t" is_untestable_bool "\t" has_MyProject_link_bool "\t" has_other_link_bool "\

Then to print it out

print b""

 

Also consider

  • opening the modules only once instead of many times
  • closing the open modules at the end
  • if you choose to load the module with "Standard view" you wouldnt need "filtering off" and "showDeletedObjects false"


 

Re: Need help speeding up DXL script
llandale - Thu Jul 24 14:52:02 EDT 2014

a8155058 - Thu Jul 24 12:25:39 EDT 2014

a dxl buffer will automatically resize so no need to worry about setting it.

you can declare a buffer two ways


Buffer b = create

b += "hello"

delete(b)

 

 

Buffer b = create(1000)

b += "hellow"

delete(b)

 

 

I added you on LinkedIn ( please don't share my identity ) so feel free to email if you have further problems. I don't frequent the forums that much.

I like a8155058 's response a lot.

Check this out to grasp the realities of concatanating results in a loop:

https://www.ibm.com/developerworks/community/forums/html/topic?id=4ba17f45-2050-4412-82fe-8f9a7457b0af&ps=25

  • line 77 your "excption_list = exception_list ObjID xxx" statment without a doubt will kill your time, as it eats up all of available DOORS memory.  When you change to a buffer you might want to break up the concatenation of the various tab'd items.

As for regular expressions.

  • I disagree with the Regexp comment however,  Putting them at the top of the program "wastes" 6 allocated units; and the "matches" function is very slow, and I think creates background Regexp.
  • Use function "regexp2" for version 9.2 onwards.
  • I think you need to escape the periods if you are looking for a period: "(/.*ndirectly/.*)"

-Louie

Re: Need help speeding up DXL script
a8155058 - Thu Jul 24 15:23:29 EDT 2014

llandale - Thu Jul 24 14:52:02 EDT 2014

I like a8155058 's response a lot.

Check this out to grasp the realities of concatanating results in a loop:

https://www.ibm.com/developerworks/community/forums/html/topic?id=4ba17f45-2050-4412-82fe-8f9a7457b0af&ps=25

  • line 77 your "excption_list = exception_list ObjID xxx" statment without a doubt will kill your time, as it eats up all of available DOORS memory.  When you change to a buffer you might want to break up the concatenation of the various tab'd items.

As for regular expressions.

  • I disagree with the Regexp comment however,  Putting them at the top of the program "wastes" 6 allocated units; and the "matches" function is very slow, and I think creates background Regexp.
  • Use function "regexp2" for version 9.2 onwards.
  • I think you need to escape the periods if you are looking for a period: "(/.*ndirectly/.*)"

-Louie

@Louie:

Remember, he is using DOORS 8.2, and that has a bad regex engine.  Back then Telelogic ( I think they were about to be acquired by IBM ) decided to change it without any notice.  From what I and others experienced, if one ran a script with regex in DOORS 8.1 it ran fine but taking the same script will take so much longer in DOORS 8.2 .  I would have to look at my notes but remember that the DOORS 8.2 regex consumed more bytes quadratically and there was no way to reclaim this memory.  Even using the "eval_" statement did not aid in this. In later DOORS version after DOORS 8.2, the regular expression was changed once again when the tool started to get blue-washed by IBM.

I do agree with your regex thoughts in later DOORS versions after 8.2 though.

Re: Need help speeding up DXL script
mcnairk - Fri Jul 25 09:45:44 EDT 2014

a8155058 - Thu Jul 24 13:08:46 EDT 2014

Additional comments:

This is just wrong!

exception_list = exception_list ObjId " \t" objNo "\t" text_MyProject "\t" req_introduced_in_build "\t" text_site "\t" is_info_bool "\t" is_tested_indirectly_bool "\t" is_untestable_bool "\t" has_MyProject_link_bool "\t" has_other_link_bool "\n"

Instead do this

b // buffer where you declared somewhere else
b += ObjId " \t" objNo "\t" text_MyProject "\t" req_introduced_in_build "\t" text_site "\t" is_info_bool "\t" is_tested_indirectly_bool "\t" is_untestable_bool "\t" has_MyProject_link_bool "\t" has_other_link_bool "\

Then to print it out

print b""

 

Also consider

  • opening the modules only once instead of many times
  • closing the open modules at the end
  • if you choose to load the module with "Standard view" you wouldnt need "filtering off" and "showDeletedObjects false"


 

I hear "This is just wrong"  a lot...

Just using buffers (I have not looked at REXEXPs) seems to be working; time is down from 2 hours to 7 mins (further optimization in progress before I can mark this posting ANSWERED).

However, should I use one buffer for the entire module, or per requirement (this is what I currently do)?

Secondly, please advise where I'm opening the modules many times. What I see when I run the script is that each source module is opened once as required then left open. This was deliberate, since I was re-running the script on multiple modules, so I left the modules open. Now the time is down, I can loop through all modules and, close them at the end.

MANY thanks for you input from you and others on this forum!

Re: Need help speeding up DXL script
a8155058 - Fri Jul 25 11:45:47 EDT 2014

mcnairk - Fri Jul 25 09:45:44 EDT 2014

I hear "This is just wrong"  a lot...

Just using buffers (I have not looked at REXEXPs) seems to be working; time is down from 2 hours to 7 mins (further optimization in progress before I can mark this posting ANSWERED).

However, should I use one buffer for the entire module, or per requirement (this is what I currently do)?

Secondly, please advise where I'm opening the modules many times. What I see when I run the script is that each source module is opened once as required then left open. This was deliberate, since I was re-running the script on multiple modules, so I left the modules open. Now the time is down, I can loop through all modules and, close them at the end.

MANY thanks for you input from you and others on this forum!

Regarding the instances of buffers.  It is all up to you - do what makes sense. Just remember that when you are done using them to delete them.

To ensure you are only opening the modules once do something like this..

void openInModules(const Object& o)
{
  LinkRef lref
  ModName_ modName
  string str
 
  for lref in o <- "*" do
  {
    modName = source(lref)
    str = fullName(modName)
 
    if ( !open module str ) 
    {
      Module m = read(str,false,true)
    }
  }
}
 
Object o = current
openInModules(o)

 

EDIT:

Reading your message again I feel having one Buffer for the whole module is sufficient.  If you are constantly declaring and deleting a buffer within a loop is a no no.  
 

To see if you are opening up the modules more than once ( it all depends on the link usage ) after you run your code go to Tools -> Manage Open Modules.  If the References ( think that is the column since haven't used DOORS in ages ) is more than 1 then you are opening up the module more than once.

Re: Need help speeding up DXL script
mcnairk - Fri Jul 25 13:59:09 EDT 2014

llandale - Thu Jul 24 14:52:02 EDT 2014

I like a8155058 's response a lot.

Check this out to grasp the realities of concatanating results in a loop:

https://www.ibm.com/developerworks/community/forums/html/topic?id=4ba17f45-2050-4412-82fe-8f9a7457b0af&ps=25

  • line 77 your "excption_list = exception_list ObjID xxx" statment without a doubt will kill your time, as it eats up all of available DOORS memory.  When you change to a buffer you might want to break up the concatenation of the various tab'd items.

As for regular expressions.

  • I disagree with the Regexp comment however,  Putting them at the top of the program "wastes" 6 allocated units; and the "matches" function is very slow, and I think creates background Regexp.
  • Use function "regexp2" for version 9.2 onwards.
  • I think you need to escape the periods if you are looking for a period: "(/.*ndirectly/.*)"

-Louie

When you say 'break up the concatenation of the various tab'd items", do you mean add each item in turn to the buffer, i.e.:

buf += att1 "\t"

buf += att2 "\t"

...

buf += attn "\n"