Memory leak for identifying OLE objects

Hi all,

I'm facing a memory leak somewhere I cannot detect. When I run my process, something the DXL execution is halted, and sometimes is not...I do not understand what it's really happening.

It should be a simple process that looks for OLE objects for every requirement within a selected module and replaces them by a little markup. The goal is to reduce as much as possible the size of the object text (including OLEs) to send it to another process.

So that if I have an embedded 16MB OLE inside of a requirement, its richTextWithOle(req."Object text") size is about 70.000.000 characters (~70MB), so I'm trying to reduce it.

In this case, the OLEs I'm looking for are pictures. For that purpose, I'm using regexp but I think I'm doing something wrong because the DXL execution sometimes halts. It does not appear to do the same every time, so I think I may be wrong on my thoughts.

I'm trying to do the following, if any OLE has been found in the RichTextWithOle result for the requirement:

- 1: Copy from the beginning of the buffer (contains the whole RichTextWithOle) towards the beginning of the OLE
        o ResultBuffer += from the beginning of the buffer (contains whole RichTextWithOle) towards the beginning of the OLE
        
- 2: Look for the end of the OLE (by another regex) in order to replace the whole OLE (70 million chars) by a little markup (18 chars)
        o ResultBuffer += <oleFound id="1"/>
        
- 3: Copy from the end of the OLE towards the end of the buffer (assumming NO MORE OLEs inside this requirement)
        o ResultBuffer += TheRest

 

The DXL function is the following:

Regexp reObj = regexp "({\\\\object)|({\\\\pict)"
Regexp reBrace = regexp "[{}]"
string GetRichTextWithReplacedOLEs(Object theObject) {

    string resultAsString = ""

        // Buffer to get the RichTextWithOle
    Buffer withOleBuffer = create();
        withOleBuffer = richTextWithOle theObject."Object Text"   
        
        // Buffer to store results (with markups instead of OLEs)
        Buffer ResultBuffer = create()    
        
        int oleId = 1;
    int pos = 0;
    int oleStart = 0
        char ch
        
        int braceLevel
        int bracePos
        
    while (search(reObj, withOleBuffer, pos)) {
        
         oleStart = pos + start 0

         // 1- Adding BEFORE the OLE object
         if (oleStart > pos) {
                        combine(ResultBuffer, withOleBuffer, pos, oleStart-1);
                 }

         // look for the ending brace of the OLE object
         braceLevel = 1; // the start of the OLE had one opening brace
         bracePos = oleStart+1;          
         while (braceLevel > 0 && search(reBrace, withOleBuffer, bracePos)) {
             ch = withOleBuffer[bracePos + start 0];
             braceLevel += (ch == '{') ? 1 : (-1);
             bracePos += 1 + end 0
         }            
         if (braceLevel != 0) {
                        error "Invalid RTF!";
                 }
         pos = bracePos;

                 // 2- Adding the OLE object markup
         //print "Found OLE object from " oleStart " to " bracePos "! Replacing...\n";
                 markupId = "<ratObject id=\"" "" oleId "" "\"/>"
         ResultBuffer += markupId

                 oleId++
    }

        // 3- Adding AFTER the OLE objects
    combine (ResultBuffer, withOleBuffer, pos); // add the rest after the last OLE object

        // Setting result as string (from Buffer)       
        resultAsString = ResultBuffer ""

        // Disposing buffers
        setempty(withOleBuffer)
        delete(withOleBuffer)   
        setempty(ResultBuffer)
        delete(ResultBuffer)
        
        // Disposing strings
        ch = null

    return resultAsString
}

 

The problem is not the speed, but the memory increases so fast (looking at the Windows Task manager monitor). I tried to simplify the code as much as possible (removing buffer accesses and so on), and even with only the search() instructions still halts :(

Maybe I've been "reinventing the wheel", don't know if there's another simple way to get the same result, or if I'm not doing the strings/buffers management well.

Please advice.

Kind regards,

Borja.


Borja López - Tue Dec 16 07:16:09 EST 2014

Re: Memory leak for identifying OLE objects
Borja López - Tue Dec 16 08:09:06 EST 2014

To complete my problem, I also get this exception from time to time (e.g. when I run the DXL with a module open):

-R-E- DXL: <Line:77> An unexpected error happened: doors.exe caused an EXCEPTION_ACCESS_VIOLATION in module MSVCR80.dll at 0023:73004500, strlen()+0048 byte(s)

LOG details: The thread tried to read from or write to a virtual address for which it does not have the appropriate access.

 

If I check the "Next error" on the Dxl window, it's been fired in the line

    while (search(reObj, withOleBuffer, pos)) {

I think it's regarding to the "reObj" Regexp ... tried to set it as a global or variable, but still fails.

Any help?

Regards,

Borja.

Re: Memory leak for identifying OLE objects
Mathias Mamsch - Tue Dec 16 10:59:42 EST 2014

Borja López - Tue Dec 16 08:09:06 EST 2014

To complete my problem, I also get this exception from time to time (e.g. when I run the DXL with a module open):

-R-E- DXL: <Line:77> An unexpected error happened: doors.exe caused an EXCEPTION_ACCESS_VIOLATION in module MSVCR80.dll at 0023:73004500, strlen()+0048 byte(s)

LOG details: The thread tried to read from or write to a virtual address for which it does not have the appropriate access.

 

If I check the "Next error" on the Dxl window, it's been fired in the line

    while (search(reObj, withOleBuffer, pos)) {

I think it's regarding to the "reObj" Regexp ... tried to set it as a global or variable, but still fails.

Any help?

Regards,

Borja.

Are you sure, that the error you are getting is not a simple "time-out" ? Do you have a pragma runLim, 0 at the top of your code? If not put one there, and see if your code will still "exit"...

Regarding the exception you are getting it seems, you have been trying to do some experiments with tempStringOf or something like that? A "virtual address" on "strlen" access exceptuon hints to a corrupt string buffer...

Apart from that, the code you posted is very likely memory leak free, despite the fact, that you are returning a string?? You should return the buffer you created, use it, and when you are done dispose it. Consider using regexp2 instead if regexp (no idea if that will help) ... Regards, Mathias

 

 

Re: Memory leak for identifying OLE objects
Borja López - Wed Dec 17 04:27:17 EST 2014

Mathias Mamsch - Tue Dec 16 10:59:42 EST 2014

Are you sure, that the error you are getting is not a simple "time-out" ? Do you have a pragma runLim, 0 at the top of your code? If not put one there, and see if your code will still "exit"...

Regarding the exception you are getting it seems, you have been trying to do some experiments with tempStringOf or something like that? A "virtual address" on "strlen" access exceptuon hints to a corrupt string buffer...

Apart from that, the code you posted is very likely memory leak free, despite the fact, that you are returning a string?? You should return the buffer you created, use it, and when you are done dispose it. Consider using regexp2 instead if regexp (no idea if that will help) ... Regards, Mathias

 

 

1) Yes, I had the "pragma runLimit, 0 " line at the beginning of the script.

2) You got the point, I just tried a couple of times to see if using "tempStringOf" helps me (but no way). I tried to reboot the computer in order not to get this violation exception, but still get it. I searched a little bit on the forum, but I cannot find if there's any way to "clear" that virtual address.. what could I do??

3) You are fully right. I've removed that string to return, and I'm playing now with the Buffer till the end. What does regexp2 improve instead of normal regexp??

 

By the moment I'm struggling with the "strlen violation".. I put ACKs in the code, and the last ack before the violation exception is:

bool found = search(reObj, withOleBuffer, pos)

Where "withOleBuffer" has a length of 71000000 (so it's not null), and "pos"  value is 0 => So I think is not problem of this line, should be something corrupt as you said but I cannot find it.

Any advice?

Thanks in advance :)
 

 

Re: Memory leak for identifying OLE objects
Mathias Mamsch - Wed Dec 17 05:31:55 EST 2014

Borja López - Wed Dec 17 04:27:17 EST 2014

1) Yes, I had the "pragma runLimit, 0 " line at the beginning of the script.

2) You got the point, I just tried a couple of times to see if using "tempStringOf" helps me (but no way). I tried to reboot the computer in order not to get this violation exception, but still get it. I searched a little bit on the forum, but I cannot find if there's any way to "clear" that virtual address.. what could I do??

3) You are fully right. I've removed that string to return, and I'm playing now with the Buffer till the end. What does regexp2 improve instead of normal regexp??

 

By the moment I'm struggling with the "strlen violation".. I put ACKs in the code, and the last ack before the violation exception is:

bool found = search(reObj, withOleBuffer, pos)

Where "withOleBuffer" has a length of 71000000 (so it's not null), and "pos"  value is 0 => So I think is not problem of this line, should be something corrupt as you said but I cannot find it.

Any advice?

Thanks in advance :)
 

 

I can hardly believe that there is something wrong with the "search" perm although of course it could be possible. I really don't know what regexp2 does better than regexp - there are some differences (I think discussed on the forum) - I think a different regexp engine is used?

You should try to reduce code until your fault goes away. What I would do now, if I were you:

  • Write the contents of withOleBuffer to a file (71 MB) ...
  • Make a minimal DXL that
    • loads the text file to a buffer
    • runs only the 'search' loop (with increment of pos) over this buffer

Run this as often as necessary - just to make sure, that you do not accidentally have discovered a bug of the regexp engine... Then I would start adding code back until the problem reappears.

I think it is more likely, that you are causing a corrution on other places in your code, which makes your code fault at the regexp search call.

Another thing you can try is to run that code:

https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014530928&ps=25

It is similar to yours in the regard, that it will loop over the OLE objects. See if you can also reproduce the bug with that code.

By the way, what DOORS version are you using?

Re: Memory leak for identifying OLE objects
Borja López - Wed Dec 17 05:44:26 EST 2014

Mathias Mamsch - Wed Dec 17 05:31:55 EST 2014

I can hardly believe that there is something wrong with the "search" perm although of course it could be possible. I really don't know what regexp2 does better than regexp - there are some differences (I think discussed on the forum) - I think a different regexp engine is used?

You should try to reduce code until your fault goes away. What I would do now, if I were you:

  • Write the contents of withOleBuffer to a file (71 MB) ...
  • Make a minimal DXL that
    • loads the text file to a buffer
    • runs only the 'search' loop (with increment of pos) over this buffer

Run this as often as necessary - just to make sure, that you do not accidentally have discovered a bug of the regexp engine... Then I would start adding code back until the problem reappears.

I think it is more likely, that you are causing a corrution on other places in your code, which makes your code fault at the regexp search call.

Another thing you can try is to run that code:

https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014530928&ps=25

It is similar to yours in the regard, that it will loop over the OLE objects. See if you can also reproduce the bug with that code.

By the way, what DOORS version are you using?

Thanks Mathias, I'll try during the morning.

I'm testing with the 9.4 version (I have also the 9.6 installed but not tested).

Re: Memory leak for identifying OLE objects
Borja López - Wed Dec 17 06:49:05 EST 2014

Mathias Mamsch - Wed Dec 17 05:31:55 EST 2014

I can hardly believe that there is something wrong with the "search" perm although of course it could be possible. I really don't know what regexp2 does better than regexp - there are some differences (I think discussed on the forum) - I think a different regexp engine is used?

You should try to reduce code until your fault goes away. What I would do now, if I were you:

  • Write the contents of withOleBuffer to a file (71 MB) ...
  • Make a minimal DXL that
    • loads the text file to a buffer
    • runs only the 'search' loop (with increment of pos) over this buffer

Run this as often as necessary - just to make sure, that you do not accidentally have discovered a bug of the regexp engine... Then I would start adding code back until the problem reappears.

I think it is more likely, that you are causing a corrution on other places in your code, which makes your code fault at the regexp search call.

Another thing you can try is to run that code:

https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014530928&ps=25

It is similar to yours in the regard, that it will loop over the OLE objects. See if you can also reproduce the bug with that code.

By the way, what DOORS version are you using?

One weird thing regarding to this is that the Violation exception only occurs when I open the module. I mean, when I open DOORS and I run my DXL script, it works fine. The used memory is 103 MB, during the execution raises till 600-700 MB, and when finished it's exactly back to 103 MB.

But if I open DOORS, open a module (double click on it), immediately close the module, and run the DXL script, the violation exception happens. Weird, right? No DXL is performed on that module when opening.

PS: My DXL opens the module via location, does the stuff and closes it again.

PS2: Although not being so slow, I see 1 instruction which is taking most of the time:

Buffer withOleBuffer = create();
withOleBuffer = richTextWithOle theObject."Object Text" 

Is there any more efficient way to get the whole description including oles?

Regards,

Borja.

Re: Memory leak for identifying OLE objects
Mathias Mamsch - Wed Dec 17 13:19:45 EST 2014

Borja López - Wed Dec 17 06:49:05 EST 2014

One weird thing regarding to this is that the Violation exception only occurs when I open the module. I mean, when I open DOORS and I run my DXL script, it works fine. The used memory is 103 MB, during the execution raises till 600-700 MB, and when finished it's exactly back to 103 MB.

But if I open DOORS, open a module (double click on it), immediately close the module, and run the DXL script, the violation exception happens. Weird, right? No DXL is performed on that module when opening.

PS: My DXL opens the module via location, does the stuff and closes it again.

PS2: Although not being so slow, I see 1 instruction which is taking most of the time:

Buffer withOleBuffer = create();
withOleBuffer = richTextWithOle theObject."Object Text" 

Is there any more efficient way to get the whole description including oles?

Regards,

Borja.

It is very easy to get mislead when looking for an error - so I assume that you can reproduce the error consistently by doing what you described - Start a new DOORS instance, open exactly that one module. Close it. Make sure it is really closed (Looking at the manage open modules). Run your script which will reopen the module ... You should get the error. Shutdown DOORS again, restart. Run your DXL immediately without doing anything else. Problem disappears ...

If that is really the case, then you might be facing different problems of which the most probable is an invalid DXL inside the module, corrupting the DXL engine. To really make sure that no DXL is executed when opening the module, you should make a copy of the module, delete any standard view, delete all DXL attributes inside the module and then restart DOORS as a DBAdmin without triggers (--notriggers). Ensure that triggers are inactive. Try again if you get the same problem.

If you still have the same problem, maybe the module is corrupt. You should try to copy the object text of the module to a different module using copy objects (double check that everything has been copied). If you still get the described error, this would be at latest the point, where I would contact IBM Service.

Regards, Mathias