Paragraph Style attribute parsing with regular expressions

I've got a problem parsing the Paragraph Style attribute using regular expressions. It is almost certainly due to my limited experience with regular expressions but have been unable to find an answer either in the DOORS forums or on other sites that explain regular expressions.

What I want is to parse the Paragraph Style attribute into as many style definitions as it has. The Paragraph style attribute often has a single style entry but it can hold as many as there are attributes.

For example: <Attributes:bold><Object Text:Body Text><Verification Method:italic>

An example of code that does not parse this correctly:

pragma runLim, 100000
 
int count = 0
 
string str1 = "<Attributes:bold><Object Text:Body Text><Verification Method:italic>"
 
Regexp re1 = regexp2 "(<.*>)"
 
 
while(re1 str1){
    count ++
    print count " " str1[match 1] "\n"
    str1 = str1[end (1) + 1:]
    print "new string: " str1 "\n"
}

 


The output of this is:
1 <Attributes:bold><Object Text:Body Text><Verification Method:italic>
new string:

Which is not what is intended.

If the code is modified to include multiple instances of '(<.*>)' then it works up to the number of repetitions. so if the regular expression is changed to

 

Regexp re1 = regexp2 "(<.*>)(<.*>)

The output becomes:
1 <Attributes:bold><Object Text:Body Text>
new string: <Verification Method:italic>

Is there a way of putting the original regular expression into a loop to iteratively extract all the styles. The answer may be no as the regular expression engine in DOORS appears to be 'greedy' in the sense that a wildcard expression will take as much of the string as it can. Other implementations of regular expressions (Perl, Java) offer non-greedy options.

One more question that is not answered in the DOORS documentation as far as I can see: what is the difference between 'regexp' and 'regexp2'? The DXL reference on-line help says that new code should use regexp2 but does not say what the difference is.

System information: Windows XP, DOORS client 9.3.0.2, DOORS server 9.2.0.2

TIA

Jim


bmij - Thu Mar 29 11:21:27 EDT 2012

Re: Paragraph Style attribute parsing with regular expressions
llandale - Thu Mar 29 11:50:57 EDT 2012

Google "Greedy" and "Lazy" "Regular Expressions", or go here:
I think that will result in you doing this:
  • Regexp re1 = regexp2 "(<.*>)?"

-Louie

Re: Paragraph Style attribute parsing with regular expressions
Mathias Mamsch - Thu Mar 29 14:23:33 EDT 2012

llandale - Thu Mar 29 11:50:57 EDT 2012
Google "Greedy" and "Lazy" "Regular Expressions", or go here:

I think that will result in you doing this:
  • Regexp re1 = regexp2 "(<.*>)?"

-Louie

You should do this:

Buffer buf = create(); buf = "<a:bc><d:fg><e:hg>"
 
Regexp reg = regexp2 "<([^>]+):([^>]+)>"
 
int pos = 0
 
while (search(reg, buf, pos)) {
     // ... process matches...
     string sAttr = buf[pos+(start 1):pos+(end 1)]
     string sStyle = buf[pos+(start 2):pos+(end 2)]
 
     print "Attribute: " sAttr " Style: " sStyle "\n"
 
     pos += 1 + end 0
}

 


Regards, Mathias

 

 


Mathias Mamsch, IT-QBase GmbH, Consultant for Requirement Engineering and D00RS

 

Re: Paragraph Style attribute parsing with regular expressions
jbackus - Fri Mar 30 04:29:25 EDT 2012

Mathias Mamsch - Thu Mar 29 14:23:33 EDT 2012

You should do this:

Buffer buf = create(); buf = "<a:bc><d:fg><e:hg>"
 
Regexp reg = regexp2 "<([^>]+):([^>]+)>"
 
int pos = 0
 
while (search(reg, buf, pos)) {
     // ... process matches...
     string sAttr = buf[pos+(start 1):pos+(end 1)]
     string sStyle = buf[pos+(start 2):pos+(end 2)]
 
     print "Attribute: " sAttr " Style: " sStyle "\n"
 
     pos += 1 + end 0
}

 


Regards, Mathias

 

 


Mathias Mamsch, IT-QBase GmbH, Consultant for Requirement Engineering and D00RS

 

Mathias,

Thank you for that code. It provides me a good start for what I'm trying to achieve.

The built in 'Edit Paragraph Style attribute' function does the job most of the time but if there are one or two objects, typically headings, that need the style changing the issue is finding the object with the wrong style. For example a heading has been demoted from level 3 to level 4. The required style is <Object Heading:Heading 4> but the actual value is <Object Heading:Heading 3>. I've written a few lines of DXL that checks every object but it does not handle cases where the attribute has multiple entries. Your example will allow me to correct that.

Jim

Re: Paragraph Style attribute parsing with regular expressions
bmij - Fri Mar 30 04:51:58 EDT 2012

Thanks, It seems I have two forum user names and logged in as the wrong one, replied and couldn't mark as answered!