Regexp with new lines

Hi,

I want to parse an object text to put each match in a new attribute.

But, when there is some new lines, my function doesn't work. Any help ?

 

/*My current object is litterally :
"Version 2 :
Add some links
with "Satisfy" link module" 

I want to have :
"Add some links
with "Satisfy" link module"
in match [3] 
*/

Object curro = current Object
string Text_Version = curro."Object Text"
string Text_Description

Regexp MyText = regexp2 ".*(Version [0-9]*:) [^\\n*](\\\\~| *)*(.*)"
if (MyText Text_Version){
Text_Description = Text_Version [match 3]
}
else Text_Description = "Nothing"

print Text_Version [match 0] "\n" 
print Text_Version [match 1] "\n"
print Text_Version [match 2] "\n"
print Text_Version [match 3] "\n"

/*
match[0] = "ersion 2 :
Add some links
with "Satisfy" link module"

match[1] = "ersion 2 :
Add some links
with "Satisfy" link module"

match [2] = ""

match[3] = ""
*/

What is wrong ? 

Why the V of Verision is not parsed ?


Estebell - Tue Apr 22 08:38:44 EDT 2014

Re: Regexp with new lines
Mathias Mamsch - Tue Apr 22 10:29:29 EDT 2014

I don't quite get your regular expression but in DXL you can match newlines in different ways, however the "." placeholder does not match on newlines:

string sText = "\n"; 

Regexp re1 = regexp "\\\n"  // match
Regexp re2 = regexp "\\n"   // match
Regexp re3 = regexp "\n"    // match
Regexp re4 = regexp "."     // no match

if (re1 sText) print "Match re1\n"; 
if (re2 sText) print "Match re2\n"; 
if (re3 sText) print "Match re3\n"; 
if (re4 sText) print "Match re4\n";

Although I don't know what the (\\\\~| *) part of your regexp is supposed to match to, but the following test at least matches in group 1 and 3 (note that you have a space between 'Version 2' and the ':' ...

string Text_Version = "Version 2 :
Add some links
with \"Satisfy\" link module"

Regexp MyText = regexp2 "(Version [0-9]*[ ]*:)[ \n]*(\\\\~| *)*(.*)"

print "Match 0: " Text_Version [match 0] "\n" 
print "Match 1: " Text_Version [match 1] "\n"
print "Match 2: " Text_Version [match 2] "\n"
print "Match 3: " Text_Version [match 3] "\n"

Regards, Mathias

Re: Regexp with new lines
llandale - Tue Apr 22 13:04:16 EDT 2014

Not really following; but I will say:

  • (.*)   will match any number of characters no including a new-line (EOL)
  • Other RegExp references to "end of string" generally mean "end of string or next EOL".

Thus, parsing text with EOLs causes problems.  I resolve that with this

  • const string cl_re_strAnyChar = "[" charOf(1) "-" charOf(255)"]"    // any character except null
  • Regexp re = regexp2(whatever (cl_re_strAnyChar)* whatever)

That seesm to handle EOLs in the text

-Louie

Re: Regexp with new lines
Estebell - Wed Apr 23 03:26:18 EDT 2014

llandale - Tue Apr 22 13:04:16 EDT 2014

Not really following; but I will say:

  • (.*)   will match any number of characters no including a new-line (EOL)
  • Other RegExp references to "end of string" generally mean "end of string or next EOL".

Thus, parsing text with EOLs causes problems.  I resolve that with this

  • const string cl_re_strAnyChar = "[" charOf(1) "-" charOf(255)"]"    // any character except null
  • Regexp re = regexp2(whatever (cl_re_strAnyChar)* whatever)

That seesm to handle EOLs in the text

-Louie

Well, I tried your const string cl_re_strAnyChar but it doesn't work...

I've simplfied my object text.

// object text : "Version 2 : Add some links with link module" (without any EOL nor special characters)

Object curro = current Object
string Text_Version = curro."Object Text"
const string str_anychar = "["charOf(1)"-"charOf(255)"]"

Regexp MyText = regexp2 "([A-Z a-z 0-9]*:)(str_anychar)*"

if (MyText Text_Version)
{
    print Text_Version [match(0)] "\n"
    print Text_Version [match(1)] "\n"
    print Text_Version [match(2)] "\n"
}

// match [0] = "Version 2 :"
// match [1] = "Version 2 :"
// match [2] = ""

Why match (0) and then match(2) are wrong ??

I expected match(0) = "Version 2 : Add some links with link module"  and match(2) = "Add some links with link module" Even without EOL and with the const string, the regexp does not match !!!

Re: Regexp with new lines
Mathias Mamsch - Wed Apr 23 17:14:02 EDT 2014

Estebell - Wed Apr 23 03:26:18 EDT 2014

Well, I tried your const string cl_re_strAnyChar but it doesn't work...

I've simplfied my object text.

// object text : "Version 2 : Add some links with link module" (without any EOL nor special characters)

Object curro = current Object
string Text_Version = curro."Object Text"
const string str_anychar = "["charOf(1)"-"charOf(255)"]"

Regexp MyText = regexp2 "([A-Z a-z 0-9]*:)(str_anychar)*"

if (MyText Text_Version)
{
    print Text_Version [match(0)] "\n"
    print Text_Version [match(1)] "\n"
    print Text_Version [match(2)] "\n"
}

// match [0] = "Version 2 :"
// match [1] = "Version 2 :"
// match [2] = ""

Why match (0) and then match(2) are wrong ??

I expected match(0) = "Version 2 : Add some links with link module"  and match(2) = "Add some links with link module" Even without EOL and with the const string, the regexp does not match !!!

Your line 7 is wrong. It should read:

Regexp MyText = regexp2 "([A-Za-z0-9 ]*:)(" str_anychar ")*"

Regards, Mathias

 

Re: Regexp with new lines
Estebell - Thu Apr 24 03:10:53 EDT 2014

Mathias Mamsch - Wed Apr 23 17:14:02 EDT 2014

Your line 7 is wrong. It should read:

Regexp MyText = regexp2 "([A-Za-z0-9 ]*:)(" str_anychar ")*"

Regards, Mathias

 

Thank's so much !

It works fine !!!

 

Re: Regexp with new lines
llandale - Thu Apr 24 09:53:01 EDT 2014

Mathias Mamsch - Wed Apr 23 17:14:02 EDT 2014

Your line 7 is wrong. It should read:

Regexp MyText = regexp2 "([A-Za-z0-9 ]*:)(" str_anychar ")*"

Regards, Mathias

 

Beat my head against the wall yesterday and missed that.  Doh!

However, I think you should move that last asterisk * inside the parens; this gives "match 2" the entire rest of the string.  The way you have it, match 2 is just the last character, in this case "e".

const string str_anychar = "["charOf(1)"-"charOf(255)"]"

Regexp MyText = regexp2 "^([A-Za-z0-9 ]*:)(" str_anychar "*)"

void Test(string in_String)
{
 print "[" in_String "]\n"
 if (MyText in_String)
 {
     print "\t0  [" in_String [match(0)] "]\n"
     print "\t1  [" in_String [match(1)] "]\n"
     print "\t2  [" in_String [match(2)] "]\n"
     print "\t3  [" in_String [match(3)] "]\n"
 }
 else print "\tNo Match\n"
}
Test("Version 2 : Add some links with link module")
Test("Version 2 : Add some: links with link module")
Test("Version 2 : Add some: links \nwith link module")

-Louie

Re: Regexp with new lines
Estebell - Fri Mar 31 09:35:55 EDT 2017

Hello Louie,

I continue in this topic because I have a new problem.

When a string has special characters, my regexp is not working. Here is my code and I don't know why it does not work.

It prints "Objectif :\nVérifier que la sortie est capable d'effectuer 5000 man"  although intOf "œ"  = 156  that is between 1 and 255 ...

void Test (string s)
{
    string anychar = "["charOf(1) "-" charOf(255)"]"
        Regexp Text = regexp2 "(Objectif[ ]*:[ ]*\\n)("anychar"*)"
        if(Text s)
        {
                s = s[match 2] ""
        }
        print s""
}
        
        Test ("Objectif :\nVérifier que la sortie est capable d'effectuer 50000 manœuvres.")

 

Re: Regexp with new lines
Mathias Mamsch - Fri Mar 31 11:30:37 EDT 2017

Estebell - Fri Mar 31 09:35:55 EDT 2017

Hello Louie,

I continue in this topic because I have a new problem.

When a string has special characters, my regexp is not working. Here is my code and I don't know why it does not work.

It prints "Objectif :\nVérifier que la sortie est capable d'effectuer 5000 man"  although intOf "œ"  = 156  that is between 1 and 255 ...

void Test (string s)
{
    string anychar = "["charOf(1) "-" charOf(255)"]"
        Regexp Text = regexp2 "(Objectif[ ]*:[ ]*\\n)("anychar"*)"
        if(Text s)
        {
                s = s[match 2] ""
        }
        print s""
}
        
        Test ("Objectif :\nVérifier que la sortie est capable d'effectuer 50000 manœuvres.")

 

DOORS is internally using UTF8, which has a much wider set of characters. The char type can hold much more than 255 different characters, but IBM never bothered to correct the "charOf", "intOf" functions. 

 

So while your test suggests that intOf "œ"  = 156, it really 339 (see http://www.fileformat.info/info/unicode/char/0153/index.htm) and therefore does not match the range 0-255 (which regexp fully respects). 

char c = addr_ 339
print c

c = charOf 339
print c

So the easiest way for your specific regexp, is to remove the "anychar" part from the regex and simply take a substring after the "end 0" match. If you really need anychar, you need to resolve to something like 

"[^" + (charOf 1) + "]"

(which assumes that you do not consider chr(1) as a valid character in your text. Hope this helps, Regards, Mathias

 

Re: Regexp with new lines
Estebell - Mon Apr 03 04:28:41 EDT 2017

Well I need anychar so I resolve by excepting ñ character.

 

Thanks a lot !