Special Characters in Regular Expressions

Greetings, I was hoping that someone in the forum has solved this issue.  My dxl script goes through and puts attribute names in an array.  I then make another array of regular expressions that have the attribute names and additional things (operations, and semiphores, but that is not important).  I then use the regular expression array to search for patterns.  However, I found that if the attribute name contains a question mark(?), or square brackets([,]), then the regular expression is not formed right and doesn't find the right pattern.

I did a workaround for the question mark by checking if the attribute name matched one I knew with a question mark and then replaced the regular expression with one that I escaped the question mark for.  For example, for an attribute name of _Req?, the regular expression was to be (<<<)(_Req?)(=)(>>>), but it doesn't work finding <<<_Req?=>>>.  However, when I replace the regular expression with (<<<)(_Req\\?)(=)(>>>), this now works.

I can't find anything in the dxl reference manual that seems to indicate why the ? won't work in a regular expression, but it doesn't surprise me.

Since I don't have control on what people name their attributes, I would like to try and make my script adaptable for whatever name they use.  What I am wondering is if anyone has found a way to make a more generic method to fix these.

Thanks a bunch!

Greg


GregM_dxler - Tue Aug 20 18:08:42 EDT 2013

Re: Special Characters in Regular Expressions
Martin_Hunter - Wed Aug 21 04:22:02 EDT 2013

Greg,

The question mark character I believe can be used for first instance in a string ie lean not greedy which by default finds the last instance.

The character should therefore be escaped as should all regular expression characters found in an attribute name.

I had to add the following function to handle the dot character

{code}

string prefixDot (strings) {

   while (matches ("[^\\\\](\\.)", s)) {

      s = s[0:start 1-1] "\\" s[end 1:]

   }

return s

}

{\code}

Regards,

Martin

Re: Special Characters in Regular Expressions
GregM_dxler - Wed Aug 21 08:50:09 EDT 2013

Martin_Hunter - Wed Aug 21 04:22:02 EDT 2013

Greg,

The question mark character I believe can be used for first instance in a string ie lean not greedy which by default finds the last instance.

The character should therefore be escaped as should all regular expression characters found in an attribute name.

I had to add the following function to handle the dot character

{code}

string prefixDot (strings) {

   while (matches ("[^\\\\](\\.)", s)) {

      s = s[0:start 1-1] "\\" s[end 1:]

   }

return s

}

{\code}

Regards,

Martin

Hi Martin,

Thanks for the reply.  The use of the question mark to find the first instance vs the last is an interesting tidbit to know.  I wonder if it would just be worthwhile to make a small function that checks each character in the attribute name for being either ? or [ or ] and escape them if so.  Just wondering what other characters might have the same type issue that would normally be used in an attribute name.

Greg

Re: Special Characters in Regular Expressions
GregM_dxler - Wed Aug 21 09:42:13 EDT 2013

Martin_Hunter - Wed Aug 21 04:22:02 EDT 2013

Greg,

The question mark character I believe can be used for first instance in a string ie lean not greedy which by default finds the last instance.

The character should therefore be escaped as should all regular expression characters found in an attribute name.

I had to add the following function to handle the dot character

{code}

string prefixDot (strings) {

   while (matches ("[^\\\\](\\.)", s)) {

      s = s[0:start 1-1] "\\" s[end 1:]

   }

return s

}

{\code}

Regards,

Martin

I think I found a simple fix, using the escape command.  Here is the snippet of code that fixes the question mark, left square bracket and right square bracket.  It could be extended to other regular expression symbols as well.  Probably the only other ones I would think that might be in attribute names would be the left parenthesis and right parenthesis.

{code]

         result = escape(sAttribute[iC], '\\', "?[]") //escape any question marks and brackets
         reCpTags[iC]=regexp2 "(<<<)(" result ")(" sType ")([^>>>$]*)" //make the regular expression

{\code]

Thanks for the help,

Greg

Re: Special Characters in Regular Expressions
llandale - Wed Aug 21 12:32:13 EDT 2013

This is fairly old code and I cannot guarantee it is complete, but I have the following Regular Expression special characters which are escaped with the '\\' character when making a generic Reqexp search string:

  • string plc_RegExp_SpecialChars = "\\*+.^$()[]|?"  // Characters in in_StringSearch that must be escaped, since they
  •           // are legal RegExp key characters.  Front slash MUST be first in string
  • int    plc_RegExp_SpecialCharsLen = length(plc_RegExp_SpecialChars)
    ...
  •     for (i=0; i<plc_RegExp_SpecialCharsLen; i++)
        {  fReplaceChar(plg_RegExp_bufSearch, plc_RegExp_SpecialChars[i] , '\\' plc_RegExp_SpecialChars[i] "")
        }

Maybe someone can add other 'special' characters I missed.

-Louie

Re: Special Characters in Regular Expressions
HemlataS - Tue Sep 03 02:43:23 EDT 2013

May be below pattern will help you

([^;]*)  - This pattern match with any string

 

Regards,

Hemlata