Regexp - problem trying to use on telephone number

Hi All,

Ok Here is what I have for my regexp
{cose} const Regexp UL_PARSE_PHONE = regexp2 "2-90-80-9http://-.2-90-90-9http://-.0-90-90-90-90-9.*"
{code}

What I am trying to achieve is I eventually want the phone number to be:
Ex. 616-555-1212
Right now I will accpet a period, or a space, instead of the dash.
I do not want any non numeric character where numbers should be.
I do not want anything on the end, like a space, tab, newline etc.
This is almost working for me, but I have noticed a few are giving me issues.
If there is a space after one of the dashes, then it let's it go.
Also, spaces on the end does it as well. I do not want these allowed.
What am I doing wrong?
I have been tweaking it here and there using differnet variations. This is the last iteration I have tried. I need some help. I am hoping somebody can help.
Thanks.
Jerry
SystemAdmin - Thu Aug 04 16:03:39 EDT 2011

Re: Regexp - problem trying to use on telephone number
llandale - Thu Aug 04 17:28:42 EDT 2011

Assuming you did your homework on valid Area and City codes restrictions, here you go:

string    sRePhone = 
                "^"                           // start of string      -
                "([2-9][0-8][0-9])"           // [match 1] area code  -
                "[-\\. ]"                     // dash-period-or-space -
                "([2-9][0-9][0-9])"           // [match 2] City               -
                "[-\\. ]"                     // dash-period-or-space -
                "([0-9][0-9][0-9][0-9])"      // [match 3] 4-digit phone      -
                "$"                           // end-of-string
Regexp  rePhone = regexp2(sRePhone)
Buffer  bufBuildPhone = create
 
string  BuildGoodPhone(string in_RawPhone)
{     // Build a good US phone formatted string based on the raw string
        // output format is 234-567-8901
        // Raw phone must be almost that, but with space or period instead of the dashess
 
        string  sGoodPhone
 
        if (rePhone in_RawPhone)
        {  bufBuildPhone       = in_RawPhone[match 1]
           bufBuildPhone        += "-"
           bufBuildPhone        += in_RawPhone[match 2]
           bufBuildPhone        += "-"
           bufBuildPhone        += in_RawPhone[match 3]
           sGoodPhone   = stringOf(bufBuildPhone)
        }
        else sGoodPhone = ""
        return(sGoodPhone)
}     // end BuildGoodPhone()    
 
 
 
 
void    TestPhone(string in_RawPhone)
{
        string  GoodPhone = BuildGoodPhone(in_RawPhone)
        print in_RawPhone "\t[" GoodPhone "]\n"
}
 
print "\t" sRePhone "\n"
                // GOOD:
TestPhone("234-567-8901")     // 
TestPhone("234.567-8901")     // uses period
TestPhone("234-567 8901")     // uses space
                // BAD:
TestPhone("234$567 8901")     // $
TestPhone("234..567 8901")    // two periods
TestPhone("234-567 8901 ")    // trailing space
TestPhone("234.567 89011")    // too many digits
TestPhone("134.567 8901")     // area code starts with 1
TestPhone("234.067 8901")     // city code starts with 0
 
delete(bufBuildPhone)


Sorry about the clumsiness but I get far too confused and need to space it out to see it clearly.

 

  • Louie

 

Re: Regexp - problem trying to use on telephone number
SystemAdmin - Thu Aug 04 19:26:09 EDT 2011

llandale - Thu Aug 04 17:28:42 EDT 2011

Assuming you did your homework on valid Area and City codes restrictions, here you go:

string    sRePhone = 
                "^"                           // start of string      -
                "([2-9][0-8][0-9])"           // [match 1] area code  -
                "[-\\. ]"                     // dash-period-or-space -
                "([2-9][0-9][0-9])"           // [match 2] City               -
                "[-\\. ]"                     // dash-period-or-space -
                "([0-9][0-9][0-9][0-9])"      // [match 3] 4-digit phone      -
                "$"                           // end-of-string
Regexp  rePhone = regexp2(sRePhone)
Buffer  bufBuildPhone = create
 
string  BuildGoodPhone(string in_RawPhone)
{     // Build a good US phone formatted string based on the raw string
        // output format is 234-567-8901
        // Raw phone must be almost that, but with space or period instead of the dashess
 
        string  sGoodPhone
 
        if (rePhone in_RawPhone)
        {  bufBuildPhone       = in_RawPhone[match 1]
           bufBuildPhone        += "-"
           bufBuildPhone        += in_RawPhone[match 2]
           bufBuildPhone        += "-"
           bufBuildPhone        += in_RawPhone[match 3]
           sGoodPhone   = stringOf(bufBuildPhone)
        }
        else sGoodPhone = ""
        return(sGoodPhone)
}     // end BuildGoodPhone()    
 
 
 
 
void    TestPhone(string in_RawPhone)
{
        string  GoodPhone = BuildGoodPhone(in_RawPhone)
        print in_RawPhone "\t[" GoodPhone "]\n"
}
 
print "\t" sRePhone "\n"
                // GOOD:
TestPhone("234-567-8901")     // 
TestPhone("234.567-8901")     // uses period
TestPhone("234-567 8901")     // uses space
                // BAD:
TestPhone("234$567 8901")     // $
TestPhone("234..567 8901")    // two periods
TestPhone("234-567 8901 ")    // trailing space
TestPhone("234.567 89011")    // too many digits
TestPhone("134.567 8901")     // area code starts with 1
TestPhone("234.067 8901")     // city code starts with 0
 
delete(bufBuildPhone)


Sorry about the clumsiness but I get far too confused and need to space it out to see it clearly.

 

  • Louie

 

Hi Louie,

Thanks for the reply!
So, I have to break it up like that?
I am wondering why mine did not work quite the way I liked.
And yeah I researched the area code and whole thing.
I figure i would tackle the US numbers first being it is much easier.
DId mine not work becuase I did use the esacpe sequenece, \\, with the period? I tried it both ways. I even dropped the sapce out from my
code example.
Oh and don't be sorry! It is not clumsy to me. I like to do the same.
I like your example very much! I will have to try it our tomorrow.

I did not show you my other code, mainly because it is garbled up with test type stuff trying to figure out waht was going on.
I have seen other examples not use the escape sequence with the square braces. This is why I hate IBM's DXL manual. Not enough information.
I even tried it with the \\ and I was still getting data I did not want.
If it does not match the expression then it should kick out.
ANd if it matches, then it should be ok, but if I had a space after the dash it would think it was ok. And I would get trailing spaces too.
But, thanks for the good example.
I also did use the matches. I just tried to use the whole thing.
Most of my numbers are right, so it seemed to work, until the other.
Ohe well.
Back to the drawing board tomorrow.
THanks!
Jerry

Re: Regexp - problem trying to use on telephone number
llandale - Fri Aug 05 13:02:20 EDT 2011

SystemAdmin - Thu Aug 04 19:26:09 EDT 2011
Hi Louie,

Thanks for the reply!
So, I have to break it up like that?
I am wondering why mine did not work quite the way I liked.
And yeah I researched the area code and whole thing.
I figure i would tackle the US numbers first being it is much easier.
DId mine not work becuase I did use the esacpe sequenece, \\, with the period? I tried it both ways. I even dropped the sapce out from my
code example.
Oh and don't be sorry! It is not clumsy to me. I like to do the same.
I like your example very much! I will have to try it our tomorrow.

I did not show you my other code, mainly because it is garbled up with test type stuff trying to figure out waht was going on.
I have seen other examples not use the escape sequence with the square braces. This is why I hate IBM's DXL manual. Not enough information.
I even tried it with the \\ and I was still getting data I did not want.
If it does not match the expression then it should kick out.
ANd if it matches, then it should be ok, but if I had a space after the dash it would think it was ok. And I would get trailing spaces too.
But, thanks for the good example.
I also did use the matches. I just tried to use the whole thing.
Most of my numbers are right, so it seemed to work, until the other.
Ohe well.
Back to the drawing board tomorrow.
THanks!
Jerry

If you look at your original post you'll see the formatting issues making it impossible to reproduce your actual code exactly; so I don't know what exactly was wrong. I'd tease you about using "Preview" but I'm the prime offender of that.

No, you don't have to break up the RegExp like I did. I have to break it up since I can juggle less balls then most folks it seems. This is far too much for me:
^(2-90-80-9)http://-\\. (2-90-90-9)http://-\\. (0-90-90-90-9)$

And breaking it up allows clever comments so I can read it later, and allows it to be more mindfully tweaked. Putting in a string and then into the RegExp lets you print the string which helps debugging.

All the 'symbols' in the RegExp chart in the manual are commands, and must be escaped if you intend to use them litterally. So "." means 'any single character except newline' and "\." means 'period character'. Since DXL requires you to escape the escape, it looks like "\\." when coded (but not printed).

match 0 is that part of the string that matched the entire regular expression.
Regexp re = regexp2("^abc) // match 0, if matched, will be "abc"
Regexp re = regexp2("^abc.*") // match 0, if matched, will be the entire string (up to an EOL)

BTW, the following is the RegExp for "any character", its 'better' than a period because it includes the NL character, but I don't recall why else

const string cl_re_strAnyChar = ""
Regexp re = regexp2("^abc" cl_re_strAnyChar"*") // match 0, if matched, will be the entire string
null character (charOf(0)) cannot exist in a 'string'.

  • Louie