DXL Script for a batch find and replace

Hello,

Is there a way to go a batch find and replace for multiple words using a dxl script?

I need to replace every reference term with itself just in all caps ( ie. cat --> CAT) in every formal module.  Is it possible to create a DXL script that could automate this?  Would I be able to create a script with every term and its replacement and then run the script on each module?

Thanks!


cjjohnson - Tue Sep 23 16:00:56 EDT 2014

Re: DXL Script for a batch find and replace
Wolfgang Uhr - Wed Sep 24 04:05:06 EDT 2014

Hi

> I need to replace every reference term with itself just in all caps ( ie. cat --> CAT) in every formal module. Is it possible to create a DXL script that could automate this?

Yes, it takes a time of about 1 Week or two and I think it is impossible to post such a script, because it needs some individual finetuing. For example, if you use a regular expression like "^(.*)([A-Z][A-Z][A-Z]*)(.*)$" you may find all uppercase words with more than three characters. If you catalog this words you will find a log of words which shall get a different reaction.

So:

  • Find the words
  • Catalogize them
  • Develop the rules for different sets of words
  • Make shure that all exeptions are handled correctly
  • And then - only then - perform the transformation

So you do not ask for a skript. You are asking for someone who does your job.

> Would I be able to create a script with every term and its replacement and then run the script on each module?

Yes it would be possible.

Best regards

Wolfgang

Re: DXL Script for a batch find and replace
cjjohnson - Wed Sep 24 11:25:15 EDT 2014

Wolfgang Uhr - Wed Sep 24 04:05:06 EDT 2014

Hi

> I need to replace every reference term with itself just in all caps ( ie. cat --> CAT) in every formal module. Is it possible to create a DXL script that could automate this?

Yes, it takes a time of about 1 Week or two and I think it is impossible to post such a script, because it needs some individual finetuing. For example, if you use a regular expression like "^(.*)([A-Z][A-Z][A-Z]*)(.*)$" you may find all uppercase words with more than three characters. If you catalog this words you will find a log of words which shall get a different reaction.

So:

  • Find the words
  • Catalogize them
  • Develop the rules for different sets of words
  • Make shure that all exeptions are handled correctly
  • And then - only then - perform the transformation

So you do not ask for a skript. You are asking for someone who does your job.

> Would I be able to create a script with every term and its replacement and then run the script on each module?

Yes it would be possible.

Best regards

Wolfgang

Thanks.  I was inquiring to seeing if it was even feasible. 

I was considering creating a spreadsheet with all the terms, one column with the terms and another with the all caps I'd like to replace them with (to avoid hard coding the terms in dxl) and then calling the file in dxl.  Possibly turning those into two separate arrays and stepping through the first to search and then replace with the all caps word and having the array position determine which word to replace with.

i.e.

Term all Caps
cat CAT
dog DOG
bug BUG
fish FISH
bird BIRD
  1. termArray[0] = 'cat'
  2. search for 'cat' in the module
  3. find 'cat'
  4. replace with capArray[0] = 'CAT'
  5. step to the next position in the array

repeat process at termArray[1] = 'dog' and capArray[1] = 'DOG' ... and so on ...

I know it is possible to import an excel file and create attributes, but is it possible to turn a column in excel into an array?  Possibly create some temp buffers and new arrays to step through and perform the searches and replace?

 

Thanks!

 

Re: DXL Script for a batch find and replace
llandale - Wed Sep 24 14:44:54 EDT 2014

I develop this sort of script in this (inside out) order:

  • Get it to work for some single value
  • Get it to work for some single object
  • Get it to work for all objects in a single module
  • Get it to work for all relevant modules in a folder or project
  • Get it to work with a nice GUI
  • Get it to work with a Batch script.

In any event, this is extemely difficult:

  • You know you have a serious richtext rawtext problem here.  It is VERY difficult to replace raw text inside of rich text.  That is, replace "cat" with "CAT".  Anyway, my example presumes you are erasing all rich text; which is a bad presumption.
  • If you are replacing mis-matched strings, then you need to check for substring matches.  e.g. replacing "ABC" with "ABCD" means you want to ignore "ABCD"  and don't want to end up with "ABCDD".
  • You need to decide if case is insensitive: does "Cat" match?  I'll assume no, but this can be overcome as well.
  • You want to match full words:  "catsup" should not match.
  • Imperfectly: a "word" is surrounded by non-alphabetic characters.  This definition is 98% good but 2% imperfect.
  • I think it best if you convert your "from" list into regular expressions, and store them.

I'm thinking you can build a Reqexp like this:

  • Regexp  MakeRegexp(string in_String)
  • {  Buffer bufTemp = create()
  •    Regexp ret_re
  •                 // [1] Stuff to ignore before the string:
  •    bufTemp += "("     // start of regexp string
  •    bufTemp += "("     // start of "match 1"
  •    bufTemp += "["     // either ..
  •    bufTemp += "^"       // [a] start of string
  •    bufTemp += "|"       // .. or
  •    bufTemp += "[.]"     // [b].. zero or more any characters
  •    bufTemp += "[^a-zA-Z]+"  // .. followed by one or more non-alpha characters
  •    bufTemp += "]"      // end of either-or
  •    bufTemp += ")"      // end of "match 1"
  •                // [2] The string we are looking for
  •    bufTemp += "("    // start of "match 2"
  •    bufTemp += "[" in_String "]"  // find the string.
  •    bufTemp += ")"   // end of "match 2"
  •               // [3] Stuff to ignore after string; but still needs to be parsed.
  •    bufTemp += "("    // start of "match 3"
  •    bufTemp += "["     // either ..
  •    bufTemp += "[^a-zA-Z]+"  // ..one or more non-alpha characters
  •    bufTemp += "[.]"     // [a].. followed by zero or more any characters
  •    bufTemp += "|"       // .. or
  •    bufTemp += "$"       // [b] end of string
  •    bufTemp += "]"      // end of either-or
  •    bufTemp += ")"     // end of "match 3"
  •    bufTemp += ")"     // end of Regexp string
  •    ret_re = regexp2(tempStringOf(bufTemp))
  •    print "MakeRegexp:    [" in_string "]\t[" tempStringof(bufTemp) "]\n"
  •    delete(bufTemp)
  •    return(ret_re)
  • }   // end MakeRegexp()

That was pretty painful for me also!  I have to build things this way since for some reason I cannot visualize more than 7 characters at a time.  Anyway, maybe we can read the File:

  • Skip  skpTerms = createString()  // KEY 'string' TO string DATA: 'Regexp' from match
  • for each line in the file
  • {   From = 1st column    // "cat"
  •      reFrom = MakeRegexp(From)
  •     To = 1nd column       // "CAT"
  •     if (length(From) != length(To)) then error   // We can overcome this, but not for now
  •     else put(skpTerms, To, reFrom)
  • }

Now when we want to match something

  • string DoReplaces(string in_String)
  • {    // Replace all desired strings
  •     Buffer bufResults = create()
  •     for reFrom in skpTerms do
  •     {  To = (string key skpTerms)
  •         Remainder = in_String
  •         while (!null Remainder)
  •         {  if (reFrom Remainder)
  •            {    // match found
  •                bufResults += Remainder[match 1]   // stuff before "from"
  •                bufResults += To
  •                Remainder = Remainder[match 3]  // stuff after "from"
  •            }
  •           else
  •           {   // no more matches found
  •               bufResults += Remainder
  •               Remainder = ""    // this exists the loop gracefully
  •           }
  •        }    // end dealing with this To string
  •    }        // end dealing with all To strings
  •    ret_String = stringOf(bufResults)
  •    delete(bufResults)
  •    print "DoReplaces; From/To: [" in_String "]\t[" ret_String "]\n"
  •    return(ret_String)
  • }   // end DoReplaces()       

There surely must be a "Buffer" solution to the above.

I think we have a problem with strings containing EOLs.

Yup, difficult.

-Louie