Remove strikethrough

Hi all,

I have an object which has a mixture of Object Text and OLEs that currently has strikethrough applied.  I'd like to remove the strikethrough but keep all the remaining RTF and obviously the OLE too!

I've managed it for Object Text and Object Headings (i.e. no OLEs) by using RegExp to find the "\\strike" and remove it.  The "\\strike0" is left in the string but that appears (?!?) to be ok and successfully removes the strikethrough whilst leaving the remaining RTF as is.

Where the Object Text is a mixture of text and OLEs, it seems to be more complicated...  It identifies the beginning of the string ok but cuts it short midway through leaving me with an object containing a little text but nothing from the first OLE onwards.

A snapshot of the code:

Module m = current
Module o = current

string s = ""

Regexp StrikeOutText = regexp2 "^(.*)\\\\strike[^0](.*)"


if (oleISObject o) {
    Text = richTextFragment richTextWithOle(o."Object Text")
        s = richTextWithOle"Object Text"

        if (StrikeOutText Text) {
                print "Match start of line: " Text[match 1] "\n"
                print "Match strikethru: " Text[match 2] "\n"
                print "Match end of line: " Text[match 3] "\n"
                o."Object Text" = richText "{\\rtf1" Text[match 1] Text[match 2] Text[match 3] "}"
        } else {
                print "doesn't match"
        }
}

Does anyone have any ideas where I'm going wrong??

Thanks for your help
s.


lumish82 - Tue Dec 16 10:10:57 EST 2014

Re: Remove strikethrough
llandale - Thu Dec 18 18:09:51 EST 2014

RTF-Tag extraction is VERY tricky.  Its basic form is to search for "\strike" (trailing space), or "\strike" followed by another valid rtfcode, like "\strike\ul".

But wait! 

  • "\\strike " is NOT a valid RTF code and will not add strikethroughs.
    It is text, a slash, "strike", and a space.
  • "\\\strike " the last part is; with a leading text slash. 
  • "\strikeout" is an invalid rtf-code.
  • "\strike\invalid "  don't know what that will do.

You also have the problem of needing to ignore any such \strike that is inside the body of an OLE object.

Yuuuuck, good luck simulating an RTF code parser accurately.

You could I suppose rebuild the entire string, something like this. 

I think you end up losing some clever markup not recognized by DOORS rt."characterisctics".

  • RichTextParagraphs rtp
  • RichText rt
  • Buffer  bufOut = create
  • for rtp in RichTextString do
  • {  for rt in rtp do
  •    {  now I've forgoten what to do.  look for rt.characteristics and copy all except strike out
  •    }
  •    copy all paragraph rtp.characterisctics
  • }
  • set the attr

You may have better luck using OLE automation to send to an empty MS-Word document the entire text, command word to select all and remove strike-outs, then copy it back and insert it back into the object.

Maybe someone as some ideas.

-Louie

Re: Remove strikethrough
llandale - Thu Dec 18 18:14:03 EST 2014

But you asked a question.  this is a crude response, I'm weak wth ReqExp.  But I wonder:

  • you need parents around the "strike" part of your regexp
    Regexp StrikeOutText = regexp2 "^(.*)(\\\\strike[^0])(.*)"
  • I think you need to look up "greedy" regular expressions on Wikipedia.  I think you want "non-greedy" but I don't recall the details.
  • There is a problem when the string has embedded EOLs, and RegExp tends to stop when it finds one.

-Louie

Re: Remove strikethrough
lumish82 - Fri Dec 19 06:36:40 EST 2014

llandale - Thu Dec 18 18:14:03 EST 2014

But you asked a question.  this is a crude response, I'm weak wth ReqExp.  But I wonder:

  • you need parents around the "strike" part of your regexp
    Regexp StrikeOutText = regexp2 "^(.*)(\\\\strike[^0])(.*)"
  • I think you need to look up "greedy" regular expressions on Wikipedia.  I think you want "non-greedy" but I don't recall the details.
  • There is a problem when the string has embedded EOLs, and RegExp tends to stop when it finds one.

-Louie

Hi Louie, thanks for your reply.

I'll read into greedy regexp and hopefully that will help.  From the quick look I did just now, I think you're right that I ideally want the regexp to be non-greedy.

I'll definitely put a buffer in there too - I keep forgetting that's a more efficient/safer way of doing regexp.  Plus look into the rt.characteristics…

But for now a quick reply to say thanks for giving me something to look into and I'll hopefully be able to post some progress after the New Year.

Thanks, Lumish.

Re: Remove strikethrough
Wolfgang Uhr - Tue Jan 13 06:49:47 EST 2015

Hello

Rtf is something like the html owned by microsoft. The structure a it's use is similar. And you cannot use regular expressions to parse xml or html and so you cannot use regular expressions for rtf (http://stackoverflow.com/questions/8577060/why-is-it-such-a-bad-idea-to-parse-xml-with-regex).

If you want to manipulate rtf-data you have to use something like a "rtf dom paser" and no, I actually do not know a usable product.

Best regards

Wolfgang