Remove or replace control characters/linefeed in object text before Output

I have a dxl script that reads and downloads information from DOORS modules. The output consists of object text and attributes as a tab-delimited text file.

Sometimes the contents of the object or attribute has a linefeed or other control character. The linefeed causes problems when I read the tab-delimited text file into Excel; it causes the location of the data records to shift to the next column or adds another line in the middle of the data record.

Is there a way to test for these control characters in the object text or attribute whent the data is read, remove or replace the control character with a space or "~", and then output the data into the tab-delimited text file?

Thank you in advance.

Willie
williamyjk - Thu Nov 12 13:16:34 EST 2009

Re: Remove or replace control characters/linefeed in object text before Output
doors36677 - Thu Nov 12 14:03:14 EST 2009

Probably replaceing the characters is not the optimum way to go -- the best approach is to modify your dxl to output a file that meets the specification for a tab delimited file.

Re: Remove or replace control characters/linefeed in object text before Output
williamyjk - Thu Nov 12 14:23:15 EST 2009

doors36677 - Thu Nov 12 14:03:14 EST 2009
Probably replaceing the characters is not the optimum way to go -- the best approach is to modify your dxl to output a file that meets the specification for a tab delimited file.

The tab delimited text file output is easily read into Excel since it interprets the tabs as a seperate data record (a new column on the row). However, if there are linefeed characters in the object text, that record is broken into two rows instead of one row.

Can you please expand on what it means to "meet the specification for a tab delimited file" in your reply?

Thank you.

Re: Remove or replace control characters/linefeed in object text before Out
kbmurphy - Thu Nov 12 14:41:31 EST 2009

williamyjk - Thu Nov 12 14:23:15 EST 2009
The tab delimited text file output is easily read into Excel since it interprets the tabs as a seperate data record (a new column on the row). However, if there are linefeed characters in the object text, that record is broken into two rows instead of one row.

Can you please expand on what it means to "meet the specification for a tab delimited file" in your reply?

Thank you.

Surround all text being exported to the tab-separated value file with quotation marks.

The CSV/TSV format does not allow line breaks that are not surrounded by quotation marks.

http://en.wikipedia.org/wiki/Comma-separated_values

Re: Remove or replace control characters/linefeed in object text before Out
williamyjk - Thu Nov 12 15:27:38 EST 2009

kbmurphy - Thu Nov 12 14:41:31 EST 2009
Surround all text being exported to the tab-separated value file with quotation marks.

The CSV/TSV format does not allow line breaks that are not surrounded by quotation marks.

http://en.wikipedia.org/wiki/Comma-separated_values

I tried your suggestion by creating a CSV file with a linefeed in one of the records:

1997,Ford,E350,"Go get one now
they are going fast"

When I read this text file into Excel using the "Text Import Wizard", the record with the linefeed is broken into two rows.

Re: Remove or replace control characters/linefeed in object text before Out
doors36677 - Thu Nov 12 15:46:35 EST 2009

williamyjk - Thu Nov 12 15:27:38 EST 2009
I tried your suggestion by creating a CSV file with a linefeed in one of the records:

1997,Ford,E350,"Go get one now
they are going fast"

When I read this text file into Excel using the "Text Import Wizard", the record with the linefeed is broken into two rows.

Don't use the import tool.

Save file with a .csv suffix then double click to open the file.

Re: Remove or replace control characters/linefeed in object text before Output
doors36677 - Thu Nov 12 15:54:56 EST 2009

One other tid bit of info you will need -- with Telelogic switch utf-8 encoding you will have to change your dxl file output to ansi encoding. Excel only recognizes ansi encoding and screws up all the rest.

Re: Remove or replace control characters/linefeed in object text before Out
williamyjk - Thu Nov 12 16:08:18 EST 2009

doors36677 - Thu Nov 12 15:46:35 EST 2009
Don't use the import tool.

Save file with a .csv suffix then double click to open the file.

The linefeed still breaks the record into two rows (try the attached file).
Attachments

attachment_14341324_CVS_Test.cvs

Re: Remove or replace control characters/linefeed in object text before Out
doors36677 - Thu Nov 12 16:08:37 EST 2009

williamyjk - Thu Nov 12 16:08:18 EST 2009
The linefeed still breaks the record into two rows (try the attached file).

You have the wrong extension on the file. cvs is a canvas file csv stands for comma delimited file

after changing to csv -- file works fine on my pc.

Re: Remove or replace control characters/linefeed in object text before Out
williamyjk - Thu Nov 12 17:12:03 EST 2009

doors36677 - Thu Nov 12 16:08:37 EST 2009
You have the wrong extension on the file. cvs is a canvas file csv stands for comma delimited file

after changing to csv -- file works fine on my pc.

I changed the extension to CSV (not CVS) and if there is a linefeed within text surrounded by double quotes, the data is not broken into two lines.

However, if there is a comma within the text surrounded by double quotes, the data is broken into two cells (see attached test file).

Any suggestions?
Attachments

attachment_14347186_CSV_Test2.csv

Re: Remove or replace control characters/linefeed in object text before Output
williamyjk - Thu Nov 12 17:14:13 EST 2009

doors36677 - Thu Nov 12 15:54:56 EST 2009
One other tid bit of info you will need -- with Telelogic switch utf-8 encoding you will have to change your dxl file output to ansi encoding. Excel only recognizes ansi encoding and screws up all the rest.

Is there a dxl command to output as ansi encoding?

Re: Remove or replace control characters/linefeed in object text before Output
doors36677 - Thu Nov 12 17:16:38 EST 2009

williamyjk - Thu Nov 12 17:14:13 EST 2009
Is there a dxl command to output as ansi encoding?

Look in help file for stream options

Re: Remove or replace control characters/linefeed in object text before Out
SystemAdmin - Fri Nov 13 05:22:04 EST 2009

williamyjk - Thu Nov 12 17:12:03 EST 2009
I changed the extension to CSV (not CVS) and if there is a linefeed within text surrounded by double quotes, the data is not broken into two lines.

However, if there is a comma within the text surrounded by double quotes, the data is broken into two cells (see attached test file).

Any suggestions?

Yes, Excel does not seem to like this :-(

That's why I prefere to export to Excel use that for updates etc and if this is to be reimported to DOORS save the Excel document (one worksheet at the time) as tab delimited and import that. DOORS does seem to handle line breaks and delimiters inside quotes just fine.

/Kristian

Re: Remove or replace control characters/linefeed in object text before Out
SystemAdmin - Fri Nov 13 05:38:45 EST 2009

SystemAdmin - Fri Nov 13 05:22:04 EST 2009
Yes, Excel does not seem to like this :-(

That's why I prefere to export to Excel use that for updates etc and if this is to be reimported to DOORS save the Excel document (one worksheet at the time) as tab delimited and import that. DOORS does seem to handle line breaks and delimiters inside quotes just fine.

/Kristian

Yes, Excel does not seem to work like this :-(

Re: Remove or replace control characters/linefeed in object text before Output
llandale - Fri Nov 13 12:12:55 EST 2009

Let me code up something without testing to do character replaces such as you suggest.

void    ReplaceCharCodes(Buffer &bufInput, string Codes, char charReplace)
{       // Replace all instances of any of the characters in string Codes with the single charReplace
 
 
    if (null bufInput or null Codes)    return  // nothing to do
        if (null charReplace)                   return  // I believe inserting null will terminate the buffer
 
        int     LenCode = length(Codes),
                i,                      // Pointer into Codes
                LocNew, LocOld          // Found positions of Code in Buffer
        char    ch
 
 
        for (i=0; i<LenCode; i++)
        {     // For every character in Codes, replace all instances in the Buffer
           ch           = Codes[i]
           LocNew       = 0
           LocOld       = -1
           while(LocNew >= 0)
           {  LocNew = contains(bufInput, Ch, LocOld+1)
              if (LocNew >= 0) bufInput[LocNew] = charReplace
              LocOld    = LocNew
                 print "Loop, i,LocOld, LocNew = " i "\t" LocOld "\t" LocNew "\n"
           }
        }     // end for all Codes to replace
}     // end ReplaceCharCodes(Buffer)
 
Buffer  gl_ReplaceCharCodes = create()   // local to following function:
string  ReplaceCharCodes(string InString, string Codes, char charReplace)
{     // Overloaded, replaces in a string
 
        gl_ReplaceCharCodes = InString
        ReplaceCharCodes(gl_ReplaceCharCodes, Codes, charReplace)
        return(stringOf(gl_ReplaceCharCodes))
}     // end overloaded ReplaceCharCodes(string)
 
// Test:
string Codes  = "\n\t" ascii(212) "\r"
string Before = "ABC\tEFG\n\r" ascii(212) "hij\n"
string After  = ReplaceCharCodes(Before, Codes, '~')
 
print "Before = [[" Before "]] end Before\n"
print "After  = [[" After  "]] end After\n"

 


Its trickier if you want to delete the codes out of the sting. You need to plow through each character in the buffer, copying it to an out buffer if its not one of the codes. You can then delete the input buffer then assign the input to the output, since that forms an alias.

 

 

 

  • Louie

 

 

Re: Remove or replace control characters/linefeed in object text before Output
williamyjk - Tue Nov 17 21:20:57 EST 2009

llandale - Fri Nov 13 12:12:55 EST 2009

Let me code up something without testing to do character replaces such as you suggest.

void    ReplaceCharCodes(Buffer &bufInput, string Codes, char charReplace)
{       // Replace all instances of any of the characters in string Codes with the single charReplace
 
 
    if (null bufInput or null Codes)    return  // nothing to do
        if (null charReplace)                   return  // I believe inserting null will terminate the buffer
 
        int     LenCode = length(Codes),
                i,                      // Pointer into Codes
                LocNew, LocOld          // Found positions of Code in Buffer
        char    ch
 
 
        for (i=0; i<LenCode; i++)
        {     // For every character in Codes, replace all instances in the Buffer
           ch           = Codes[i]
           LocNew       = 0
           LocOld       = -1
           while(LocNew >= 0)
           {  LocNew = contains(bufInput, Ch, LocOld+1)
              if (LocNew >= 0) bufInput[LocNew] = charReplace
              LocOld    = LocNew
                 print "Loop, i,LocOld, LocNew = " i "\t" LocOld "\t" LocNew "\n"
           }
        }     // end for all Codes to replace
}     // end ReplaceCharCodes(Buffer)
 
Buffer  gl_ReplaceCharCodes = create()   // local to following function:
string  ReplaceCharCodes(string InString, string Codes, char charReplace)
{     // Overloaded, replaces in a string
 
        gl_ReplaceCharCodes = InString
        ReplaceCharCodes(gl_ReplaceCharCodes, Codes, charReplace)
        return(stringOf(gl_ReplaceCharCodes))
}     // end overloaded ReplaceCharCodes(string)
 
// Test:
string Codes  = "\n\t" ascii(212) "\r"
string Before = "ABC\tEFG\n\r" ascii(212) "hij\n"
string After  = ReplaceCharCodes(Before, Codes, '~')
 
print "Before = [[" Before "]] end Before\n"
print "After  = [[" After  "]] end After\n"

 


Its trickier if you want to delete the codes out of the sting. You need to plow through each character in the buffer, copying it to an out buffer if its not one of the codes. You can then delete the input buffer then assign the input to the output, since that forms an alias.

 

 

 

  • Louie

 

 

Hi Louie,

The dxl worked excellently.

Thank you.