It's all about the answers!

Ask a question

CCC imported XML UTF-16 line delimiter mangled

Vince Thyng (13723152) | asked Dec 22 '14, 6:16 p.m.
After importing UTF-16 XML files, the line delimiters are not correct.  UTF-8 files appear to be fine.  What we are seeing is in the original file in ClearCase, a new line would look like:
00 0D 00 0A
but the imported version of the file has:
00 0D 0A 00 0D 0A

This seems like a UTF-8 line delimiter changer was run against the UTF-16 files when it should have been UTF-16. 

I am looking for a better understanding of how this came about.  Does the import look to see if the file is UTF-16?  Which part of the process exactly foes this, export or import?  I am thinking we could run a UTF-8 tool against the files, looking for 00 0D 0A 00 0D 0A, and shrinking it to 00 0D 00 0A.

2 answers

permanent link
Geoffrey Clemm (30.0k23035) | answered Dec 22 '14, 11:58 p.m.
I'm not aware of any way to specify the encoding during an import.   If that is right, you'll want to import the files without a line-ending change, and then do the line-ending changes with a script after the import is completed.

Vince Thyng commented Dec 23 '14, 12:38 p.m.

I saw some mention of LINE_DELIMITER custom field you can add to your stream or view in clear case for syncs between CC and RTC.  Do you know if ccc export ccase performs any line delimiter translation?  The change is apparent on the very first versions to arrive in RTC, but is not there in CC so I am thinking either export or import performed the change.

Geoffrey Clemm commented Dec 24 '14, 2:20 a.m.

Yes, if you specify that the line delimiter in the RTC database should be different from that in the CC VOB, then the importer will attempt to modify the line endings, so you should declare that the line endings are the same.
To do so, after creating a synchronization stream, but before synchronizing any files (e.g., cancel out of the "select files to synchronize" wizard), right click on the new sync stream in the ClearCase Synchronized Stream view, and select Properties.   Then select "ClearCase Provider Properties", and modify the LINE_DELIMITER_WORKSPACE property so that it matches the LINE_DELIMITER property.  

Vince Thyng commented Jan 05 '15, 6:30 p.m.

Since this was a one time import, does this still apply?  I am not sure where the ClearCase Provider Properties would be in the case where I use ccc import ccase just once.

Geoffrey Clemm commented Jan 05 '15, 9:24 p.m.

I haven't used the "ccc export/import ccase" version importer tool, so I've forwarded this question to the version importer dev team, so they can comment.

permanent link
Yoshio Horiuchi (611) | answered Jan 06 '15, 4:00 a.m.
I guess that this is a file encoding issue.
Please check Character Encoding property of the files in RTC. If it's not UTF-16LE, you may need to update the exported data to change file encoding before importing.

To update file encoding, update encoding command can be used;
'ccc update ccase encoding -d {data dir} -e UTF-16LE ...'

The exporter detects files that have unmappable characters in your system locale, and displays CRRTC4203W 'unencodable characters' warning. If the warning is displayed, you can find files to be updated in logs\unencodableTextFileVersions.txt file.

Vince Thyng commented Jan 07 '15, 3:57 p.m.

There were several unencodable character errors during export.  At this point I have already written a tool to correct the files.  The files may still be incorrectly set though, but I will work with the users to determine exactly how they should be set.  Thank you!

Your answer

Register or to post your answer.