CCC imported XML UTF-16 line delimiter mangled
00 0D 00 0A
but the imported version of the file has:
00 0D 0A 00 0D 0A
This seems like a UTF-8 line delimiter changer was run against the UTF-16 files when it should have been UTF-16.
I am looking for a better understanding of how this came about. Does the import look to see if the file is UTF-16? Which part of the process exactly foes this, export or import? I am thinking we could run a UTF-8 tool against the files, looking for 00 0D 0A 00 0D 0A, and shrinking it to 00 0D 00 0A.
2 answers
Comments
I saw some mention of LINE_DELIMITER custom field you can add to your stream or view in clear case for syncs between CC and RTC. Do you know if ccc export ccase performs any line delimiter translation? The change is apparent on the very first versions to arrive in RTC, but is not there in CC so I am thinking either export or import performed the change.
Yes, if you specify that the line delimiter in the RTC database should be different from that in the CC VOB, then the importer will attempt to modify the line endings, so you should declare that the line endings are the same.
To do so, after creating a synchronization stream, but before synchronizing any files (e.g., cancel out of the "select files to synchronize" wizard), right click on the new sync stream in the ClearCase Synchronized Stream view, and select Properties. Then select "ClearCase Provider Properties", and modify the LINE_DELIMITER_WORKSPACE property so that it matches the LINE_DELIMITER property.
Since this was a one time import, does this still apply? I am not sure where the ClearCase Provider Properties would be in the case where I use ccc import ccase just once.
I haven't used the "ccc export/import ccase" version importer tool, so I've forwarded this question to the version importer dev team, so they can comment.
Please check Character Encoding property of the files in RTC. If it's not UTF-16LE, you may need to update the exported data to change file encoding before importing.
To update file encoding, update encoding command can be used;
'ccc update ccase encoding -d {data dir} -e UTF-16LE ...'
The exporter detects files that have unmappable characters in your system locale, and displays CRRTC4203W 'unencodable characters' warning. If the warning is displayed, you can find files to be updated in logs\unencodableTextFileVersions.txt file.