Why does merging Hebrew source files in Visual Studio add invalid characters ?
Using RTC VS Client 3.0.1.3
When I use the VS RTC Compare & Merge tool, I see invalid characters instead of regular Hebrew characters. When the merge operation occurs, these invalid characters as passed into the source code and this has been installed into the production environment, causing the developed application to view invalid characters instead of Hebrew characters.
I confirm that the local file encoding is UTF-8.
This happens for all users on all machines having hebrew in source code files.
Is this a bug?
When I use the VS RTC Compare & Merge tool, I see invalid characters instead of regular Hebrew characters. When the merge operation occurs, these invalid characters as passed into the source code and this has been installed into the production environment, causing the developed application to view invalid characters instead of Hebrew characters.
I confirm that the local file encoding is UTF-8.
This happens for all users on all machines having hebrew in source code files.
Is this a bug?
Accepted answer
No - the files being checked in are not being marked as UTF-8.
During check-in from the Eclipse client, Eclipse is asked what the encoding is. Eclipse has a heuristic to decide. During check-in from the VS client, we have a heuristic in the daemon to decide.
First off we sniff the contents to see if there is a BOM.
If there is we decide the encoding from that.
If there isn't we look at the magic file.
If the magic file doesn't specify it, we use the system's encoding.
I was able to check-in a change with UTF-8 from VS Client.
1 - in VS Client - select your file - select - File --> Advanced Save Option
2 - set to UTF-8 with signature
3 - made some changes
4 - check-in
5 - show history - open compare editor for this change - content is not garbled anymore
(note - I did not use the magic file)
During check-in from the Eclipse client, Eclipse is asked what the encoding is. Eclipse has a heuristic to decide. During check-in from the VS client, we have a heuristic in the daemon to decide.
First off we sniff the contents to see if there is a BOM.
If there is we decide the encoding from that.
If there isn't we look at the magic file.
If the magic file doesn't specify it, we use the system's encoding.
I was able to check-in a change with UTF-8 from VS Client.
1 - in VS Client - select your file - select - File --> Advanced Save Option
2 - set to UTF-8 with signature
3 - made some changes
4 - check-in
5 - show history - open compare editor for this change - content is not garbled anymore
(note - I did not use the magic file)
Comments
You should mark your question as answered. :)
Thanks Evan - yes - I know that - we looked into this with Seth.
For some reason, I can not flag my own questions as answered.
Any hint appreciated.
Thanks
The model of asking and answering your own question isn't a use case that the Forum was designed to handle, given that it breaks the reputation gain model. We could open this as a new enhancement request.