I've got an existing DOORS module which happens to have some rich text entries; these entries have some symbols in them such as 'curly' quotes. I'm trying to upgrade a DXL macro which exports a LaTeX source file, and the problem is that these high-number symbols are not considered "standard UTF-8" by TexMaker's import function (and in any case probably won't be processed by Xelatex or other converters) . I can't simply use the `UnicodeString` functions in DXL because those break the rest of the rich text, and apparently the character identifier `charOf(decimal_number_code)` only works over the basic set of characters, i.e. less than some numeric code value. For example, `charOf(8217)` should create a right-curly single quote, but when I tried code along the lines of if (charOf(8217) == one_char) I never get a match. I did copy the curly quote from the DOORS module and verified via an online unicode analyzer that it was definitely Unicode decimal value 8217 . So, what am I missing here? I just want to be able to detect any symbol character, identify it correctly, and then replace it with ,e.g., `\textquoteright` in the output stream.
My overall setup works for lower-count chars, since this works:
thedeg = charOf(176) witthoft_carl - Wed Sep 28 07:26:59 EDT 2016 |
Re: How to find high-number unicode or symbol chars Hey, you are right it seems intOf(char) and charOf(int) both do some modulo 256 and therefore cut anything above that off.
Have you tried: int i=8217; char c = addr_(i); print c; instead? |
Re: How to find high-number unicode or symbol chars O.Wilkop - Wed Sep 28 08:27:17 EDT 2016 Hey, you are right it seems intOf(char) and charOf(int) both do some modulo 256 and therefore cut anything above that off.
Have you tried: int i=8217; char c = addr_(i); print c; instead? The problems are only intOf and charOf functions ... Internally char is an UTF compatible type. You can use it for concatenation and comparison. See the attached file (UTF coded). Execute this file as an include to ensure that encoding is not changed! #include <c:/temp/utf_handling.dxl> This is the code inside the file pragma encoding, "utf-8" string s = "ﮚﬞﮚﬞ" print "IntOf: " (intOf s[1]) "\n" int code = (addr_ s[1]) int; print "Code: " code "\n"; char x = s[0]; string sNew = x x x ""; print "Concatenated: " sNew "\n"; You can see, that using "addr_" you can create and get the full char code. Regards, Mathias Attachments utf_handling.dxl |
Re: How to find high-number unicode or symbol chars O.Wilkop - Wed Sep 28 08:27:17 EDT 2016 Hey, you are right it seems intOf(char) and charOf(int) both do some modulo 256 and therefore cut anything above that off.
Have you tried: int i=8217; char c = addr_(i); print c; instead? Thanks, Oliver. That does return the desired character. If I have success running comparison tests, I'll accept your answer. |
Re: How to find high-number unicode or symbol chars Mathias Mamsch - Wed Sep 28 09:11:12 EDT 2016 The problems are only intOf and charOf functions ... Internally char is an UTF compatible type. You can use it for concatenation and comparison. See the attached file (UTF coded). Execute this file as an include to ensure that encoding is not changed! #include <c:/temp/utf_handling.dxl> This is the code inside the file pragma encoding, "utf-8" string s = "ﮚﬞﮚﬞ" print "IntOf: " (intOf s[1]) "\n" int code = (addr_ s[1]) int; print "Code: " code "\n"; char x = s[0]; string sNew = x x x ""; print "Concatenated: " sNew "\n"; You can see, that using "addr_" you can create and get the full char code. Regards, Mathias By the way, an automatic conversion of char to int can be reached by using an integer reference to a char. This way you save the effort to call "addr_" every time, especially on string functions that need to be very performant if called very often:
string s = "ABCDEF";
char c = null;
// an integer reference to c! It will always reflect the value of c as int.
int &ref = addr_ (&c);
for (i = 0; i < length s; i++) {
c = s[i];
if (ref == 67) print "c found at index " i "\n"
}
Regards, Mathias |