It's all about the answers!

Ask a question

Problems with HTML escaping and non-ASCII characters using the JSON OSLC API


Fabian Zaiser (2316) | asked Aug 09 '16, 10:18 a.m.
edited Aug 09 '16, 11:01 a.m.
I'm using the OSLC API for RTC. Here's what I did:

1. Create a defect with the summary ">". Then retrieve it. The result says that "dcterms:title" is ">". Is this a bug? Shouldn't this be ">"?

2. Use the OSLC query API to search for this issue by 'oslc.where=dcterms:title=">"'. This returns no results. Changing it to 'oslc.where=dcterms:title=">"'. Is there a reason behind this inconsistency?

3. Doing the same with " " (non-breaking space,  , \u00A0) does still not give any results using the query API even though I HTML-unescaped and Unicode-escaped the character. How can I work around that? Am I doing something wrong?

This is not an artificial problem. There are lots of defects with non-ASCII characters. Has anyone else encountered (and solved) this issue?


One answer



permanent link
sam detweiler (12.4k6181201) | answered Aug 09 '16, 10:43 a.m.
edited Aug 09 '16, 10:46 a.m.
escaping data for JSON is a pain.

I use this online tool to figure out my json problems

http://codebeautify.org/jsonvalidator

I typically have to convert the xml type chars to their escaped  values, >, <, ", etc...

online doc for unicode escapes is

Unicode escape sequences

Any character with a character code lower than 65536 can be escaped using the hexadecimal value of its character code, prefixed with \u . (As mentioned before, higher character codes are represented by a pair of surrogate characters.)

Unicode escapes are six characters long. They require exactly four characters following \u . If the hexadecimal character code is only one, two or three characters long, you’ll need to pad it with leading zeroes.

The copyright symbol ( '©' ) has character code 169 , which gives A9 in hexadecimal notation, so you could write it as '\u00A9' . Similarly, '♥' could be written as '\u2665' .

The hexadecimal part of this kind of character escape is case-insensitive; in other words, '\u00a9' and '\u00A9' are equivalent.

You could define Unicode escape syntax using the following regular expression: \\u[a-fA-F0-9]{4} .



Comments
Fabian Zaiser commented Aug 09 '16, 10:57 a.m.

Thanks for your answer! Unfortunately, it doesn't solve my problem. My problem is not how to escape Unicode characters but that the query API doesn't work despite escaping (my bullet point 3). (My other problem is being annoyed with the HTML escaping but I can work around that.)

Your answer


Register or to post your answer.