Jazz Forum Welcome to the Jazz Community Forum Connect and collaborate with IBM Engineering experts and users

Problems with HTML escaping and non-ASCII characters using the JSON OSLC API

I'm using the OSLC API for RTC. Here's what I did:

1. Create a defect with the summary ">". Then retrieve it. The result says that "dcterms:title" is ">". Is this a bug? Shouldn't this be ">"?

2. Use the OSLC query API to search for this issue by 'oslc.where=dcterms:title=">"'. This returns no results. Changing it to 'oslc.where=dcterms:title=">"'. Is there a reason behind this inconsistency?

3. Doing the same with " " (non-breaking space,  , \u00A0) does still not give any results using the query API even though I HTML-unescaped and Unicode-escaped the character. How can I work around that? Am I doing something wrong?

This is not an artificial problem. There are lots of defects with non-ASCII characters. Has anyone else encountered (and solved) this issue?


0 votes



One answer

Permanent link
escaping data for JSON is a pain.

I use this online tool to figure out my json problems

http://codebeautify.org/jsonvalidator

I typically have to convert the xml type chars to their escaped  values, >, <, ", etc...

online doc for unicode escapes is

Unicode escape sequences

Any character with a character code lower than 65536 can be escaped using the hexadecimal value of its character code, prefixed with \u . (As mentioned before, higher character codes are represented by a pair of surrogate characters.)

Unicode escapes are six characters long. They require exactly four characters following \u . If the hexadecimal character code is only one, two or three characters long, you’ll need to pad it with leading zeroes.

The copyright symbol ( '©' ) has character code 169 , which gives A9 in hexadecimal notation, so you could write it as '\u00A9' . Similarly, '♥' could be written as '\u2665' .

The hexadecimal part of this kind of character escape is case-insensitive; in other words, '\u00a9' and '\u00A9' are equivalent.

You could define Unicode escape syntax using the following regular expression: \\u[a-fA-F0-9]{4} .


0 votes

Comments

Thanks for your answer! Unfortunately, it doesn't solve my problem. My problem is not how to escape Unicode characters but that the query API doesn't work despite escaping (my bullet point 3). (My other problem is being annoyed with the HTML escaping but I can work around that.)

Your answer

Register or log in to post your answer.

Dashboards and work items are no longer publicly available, so some links may be invalid. We now provide similar information through other means. Learn more here.

Search context
Follow this question

By Email: 

Once you sign in you will be able to subscribe for any updates here.

By RSS:

Answers
Answers and Comments
Question details
× 516
× 60
× 1

Question asked: Aug 09 '16, 10:18 a.m.

Question was seen: 3,790 times

Last updated: Aug 09 '16, 11:01 a.m.

Confirmation Cancel Confirm