Problems with HTML escaping and non-ASCII characters using the JSON OSLC API

Fabian Zaiser

(23●3●6) edited

Aug 09 '16, 11:01 a.m.

I'm using the OSLC API for RTC. Here's what I did:

1. Create a defect with the summary ">". Then retrieve it. The result says that "dcterms:title" is "&gt;". Is this a bug? Shouldn't this be ">"?

2. Use the OSLC query API to search for this issue by 'oslc.where=dcterms:title="&gt;"'. This returns no results. Changing it to 'oslc.where=dcterms:title=">"'. Is there a reason behind this inconsistency?

3. Doing the same with " " (non-breaking space, &nbsp;, \u00A0) does still not give any results using the query API even though I HTML-unescaped and Unicode-escaped the character. How can I work around that? Am I doing something wrong?

This is not an artificial problem. There are lots of defects with non-ASCII characters. Has anyone else encountered (and solved) this issue?

0 votes

1 answer

5,698 views

0 votes

One answer

Permanent link

sam detweiler

(12.5k●6●195●201) edited

Aug 09 '16, 10:46 a.m.

escaping data for JSON is a pain.

I use this online tool to figure out my json problems

http://codebeautify.org/jsonvalidator

I typically have to convert the xml type chars to their escaped values, >, <, ", etc...

online doc for unicode escapes is

Unicode escape sequences

Any character with a character code lower than 65536 can be escaped using the hexadecimal value of its character code, prefixed with \u . (As mentioned before, higher character codes are represented by a pair of surrogate characters.)

Unicode escapes are six characters long. They require exactly four characters following \u . If the hexadecimal character code is only one, two or three characters long, you’ll need to pad it with leading zeroes.

The copyright symbol ( '©' ) has character code 169 , which gives A9 in hexadecimal notation, so you could write it as '\u00A9' . Similarly, '♥' could be written as '\u2665' .

The hexadecimal part of this kind of character escape is case-insensitive; in other words, '\u00a9' and '\u00A9' are equivalent.

You could define Unicode escape syntax using the following regular expression: \\u[a-fA-F0-9]{4} .

0 votes

Comments

Fabian Zaiser

Aug 09 '16, 10:57 a.m.

Thanks for your answer! Unfortunately, it doesn't solve my problem. My problem is not how to escape Unicode characters but that the query API doesn't work despite escaping (my bullet point 3). (My other problem is being annoyed with the HTML escaping but I can work around that.)

Your answer

Dashboards and work items are no longer publicly available, so some links may be invalid. We now provide similar information through other means. Learn more here.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here.

By RSS:

Answers

Answers and Comments

Question details

rational-team-concert

× 43,032

troubleshooting

× 6,066

oslc

× 523

oslc-api

× 62

unicode

× 1

Question asked: Aug 09 '16, 10:18 a.m.

Question was seen: 5,698 times

Last updated: Aug 09 '16, 11:01 a.m.

Problems with HTML escaping and non-ASCII characters using the JSON OSLC API

Fabian Zaiser

1 answer

5,698 views

0 votes

One answer

sam detweiler

Unicode escape sequences

Comments

Fabian Zaiser

Your answer

Follow this question

Question details

Related questions