This wiki: The development wiki is a work area where Jazz development teams plan and discuss technical designs and operations for the projects at Jazz.net. Work items often link to documents here. You are welcome to browse, follow along, and participate. Participation is what Jazz.net is all about! But please keep in mind that information here is "as is", unsupported, and may be outdated or inaccurate. For information on released products, consult IBM Knowledge Center, support tech notes, and the Jazz.net library. See also the Jazz.net Terms of Use. Any documentation or reference material found in this wiki is not official product documentation, but it is primarily for the use of the development teams. For your end use, you should consult official product documentation (infocenters), IBM.com support artifacts (tech notes), and the jazz.net library as officially "stamped" resources. |
U+00C5
LATIN CAPITAL LETTER A WITH RING ABOVE
U+212B
ANGSTROM SIGN
U+0041
LATIN CAPITAL LETTER A, followed by U+030A
COMBINING RING ABOVE
U+0066
U+0069
represents the string "fi" (LATIN SMALL LETTER F followed by LATIN SMALL LETTER I), while the sequence U+FB01
represents the single character 'fi' (LATIN SMALL LIGATURE FI). By converting Unicode text to Normalization Form KC, the second representation is converted to the first, and the information that a ligature was used is lost.
In summary, NFC removes the distinction between equivalent characters, while preserving the distinction between compatible characters or sequences; NFKC removes the distinction between both equivalent and compatible sequences. NFC conversion is not considered lossy, but NFKC conversion is.
Consider the English word "field". In Unicode this can be written as either of the two compatible forms: U+0066 U+0069 U+0065 U+006C U+0064
U+FB01 U+0065 U+006C U+0064
ex:resource1 ex:property "\uFB01eld" .neither of the following queries will find match that triple:
SELECT * WHERE { ?resource ex:property "field" } SELECT * WHERE { ?resource ex:property ?str . FILTER (STR(?str) = "field") }Normalization of strings is not provided as a standard feature of SPARQL, so it is important to standardize on a normal form for Unicode encoding of literal values in published RDF, and to transform user queries into that same normal form.
str
is converted to NFC using the method call Normalizer.normalize(str,Normalizer.Form.NFC)
, or to NFKC using Normalizer.normalize(str,Normalizer.Form.NFKC)
.