Best Practice: Writing SPARQL Queries that Compensate for Alternate Types of String Literals
State: Approved
Contact: Nick Crossley
Scope
In RDF 1.0 there were three alternative ways that a literal value can be encoded. A literal could be either plain or typed. Furthermore, a plain literal could have an optional language tag. These alternative values were distinct.
In RDF 1.1, which we use in CLM products today, this was simplified. A plain literal was defined to be of type xsd:string, and a plain literal with a language tag was defined to be of type rdf:langString, so all literals are typed.
For typed or tagged values to be equal, their types or tags must also be equal. These differences might result in unexpected query results unless you take care in writing the query.
Recommendation
The solution is to use the SPARQL STR() function to extract just the string part of the value. SPARQL has other functions for extracting the type and language tag, e.g. if you just want English values.
Examples
To illustrate this, consider the following example resource <http://example.com/ex1/> where the string "title" is represented three ways:
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@base <http://example.com/ex1/> .
<plain> dcterms:title "title" .
<typed> dcterms:title "title"^^xsd:string .
<english> dcterms:title "title"@en .
This example contains 3 triples, as shown by this query:
select *
where {
graph <http://example.com/ex1/> {
?s ?p ?o .
}
}
Now suppose we write a query to find all the resources whose dcterms:title is "title":
select *
where {
graph <http://example.com/ex1/> {
?s ?p "title" .
}
}
The result with RDF 1 would have been just the triple with the plain literal value:
In RDF 1.1, the result includes both the 'plain' literal and the 'typed' literal, because they both have the type xsd:string, as does the literal in the query - so they match:
We can use a FILTER clause to get more triples, even with RDF 1:
select *
where {
graph <http://example.com/ex1/> {
?s ?p ?o .
filter(?o = "title") .
}
}
The definition of the SPARQL equals (=) operator does some conversion, but omits the tagged value.
To get all values, use the SPARQL STR function.
select *
where {
graph <http://example.com/ex1/> {
?s ?p ?o .
filter(str(?o) = "title") .
}
}