Structured Query Additional Service

 This service allows precise, structured queries over Resource Properties stored about all simple and collection resources.

Motivation

How does a client find resources previously stored on the server? The GET method provides direct retrieval of a resource, but only to a client that has the URI for that resource. What we are talking about here is indirect retrieval of resources - where a client wants to access resources by some means other than with their URI.

Index properties are embedded in (or otherwise buried in) the representation of resources. Since the server mediates all operations that create, update, and delete resources, it's in a position to extract these index properties, and put them in an internal index that the server maintains. The client can run queries for resources with particular index properties. The server answers these queries efficiently by consulting its internal index. The query can also indicate that some or all of the resource's index properties should be returned in the response, which allows the client to discover something about each of the "hits" without having to make addition requests to retrieve those resources individually.

How properties are extracted from resources is controlled by the server's indexing rules. The server's internal index also records certain system information, including its content type and last modified date, about each Storage resource (including resources that are not XML-based).

The server maintains an up-to-date internal index of the set of index properties for each Storage resource, where the Storage resources are identified by their URI. This includes removing out of date index properties when a resource is deleted or is changed. Indexing is performed on a resource's representation as the resource is operated on. This occurs when a Storage resource is created via a PUT or POST, updated via a PUT, or deleted via DELETE. Note that POSTing to a Collection resource does not result in the Collection resource being re-indexed; indexing of a Collection resource only happens when the Collection resource is explicitly created, deleted, or updated. The updating of the server's internal index and internal representation of the Storage resource MUST occur as an atomic transaction. A client query that runs immediately after a Storage resource has been updated may rely on the server's index accurately reflecting the resource's current set of index properties.

Only Storage resources (including AtomPub-style collections, entries, and media resources) are covered by the server's index. The server's index does not include Revision resources, User resources, etc.

Specification

The query service is provided in two forms, a simple conjunctive, URL-encoded form and a more complex database-provided form. The query service URL may be determined from the root service document. Note that a client must have the Reader role to perform a query but must not return any resources the authenticated user does not have permission to GET.

Method URL Comments
GET/HEAD {query-uri} Returns the Opensearch description document.
GET/HEAD {query-uri}?{terms} Executes the simple query, as defined below.
POST {query-uri} Executes the complex query, as defined below.
PUT {query-uri} Not permitted - returns 405 (Method not allowed).
DELETE {query-uri} Not permitted - returns 405 (Method not allowed).

Notes:

  1. If the {query-string} is malformed, the server MUST return 400 Bad Request.

URL-Encoded Query Service

The client can run queries for resources that include particular index terms. The server answers these queries efficiently by consulting its internal index. Queries are simple conjunctions of property = value terms. A query term is either:

  • [{t} : ] {k} = {v} where k is a property name, v is a pattern to match against the value of that property, and t is the property's type - matches resources with an index property k with value v; k is either a server-provided property (table below) or an extracted property key; t MUST be "int", "boolean", "date", or "uri"; if the type is omitted, the datatype is a string. For boolean-, date-, and integer-valued properties, v MUST be a literal constant. For string- and uri-valued properties, v MUST be either a literal constant or a pattern consisting of a literal prefix followed by a single '*' character. These patterns match values that start with the prefix. General wildcarding is not supported; '*' characters in other than the last position are taken literally.
  • queryNS = {namespace} where namespace is an XML namespace URI as the implicit qualification for all keys that are simple names (other than the server-provided properties).
  • properties [ = {k}[,{k}]...] where each k is a property name - k is either the name of a server-provided property (other than "resource-modified-since") or an extracted property key. Property keys extracted from XML documents are URIs of the form {namespace}#{element} where namespace is the XML namespace URI and element is the name of an element type in that namespace, or the single '*' character. Simple names, other than server-provided properties, can be implicitly quialified by the queryNS parameter.
  • {query-term} ( '&' {query-term} )+ - 2 or more query terms separated by '&' - match resources that match all the index properties

For simple cases, the query is encoded in the query part of the URI more or less directly. Property names can be embedded directly into the URI, as can the characters '=' and '&'. Property values must be encoded if they include certain characters (all unreserved characters (section 2.3 of RFC 3986 Uniform Resource Identifier (URI): Generic Syntax) can be used directly).

The following table lists the set of server-provided properties. The client MAY use these names in queries note that the rdf prefix is bound to the URL http://www.w3.org/1999/02/22-rdf-syntax-ns#, dcterms to http://purl.org/dc/terms/ and ors to http://example.org/xmlns/openservices/properties/v0.6.

Name Type Prvider Comments
rdf:about xsd:anyURI Server The URI of the resource.
dcterms:format xsd:string Server The MIME content type specified on the last update.
dcterms:contributor xsd:anyURI Server The URI of the user that last modified the resource.
dcterms:modified xsd:date Server The date the resource was last modified.
rdf:type xsd:anyURI Server XML root element
ors:resource-modified-since xsd:date Server Resources modified since the given date.
ors:resource-collection xsd:anyURI Server Entries and media resources will have this property. This property will be the URI of the collection in which the resource was created.
ors:resource-entry xsd:anyURI Server Media resources will have this property. This property will be the URI of the entry that was created for this resource when the resource was POSTed to the collection.

Properties keys extracted from XML documents are URIs of the form {namespace}#{element} where namespace is the XML namespace URI and element is the name of an element type in that namespace. Note that the '#' is one of the characters that must be percent encoded (%23).

Note that the query term resource-modified-since=2007-11-08T20:00:01Z is roughly equivalent to resource-last-modified >= 2007-11-08T20:00:01Z (were ">=" a legal operator in query terms, which it's not). While the server may store modification timestamps at higher resolution, times passed in Last-Modified headers are in seconds. This means that if one updates a resource at time t1 and gets back a Last-Modified response specifying t2, the relationship between the two is t2=truncate-to-seconds(t1). The query term resource-modified-since=t2 will always match the resource that was just updated. The query term resource-modified-since=t3 where t3=t2+1s will match the resource if it was modified at t3 or later; it will not match the resource if the last time it was updated was between t2 and t3, even if it was updated multiple times in that interval.

A query string MAY contain 0 or 1 "properties" query term. This parameter control what information is returned in the query results. If "properties" is not included, the query results include little more than the URI of the "hits". If "properties" is included, the client is requesting that indexed properties be retrieved for and included with each "hit".

Example {query-string}s (un-encoded):

  • http://dublincore.org/documents/dcmi-terms/format=text/plain
  • http://www.w3.org/1999/02/22-rdf-syntax-ns%23about=/jazz/resources/workitems/*
  • http://www.w3.org/1999/02/22-rdf-syntax-ns%23type=http://music.example.org/schema#track
  • http://example.org/xmlns/openservices/properties/v0.6%23resource-modified-since=2007-11-08T20:00:01Z
  • http://music.example.org/schema#genre=pop
  • date:http://music.example.org/schema#release-date=1971-04-30T00:00:01Z
  • uri:http://music.example.org/schema#cover-art=http://music.example.org/cat-1666127636
  • http://music.example.org/schema#genre=rock&http://music.example.org/schema#language=english
  • queryNS=http://music.example.org/schema&genre=rock&language=english
  • queryNS=http://music.example.org/schema&root-element=http://music.example.org/schema#track&properties=title,genre

The client uses GET on a Query Service resource to search for Storage resources with representations with index properties satisfying the encoded query.

Results Representation

The results format for query is similar to that for Full-Text Search, except that relevance:score element is optional in query results. Also, to support large results, the server MAY page query results, see the discussion in Collection Resource Storage.

POST-based Query Service

This service also provides the option for a more complex query service where standard query languages may be used against the "RDF Store". This RDF store is seen as a collection of RDF documents where each document is a representation of all index properties for a given resource (see Index Properties). The limitations of the URL-encoded API above are known, and include:

  • Only conjunction is supported between terms.
  • Only equality is supported as an operator for terms.
  • A term may only appear once in a query.

The extended support is intended specifically to address these limitations and is provided by allowing the client to POST a standard query in a language such as XQuery or SPARQL to the same {query-uri} URL as above. The Open Services for Lifecycle Collaboration provider may provide this query support directly or may delegate to an underlying store (for example in the case of XQuery an XML-enabled database may actually execute the query). The submitted query will need to have certain features to be validated by the query handler:

  • The content-type of the request MUST identify the query language being posted (for example for XQuery this would be application/xquery).
    • POSTing a request with an unknown, or unsupported content type will result in 415 (Unsupported Media Type).
    • Any Open Services for Lifecycle Collaboration provider MUST document the supported query languages and associated content types.
  • The Open Services for Lifecycle Collaboration provider MUST document any specific constraints required by their support of a given query language.
    • For example, if the provider supports XQuery they may document any unsupported specific functions or capabilities.
    • Any request which does not meet these constraints MUST return 400 (Bad Request).

For example, if a provider supports :

POST /jazz/services/query HTTP/1.1
Host: example.com
Date: ...
Authorization: ...
Content-Type: application/xquery
Content-Length: ...

xquery
declare namespace rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#";
declare namespace wi = "http://www.jazz.net/xmlns/workitems/";
for $index in collection("RDF_INDEX")/rdf:Description
  where $index/wi:workItemType = "defect" and 
        $index/wi:internalPriority = 1 and 
        $index/wi:internalState != 4
  return $index/fn:string(@rdf:about);

The results format is the same as that described above for URL-encoded queries.

Examples