This is guidance to understand the RDF, Jena and SPARQL relationships with CLM (Jazz).
RDF, Jena, SPARQL, and CLM basics
Why do I care about RDF, Jena and SPARQL?
CLM is built around the concepts of Semantic Web - the web of things. CLM adheres to the
W3C (World Wide Web Consortium) and IETF (Internet Engineering Task Force) recommendations and specifications for modeling and exchange.
RDF, Jena and SPARQL form the basis for the relationship between the data model, application usage, and data manipulation. Simply put, to model the data effectively RDF is used, Jena is used as the framework for understanding RDF, and Jena's transport is SPARQL.
What is RDF?
RDF (Resource Description Framework) is the basic component of the semantic web as defined through the
W3C. Specifically it is the data model for the representation of a thing, place, object, time frame, etc. used to build out a useful repository. The idea is to enable machine understanding of a resource by providing the meta necessary to describe the thing by providing a usable data model. CLM is built around semantic web and so uses RDF as a fundamental concept. The singular data model allows a common language for exchange between applications.
References:
Wikipedia: RDF w3c RDF primer RDF spec RDF schema RDF XML grammar RFC 3870
What is a Triple/Triplestore?
The statement about an object in RDF is defined in a 3 word descriptor (subject-predicate-object) called a triple. A triplestore is the storage mechanism for a triple.
The triplestores used for CLM 4.0 are over built commercial relational database engines, which interact using SQL, by using SPARQL as the intermediary.
What is Jena?
Jena is a framework for the semantic web written in Java. It is an open source package obtained through Apache. CLM uses Jena API to interact with RDF.
References:
Wikipedia: Jena Framework Apache Jena
What is Jena TDB?
Jazz Foundation Services use the Jena TDB database to store and query over RDF data. TDB is a component of Jena implemented for fast RDF storage.
See the
Apache TDB page and for more information.
Specifics of the architecture can be found in the [[http://jena.apache.org/documentation/tdb/architecture.html][Jena documentation on TDB].
What is SPARQL?
SPARQL is actually used to store, retrieve, and manipulate triples over SQL.
Beyond the syntax, the structure (nature) of the SPARQL query has a direct impact on the resulting SQL query, which in turn finds and returns the result in the relational database as efficiently as the final request and the size and structure of the data tables allow.
References:
w3: SPARQL
NOTE: There are changes and improvements to SPARQL and RDF that are active proposals, see SPARQL 1.1 for more information.
Assessment of RDF, Jena and SPARQL in CLM
The best way to understand the nature of RDF, SPARQL, and Jena in CLM is to look at a typical request for a work item, and see the flow from request to retrieval, and also investigate a query to see the similarities and differences.
Symptoms of a problem
Impact / Scope
Timing (When)
Environmental changes
Using semantic web with CLM in DM
Here are some pointers to places where Sematic Web has visibility in DM:
Recommended analysis steps
Reviewing an RDF
Remember that RDF can represent many different things:
Here is a typical RDF file for RRC describing an Artifact.
<rdf:RDF>
<rdf:Description rdf:nodeID="A0">
<j.2:hasAttrDef rdf:resource="https://fakeexampleserver.jazz.net:9443/rm/types/_iIZ9GrygEeGyA4urzO_ojw"/>
<rdf:value rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">12345</rdf:value>
</rdf:Description>
<rdf:Description rdf:nodeID="A1">
<j.2:hasAttrDef rdf:resource="https://fakeexampleserver.jazz.net:9443/rm/types/_iI2pCrygEeGyA4urzO_ojw"/>
<rdf:value>Mamba Number 3</rdf:value>
</rdf:Description>
<rdf:Description rdf:nodeID="A2">
<j.2:hasAttrDef rdf:resource="https://fakeexampleserver.jazz.net:9443/rm/types/_iIs4CrygEeGyA4urzO_ojw"/>
</rdf:Description>
<rdf:Description rdf:nodeID="A3">
<j.2:hasAttrDef rdf:resource="https://fakeexampleserver.jazz.net:9443/rm/types/_iHglOrygEeGyA4urzO_ojw"/>
<rdf:value rdf:resource="https://fakeexampleserver.jazz.net:9443/rm/types/_iF4NirygEeGyA4urzO_ojw#Text"/>
</rdf:Description>
<rdf:Description rdf:about="https://fakeexampleserver.jazz.net:9443/rm/resources/rrc164302">
<j.0:isPartOf rdf:resource="https://fakeexampleserver.jazz.net:9443/rm/resources"/>
<j.4:appId>IPyWED9U4oYSXbs3N9BSQkAiRNvckY42SmP1asx3udY=</j.4:appId>
<j.3:ArtifactFormat rdf:resource="https://fakeexampleserver.jazz.net:9443/rm/types/_iF4NirygEeGyA4urzO_ojw#Text"/>
<j.4:resourceContext rdf:resource="https://fakeexampleserver.jazz.net:9443/jts/process/project-areas/_c0YqUaUnEd-RB7K0TII4TA"/>
<j.4:resourceContextId>_c0YqUaUnEd-RB7K0TII4TA</j.4:resourceContextId>
<j.0:creator rdf:resource="https://fakeexampleserver.jazz.net:9443/jts/users/geraldMitchell"/>
<j.2:hasAttrVal rdf:nodeID="A0"/>
<j.2:ofType rdf:resource="https://fakeexampleserver.jazz.net:9443/rm/types/_JbS4VryiEeGyA4urzO_ojw"/>
<j.4:etag>"_FNyfJb7VEeGro_Tz28rjsQ"</j.4:etag>
<j.0:contributor rdf:resource="https://fakeexampleserver.jazz.net:9443/jts/users/geraldMitchell"/>
<j.0:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2012-06-25T14:50:18.514Z</j.0:modified>
<j.2:hasAttrVal rdf:nodeID="A2"/>
<j.3:PrimaryText>
hello world
</j.3:PrimaryText>
<j.0:created rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2012-06-11T06:55:03.059Z</j.0:created>
<j.0:format>application/rdf+xml</j.0:format>
<j.0:title>Mamba Number 3</j.0:title>
<j.4:resourceLocation rdf:resource="https://fakeexampleserver.jazz.net:9443/jts/storage/com.ibm.rdm.resources/rrc12345"/>
<j.0:identifier rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">12345</j.0:identifier>
<j.2:hasAttrVal rdf:nodeID="A1"/>
<j.2:hasAttrVal rdf:nodeID="A3"/>
<j.1:parent rdf:resource="https://fakeexampleserver.jazz.net:9443/rm/folders/_YP9IgHMmEeGcrLHosKj6AA"/>
<rdf:type rdf:resource="http://www.ibm.com/xmlns/rdm/rdf/Artifact"/>
</rdf:Description>
</rdf:RDF>
This is a fairly simple graph (math term) of a simple artifact that I am trying to envelope.
RDF is also used to denote the communications channels. Take for example the jazz.net root services
https://jazz.net/jts/rootservices
<!--
Licensed Materials - Property of IBM
(c) Copyright IBM Corporation 2011. All Rights Reserved.
Note to U.S. Government Users Restricted Rights:
Use, duplication or disclosure restricted by GSA ADP Schedule
Contract with IBM Corp.
-->
<rdf:Description rdf:about="https://jazz.net/jts/rootservices">
<!--
Root services resource template for applications based on JAF SDK.
Contains required contributions both for applications and for the JTS.
Applications may add additional services, but may only remove services noted as being "JTS only".
Specification is available at https://jazz.net/wiki/bin/view/Main/RootServicesSpec
-->
<!-- Modify to provide a descriptive title for the application -->
<dc:title xml:lang="en">Rational Jazz Team Server</dc:title>
<!-- The following services must be included in both the JTS and applications -->
<jd:discovery rdf:resource="https://jazz.net/jts/discovery"/>
<jd:friends rdf:resource="https://jazz.net/jts/friends"/>
<jd:infocenterRoot rdf:resource="https://jazz.net/jts/../clmhelp"/>
<jd:viewletServiceRoot rdf:resource="https://jazz.net/jts"/>
<jd:viewletWebUIRoot rdf:resource="https://jazz.net/jts"/>
<jfs:oauthDomain>https://jazz.net/jts</jfs:oauthDomain>
<jfs:oauthRealmName>Jazz</jfs:oauthRealmName>
<jfs:oauthAccessTokenUrl rdf:resource="https://jazz.net/jts/oauth-access-token"/>
<jfs:oauthApprovalModuleUrl rdf:resource="https://jazz.net/jts/_ajax-modules/com.ibm.team.repository.AuthorizeOAuth"/>
<jfs:oauthExpireTokenUrl rdf:resource="https://jazz.net/jts/oauth-expire-token"/>
<jfs:oauthRequestConsumerKeyUrl rdf:resource="https://jazz.net/jts/oauth-request-consumer"/>
<jfs:oauthRequestTokenUrl rdf:resource="https://jazz.net/jts/oauth-request-token"/>
<jfs:oauthUserAuthorizationUrl rdf:resource="https://jazz.net/jts/oauth-authorize"/>
<jfs:jauthCheckAuthUrl rdf:resource="https://jazz.net/jts/jauth-check-auth"/>
<jfs:jauthCheckTokenUrl rdf:resource="https://jazz.net/jts/jauth-check-token"/>
<jfs:jauthIssueTokenUrl rdf:resource="https://jazz.net/jts/jauth-issue-token"/>
<jfs:jauthProxyUrl rdf:resource="https://jazz.net/jts/jauth-proxy"/>
<jfs:jauthRevokeTokenUrl rdf:resource="https://jazz.net/jts/jauth-revoke-token"/>
<jfs:jauthSigninUrl rdf:resource="https://jazz.net/jts/jauth-signin"/>
<jfs:baselines rdf:resource="https://jazz.net/jts/baselines"/>
<jfs:bulkOperations rdf:resource="https://jazz.net/jts/bulk"/>
<jfs:changes rdf:resource="https://jazz.net/jts/changes"/>
<jfs:currentUser rdf:resource="https://jazz.net/jts/whoami"/>
<jfs:history rdf:resource="https://jazz.net/jts/history"/>
<jfs:indexing rdf:resource="https://jazz.net/jts/indexing"/>
<jfs:mailer rdf:resource="https://jazz.net/jts/mailer"/>
<jfs:query rdf:resource="https://jazz.net/jts/query"/>
<jfs:search rdf:resource="https://jazz.net/jts/search"/>
<jfs:storage rdf:resource="https://jazz.net/jts/storage"/>
<jfs:users rdf:resource="https://jazz.net/jts/users"/>
<jfs:setupWizardDescriptor rdf:resource="https://jazz.net/jts/service/com.ibm.team.repository.service.internal.setup.ISetupWizardDescriptorService"/>
<jdb:dashboards rdf:resource="https://jazz.net/jts/dashboards"/>
<ju:widgetCatalog rdf:resource="https://jazz.net/jts/jfs/WidgetCatalog"/>
<jp06:processAbout rdf:resource="https://jazz.net/jts/process-about"/>
<jp06:processSecurity rdf:resource="https://jazz.net/jts/process-security"/>
<jp06:processTemplates rdf:resource="https://jazz.net/jts/process/templates"/>
<jp06:projectAreas rdf:resource="https://jazz.net/jts/process/project-areas"/>
<jtp:associations rdf:resource="https://jazz.net/jts/process-authoring/associations"/>
<jtp:defaultPracticeLibraryUrl rdf:resource="https://jazz.net/jts/process-authoring/libraries/shared"/>
<jtp:file rdf:resource="https://jazz.net/jts/process-authoring/file"/>
<jtp:license rdf:resource="https://jazz.net/jts/process-authoring/license"/>
<jtp:practices rdf:resource="https://jazz.net/jts/process-authoring/practices"/>
<jtp:processDescriptions rdf:resource="https://jazz.net/jts/process-authoring/descriptions"/>
<oslc:publisher rdf:resource="https://jazz.net/jts/application-about"/>
<!-- End of services common to JTS and applications -->
<!-- The following services are supported only in the JTS, and should be removed for applications -->
<jfs:urlMappingFeed rdf:resource="https://jazz.net/jts/urlMappings/prefixes"/>
<jfs:serverRenameStatus rdf:resource="https://jazz.net/jts/serverRenameStatus"/>
<!-- End of JTS-only services -->
<!-- Applications may add any services they provide here -->
<!-- The admin Web UI service should be uncommented in applications
<jfs:adminWebUI rdf:resource="https://jazz.net/jts/admin" />
-->
<!-- The registration handler service should be uncommented for application that do not supply their own
<jd:registration rdf:resource="https://jazz.net/jts/service/com.ibm.team.repository.service.internal.setup.IRegistrationHandlerService" />
-->
<!-- End of application-specific services -->
</rdf:Description>
Reviewing SPARQL in logs
The logging for SPARQL can be easily viewed through the application server log files.
Current logging for the SPARQL queries are shown when the query takes a time over a particular threshold.
NOTE: some of the following examples are not actual issues, or are issues from previous releases that have been resolved. These are used as the basis of examples only, and the data has been changed to be entirely fictional or implausible.
- a query example in jazz.log of a very long query, no results. This resulted because of a long wait time not its own query.
2013-12-11 20:09:08,765
[ http-9443-Processor16]
WARN
sparqlLogger
query => DESCRIBE
<https://oldfakeserver.jazz.net:9443/rdm/tags/_0_DvYEL4EeCMmovJqp_Txw>
<https://oldfakeserver.jazz.net:9443/rdm/tags/_wwLDgdsvs-fREeGchs_-xUS28A>
<https://oldfakeserver.jazz.net:9443/rdm/tags/_PQY2UlZOvbs34ofi6KAZg>
<https://oldfakeserver.jazz.net:9443/rdm/tags/_N5XJkEFGGVcdrqyX4JLh5A>
<https://oldfakeserver.jazz.net:9443/rdm/tags/_n19Iw1ZMEeCXx4ofi6KAZg>
<https://oldfakeserver.jazz.net:9443/rdm/tags/_K4_iII_ds_dvsdYcfFi9LJA>
<https://oldfakeserver.jazz.net:9443/rdm/tags/_CzgwwKwqEd-RB7K0TII4TA>
<https://oldfakeserver.jazz.net:9443/rdm/tags/_7-k0oIXzEeCid_twUTmPug>
<https://oldfakeserver.jazz.net:9443/rdm/tags/_iXx1YdoNEeCnZepfr52Nvg>
<https://oldfakeserver.jazz.net:9443/rdm/tags/_7Ob5QSF4CWrqyX4JLh5A>
unscoped, user belongs to => 0 context(s)
execution time => 47.12 sec (wait: 47.064 sec, sparql: 48 ms, match size: 0)
In this example the user would have seen a delay on the results, but wouldn't have any returned results from the tags. Things to ask:
- what the user was doing, step by step
- what were the results
- is there another way of getting the correct results
- Examples of things to pay attention to in these logs
- Scoping: unscoped queries look through everything. Looking for more information than need will result in both more manual intervention in the results and a slower response with more information.
- Execution time: this is the time the query spent in the server.
- wait: the server was busy during or before query; this is the time where the query was not working
- sparql: the time spent on the sparql query
- match size is the amount of results returned to the query
Possible Causes and Solutions
The indices need to be re-indexed
Indices will be a direct correlation with the performance of the application. Essentially, the index is a look-up table for the information that can be more easily queried and result in faster responses.
The indices are explained
here. You can re-index the individual indices or all at once and compact the indices using the repotools commands.
Related topics: None
External links:
Additional contributors: None