Understanding Jena And SPARQL

Authors: GeraldMitchell
Last updated: 04 Mar 2013
Build Basis: Collaboration Lifecycle Management (CLM) 4.0 Jazz Team Server(JTS) 4.0 Rational Team Concert(RTC) 4.0

This is guidance to understand the RDF, Jena and SPARQL relationships with CLM (Jazz).

RDF, Jena, SPARQL, and CLM basics

Why do I care about RDF, Jena and SPARQL?

CLM is built around the concepts of Semantic Web - the web of things. CLM adheres to the W3C (World Wide Web Consortium) and IETF (Internet Engineering Task Force) recommendations and specifications for modeling and exchange. RDF, Jena and SPARQL form the basis for the relationship between the data model, application usage, and data manipulation. Simply put, to model the data effectively RDF is used, Jena is used as the framework for understanding RDF, and Jena's transport is SPARQL.

What is RDF?

RDF (Resource Description Framework) is the basic component of the semantic web as defined through the W3C. Specifically it is the data model for the representation of a thing, place, object, time frame, etc. used to build out a useful repository. The idea is to enable machine understanding of a resource by providing the meta necessary to describe the thing by providing a usable data model. CLM is built around semantic web and so uses RDF as a fundamental concept. The singular data model allows a common language for exchange between applications.

References: Wikipedia: RDF w3c RDF primer RDF spec RDF schema RDF XML grammar RFC 3870

What is a Triple/Triplestore?
The statement about an object in RDF is defined in a 3 word descriptor (subject-predicate-object) called a triple. A triplestore is the storage mechanism for a triple. The triplestores used for CLM 4.0 are over built commercial relational database engines, which interact using SQL, by using SPARQL as the intermediary.

What is Jena?

Jena is a framework for the semantic web written in Java. It is an open source package obtained through Apache. CLM uses Jena API to interact with RDF. References: Wikipedia: Jena Framework Apache Jena

What is Jena TDB?

Jazz Foundation Services use the Jena TDB database to store and query over RDF data. TDB is a component of Jena implemented for fast RDF storage. See the Apache TDB page and for more information. Specifics of the architecture can be found in the [[http://jena.apache.org/documentation/tdb/architecture.html][Jena documentation on TDB].

What is SPARQL?

SPARQL is actually used to store, retrieve, and manipulate triples over SQL.

Beyond the syntax, the structure (nature) of the SPARQL query has a direct impact on the resulting SQL query, which in turn finds and returns the result in the relational database as efficiently as the final request and the size and structure of the data tables allow.

References: w3: SPARQL

NOTE: There are changes and improvements to SPARQL and RDF that are active proposals, see SPARQL 1.1 for more information.

Assessment of RDF, Jena and SPARQL in CLM

The best way to understand the nature of RDF, SPARQL, and Jena in CLM is to look at a typical request for a work item, and see the flow from request to retrieval, and also investigate a query to see the similarities and differences.

Symptoms of a problem

Impact / Scope

Timing (When)

Environmental changes

Using semantic web with CLM in DM

Here are some pointers to places where Sematic Web has visibility in DM:

Recommended analysis steps

Reviewing an RDF

Remember that RDF can represent many different things:

Here is a typical RDF file for RRC describing an Artifact.


<rdf:RDF>
   <rdf:Description rdf:nodeID="A0">
      <j.2:hasAttrDef rdf:resource="https://fakeexampleserver.jazz.net:9443/rm/types/_iIZ9GrygEeGyA4urzO_ojw"/>
      <rdf:value rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">12345</rdf:value>
   </rdf:Description>
   <rdf:Description rdf:nodeID="A1">
      <j.2:hasAttrDef rdf:resource="https://fakeexampleserver.jazz.net:9443/rm/types/_iI2pCrygEeGyA4urzO_ojw"/>
      <rdf:value>Mamba Number 3</rdf:value>
   </rdf:Description>
   <rdf:Description rdf:nodeID="A2">
      <j.2:hasAttrDef rdf:resource="https://fakeexampleserver.jazz.net:9443/rm/types/_iIs4CrygEeGyA4urzO_ojw"/>
   </rdf:Description>
   <rdf:Description rdf:nodeID="A3">
      <j.2:hasAttrDef rdf:resource="https://fakeexampleserver.jazz.net:9443/rm/types/_iHglOrygEeGyA4urzO_ojw"/>
      <rdf:value rdf:resource="https://fakeexampleserver.jazz.net:9443/rm/types/_iF4NirygEeGyA4urzO_ojw#Text"/>
   </rdf:Description>
   <rdf:Description rdf:about="https://fakeexampleserver.jazz.net:9443/rm/resources/rrc164302">
      <j.0:isPartOf rdf:resource="https://fakeexampleserver.jazz.net:9443/rm/resources"/>
      <j.4:appId>IPyWED9U4oYSXbs3N9BSQkAiRNvckY42SmP1asx3udY=</j.4:appId>
      <j.3:ArtifactFormat rdf:resource="https://fakeexampleserver.jazz.net:9443/rm/types/_iF4NirygEeGyA4urzO_ojw#Text"/>
      <j.4:resourceContext rdf:resource="https://fakeexampleserver.jazz.net:9443/jts/process/project-areas/_c0YqUaUnEd-RB7K0TII4TA"/>
      <j.4:resourceContextId>_c0YqUaUnEd-RB7K0TII4TA</j.4:resourceContextId>
      <j.0:creator rdf:resource="https://fakeexampleserver.jazz.net:9443/jts/users/geraldMitchell"/>
      <j.2:hasAttrVal rdf:nodeID="A0"/>
      <j.2:ofType rdf:resource="https://fakeexampleserver.jazz.net:9443/rm/types/_JbS4VryiEeGyA4urzO_ojw"/>
      <j.4:etag>"_FNyfJb7VEeGro_Tz28rjsQ"</j.4:etag>
      <j.0:contributor rdf:resource="https://fakeexampleserver.jazz.net:9443/jts/users/geraldMitchell"/>
      <j.0:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2012-06-25T14:50:18.514Z</j.0:modified>
      <j.2:hasAttrVal rdf:nodeID="A2"/>
      <j.3:PrimaryText>

            hello world

      </j.3:PrimaryText>
      <j.0:created rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2012-06-11T06:55:03.059Z</j.0:created>
      <j.0:format>application/rdf+xml</j.0:format>
      <j.0:title>Mamba Number 3</j.0:title>
      <j.4:resourceLocation rdf:resource="https://fakeexampleserver.jazz.net:9443/jts/storage/com.ibm.rdm.resources/rrc12345"/>
      <j.0:identifier rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">12345</j.0:identifier>
      <j.2:hasAttrVal rdf:nodeID="A1"/>
      <j.2:hasAttrVal rdf:nodeID="A3"/>
      <j.1:parent rdf:resource="https://fakeexampleserver.jazz.net:9443/rm/folders/_YP9IgHMmEeGcrLHosKj6AA"/>
      <rdf:type rdf:resource="http://www.ibm.com/xmlns/rdm/rdf/Artifact"/>
   </rdf:Description>
</rdf:RDF>

This is a fairly simple graph (math term) of a simple artifact that I am trying to envelope.

RDF is also used to denote the communications channels. Take for example the jazz.net root services https://jazz.net/jts/rootservices

<!--
    Licensed Materials - Property of IBM
    (c) Copyright IBM Corporation 2011. All Rights Reserved.
   
    Note to U.S. Government Users Restricted Rights:  
    Use, duplication or disclosure restricted by GSA ADP Schedule 
    Contract with IBM Corp. 
 -->
<rdf:Description rdf:about="https://jazz.net/jts/rootservices">
   <!-- 
        Root services resource template for applications based on JAF SDK.
        Contains required contributions both for applications and for the JTS.
        Applications may add additional services, but may only remove services noted as being "JTS only".
        Specification is available at https://jazz.net/wiki/bin/view/Main/RootServicesSpec
   -->
   <!-- Modify to provide a descriptive title for the application -->
   <dc:title xml:lang="en">Rational Jazz Team Server</dc:title>
   <!-- The following services must be included in both the JTS and applications -->
   <jd:discovery rdf:resource="https://jazz.net/jts/discovery"/>
   <jd:friends rdf:resource="https://jazz.net/jts/friends"/>
   <jd:infocenterRoot rdf:resource="https://jazz.net/jts/../clmhelp"/>
   <jd:viewletServiceRoot rdf:resource="https://jazz.net/jts"/>
   <jd:viewletWebUIRoot rdf:resource="https://jazz.net/jts"/>
   <jfs:oauthDomain>https://jazz.net/jts</jfs:oauthDomain>
   <jfs:oauthRealmName>Jazz</jfs:oauthRealmName>
   <jfs:oauthAccessTokenUrl rdf:resource="https://jazz.net/jts/oauth-access-token"/>
   <jfs:oauthApprovalModuleUrl rdf:resource="https://jazz.net/jts/_ajax-modules/com.ibm.team.repository.AuthorizeOAuth"/>
   <jfs:oauthExpireTokenUrl rdf:resource="https://jazz.net/jts/oauth-expire-token"/>
   <jfs:oauthRequestConsumerKeyUrl rdf:resource="https://jazz.net/jts/oauth-request-consumer"/>
   <jfs:oauthRequestTokenUrl rdf:resource="https://jazz.net/jts/oauth-request-token"/>
   <jfs:oauthUserAuthorizationUrl rdf:resource="https://jazz.net/jts/oauth-authorize"/>
   <jfs:jauthCheckAuthUrl rdf:resource="https://jazz.net/jts/jauth-check-auth"/>
   <jfs:jauthCheckTokenUrl rdf:resource="https://jazz.net/jts/jauth-check-token"/>
   <jfs:jauthIssueTokenUrl rdf:resource="https://jazz.net/jts/jauth-issue-token"/>
   <jfs:jauthProxyUrl rdf:resource="https://jazz.net/jts/jauth-proxy"/>
   <jfs:jauthRevokeTokenUrl rdf:resource="https://jazz.net/jts/jauth-revoke-token"/>
   <jfs:jauthSigninUrl rdf:resource="https://jazz.net/jts/jauth-signin"/>
   <jfs:baselines rdf:resource="https://jazz.net/jts/baselines"/>
   <jfs:bulkOperations rdf:resource="https://jazz.net/jts/bulk"/>
   <jfs:changes rdf:resource="https://jazz.net/jts/changes"/>
   <jfs:currentUser rdf:resource="https://jazz.net/jts/whoami"/>
   <jfs:history rdf:resource="https://jazz.net/jts/history"/>
   <jfs:indexing rdf:resource="https://jazz.net/jts/indexing"/>
   <jfs:mailer rdf:resource="https://jazz.net/jts/mailer"/>
   <jfs:query rdf:resource="https://jazz.net/jts/query"/>
   <jfs:search rdf:resource="https://jazz.net/jts/search"/>
   <jfs:storage rdf:resource="https://jazz.net/jts/storage"/>
   <jfs:users rdf:resource="https://jazz.net/jts/users"/>
   <jfs:setupWizardDescriptor rdf:resource="https://jazz.net/jts/service/com.ibm.team.repository.service.internal.setup.ISetupWizardDescriptorService"/>
   <jdb:dashboards rdf:resource="https://jazz.net/jts/dashboards"/>
   <ju:widgetCatalog rdf:resource="https://jazz.net/jts/jfs/WidgetCatalog"/>
   <jp06:processAbout rdf:resource="https://jazz.net/jts/process-about"/>
   <jp06:processSecurity rdf:resource="https://jazz.net/jts/process-security"/>
   <jp06:processTemplates rdf:resource="https://jazz.net/jts/process/templates"/>
   <jp06:projectAreas rdf:resource="https://jazz.net/jts/process/project-areas"/>
   <jtp:associations rdf:resource="https://jazz.net/jts/process-authoring/associations"/>
   <jtp:defaultPracticeLibraryUrl rdf:resource="https://jazz.net/jts/process-authoring/libraries/shared"/>
   <jtp:file rdf:resource="https://jazz.net/jts/process-authoring/file"/>
   <jtp:license rdf:resource="https://jazz.net/jts/process-authoring/license"/>
   <jtp:practices rdf:resource="https://jazz.net/jts/process-authoring/practices"/>
   <jtp:processDescriptions rdf:resource="https://jazz.net/jts/process-authoring/descriptions"/>
   <oslc:publisher rdf:resource="https://jazz.net/jts/application-about"/>
   <!-- End of services common to JTS and applications -->
   <!-- The following services are supported only in the JTS, and should be removed for applications -->
   <jfs:urlMappingFeed rdf:resource="https://jazz.net/jts/urlMappings/prefixes"/>
   <jfs:serverRenameStatus rdf:resource="https://jazz.net/jts/serverRenameStatus"/>
   <!-- End of JTS-only services -->
   <!-- Applications may add any services they provide here -->
   <!-- The admin Web UI service should be uncommented in applications
     <jfs:adminWebUI                     rdf:resource="https://jazz.net/jts/admin" />
   -->
   <!-- The registration handler service should be uncommented for application that do not supply their own
      <jd:registration                     rdf:resource="https://jazz.net/jts/service/com.ibm.team.repository.service.internal.setup.IRegistrationHandlerService" />
   -->
   <!-- End of application-specific services -->
</rdf:Description>

Reviewing SPARQL in logs

The logging for SPARQL can be easily viewed through the application server log files. Current logging for the SPARQL queries are shown when the query takes a time over a particular threshold.

NOTE: some of the following examples are not actual issues, or are issues from previous releases that have been resolved. These are used as the basis of examples only, and the data has been changed to be entirely fictional or implausible.

  • querying a lot of information in rdm.log
    2013-02-29 22:21:20,191
        [         http-9443-Processor91] [74123569] 
             WARN
                ng.server.services.query.internal.QueryServiceUtil
                   Describe Query (QueryServiceUtil) invoked with 512 urls by user: 
                      https://oldfakeserver.jazz.net:9443/jazz/users/ABCD0123 
    
    In this example the query amount of 512 URLs by the user ABCD0123 is probably excessive, and it may be a good idea to ask the user
    • what the user was doing, step by step
    • what were the results
    • is there another way of getting the correct results

  • a review in rdm.log
    2011-11-11 11:11:11,111 
       [         http-9443-Processor82] [193452490] 
          ERROR 
             er.services.reviews.internal.ReviewServiceInternal 
                Review service invoked with no artifactURI by user: 
                   https://oldfakeserver.jazz.net:9443/jazz/users/CDEF1234
    
    In this example a review by the user ABCD0123 wasn't for a valid artifact id
    • what the user was doing, step by step
    • what were the results
    • is there another way of getting the correct results

  • a query example in jazz.log of a very long query, no results. This resulted because of a long wait time not its own query.
    2013-12-11 20:09:08,765
       [         http-9443-Processor16]
          WARN 
             sparqlLogger 
                query => DESCRIBE 
                   <https://oldfakeserver.jazz.net:9443/rdm/tags/_0_DvYEL4EeCMmovJqp_Txw> 
                   <https://oldfakeserver.jazz.net:9443/rdm/tags/_wwLDgdsvs-fREeGchs_-xUS28A> 
                   <https://oldfakeserver.jazz.net:9443/rdm/tags/_PQY2UlZOvbs34ofi6KAZg> 
                   <https://oldfakeserver.jazz.net:9443/rdm/tags/_N5XJkEFGGVcdrqyX4JLh5A> 
                   <https://oldfakeserver.jazz.net:9443/rdm/tags/_n19Iw1ZMEeCXx4ofi6KAZg> 
                   <https://oldfakeserver.jazz.net:9443/rdm/tags/_K4_iII_ds_dvsdYcfFi9LJA> 
                   <https://oldfakeserver.jazz.net:9443/rdm/tags/_CzgwwKwqEd-RB7K0TII4TA> 
                   <https://oldfakeserver.jazz.net:9443/rdm/tags/_7-k0oIXzEeCid_twUTmPug> 
                   <https://oldfakeserver.jazz.net:9443/rdm/tags/_iXx1YdoNEeCnZepfr52Nvg> 
                   <https://oldfakeserver.jazz.net:9443/rdm/tags/_7Ob5QSF4CWrqyX4JLh5A>
               unscoped, user belongs to => 0 context(s)
               execution time => 47.12 sec (wait: 47.064 sec, sparql: 48 ms, match size: 0)
    
    In this example the user would have seen a delay on the results, but wouldn't have any returned results from the tags. Things to ask:
    • what the user was doing, step by step
    • what were the results
    • is there another way of getting the correct results
    • Examples of things to pay attention to in these logs
      • Scoping: unscoped queries look through everything. Looking for more information than need will result in both more manual intervention in the results and a slower response with more information.
      • Execution time: this is the time the query spent in the server.
        • wait: the server was busy during or before query; this is the time where the query was not working
        • sparql: the time spent on the sparql query
        • match size is the amount of results returned to the query

Possible Causes and Solutions

The indices need to be re-indexed

Indices will be a direct correlation with the performance of the application. Essentially, the index is a look-up table for the information that can be more easily queried and result in faster responses. The indices are explained here. You can re-index the individual indices or all at once and compact the indices using the repotools commands.

Keywords to aid searching

Jena SPARQL query JFS Jazz SQL log analysis RDF triple triplestore DM semantic web Resource Description Framework

For further reference

Related Topics: Deployment Web Home, Deployment Web Home

External Links:

Additional Contributors: TWikiUser, TWikiUser

This topic: Deployment > WebHome > DeploymentTroubleshooting > PerformanceTroubleshooting > UnderstandingJenaAndSPARQL
History: r11 - 2013-03-04 - 21:37:08 - Main.geraldmi
 
This site is powered by the TWiki collaboration platformCopyright © by IBM and non-IBM contributing authors. All material on this collaboration platform is the property of the contributing authors.
Contributions are governed by our Terms of Use. Please read the following disclaimer.
Dashboards and work items are no longer publicly available, so some links may be invalid. We now provide similar information through other means. Learn more here.