Introduction to RDF Query with SPARQL
Overview
- What is SPARQL?
- RDF Query Languages Background
- SPARQL Query Language
- RDF Data in Turtle
- Triple and Graph Patterns
- Expressions and Values
- Query Results and Data
- Issues
What Is RDF?
- RDF Graph := a set of RDF Triples
- RDF Triple := a 3-tuple (IRI or Blank Node, IRI, IRI or Blank Node or Literal)
- Blank Node := a graph-scoped identifier
- IRI is Internationalized URIs (IRIs) here
What Is SPARQL?
- SPARQL Protocol and RDF Query Language
(you can blame me for the cute name)
- A Query Language ...:
Find names and websites of contributors to PlanetRDF:
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?website
FROM <http://planetrdf.com/bloggers.rdf>
WHERE { ?person foaf:weblog ?website ;
foaf:name ?name .
?website a foaf:Document
}
- ... and a Protocol.
http://.../qps?
query-lang=http://www.w3.org/TR/rdf-sparql-query/
&graph-id=http://planetrdf.com/bloggers.rdf
&query=PREFIX foaf: <http://xmlns.com/foaf/0.1/...
- Run this query
Background of RDF Query Languages
- RDF: W3C RECommendation February 1998
- Revised RECommendations February 2004
- Querying at W3C been discussed since QL'98 workshop in December 1998
- Querying XML (XQuery) been developed from September 1999
- Implementors have lead RDF query language development
Why do you need (RDF) query languages?
- Support all of the RDF model
- Knows about RDF graphs and RDF triples
- Can handle RDF's semi-structured data
- Supports operations on RDF graphs
- To enable application development at a higher (above triple) level
- To enable cross-language, cross-platform development
Pre-Existing RDF Query Language Designs
The main RDF query language styles:
- SQL-like: RDQL/Squish, SeRQL, RDFDB QL, RQL, ...
- XPath-like: Versa, RDFPath
- Rules-like: N3QL, Triple, DQL, OWL-QL, ...
- Language-like: Algae2, Fabl, Abeline
- Using XML: XSLT, XPath, XQuery
Most popular by far are the SQL-like languages.
RDQL/Squish was the most popular SQL-like language with multiple
independent implementations, before SPARQL.
Querying XML and Querying RDF
Concept |
XML |
RDF |
Model |
Document or Tree or Infoset (PSVI) |
Set of Triples
= RDF Graph |
Atomic Units |
Elements, Attributes, Text |
Triples, URIs, Blank Nodes, Text |
Identifiers |
Element/Attribute names
QNames
IDs
XPointers / XPaths
|
URIs |
Described by |
DTDs
W3C XML Schema
Relax NG
... |
RDF Schema |
XSLT / XQuery and RDF
- Not based on RDF terms - graphs, URIs and literals
- XQuery is based on sequences (node sets), cannot represent a set of triples
- Syntax focused - so more useful for SPARQL XML output (later)
- XQuery is still being developed (1999-present)
Why standardize RDF query now?
given that RDQL (and others) are widely implemented
- Querying RDF has been discussed since 1998
- Implementations of RDQL are similar but have some divergence and extensions
- Recent research with contexts and Named Graphs are not in all current RDF query languages
- XML query solutions exist but do not entirely suit RDF.
So in March 2004 the W3C formed the
RDF Data Access Working Group
It has just reached Last Call stage February 2006
Use Cases for RDF Query
Some of the use cases DAWG recorded were:
- Finding values for partially known graph structures
- Getting information about an identifiable object with unknown properties
- A human friendly syntax for queries for application developers
- Running automated regular queries against RDF graphs
- Querying aggregated RDF graphs
- Running queries constrained with datatype expressions
- Querying a remote RDF server and getting streaming results back
- Allowing alternate solutions to match in queries
- Using local extension functions in a query
- Using an RDF query service with Web Services
Requirements for RDF query
These led to requirements
- from existing languages:
- conjunction (AND) of triple patterns with variable bindings
and constraints
- from use cases:
- graphs, datatypes, extension functions, aggregation,
alternates, descriptions
- from the WG charter:
- a protocol.
no rules language, no cursors, no proofs and no updates [more of this later]
SPARQL - Query Language
- An RDF data access query language
- Data access means reading information, not writing (updates)
- Outline query model is graph patterns - a data graph with constants replaced with variable names
- In style of earlier RDQLish work - i.e. not rules or path based
SPARQL query syntax
- Make query look like the data with variables substituted
- Turtle / N3 data style chosen (eventually!)
- Generalise RDF triples to be 3-array of RDF Terms
- RDF Term := RDF IRI | Blank Node | Literal
- Triple Pattern := 3-array of (RDF Term | Variable Name)
- (Consequence: you can match triples that RDF does not support)
plus more abbreviating syntax
Turtle RDF syntax - IRIs and Blank Nodes
- IRIs
- Enclosed in <>
Relative IRI references turned into IRIs
<
IRI>
- or
@prefix
prefix <http://....>
prefix:
name
in the style of XML QNames as a shorthand for the full IRI
- Blank Nodes
_:
name
- or
[]
for a Blank Node used once
Turtle RDF syntax - RDF Literals
- Literals
"
Literal"
"
Literal"@
language
"""
Long literal with
newlines"""
- Datatyped Literals
" lexical form"^^ datatype IRI
e.g. "10"^^xsd:integer | |
10 | Decimal integer (xsd:integer ) |
true | Boolean (xsd:boolean ) |
2.5 | Decimal (xsd:decimal ) |
Turtle RDF Syntax - Abbreviations
- Triples separated by
.
:a :b :c . :d :e :f .
- Common triple predicate and subject:
-
:a :b :c ,
:d .
which is the same as
:a :b :c .
:a :b :d .
- Common triple subject:
-
:a :b :c ;
:d :e .
which is the same as:
:a :b :c .
:a :d :e .
Turtle RDF Syntax - Abbreviations 2
- Blank node as a subject
:a :b [ :c :d ]
which is the same as:
:a :b _:x .
_:x :c :d .
for blank node _:x
- RDF Collections
:a :b ( :c :d :e :f )
which is short for many many triples :)
SPARQL Triple Patterns
Add variables to a Turtle RDF graph. Data:
:a :b :c .
:c :d "hello"
Query: Find me the x in the RDF graph such that this matches:
?x :b :c .
:c :d "hello"
The full SPARQL query:
PREFIX : <http://example.org/stuff#>
SELECT ?v
WHERE {
?v :b :c .
:c :d "hello"
}
{ } is a SPARQL Graph Pattern
SPARQL Graph Patterns (GPs)
- Basic
- contains only triple patterns which must all match
- Group
- contain GPs that MUST match or the GP fails
- Optional
- contain GPs that MAY match
- Union
- contain alternate GPs, any or all of which MAY match
All may contain FILTER
expressions.
FILTER Expressions and Values
- Graph patterns can be constrained by
FILTER
expressions
- Expressions over bound variables returning True, False or error
- Arithmetic logical operators, functions, type promotion, casting
- Extension functions
my:function
- XSD Datatypes: boolean, integer, decimal, dateTime
FILTER ?x > 3
FILTER BOUND(?x)
FILTER (?date < "2006-03-01T20:15:00Z"^^xsd:date)
Query Results
- Like SQL - a table; a sequence of rows of variable bindings
- Aligns with existing SQL support
- Protocols such as ODBC and JDBC
- SQL Database access APIs such as Perl DBI
- SPARQL has been added inside Oracle and (maybe?) PostgreSQL
OPTIONAL
can give unbound variables. This is not SQL NULL
SELECT DISTINCT
on a per-row basis
ORDER BY
: ascending or descending sort order
- Result sub-sequences
LIMIT
and OFFSET
(not cursors)
Data Management
- Specify input graphs with
FROM
FROM <http://planetrdf.com/bloggers.rdf>
- Give them names with
FROM NAMED
FROM NAMED <http://site1.example.com/foo.rdf>
FROM NAMED <http://site2.example.com/bar.rdf>
- Get the names back with
GRAPH ?g
GP
WHERE { GRAPH ?g { ?s dc:title "Der Baum"@de } }
- Model is 1 background graph + many Named Graphs
The Rest
- RDF Blank Nodes in a query act as anonymous variables
ASK
: ask if there is any answer, do not return results
- sometimes you want to know there is an answer but don't care what it is
- sometimes you don't know what it is
DESCRIBE
: tell me about a named thing
CONSTRUCT
: Build an RDF graph from substituting bindings into
a second graph pattern. A 1-step rule language
Use Every SPARQL Keyword Example
BASE <http://example.org/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
# This is a relative IRI to BASE above
PREFIX ex: <properties/1.0#>
SELECT DISTINCT $person ?name $age
FROM <http://rdf.example.org/personA.rdf>
FROM <http://rdf.example.org/personB.rdf>
WHERE { $person a foaf:Person ;
foaf:name ?name.
OPTIONAL { $person ex:age $age } .
FILTER (!REGEX(?name, "Bob"))
}
ORDER BY ASC(?name) LIMIT 10 OFFSET 20
Rasqal SPARQL examples
SPARQL Status
- At Last Call stage February 2006
- Several mature query language implementations: SPARQL implementations
3+ in Java alone
- Oracle developing support
- Completing shortly
Extending SPARQL Query
- Extension functions:
WHERE { ...
FILTER my:function(?a, ?b, 25)
}
- "Magic" predicates overload meaning:
WHERE { ...
?a xsparql:closeto ?b
}
(only for 2-ary predicates)
- Use conventions of graph patterns to mean something special
- True language extensions: new keywords:
WHERE { ?a CLOSETO(5) ?b }
SPARQL - Protocol
- Services running SPARQL queries over a set of graphs
- A transport protocol for invoking the service
- Based on ideas from earlier protocol work such as Joseki
- Describing the service with Web Service technologies
SPARQL Issues
- No update. Reason: out of charter
- No cursors. Reason: out of charter, not a stateful protocol
- No computed results like
SELECT (?x+?y) WHERE ...
- No aggregate functions like
SELECT COUNT(?x) WHERE ...
- No fulltext search support
- No support for querying RDF collections
- A problem for OWL as many OWL concepts are encoded in
rdf:first
/ rdf:rest
lists
- Could be something like
?item IN ?list
- No good design emerged
- Cannot query variable length paths (Versa)
- Expression functionality does not include all datatypes
- Missing datatype functions/operators critical for some people