Advanced SPARQL in IMM
Today I would like to profile an advanced SPARQL query, break it down, and explain how each part works.
The Query:
with (
select ?entityIdIn ?highestTypeOut
{
{ ?entityIdIn rdf:type ?highestTypeOut. }
not() {
?entityIdIn rdf:type ?hType.
?hType rdfs:subClassOf ?highestTypeOut.
}
} as :getMostSpecificType
)
SELECT ?title ?rdfType
WHERE
{
?entityId ?predicate ?title.
{ :getMostSpecificType(?entityId, ?rdfType) }
FILTER(:contains(?title, 'Superman'))
}
Overview
In essence, this query is searching every entity in the store and returns those entities where a property contains the word “Superman.” It also finds and returns the most specific type of each result. The purpose of this query is simple: to allow a user to search for a single term and find any resultant entities, regardless of their type or their schema.
In the results below, a number of different entities are returned: videos, clips, containers, and even resources.
Inference Rules
The “with” clause provides the ability to specify custom inference rules within a SPARQL query. In this case, we have a named rule that returns the most derivative (specific) type of the supplied entity. Since SPARQL variables either act as constraints or as variables depending on whether they are bound or unbound, I have suffixed “in” and “out” to make it clear how each is expected to be used. Calling the rule is simple. It is important to note that the results of this rule are not optional. Therefore, all returned entities must have at least one rdf:type.
{ :getMostSpecificType(?entityId, ?rdfType) }
Negation
The idea of negation is simple. The solution is the result of a graph pattern which is negatively constrained against the results of another graph pattern.
{ ?entityIdIn rdf:type ?highestTypeOut. }
not() {
?entityIdIn rdf:type ?hType.
?hType rdfs:subClassOf ?highestTypeOut.
}
Expressed in English: “Any triple where the subject is ?entityIdIn and the predicate is rdf:type EXCEPT where the same triple exists in the following graph pattern: any triple where the subject is ?entityIdIn and the predicate is rdf:type AND the resultant type is a subclass of another resultant type.” Whew. Put more simply: “Bind ?highestTypeOut to the rdf:type which has no derivative types.” This is an example of using negation to constrain the results of a graph pattern.
Full Text Search
The :contains function provides support for SQL full-text queries in a SPARQL filter. This is the most performant method of querying for specific text. In addition to simple word matching seen in the query above, advanced features of full-text can be used. For example, we can search for different inflections of the word “ship” by using the following:
FILTER(:contains(?title, 'FORMSOF(INFLECTIONAL, \"ship\")'))
This will return results with values such as “ships,” “shipped,” and “shipping.”
Other Thoughts
As mentioned in my previous post, the “rdf” and “rdfs” prefixes used in this query are handled automatically by IMM. Additionally, this query uses the ontology graph to determine class derivation. An enterprising developer could utilize more facts from the ontology to enhance the results of the query. For example, he/she could return the friendly name (rdfs:label) for a type instead of the type’s URI.