The document for SPARQL 1.1 Entailment Regimes asserts that it is possible to produce an inconsistent graph, moreover there is a single source of the inconsistency: rdf:XMLLiteral
:
ex:a ex:b "<"^^rdf:XMLLiteral . ex:b rdfs:range rdf:XMLLiteral .
The reasoning is that <
is not a valid XML fragment and thus "<"^^rdf:XMLLiteral
must be interpreted as something that is not in rdfs:Literal
(apparently). This seems somewhat arbitrary and complicated, so I have the following questions:
"<"^^rdf:XMLLiteral
be simply interpreted as "<"
? It makes sense that it is not an XML literal, but why cannot it be a literal at all?rdf:XMLLiteral
and not for example xsd:boolean
or other datatypes? There are lots of inconsistencies that can be found if we start validating common datatypes.rdf:XMLLiteral
is non-normative. Does it mean that newer interpretations of RDFS are always consistent?SPARQL 1.1 Entailment regimes, standardised in March 2013, is based on RDF Semantics from the 2004 standard (which I will refer to as RDF 1.0). In RDF 1.0, RDFS entailment does not impose that datatype URIs are interpreted as datatypes, but it assigns a special semantics to rdf:XMLLiteral
and literals that have this datatype URI. Other literals are not constrained in any ways by their datatype URIs, therefore xsd:boolean
, for instance, doesn't influence the consistency in RDFS entailment. In fact, RDF entailment imposes the special treatment of rdf:XMLLiteral
, which carries on to RDFS entailment.
In order to find additional inconsistencies due to datatypes, you have to consider another entailment regime like D-entailment or OWL. In RDF 1.0, D-entailment was defined as an extension of RDFS, so there is no "validating common datatypes" in RDFS. This should answer your second question.
Further, "<"^^rdf:XMLLiteral
is an ill-typed XML literal, so it must not be interpreted as an XML value and, by constraints on RDF entailment, its interpretation must not be of type rdf:XMLLiteral
, that is, more formally, the pair (IL("<"^^rdf:XMLLiteral
),IS(rdf:XMLLiteral
)), composed of the interpretation of literal "<"^^rdf:XMLLiteral
and of the interpretation of URI rdf:XMLLiteral
, must not be in the extension IEXT(IS(rdf:type
)) of property rdf:type
. Also, ill-typed XML literals must not be equal to any literal values, which necessarily includes the plain literal values (UNICODE strings and language-tagged strings), so it cannot denote the string "<"
. The reason is that we don't want that ill-typed literals denote the same value as some well formed literals. This should answer your first question.
In 2014, RDF 1.1 was standardised with an updated semantics. D-entailment is no longer an extension of RDFS entailment. It is the other way around: RDFS entailment is defined with respect to a set D of recognized datatype IRIs. This means that RDFS entailment is no longer a single entailment regime, but a family of entailment regimes, parameterised by D. In its simplest instance, RDFS entailment must only recognise xsd:string
and rdf:langString
, which means that there can still be inconsistencies, because not all UNICODE strings are valid XSD strings. Also, RDF 1.1 changed the interpretation of ill-typed literals. In RDF 1.1 Semantics, ill-typed literals do not denote anything. This means that you cannot even talk about them. As soon as there is an ill-typed literal in an RDF graph, the graph is inconsistent. Therefore:
<s> <p> "\u0000"^^xsd:string .
is inconsistent in RDFS 1.1 entailment regimes. This should answer your third question.
Regarding your last question, I do not know. However, I do believe, with a fairly high confidence, that no existing tool correctly and completely implement RDFS entailment, whether in its 2004 version or 2014's.
Interesting. So, if I understand it correctly, graphs with RDF 1.0 entailment could never be inconsistent (as ill-typed literals can mean something), while with RDF 1.1 entailment they could only be inconsistent due to an ill-typed
xsd:string
as it cannot mean anything? I was under the impression that RDF 1.1 became more liberal, not less. Also it seems strings are invalid only because they cannot be encoded in XML, but I thoughtrdf:langString
or any other datatype could fall into that category as well. Is"\u0000"@en
valid? RDF 1.1 says thatrdf:langString
has no ill-typed literals.Ah, I get the obvious reason now:
xsd:string
is taken from XML schema which refers to XML 1.0Char
production. I wonder about control characters now; those were disallowed in XML 1.0 but allowed as entities in XML 1.1. However, XML Schema 1.1 makes the choice of XML version implementation-defined. I guess a follow-up question would be whether"\u0001"^^xsd:string
is ill-typed or not. RDF 1.1 Semantics is plainly false in that section, as "Such strings cannot be written in an XML-compatible surface syntax." is only true if XML refers to XML 1.0. And it was written 2 years after XSD 1.1!I find this quite problematic for interoperability with common programming languages, as RDF 1.1 mandates that every literal must have a datatype, and
xsd:string
is the default one. I had hoped thatrdf:PlainLiteral
would "save" me, but its value space is also restricted in terms of XML 1.0Char
. It seems like"\u0000"@und
is the only way to smuggle in a literal such as this (although it butchers BCP 47 in return). Hopefully any Unicode string is valid there.IS4, your questions in your comments are somewhat complex to answer in comments. I wouldn't say that RDF 1.1 is more or less liberal than RDF 1.0. Some entailments are true in RDF 1.0 that are not valid in RDF 1.1 and vice versa. Also, what you can see as problems are corner cases that are either irrelevant to most use cases, or can be addressed without causing problems by diviating from the standards a little in a sensible way. Some standards are not followed too strictly because the corner cases are too rare to be an issue in concrete implementations, and most of the time, it is Ok.