sparql-Inconsistency in RDFS entailment regime

Antoine Zimmermann 2021-02-15 07:19:15

SPARQL 1.1 Entailment regimes, standardised in March 2013, is based on RDF Semantics from the 2004 standard (which I will refer to as RDF 1.0). In RDF 1.0, RDFS entailment does not impose that datatype URIs are interpreted as datatypes, but it assigns a special semantics to rdf:XMLLiteral and literals that have this datatype URI. Other literals are not constrained in any ways by their datatype URIs, therefore xsd:boolean, for instance, doesn't influence the consistency in RDFS entailment. In fact, RDF entailment imposes the special treatment of rdf:XMLLiteral, which carries on to RDFS entailment.

In order to find additional inconsistencies due to datatypes, you have to consider another entailment regime like D-entailment or OWL. In RDF 1.0, D-entailment was defined as an extension of RDFS, so there is no "validating common datatypes" in RDFS. This should answer your second question.

Further, "<"^^rdf:XMLLiteral is an ill-typed XML literal, so it must not be interpreted as an XML value and, by constraints on RDF entailment, its interpretation must not be of type rdf:XMLLiteral, that is, more formally, the pair (IL("<"^^rdf:XMLLiteral),IS(rdf:XMLLiteral)), composed of the interpretation of literal "<"^^rdf:XMLLiteral and of the interpretation of URI rdf:XMLLiteral, must not be in the extension IEXT(IS(rdf:type)) of property rdf:type. Also, ill-typed XML literals must not be equal to any literal values, which necessarily includes the plain literal values (UNICODE strings and language-tagged strings), so it cannot denote the string "<". The reason is that we don't want that ill-typed literals denote the same value as some well formed literals. This should answer your first question.

In 2014, RDF 1.1 was standardised with an updated semantics. D-entailment is no longer an extension of RDFS entailment. It is the other way around: RDFS entailment is defined with respect to a set D of recognized datatype IRIs. This means that RDFS entailment is no longer a single entailment regime, but a family of entailment regimes, parameterised by D. In its simplest instance, RDFS entailment must only recognise xsd:string and rdf:langString, which means that there can still be inconsistencies, because not all UNICODE strings are valid XSD strings. Also, RDF 1.1 changed the interpretation of ill-typed literals. In RDF 1.1 Semantics, ill-typed literals do not denote anything. This means that you cannot even talk about them. As soon as there is an ill-typed literal in an RDF graph, the graph is inconsistent. Therefore:

<s>  <p>  "\u0000"^^xsd:string .

is inconsistent in RDFS 1.1 entailment regimes. This should answer your third question.

Regarding your last question, I do not know. However, I do believe, with a fairly high confidence, that no existing tool correctly and completely implement RDFS entailment, whether in its 2004 version or 2014's.

IS4 2021-02-13 14:00:41

Interesting. So, if I understand it correctly, graphs with RDF 1.0 entailment could never be inconsistent (as ill-typed literals can mean something), while with RDF 1.1 entailment they could only be inconsistent due to an ill-typed xsd:string as it cannot mean anything? I was under the impression that RDF 1.1 became more liberal, not less. Also it seems strings are invalid only because they cannot be encoded in XML, but I thought rdf:langString or any other datatype could fall into that category as well. Is "\u0000"@en valid? RDF 1.1 says that rdf:langString has no ill-typed literals.

IS4 2021-02-14 13:13:34

Ah, I get the obvious reason now: xsd:string is taken from XML schema which refers to XML 1.0 Char production. I wonder about control characters now; those were disallowed in XML 1.0 but allowed as entities in XML 1.1. However, XML Schema 1.1 makes the choice of XML version implementation-defined. I guess a follow-up question would be whether "\u0001"^^xsd:string is ill-typed or not. RDF 1.1 Semantics is plainly false in that section, as "Such strings cannot be written in an XML-compatible surface syntax." is only true if XML refers to XML 1.0. And it was written 2 years after XSD 1.1!

IS4 2021-02-14 13:36:30

I find this quite problematic for interoperability with common programming languages, as RDF 1.1 mandates that every literal must have a datatype, and xsd:string is the default one. I had hoped that rdf:PlainLiteral would "save" me, but its value space is also restricted in terms of XML 1.0 Char. It seems like "\u0000"@und is the only way to smuggle in a literal such as this (although it butchers BCP 47 in return). Hopefully any Unicode string is valid there.

Antoine Zimmermann 2021-02-14 23:36:39

IS4, your questions in your comments are somewhat complex to answer in comments. I wouldn't say that RDF 1.1 is more or less liberal than RDF 1.0. Some entailments are true in RDF 1.0 that are not valid in RDF 1.1 and vice versa. Also, what you can see as problems are corner cases that are either irrelevant to most use cases, or can be addressed without causing problems by diviating from the standards a little in a sensible way. Some standards are not followed too strictly because the corner cases are too rare to be an issue in concrete implementations, and most of the time, it is Ok.

Inconsistency in RDFS entailment regime

热门帖子

热门github