I've got an XSD file with such element:
<xs:element name="orcid" minOccurs="0" maxOccurs="1">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="https://orcid\.org/[0-9]{4}-[0-9]{4}-[0-9]{4}-\d{3}[\dX]"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
I'd like to read the pattern from that value and I do:
with open(app.config.get("schema")) as xsd:
doc = etree.parse(xsd)
data = doc.xpath(ORCID_XPATH, namespaces=doc.getroot().nsmap)[0]
where
ORCID_XPATH = '/xs:element/xs:simpleType/xs:restriction[@base="xs:string"]/xs:pattern/@value
but as a result I got a string I don't understand:
'[d0-9]{4}-{0,1}[0-9]{3}[0-9xX]{1}'
Could you please explain to me what's happening here?
Your path expression isn't very selective, in particular it doesn't qualify xs:element
with [@name='orcid']
. So I suspect you're picking up a different element declaration. Since you're in Python, you're probably using XPath 1.0, which typically gives you the first matching node, rather than warning you that there's more than one.