Warm tip: This article is reproduced from stackoverflow.com, please click
python xml xpath xsd

reading regex from xsd file with xpath, string value

发布于 2020-04-21 11:14:03

I've got an XSD file with such element:

<xs:element name="orcid" minOccurs="0" maxOccurs="1">
  <xs:simpleType>
    <xs:restriction base="xs:string">
      <xs:pattern value="https://orcid\.org/[0-9]{4}-[0-9]{4}-[0-9]{4}-\d{3}[\dX]"/>
    </xs:restriction>
  </xs:simpleType>
</xs:element>

I'd like to read the pattern from that value and I do:

with open(app.config.get("schema")) as xsd:
  doc = etree.parse(xsd)
  data = doc.xpath(ORCID_XPATH, namespaces=doc.getroot().nsmap)[0]

where

ORCID_XPATH = '/xs:element/xs:simpleType/xs:restriction[@base="xs:string"]/xs:pattern/@value

but as a result I got a string I don't understand:

'[d0-9]{4}-{0,1}[0-9]{3}[0-9xX]{1}'

Could you please explain to me what's happening here?

Questioner
Malvinka
Viewed
36
Michael Kay 2020-02-06 06:51

Your path expression isn't very selective, in particular it doesn't qualify xs:element with [@name='orcid']. So I suspect you're picking up a different element declaration. Since you're in Python, you're probably using XPath 1.0, which typically gives you the first matching node, rather than warning you that there's more than one.