Warm tip: This article is reproduced from serverfault.com, please click

owl-如何收集维基数据以使用 owlready (python) 生成本体?

(owl - How to harvest wikidata for generating an ontology with owlready (python)?)

发布于 2021-02-11 13:48:08

在帮助下,例如wikidata_query(...)从 python 执行对 wikidata 的查询很简单:

例如

q = """
SELECT ?item ?itemLabel 
WHERE 
{
  ?item wdt:P279 wd:Q125977.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
"""

wikidata_query(q)

检索Q125977(向量空间)的所有子类然而,结果是一个普通的 json 结构。相反,我想要一个owlready2 -ontology(更准确地说,只有subclassOf-relations)。

是否有可用的代码(部分)执行此任务,还是我必须自己编写代码?在后一种情况下:最聪明的方法是什么(例如使用一些可定制的深度,处理多重继承)?

Questioner
cknoll
Viewed
0
horcrux 2021-02-12 00:07:44

以下迭代你的输出 DataFrame 并使用子类断言填充本体:

from owlready2 import *
import re

def get_entity_id(iri) :
    return re.sub(r".*[#/\\]", "", iri)

def get_subclass_assertions(wd_superclass_id) :
    q = """
    SELECT ?item ?itemLabel 
    WHERE 
    {
      ?item wdt:P279 wd:%s.
      SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
    }
    """ % wd_superclass_id
    
    ontology = get_ontology("http://www.wikidata.org/entity/")
    with ontology:
        SuperClass = types.new_class(wd_superclass_id, (Thing,))
        assert SuperClass.name == wd_superclass_id
        for _, row in wikidata_query(q).iterrows():
            new_class_id = get_entity_id(row['item'])
            NewClass = types.new_class(new_class_id, (SuperClass,))
            NewClass.label = row['itemLabel']
    
    return ontology

wd_id = "Q125977"

ontology = get_subclass_assertions(wd_id)       # get the ontology
subclasses = list(ontology[wd_id].subclasses()) # now you can accede to subclasses

print(subclasses)                               # [entity.Q190109, entity.Q289411, entity.Q305936, entity.Q464794, entity.Q577835, entity.Q726210, entity.Q728435, entity.Q752487, entity.Q766774, entity.Q1000660, entity.Q1393796, entity.Q1455249, entity.Q1503340, entity.Q1777803, entity.Q2133608, entity.Q2344309, entity.Q3058206, entity.Q4131086, entity.Q4131105, entity.Q5156614, entity.Q7642975, entity.Q15834383, entity.Q42797338, entity.Q46996054, entity.Q63793693, entity.Q77595632, entity.Q77604597, entity.Q91474415, entity.Q98929437]
print(list(map(lambda c: c.iri, subclasses)))   # ['http://www.wikidata.org/entity/Q190109', 'http://www.wikidata.org/entity/Q289411', 'http://www.wikidata.org/entity/Q305936', 'http://www.wikidata.org/entity/Q464794', 'http://www.wikidata.org/entity/Q577835', 'http://www.wikidata.org/entity/Q726210', 'http://www.wikidata.org/entity/Q728435', 'http://www.wikidata.org/entity/Q752487', 'http://www.wikidata.org/entity/Q766774', 'http://www.wikidata.org/entity/Q1000660', 'http://www.wikidata.org/entity/Q1393796', 'http://www.wikidata.org/entity/Q1455249', 'http://www.wikidata.org/entity/Q1503340', 'http://www.wikidata.org/entity/Q1777803', 'http://www.wikidata.org/entity/Q2133608', 'http://www.wikidata.org/entity/Q2344309', 'http://www.wikidata.org/entity/Q3058206', 'http://www.wikidata.org/entity/Q4131086', 'http://www.wikidata.org/entity/Q4131105', 'http://www.wikidata.org/entity/Q5156614', 'http://www.wikidata.org/entity/Q7642975', 'http://www.wikidata.org/entity/Q15834383', 'http://www.wikidata.org/entity/Q42797338', 'http://www.wikidata.org/entity/Q46996054', 'http://www.wikidata.org/entity/Q63793693', 'http://www.wikidata.org/entity/Q77595632', 'http://www.wikidata.org/entity/Q77604597', 'http://www.wikidata.org/entity/Q91474415', 'http://www.wikidata.org/entity/Q98929437']
print(list(map(lambda c: c.label, subclasses))) # [['field'], ['sequence space'], ['Lp space'], ['Minkowski spacetime'], ['field extension'], ['normed vector space'], ['linear subspace'], ['dual space'], ['symplectic vector space'], ['algebra over a field'], ['quotient space'], ['topological vector space'], ['ordered vector space'], ['coalgebra'], ['coordinate space'], ['pseudo-Euclidean space'], ['graded vector space'], ['row space'], ['column space'], ['complex vector space'], ['super vector space'], ['matrix space'], ['function vector space'], ['real vector space'], ['seminormed space'], ['real or complex vector space'], ['finite-dimensional vector space'], ['quadratic space'], ['space of linear maps']]