searching classes by label in obi

classic Classic list List threaded Threaded
5 messages Options
gp
Reply | Threaded
Open this post in threaded view
|

searching classes by label in obi

gp
Hello,
I am struggling with the ontology.search function to find classes using their label. I suspect the problem has to do with the parsing of the original OWL file, which I understand is not the preferred format for owlready2.

Anyway I am trying to find terms in the obi ontology: see http://www.obofoundry.org/ontology/obi.html

and once loaded I try to perform the search in the following manner:
obi.search(label = '*data set') # trying to find entry "http://purl.obolibrary.org/obo/IAO_0000100"

The search returns [obo.OBI_0000741] which indeed has label 'topologically preserved clustered data set' but not IAO_0000100 (as well as a few other classes which would be a match).

Diving into the original owl file it appears that these entries are slightly different.We have
    <!-- http://purl.obolibrary.org/obo/IAO_0000100 -->

    <owl:Class rdf:about="http://purl.obolibrary.org/obo/IAO_0000100">
        <rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/IAO_0000027"/>
        <obo:IAO_0000111 xml:lang="en">data set</obo:IAO_0000111>
        <obo:IAO_0000112 xml:lang="en">Intensity values in a CEL file or from multiple CEL files comprise a data set (as opposed to the CEL files themselves).</obo:IAO_0000112>
        <obo:IAO_0000114 rdf:resource="http://purl.obolibrary.org/obo/IAO_0000125"/>
        <obo:IAO_0000115 xml:lang="en">A data item that is an aggregate of other data items of the same type that have something in common. Averages and distributions can be determined for data sets.</obo:IAO_0000115>
        <obo:IAO_0000116 xml:lang="en">2009/10/23 Alan Ruttenberg. The intention is that this term represent collections of like data. So this isn&apos;t for, e.g. the whole contents of a cel file, which includes parameters, metadata etc. This is more like java arrays of a certain rather specific type</obo:IAO_0000116>
        <obo:IAO_0000116>2014-05-05: Data sets are aggregates and thus must include two or more data items. We have chosen not to add logical axioms to make this restriction.</obo:IAO_0000116>
        <obo:IAO_0000117 xml:lang="en">person:Allyson Lister</obo:IAO_0000117>
        <obo:IAO_0000117 xml:lang="en">person:Chris Stoeckert</obo:IAO_0000117>
        <obo:IAO_0000119 xml:lang="en">OBI_0000042</obo:IAO_0000119>
        <obo:IAO_0000119 xml:lang="en">group:OBI</obo:IAO_0000119>
        <rdfs:label xml:lang="en">data set</rdfs:label>
    </owl:Class>

and
    <!-- http://purl.obolibrary.org/obo/OBI_0000741 -->

    <owl:Class rdf:about="http://purl.obolibrary.org/obo/OBI_0000741">
        <rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/OBI_0000648"/>
        <obo:IAO_0000111 rdf:datatype="http://www.w3.org/2001/XMLSchema#string">topologically preserved clustered data set</obo:IAO_0000111>
        <obo:IAO_0000112 rdf:datatype="http://www.w3.org/2001/XMLSchema#string">the output data set generated from a self-organizing map.</obo:IAO_0000112>
        <obo:IAO_0000114 rdf:resource="http://purl.obolibrary.org/obo/IAO_0000125"/>
        <obo:IAO_0000115 rdf:datatype="http://www.w3.org/2001/XMLSchema#string">A clustered data set in which the topology, i.e. the spatial properties between data points, is preserved from the original input data from which it was derived.</obo:IAO_0000115>
        <obo:IAO_0000117 rdf:datatype="http://www.w3.org/2001/XMLSchema#string">James Malone</obo:IAO_0000117>
        <obo:IAO_0000119 rdf:datatype="http://www.w3.org/2001/XMLSchema#string">PERSON: James Malone</obo:IAO_0000119>
        <rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string">topologically preserved clustered data set</rdfs:label>
    </owl:Class>

More generally I cannot find any entry with a label having the form
<rdfs:label xml:lang="en">XXX YYY ZZZ</rdfs:label>
but
<rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string">XX YY ZZ</rdfs:label>
seems to make no problem.

Is that a problem with the owl parser, or I am doing something wrong?
Is there a workaround?

Anyway thank you so much for the very cool library!
Guillaume
Reply | Threaded
Open this post in threaded view
|

Re: searching classes by label in obi

Jiba
Administrator
Hello,

You're right, there is a problem. Localized string are considered as a different datatype in RDF, and thus they are not found properly.

A simple workaround is to perform 2 searches, one with a normal string and one with a localized string (here, in English):

onto.search(label = '*data set')

onto.search(label = locstr('*data set', "en"))


But I'm going to fix it in search() in the development version (what a pity I've just made a release),

Best regards,
Jiba
Reply | Threaded
Open this post in threaded view
|

Re: searching classes by label in obi

Jiba
Administrator
Actually I messed up with Java, so I am publishing a new release again (0.8) which include the fix :)
gp
Reply | Threaded
Open this post in threaded view
|

Re: searching classes by label in obi

gp
What a lightning fast answer, workaround and fix!
Thanks a lot for this!
gp
Reply | Threaded
Open this post in threaded view
|

Re: searching classes by label in obi

gp
Just to state the obvious, it now works nicely.

Thank you very much!