Cannot process language strings longer than 2 characters

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Cannot process language strings longer than 2 characters

anthropomorphism
This post was updated on .
In the (real-world) dataset I'm dealing with, the language strings are "en", "nl-be", "fr-be".

Only the first of these is dealt with; the others throw AttributeErrors:

File "/path/python3.8/site-packages/owlready2/util.py", line 113, in __getattr__
    if len(attr) != 2: raise AttributeError("'%s' is not a language code (must be 2-char string)!" % attr)
Undocumented workaround: If I use LanguageSublist(names, "nl-be") then it works as intended.

I'd suggest that for hyphenated language names desirable behaviour would be to call them with an underscore (foo.nl_be), assuming that we're not worried about the semantic difference between nl-be and nl_BE, both of which get used in the wild, though not usually in the same ontology.

It would also be quite nice to have a method of searching for language="nl*" (matching "nl-be" as well as "nl" or "nl_nl". This is reasonably easy to implement by subclassing LanguageSublist though; see below)

(NB that there are some languages that lack two letter codes - Ancient Greek is "grc" and Scots is "sco", for instance. I don't have an urgent need to process any terminology in these languages though)




class WildcardLanguageSublist(LanguageSublist):
    __slots__ = ["_l", "_lang"]

    def __init__(self, l, lang):
        if lang.endswith("*"):
            list.__init__(self, (str(x) for x in l if isinstance(x, locstr) and x.lang.startswith(lang[:-1])))
        elif lang.endswith("~"):
            # more restrictive; xx~ matches only "xx", "xx_.*", or "xx-.*", but not "xxzzy" 
            shorter = lang[:-1]
            list.__init__(self, (str(x) for x in l if isinstance(x, locstr) and (
                                x == shorter
                                or x.startswith(shorter+"_")
                                or x.startswith(shorter+ "-")
                            )
                        ))
        else:
            list.__init__(self, (str(x) for x in l if isinstance(x, locstr) and x.lang == lang))
        self._l = l
        self._lang = lang
        self._obj = None
Reply | Threaded
Open this post in threaded view
|

Re: Cannot process language strings longer than 2 characters

anthropomorphism
we should probably compare x.casefold() to lang.casefold(), since the difference between "nl-BE" and "nl-be" is not something anyone is likely to care about.
Reply | Threaded
Open this post in threaded view
|

Re: Cannot process language strings longer than 2 characters

Jiba
Administrator
Hi,

Thank you for this contribution! I integrated it in the development version of Owlready in a slightly modified form. You can now do individual.prop.fr_BE and  individual.prop.fr_any (for wildcard, corresponding to your strict version).

Jiba