In the (real-world) dataset I'm dealing with, the language strings are "en", "nl-be", "fr-be".
Only the first of these is dealt with; the others throw AttributeErrors:
File "/path/python3.8/site-packages/owlready2/util.py", line 113, in __getattr__
if len(attr) != 2: raise AttributeError("'%s' is not a language code (must be 2-char string)!" % attr)
Undocumented workaround: If I use LanguageSublist(names, "nl-be") then it works as intended.
I'd suggest that for hyphenated language names desirable behaviour would be to call them with an underscore (foo.nl_be), assuming that we're not worried about the semantic difference between nl-be and nl_BE, both of which get used in the wild, though not usually in the same ontology.
It would also be quite nice to have a method of searching for language="nl*" (matching "nl-be" as well as "nl" or "nl_nl". This is reasonably easy to implement by subclassing LanguageSublist though; see below)
(NB that there are some languages that lack two letter codes - Ancient Greek is "grc" and Scots is "sco", for instance. I don't have an urgent need to process any terminology in these languages though)
class WildcardLanguageSublist(LanguageSublist):
__slots__ = ["_l", "_lang"]
def __init__(self, l, lang):
if lang.endswith("*"):
list.__init__(self, (str(x) for x in l if isinstance(x, locstr) and x.lang.startswith(lang[:-1])))
elif lang.endswith("~"):
# more restrictive; xx~ matches only "xx", "xx_.*", or "xx-.*", but not "xxzzy"
shorter = lang[:-1]
list.__init__(self, (str(x) for x in l if isinstance(x, locstr) and (
x == shorter
or x.startswith(shorter+"_")
or x.startswith(shorter+ "-")
)
))
else:
list.__init__(self, (str(x) for x in l if isinstance(x, locstr) and x.lang == lang))
self._l = l
self._lang = lang
self._obj = None