Hi owlready fans,
I am trying to create a largish number of instances (10K-1M) conforming to an imported ontology. Each instance has also a large number of data properties (~100) the values of which are generated dynamically in python.
Creating the bare instances is reasonably fast but when trying to populate them with data properties there seems to be some bottleneck that slows things down quite significantly.
The core of my approach for inserting the data property values is a loop that iterates over instances and sets a value. Like so:
for entity in onto.Entity.instances():
setattr(entity, prop.name, [value])
Debugging the setattr step shows that owlready does quite a lot of stuff (checking things etc) in this step (including writing to sqlite at each iteration). I wonder if there is another way to mass insert data properties that can speed things up.
Some thoughts:
- I am not 100% sure if the database write is the actual time consuming step - but if it is, is it possible to defer and write things out in bulk in the end?
- In the owlready book (chapter 11) there is a section about interogating the database directly. There is an example about selecting directly from the db but I would imagine that adding properties while maintaing consistency at the level of owl/ready is not quite trivial?
Any ideas / suggestions appreciated!