Problem using ontology in multi-process environment

Posted by fabad on
URL: http://owlready.306.s1.nabble.com/Problem-using-ontology-in-multi-process-environment-tp3198.html

Hi, first of all, thanks for owlready2, its very useful for ontology managing in python!

I had a functional python script that deals with ontologies to make some calculations, and I wanted to improve its performance by parallelizing it. I am using the ProcessPoolExecutor to manage the processes. However, my application hanged and do not output anything. After some research and debugging, I figured out that there were a number of errors thrown by each thread, but they were silenced in the main application.

It seems that python is trying to serialize/deserialize the ontology object via pickle in order to send a copy of the ontology to each process, but it fails with the following exception:

"""
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/usr/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
TypeError: 'NoneType' object is not callable
"""
I've searched on the internet and, I found that when the object you want to pass as argument to the processes is not picklelable, an PicklingError is thrown; however, it seems that here the ontology is picklelable, but it fails in the process.

Next, I show a minimal code to reproduce this error. Please, if you run this code, replace the "ontology_file_path" variable with the path of an ontology in your file system.

I was using owlready2 0.39, but I updated to 0.43 and the error is still there.

import pathlib
from concurrent.futures import ProcessPoolExecutor
from owlready2 import get_ontology
import time

def get_iri(ontology, job_id):
    return f"job {job_id} -> {ontology.iri}"

if __name__ == '__main__':
    ontology_file_path = pathlib.Path("/home/fabad/test_embed_comp/go.owl")
    print("Loading ontology")
    ontology = get_ontology(f"file://{str(ontology_file_path)}").load()
    print("Ontology loaded")
    executor = ProcessPoolExecutor(max_workers=4)
    results = {}
    for i in range(30):
        results[i] = executor.submit(get_iri, ontology, i)

    time.sleep(10)
    for n, future in results.items():
        if future.exception() is not None:
            raise future.exception()

    executor.shutdown(wait=True)

    for n, future in results.items():
        print(f'{future.result()}')


Anyone has faced this issue before? Do you have any clue on how to solve it?

Thanks beforehand,
Francisco Abad.