Error import umls

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Error import umls

Hamico
Hi Jiba,
I have the follower error:
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: ‘charmap’ can’t decode byte 0x81 in position 5439: character maps to <undefined>

How can I solve the problem?
Many thanks.
Best regards,
Hamico
Reply | Threaded
Open this post in threaded view
|

Re: Error import umls

Jiba
Administrator
Hi,

UMLS files are normally encoded in ASCII, so there is no decoding required.

Could you give me the entire traceback, to investigate the problem?

Thank you,
Jiba
Reply | Threaded
Open this post in threaded view
|

Re: Error import umls

Hamico
Hi Jiba,
My full output is the follower:

* Owlready2 * Warning: optimized Cython parser module 'owlready2_optimized' is not available, defaulting to slower Python implementation
Importing UMLS from Zip file 2018AB-full/2018ab-1-meta.nlm...
  Parsing 2018AB/META/MRRANK.RRF.gz as MRRANK...
Traceback (most recent call last):
  File "C:/Users/developer/PycharmProjects/Prova/Script5.py", line 5, in <module>
    import_umls("umls-2018AB-full.zip", terminologies = ["ICD10", "SNOMEDCT_US", "CUI"])
  File "C:\Users\developer\PycharmProjects\Prova\venv\lib\site-packages\owlready2\pymedtermino2\umls.py", line 682, in import_umls
  Parsing 2018AB/META/MRCONSO.RRF.aa.gz as MRCONSO...
    remnants[table_name])
  File "C:\Users\developer\PycharmProjects\Prova\venv\lib\site-packages\owlready2\pymedtermino2\umls.py", line 79, in parse_mrconso
    for line in f:
  File "C:\Users\developer\AppData\Local\Programs\Python\Python37\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 5439: character maps to <undefined>




How can I solve the error?
Many thanks.

Hamico
Reply | Threaded
Open this post in threaded view
|

Re: Error import umls

Jiba
Administrator
Hi,

It seems that your computer does not use UTF-8 as default encoding.

I fixed UMLS importation to force UTF-8 encoding, in the development version.

Could you try again with this new version?

Jiba
Reply | Threaded
Open this post in threaded view
|

Re: Error import umls

Hamico
Hi Jiba,
It works. The file has a size of 2,3 GB.

Many thanks.
Best regards,
Hamico