You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried to convert HPO to FHIR using semsql as an intermediary. However, after about 40 minutes, I decided to give up and switch to Obographs for speed. I think it took about 10 minutes to convert to a .db, and the rest of the time in my process was just OAK trying to load the DB. Normally semsql is much faster to load than using rdflib, but not in this case. I looked and saw that my hpo.db was about 1GB, which is about 10x larger than my hpo.owl. I looked at some of my other conversions, and it looks like this 5-10x file size was normal.
If I'm correct that the issue is not so much OAK performance, but just the file size in general, is there anything we can do to reduce these file sizes? Or maybe it's not so much the size, but the structure that is taking OAK a long time to parse downstream? If this is more of an OAK issue (or both an OAK issue and a semsql issue), I can open up a ticket over there.
Potential causes
May be 1 or more of the following that's taking a lot of time.
a. Semsql: File size
b. Semsql: Non-optimal structures for downstream parsing
c. OAK: Not parsing optimally
d. OAK: Spending time doing things that are maybe not needed for my use case
The text was updated successfully, but these errors were encountered:
joeflack4
changed the title
File size optimization
File size, structure, and downstream performance
Dec 15, 2022
Overview
I tried to convert HPO to FHIR using
semsql
as an intermediary. However, after about 40 minutes, I decided to give up and switch to Obographs for speed. I think it took about 10 minutes to convert to a.db
, and the rest of the time in my process was just OAK trying to load the DB. Normallysemsql
is much faster to load than usingrdflib
, but not in this case. I looked and saw that myhpo.db
was about 1GB, which is about 10x larger than myhpo.owl
. I looked at some of my other conversions, and it looks like this 5-10x file size was normal.If I'm correct that the issue is not so much OAK performance, but just the file size in general, is there anything we can do to reduce these file sizes? Or maybe it's not so much the size, but the structure that is taking OAK a long time to parse downstream? If this is more of an OAK issue (or both an OAK issue and a
semsql
issue), I can open up a ticket over there.Potential causes
May be 1 or more of the following that's taking a lot of time.
a. Semsql: File size
b. Semsql: Non-optimal structures for downstream parsing
c. OAK: Not parsing optimally
d. OAK: Spending time doing things that are maybe not needed for my use case
The text was updated successfully, but these errors were encountered: