-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read SSSOM #111
Comments
https://github.com/mapping-commons/sssom We need to figure out a few things if we go to sssom:
|
@matentzn tells me that these problems can be handled with off the shelf sssom |
Assuming MONDO:0018670 is the clique leader (sssom 0.9.0, not sssom 1.0), a sssom file would look something like this:
There are some features for natively supporting semantic similarity measures, see https://mapping-commons.github.io/sssom/Mapping/, but I don't think |
Thanks! Is it required to repeat the subject_labels or categories etc when they are repeated? If we are using the ordering of the rows as information, are we abusing the format? |
I would keep the information redundant with the labels, but nothing in sssom requires you to. I like that in general so that I can more easily combine different mappings sets, merge them etc. I think expecting the row order to mean something is not very reliable. If you wanted to be 100% reliable you could of course export all cliques as separate sssom files. This is what I think Chris does. But it would result in 5000 files. It's an interesting use case. Maybe if you could create an identifier for each clique, you could put it into the "other" column. Sorry maybe sssom is not ideal here, but we could consider extensions to the format to cover this use case (named groups for mappings). |
I suppose we could put a clique id of some sort in the |
The goal here is to have a format for storage not for sending back to clients? In that case, is the ordering a property of the mappings themselves, or a function that NodeNormalizer applies after the fact (ie a priority list of prefixes from biolink)? If it's a property of the mappings themselves maybe there is a more direct way to express this? Same with IC value? |
I've written a program to convert some of the files in the Babel compendia into SSSOM so we can see look at them in my Dropbox. These files appear to pass validation on sssom-py apart from missing CURIE maps. If everybody's happy with these files, I can run my program on all the Babel compendia (which will probably take 0.5-1 days to run). Some thoughts and questions:
|
There's a check in babel against the biolink prefixes for each type. So it will potentially write out anything in the biolink yaml for each type, and should not write out anything that isn't in that prefix list. |
Just as we want to have Babel write SSSOM, NN will need to read it.
The text was updated successfully, but these errors were encountered: