2022 02 16 meetings

sample-annotator repo updates

I have switched my fork of turbomam/sample-annotator: NMDC Sample Annotator to poetry.

I have also added an example for comparing rel_to_oxygen values to MIxS' expectations, as a starting point for DataGood.

I'd like to merge this into main now.

Pull Request #48 · microbiomedata/sample-annotator

Highlights:

installing the poetry application as a system requirement
run poetry install once after switching to this new branch
dependencies are specified in pyproject.toml
are we ready to publish to PyPI? what metadata to use? some was carried forward from setup.cfg into pyproject.toml
still need to re-instate some command line scripts under [tool.poetry.scripts]inpyproject.toml
- sample-util = sample_annotator.sample_utils.main
- goldapi = sample_annotator.clients.gold_api
moved non-poetry configuration files to pre-poetry/
removed a few dependencies
- pint... I saw several similar looking options when following the poetry init guided process
- anything related to pipenv
changed the source for importing Message... see below
requiring python 3.9
- test older versions with tox?
tests pass
new poestry-based GH actions pass
actively working on Makefile
refactored .gitignore based on
- https://raw.githubusercontent.com/github/gitignore/main/Python.gitignore
- gitignore/JetBrains.gitignore at main · github/gitignore · GitHub
- checking in selected .idea content from PyCharm, like the Black configuration
did some semi-manual reformatting, in the PyCharm default style
have since agreed with Harshad and Marcin to autoformat with Black on save within IDE
which testing framework? how to invoke?
- pytest
- unittest
difference between tests and examples
- where do inputs and outputs go
new logs directory
what documentation framework? I haven't touched any of this:
- config/...
- docs
- sphinx
documentation needs to be updated in general, esp. for poetry
- ABOUT.md
- CONTRIBUTING.md
- README.md
logging best practices?

I wrote sample_annotator/clients/biosample_sqlite_client.py. It documents the expected values from Enum: rel_to_oxygen_enum - MIxS, as well as the observed values in biosample_basex_data_good_subset.db 's harmonized_wide_sel_envs.rel_to_oxygen

I had trouble running it from a poetry script wrapper (sqlite_client_cli)

ImportError: cannot import name 'Message' from 'sample_annotator.sample_annotator' (/Users/MAM/Documents/gitrepos/turbomam/sample-annotator/sample_annotator/sample_annotator.py)

But I didn't have any trouble running it directly as

python sample_annotator/clients/biosample_sqlite_client.py ...

So I commented out the Message import from .sample_annotator in sample_annotator/__init__.py and replaced that with

from report_model import Message

rel_to_oxygen entry point in Makefile:

rel_to_oxygen_example: downloads/mixs6_core.tsv  
 $(RUN) rel_to_oxygen_example \ 
        --sqlite_path $(biosample_sqlite_file) \  
 --mixs_core_path $<

rel_to_oxygen module

9:00 PT meeting with Huy, Ichchitaa, Mark and Marcin

regrets from Kjiersten

links:

The LBL team have a separate repository for converting NCBI's biosample_set.xml.gz into SQLite like biosample_basex_data_good_subset.db

The DataGood team can use the SQLite products as their input and do not need to be concerned with the conversion, which takes place in a separate repo.

The SQLite databases are available at https://portal.nersc.gov/project/m3513/biosample

Each developer will have their own local copy of the SQLite database. They will certainly become out of sync. That's one of the many reasons why the #1 deliverable is committing code into microbiomedata/sample-annotator, so that LBL people can rerun or extend the transformations on other databases in the future

LBL people can help think about ways to expose this work to the public through static reports or lightweight web APIs like flask or fastapi.

column (from `harmonized_wide_sel_envs` table)	action
`rel_to_oxygen`	replace illegal values with terms from controlled vocabuary or flag as un-repairable. Will require some subject matter knowledge.
`depth`, `temp`...	break out into value and unit parts with `quantulum3`
`env_broad_scale`	lightweight NER

11:00 PT meeting with Harshad, Marcin and Mark

Most of the notes I took in this meeting have been folded into the bullet points above

What isutils/flatten.py supped to do? I tlooks buggy.

':' expected @ line 12
Indent expected @ line 12
Unresolved reference 'obj' @ line 13
It looks like line 12 just needs a colon and line 13 just needs indentation, but I have no idea what obj is supposed to be.

why is a static `mixs.json` in this repo?

sample_annotator/__init__.py:

MIXS_SCHEMA = os.path.join(MAIN_SCHEMA_DIR, 'mixs.json')

sample_annotator/metadata/sample_schema.py:

from sample_annotator import MIXS_SCHEMA
class SampleSchema:
    object: Dict = None
    slot_dict_by_alias = None
    def load(self, force=False) -> Dict:
        """
        Load the schema from config folded
        """
        if self.object and not force:
            return self.object
        with open(MIXS_SCHEMA) as stream:
            self.object = json.load(stream)
            return self.object

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2022 02 16 meetings

sample-annotator repo updates

Highlights:

9:00 PT meeting with Huy, Ichchitaa, Mark and Marcin

11:00 PT meeting with Harshad, Marcin and Mark

why is a static `mixs.json` in this repo?

Clone this wiki locally

2022 02 16 meetings

sample-annotator repo updates

Highlights:

9:00 PT meeting with Huy, Ichchitaa, Mark and Marcin

11:00 PT meeting with Harshad, Marcin and Mark

why is a static mixs.json in this repo?

Clone this wiki locally

why is a static `mixs.json` in this repo?