-
Notifications
You must be signed in to change notification settings - Fork 9
2022 02 16 meetings
I have switched my fork of turbomam/sample-annotator: NMDC Sample Annotator to poetry.
I have also added an example for comparing rel_to_oxygen values
to MIxS' expectations, as a starting point for DataGood.
I'd like to merge this into main
now.
Pull Request #48 · microbiomedata/sample-annotator
-
installing the poetry application as a system requirement
-
run
poetry install
once after switching to this new branch -
dependencies are specified in
pyproject.toml
-
are we ready to publish to PyPI? what metadata to use? some was carried forward from
setup.cfg
intopyproject.toml
-
still need to re-instate some command line scripts under
[tool.poetry.scripts]
inpyproject.toml
-
sample-util = sample_annotator.sample_utils.main
-
goldapi = sample_annotator.clients.gold_api
-
-
moved non-poetry configuration files
to pre-poetry/
-
removed a few dependencies
-
pint
... I saw several similar looking options when following thepoetry init
guided process -
anything related to pipenv
-
-
changed the source for importing
Message
... see below -
requiring python 3.9
- test older versions with
tox
?
- test older versions with
-
tests pass
-
new poestry-based GH actions pass
-
actively working on
Makefile
-
refactored .gitignore based on
- https://raw.githubusercontent.com/github/gitignore/main/Python.gitignore
- gitignore/JetBrains.gitignore at main · github/gitignore · GitHub
- checking in selected
.idea
content from PyCharm, like the Black configuration
-
did some semi-manual reformatting, in the PyCharm default style
-
have since agreed with Harshad and Marcin to autoformat with Black on save within IDE
-
which testing framework? how to invoke?
pytest
unittest
-
difference between
tests
andexamples
- where do inputs and outputs go
-
new
logs
directory -
what documentation framework? I haven't touched any of this:
-
config/
... docs
sphinx
-
-
documentation needs to be updated in general, esp. for poetry
ABOUT.md
CONTRIBUTING.md
README.md
-
logging best practices?
I wrote sample_annotator/clients/biosample_sqlite_client.py
. It documents the expected values from Enum: rel_to_oxygen_enum - MIxS, as well as the observed values in biosample_basex_data_good_subset.db
's harmonized_wide_sel_envs.rel_to_oxygen
I had trouble running it from a poetry script wrapper (sqlite_client_cli
)
ImportError: cannot import name 'Message' from 'sample_annotator.sample_annotator' (/Users/MAM/Documents/gitrepos/turbomam/sample-annotator/sample_annotator/sample_annotator.py)
But I didn't have any trouble running it directly as
python sample_annotator/clients/biosample_sqlite_client.py ...
So I commented out the Message
import from .sample_annotator
in sample_annotator/__init__.py
and replaced that with
from report_model import Message
rel_to_oxygen
entry point in Makefile:
rel_to_oxygen_example: downloads/mixs6_core.tsv
$(RUN) rel_to_oxygen_example \
--sqlite_path $(biosample_sqlite_file) \
--mixs_core_path $<
regrets from Kjiersten
links:
-
LBL DataGood collaboration on biosample annotation · microbiomedata/sample-annotator Wiki · GitHub
-
sample-annotator/pyproject.toml at issue-47-poetry · turbomam/sample-annotator · GitHub
-
sample-annotator/rel_to_oxygen_example.py at issue-47-poetry · turbomam/sample-annotator · GitHub
-
sample-annotator/biosample_sqlite_client.py at issue-47-poetry · turbomam/sample-annotator · GitHub
The LBL team have a separate repository for converting NCBI's biosample_set.xml.gz
into SQLite like biosample_basex_data_good_subset.db
The DataGood team can use the SQLite products as their input and do not need to be concerned with the conversion, which takes place in a separate repo.
The SQLite databases are available at https://portal.nersc.gov/project/m3513/biosample
Each developer will have their own local copy of the SQLite database. They will certainly become out of sync. That's one of the many reasons why the #1 deliverable is committing code into microbiomedata/sample-annotator
, so that LBL people can rerun or extend the transformations on other databases in the future
LBL people can help think about ways to expose this work to the public through static reports or lightweight web APIs like flask or fastapi.
column (from harmonized_wide_sel_envs table) |
action |
---|---|
rel_to_oxygen |
replace illegal values with terms from controlled vocabuary or flag as un-repairable. Will require some subject matter knowledge. |
depth , temp ... |
break out into value and unit parts with quantulum3
|
env_broad_scale |
lightweight NER |
Most of the notes I took in this meeting have been folded into the bullet points above
What isutils/flatten.py
supped to do? I tlooks buggy.
- ':' expected @ line 12
- Indent expected @ line 12
- Unresolved reference 'obj' @ line 13
- It looks like line 12 just needs a colon and line 13 just needs indentation, but I have no idea what
obj
is supposed to be.
sample_annotator/__init__.py
:
MIXS_SCHEMA = os.path.join(MAIN_SCHEMA_DIR, 'mixs.json')
sample_annotator/metadata/sample_schema.py
:
from sample_annotator import MIXS_SCHEMA
class SampleSchema:
object: Dict = None
slot_dict_by_alias = None
def load(self, force=False) -> Dict:
"""
Load the schema from config folded
"""
if self.object and not force:
return self.object
with open(MIXS_SCHEMA) as stream:
self.object = json.load(stream)
return self.object