v0.3.0 (2025-01-17)
Build
- build: configure Read the Docs for explicit path to config.py
Update .readthedocs.yaml
to explicitly specify the path to
config.py
. This ensures proper documentation builds and avoids
potential issues with an upcoming deprecation of inferred configuration. (ee47493
)
Feature
- feat: make plot writing to file optional in
plot_grounding_rates
Make writing plots to file optional in the plot_grounding_rates
function by introducing a new parameter to control this behavior. This
allows for flexible usage, including previewing plots without generating
files. (18d01bb
)
- feat: visualize similarity metrics by configuration
Implement a visualization to assess the accuracy of different OntoGPT
configurations relative to a baseline. Use a simple box plot to display
and compare configurations. (b392629
)
- feat: visualize similarity metrics by predicate
Implement a visualization to assess the accuracy of different OntoGPT
configurations relative to a baseline standard for each predicate
represented by OntoGPT templates. Use a simple box plot to
effectively display and compare similarity metrics across predicate
values. (1bd1184
)
- feat: add logging to
benchmark_against_standard
for better insights
Add logging capabilities to the benchmark_against_standard
function to
provide insights into the ongoing execution process, especially helpful
for this time-consuming operation. (13b2eb6
)
- feat: visualize grounding rates across OntoGPT configurations
Implement a visualization to assess the grounding success rates of
different OntoGPT configurations. This visualization utilizes a 100%
stacked bar chart to compare and contrast the performance of various
configurations. (8fd9962
)
- feat: enhance CURIE expansion with expanded prefix map
Updated the expand_curie
function to utilize a significantly larger
prefix map, enabling the expansion of a wider range of CURIEs. (5a09e7e
)
- feat: introduce
temperature
parameter for OntoGPT calls
Add a temperature
parameter to OntoGPT calls, allowing users to
control the model's behavior and adjust the level of creativity or
randomness in the generated output. (44ac7d6
)
- feat: implement benchmark data collection and testing
Add functionality to collect and analyze benchmark data, including a
dedicated test suite to evaluate this routine.
We have opted for a baseline comparison method to evaluate the
performance of our algorithm across different parameterizations. This
approach offers several advantages, including efficiency and
interpretability. By directly comparing each parameterization to a
fixed baseline, we can quickly assess its relative performance and
identify the optimal configuration. While this method may not uncover
subtle differences between parameterizations that are both better or
worse than the baseline, it provides a practical and timely solution for
our specific goals. (f003b96
)
- feat: implement performance metric logging
Add logging for performance metrics to enable in-depth analysis and
optimization.
- Create a context manager to log metrics of interest (runtime and
memory usage). - Estimate tokens per LLM call using word count. (
d667b31
)
- feat: implement logging for debugging
Add logging capabilities to enhance debugging and runtime monitoring. (864889e
)
- feat: initialize benchmark testing module
Create a new module to facilitate benchmark testing, allowing for
performance evaluation and optimization. (66843ba
)
Fix
- fix: correct return logic in
add_predicate_annotations_to_workbook
Resolve an issue in the add_predicate_annotations_to_workbook
function
that prevents it from returning the expected results. (5a49584
)
- fix: handle multiple semicolons in CURIE expansion
Correct the expand_curie function to handle CURIEs containing more than
one semicolon, preventing the ValueError: too many values to unpack
error. (31d0e9c
)
- fix: update OntoGPT templates to improve grounding
Update templates to improve ontology grounding, specifically:
-
Improve template prompts to produce more accurate and precise
results. -
Relax vocabulary branch constraints to enable broader capture of
concepts outside of the target branch due to relevant concepts appearing
in multiple branches within the vocabulary. Do this for all templates
exceptcontains_process
andenv_medium
, where concepts are
sufficiently constrained to a single branch.
By doing this we increase our reliance on effective prompts to guide the
LLM to extract relevant concepts without extracting irrelevant concepts.
The issue of irrelevant concepts may be addressed downstream in an
additional post processing step that trims out these concepts.
Note vocabulary constraints don't seem to work in vocabularies using the
BioPortal API.
- Replace semantically descriptive labels (e.g.,
measurement_type
) in
templates with less semantically related labels (e.g.,output
). This
change mitigates the risk of the LLM misinterpreting labels as
placeholders for extracted values, leading to parsing errors and
incorrect results. (1c79260
)
- fix: correct OntoGPT command construction
Remove an extra space from the OntoGPT extract
command construction to
prevent potential errors and ensure the command executes as expected. (31a5ff4
)
- fix: prevent OntoGPT cache-related errors by clearing cache
Implement a cache-clearing mechanism before each OntoGPT call to
mitigate issues where cached results, particularly those without
grounded concepts, could lead to processing errors. This ensures that
each call to OntoGPT is fresh and produces reliable results. (d342773
)
- fix: add missing parameters to
annotate_workbooks
Add missing parameters to the annotate_workbooks
function to ensure
correct argument propagation to its subfunctions. (9e8570a
)
Performance
- perf: optimize OntoGPT calls using
ollama_chat
Optimize OntoGPT calls by specifying the ollama_chat
model within the
extract
command, leveraging performance improvements recommended by
the litellm
package. (2a46e33
)
- perf: enhance OntoGPT grounding with sample size
Implement a strategy to combine multiple OntoGPT runs for each input to
improve the consistency and completeness of concept grounding. This
approach addresses the variability inherent in the OntoGPT process,
resulting in more reliable and accurate annotations. (57e6df7
)
Refactor
- refactor: consolidate OntoGPT workbook annotators into a single function
Consolidate multiple OntoGPT workbook annotator functions into a single,
unified function to improve code maintainability, reduce redundancy, and
enhance overall code clarity. (ed668b1
)
- refactor: remove outdated
add_dataset_annotations_to_workbook
function
Remove the outdated add_dataset_annotations_to_workbook
function, as
it lacks the necessary granularity for predicate-level categorization of
semantic annotations, a crucial aspect of our current annotation model.
While alternative approaches exist (e.g., annotating with terms from
multiple vocabularies and then categorizing based on branch), the
ongoing development and active community support for OntoGPT suggest a
more promising long-term solution. (25f0a8b
)
- refactor: replace print statements with logging
Replace print statements with logging statements to enable more
structured and persistent output. This change provides flexibility for
capturing and analyzing runtime information. (23907c6
)
Test
- test: create test data for term-set similarity score analysis
Create a set of test data containing term-set similarity scores for
various configurations, enabling unit testing of downstream functions
that analyze and interpret these scores. (513e5e5
)