Skip to content

v0.3.0

Latest
Compare
Choose a tag to compare
@github-actions github-actions released this 17 Jan 20:52

v0.3.0 (2025-01-17)

Build

  • build: configure Read the Docs for explicit path to config.py

Update .readthedocs.yaml to explicitly specify the path to
config.py. This ensures proper documentation builds and avoids
potential issues with an upcoming deprecation of inferred configuration. (ee47493)

Feature

  • feat: make plot writing to file optional in plot_grounding_rates

Make writing plots to file optional in the plot_grounding_rates
function by introducing a new parameter to control this behavior. This
allows for flexible usage, including previewing plots without generating
files. (18d01bb)

  • feat: visualize similarity metrics by configuration

Implement a visualization to assess the accuracy of different OntoGPT
configurations relative to a baseline. Use a simple box plot to display
and compare configurations. (b392629)

  • feat: visualize similarity metrics by predicate

Implement a visualization to assess the accuracy of different OntoGPT
configurations relative to a baseline standard for each predicate
represented by OntoGPT templates. Use a simple box plot to
effectively display and compare similarity metrics across predicate
values. (1bd1184)

  • feat: add logging to benchmark_against_standard for better insights

Add logging capabilities to the benchmark_against_standard function to
provide insights into the ongoing execution process, especially helpful
for this time-consuming operation. (13b2eb6)

  • feat: visualize grounding rates across OntoGPT configurations

Implement a visualization to assess the grounding success rates of
different OntoGPT configurations. This visualization utilizes a 100%
stacked bar chart to compare and contrast the performance of various
configurations. (8fd9962)

  • feat: enhance CURIE expansion with expanded prefix map

Updated the expand_curie function to utilize a significantly larger
prefix map, enabling the expansion of a wider range of CURIEs. (5a09e7e)

  • feat: introduce temperature parameter for OntoGPT calls

Add a temperature parameter to OntoGPT calls, allowing users to
control the model's behavior and adjust the level of creativity or
randomness in the generated output. (44ac7d6)

  • feat: implement benchmark data collection and testing

Add functionality to collect and analyze benchmark data, including a
dedicated test suite to evaluate this routine.

We have opted for a baseline comparison method to evaluate the
performance of our algorithm across different parameterizations. This
approach offers several advantages, including efficiency and
interpretability. By directly comparing each parameterization to a
fixed baseline, we can quickly assess its relative performance and
identify the optimal configuration. While this method may not uncover
subtle differences between parameterizations that are both better or
worse than the baseline, it provides a practical and timely solution for
our specific goals. (f003b96)

  • feat: implement performance metric logging

Add logging for performance metrics to enable in-depth analysis and
optimization.

  • Create a context manager to log metrics of interest (runtime and
    memory usage).
  • Estimate tokens per LLM call using word count. (d667b31)
  • feat: implement logging for debugging

Add logging capabilities to enhance debugging and runtime monitoring. (864889e)

  • feat: initialize benchmark testing module

Create a new module to facilitate benchmark testing, allowing for
performance evaluation and optimization. (66843ba)

Fix

  • fix: correct return logic in add_predicate_annotations_to_workbook

Resolve an issue in the add_predicate_annotations_to_workbook function
that prevents it from returning the expected results. (5a49584)

  • fix: handle multiple semicolons in CURIE expansion

Correct the expand_curie function to handle CURIEs containing more than
one semicolon, preventing the ValueError: too many values to unpack
error. (31d0e9c)

  • fix: update OntoGPT templates to improve grounding

Update templates to improve ontology grounding, specifically:

  1. Improve template prompts to produce more accurate and precise
    results.

  2. Relax vocabulary branch constraints to enable broader capture of
    concepts outside of the target branch due to relevant concepts appearing
    in multiple branches within the vocabulary. Do this for all templates
    except contains_process and env_medium, where concepts are
    sufficiently constrained to a single branch.

By doing this we increase our reliance on effective prompts to guide the
LLM to extract relevant concepts without extracting irrelevant concepts.
The issue of irrelevant concepts may be addressed downstream in an
additional post processing step that trims out these concepts.

Note vocabulary constraints don't seem to work in vocabularies using the
BioPortal API.

  1. Replace semantically descriptive labels (e.g., measurement_type) in
    templates with less semantically related labels (e.g., output). This
    change mitigates the risk of the LLM misinterpreting labels as
    placeholders for extracted values, leading to parsing errors and
    incorrect results. (1c79260)
  • fix: correct OntoGPT command construction

Remove an extra space from the OntoGPT extract command construction to
prevent potential errors and ensure the command executes as expected. (31a5ff4)

  • fix: prevent OntoGPT cache-related errors by clearing cache

Implement a cache-clearing mechanism before each OntoGPT call to
mitigate issues where cached results, particularly those without
grounded concepts, could lead to processing errors. This ensures that
each call to OntoGPT is fresh and produces reliable results. (d342773)

  • fix: add missing parameters to annotate_workbooks

Add missing parameters to the annotate_workbooks function to ensure
correct argument propagation to its subfunctions. (9e8570a)

Performance

  • perf: optimize OntoGPT calls using ollama_chat

Optimize OntoGPT calls by specifying the ollama_chat model within the
extract command, leveraging performance improvements recommended by
the litellm package. (2a46e33)

  • perf: enhance OntoGPT grounding with sample size

Implement a strategy to combine multiple OntoGPT runs for each input to
improve the consistency and completeness of concept grounding. This
approach addresses the variability inherent in the OntoGPT process,
resulting in more reliable and accurate annotations. (57e6df7)

Refactor

  • refactor: consolidate OntoGPT workbook annotators into a single function

Consolidate multiple OntoGPT workbook annotator functions into a single,
unified function to improve code maintainability, reduce redundancy, and
enhance overall code clarity. (ed668b1)

  • refactor: remove outdated add_dataset_annotations_to_workbook function

Remove the outdated add_dataset_annotations_to_workbook function, as
it lacks the necessary granularity for predicate-level categorization of
semantic annotations, a crucial aspect of our current annotation model.

While alternative approaches exist (e.g., annotating with terms from
multiple vocabularies and then categorizing based on branch), the
ongoing development and active community support for OntoGPT suggest a
more promising long-term solution. (25f0a8b)

  • refactor: replace print statements with logging

Replace print statements with logging statements to enable more
structured and persistent output. This change provides flexibility for
capturing and analyzing runtime information. (23907c6)

Test

  • test: create test data for term-set similarity score analysis

Create a set of test data containing term-set similarity scores for
various configurations, enabling unit testing of downstream functions
that analyze and interpret these scores. (513e5e5)