Development #121

clnsmth · 2025-01-17T20:45:22Z

No description provided.

Add missing parameters to the `annotate_workbooks` function to ensure correct argument propagation to its subfunctions.

Implement a cache-clearing mechanism before each OntoGPT call to mitigate issues where cached results, particularly those without grounded concepts, could lead to processing errors. This ensures that each call to OntoGPT is fresh and produces reliable results.

Implement a strategy to combine multiple OntoGPT runs for each input to improve the consistency and completeness of concept grounding. This approach addresses the variability inherent in the OntoGPT process, resulting in more reliable and accurate annotations.

Create a new module to facilitate benchmark testing, allowing for performance evaluation and optimization.

Add logging capabilities to enhance debugging and runtime monitoring.

Add logging for performance metrics to enable in-depth analysis and optimization. - Create a context manager to log metrics of interest (runtime and memory usage). - Estimate tokens per LLM call using word count.

Replace print statements with logging statements to enable more structured and persistent output. This change provides flexibility for capturing and analyzing runtime information.

Add functionality to collect and analyze benchmark data, including a dedicated test suite to evaluate this routine. We have opted for a baseline comparison method to evaluate the performance of our algorithm across different parameterizations. This approach offers several advantages, including efficiency and interpretability. By directly comparing each parameterization to a fixed baseline, we can quickly assess its relative performance and identify the optimal configuration. While this method may not uncover subtle differences between parameterizations that are both better or worse than the baseline, it provides a practical and timely solution for our specific goals.

Remove an extra space from the OntoGPT `extract` command construction to prevent potential errors and ensure the command executes as expected.

Optimize OntoGPT calls by specifying the `ollama_chat` model within the `extract` command, leveraging performance improvements recommended by the `litellm` package.

Add a `temperature` parameter to OntoGPT calls, allowing users to control the model's behavior and adjust the level of creativity or randomness in the generated output.

Update templates to improve ontology grounding, specifically: 1. Improve template prompts to produce more accurate and precise results. 2. Relax vocabulary branch constraints to enable broader capture of concepts outside of the target branch due to relevant concepts appearing in multiple branches within the vocabulary. Do this for all templates except `contains_process` and `env_medium`, where concepts are sufficiently constrained to a single branch. By doing this we increase our reliance on effective prompts to guide the LLM to extract relevant concepts without extracting irrelevant concepts. The issue of irrelevant concepts may be addressed downstream in an additional post processing step that trims out these concepts. Note vocabulary constraints don't seem to work in vocabularies using the BioPortal API. 3. Replace semantically descriptive labels (e.g., `measurement_type`) in templates with less semantically related labels (e.g., `output`). This change mitigates the risk of the LLM misinterpreting labels as placeholders for extracted values, leading to parsing errors and incorrect results.

Updated the `expand_curie` function to utilize a significantly larger prefix map, enabling the expansion of a wider range of CURIEs.

Correct the expand_curie function to handle CURIEs containing more than one semicolon, preventing the ValueError: too many values to unpack error.

Implement a visualization to assess the grounding success rates of different OntoGPT configurations. This visualization utilizes a 100% stacked bar chart to compare and contrast the performance of various configurations.

Add logging capabilities to the `benchmark_against_standard` function to provide insights into the ongoing execution process, especially helpful for this time-consuming operation.

Create a set of test data containing term-set similarity scores for various configurations, enabling unit testing of downstream functions that analyze and interpret these scores.

Implement a visualization to assess the accuracy of different OntoGPT configurations relative to a baseline standard for each predicate represented by OntoGPT templates. Use a simple box plot to effectively display and compare similarity metrics across predicate values.

Implement a visualization to assess the accuracy of different OntoGPT configurations relative to a baseline. Use a simple box plot to display and compare configurations.

Make writing plots to file optional in the `plot_grounding_rates` function by introducing a new parameter to control this behavior. This allows for flexible usage, including previewing plots without generating files.

Remove the outdated `add_dataset_annotations_to_workbook` function, as it lacks the necessary granularity for predicate-level categorization of semantic annotations, a crucial aspect of our current annotation model. While alternative approaches exist (e.g., annotating with terms from multiple vocabularies and then categorizing based on branch), the ongoing development and active community support for OntoGPT suggest a more promising long-term solution.

Consolidate multiple OntoGPT workbook annotator functions into a single, unified function to improve code maintainability, reduce redundancy, and enhance overall code clarity.

Resolve an issue in the `add_predicate_annotations_to_workbook` function that prevents it from returning the expected results.

Update `.readthedocs.yaml` to explicitly specify the path to `config.py`. This ensures proper documentation builds and avoids potential issues with an upcoming deprecation of inferred configuration.

clnsmth added 24 commits November 18, 2024 07:50

fix: add missing parameters to annotate_workbooks

9e8570a

Add missing parameters to the `annotate_workbooks` function to ensure correct argument propagation to its subfunctions.

feat: initialize benchmark testing module

66843ba

Create a new module to facilitate benchmark testing, allowing for performance evaluation and optimization.

feat: implement logging for debugging

864889e

Add logging capabilities to enhance debugging and runtime monitoring.

feat: implement performance metric logging

d667b31

Add logging for performance metrics to enable in-depth analysis and optimization. - Create a context manager to log metrics of interest (runtime and memory usage). - Estimate tokens per LLM call using word count.

refactor: replace print statements with logging

23907c6

Replace print statements with logging statements to enable more structured and persistent output. This change provides flexibility for capturing and analyzing runtime information.

fix: correct OntoGPT command construction

31a5ff4

Remove an extra space from the OntoGPT `extract` command construction to prevent potential errors and ensure the command executes as expected.

perf: optimize OntoGPT calls using ollama_chat

2a46e33

Optimize OntoGPT calls by specifying the `ollama_chat` model within the `extract` command, leveraging performance improvements recommended by the `litellm` package.

feat: introduce temperature parameter for OntoGPT calls

44ac7d6

Add a `temperature` parameter to OntoGPT calls, allowing users to control the model's behavior and adjust the level of creativity or randomness in the generated output.

feat: enhance CURIE expansion with expanded prefix map

5a09e7e

Updated the `expand_curie` function to utilize a significantly larger prefix map, enabling the expansion of a wider range of CURIEs.

fix: handle multiple semicolons in CURIE expansion

31d0e9c

Correct the expand_curie function to handle CURIEs containing more than one semicolon, preventing the ValueError: too many values to unpack error.

feat: visualize grounding rates across OntoGPT configurations

8fd9962

Implement a visualization to assess the grounding success rates of different OntoGPT configurations. This visualization utilizes a 100% stacked bar chart to compare and contrast the performance of various configurations.

feat: add logging to benchmark_against_standard for better insights

13b2eb6

Add logging capabilities to the `benchmark_against_standard` function to provide insights into the ongoing execution process, especially helpful for this time-consuming operation.

test: create test data for term-set similarity score analysis

513e5e5

Create a set of test data containing term-set similarity scores for various configurations, enabling unit testing of downstream functions that analyze and interpret these scores.

feat: visualize similarity metrics by configuration

b392629

Implement a visualization to assess the accuracy of different OntoGPT configurations relative to a baseline. Use a simple box plot to display and compare configurations.

feat: make plot writing to file optional in plot_grounding_rates

18d01bb

Make writing plots to file optional in the `plot_grounding_rates` function by introducing a new parameter to control this behavior. This allows for flexible usage, including previewing plots without generating files.

refactor: consolidate OntoGPT workbook annotators into a single function

ed668b1

Consolidate multiple OntoGPT workbook annotator functions into a single, unified function to improve code maintainability, reduce redundancy, and enhance overall code clarity.

fix: correct return logic in add_predicate_annotations_to_workbook

5a49584

Resolve an issue in the `add_predicate_annotations_to_workbook` function that prevents it from returning the expected results.

build: configure Read the Docs for explicit path to config.py

ee47493

Update `.readthedocs.yaml` to explicitly specify the path to `config.py`. This ensures proper documentation builds and avoids potential issues with an upcoming deprecation of inferred configuration.

clnsmth merged commit ee47493 into main Jan 17, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Development #121

Development #121

clnsmth commented Jan 17, 2025

Development #121

Development #121

Conversation

clnsmth commented Jan 17, 2025