-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Development #121
Merged
Merged
Development #121
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add missing parameters to the `annotate_workbooks` function to ensure correct argument propagation to its subfunctions.
Implement a cache-clearing mechanism before each OntoGPT call to mitigate issues where cached results, particularly those without grounded concepts, could lead to processing errors. This ensures that each call to OntoGPT is fresh and produces reliable results.
Implement a strategy to combine multiple OntoGPT runs for each input to improve the consistency and completeness of concept grounding. This approach addresses the variability inherent in the OntoGPT process, resulting in more reliable and accurate annotations.
Create a new module to facilitate benchmark testing, allowing for performance evaluation and optimization.
Add logging capabilities to enhance debugging and runtime monitoring.
Add logging for performance metrics to enable in-depth analysis and optimization. - Create a context manager to log metrics of interest (runtime and memory usage). - Estimate tokens per LLM call using word count.
Replace print statements with logging statements to enable more structured and persistent output. This change provides flexibility for capturing and analyzing runtime information.
Add functionality to collect and analyze benchmark data, including a dedicated test suite to evaluate this routine. We have opted for a baseline comparison method to evaluate the performance of our algorithm across different parameterizations. This approach offers several advantages, including efficiency and interpretability. By directly comparing each parameterization to a fixed baseline, we can quickly assess its relative performance and identify the optimal configuration. While this method may not uncover subtle differences between parameterizations that are both better or worse than the baseline, it provides a practical and timely solution for our specific goals.
Remove an extra space from the OntoGPT `extract` command construction to prevent potential errors and ensure the command executes as expected.
Optimize OntoGPT calls by specifying the `ollama_chat` model within the `extract` command, leveraging performance improvements recommended by the `litellm` package.
Add a `temperature` parameter to OntoGPT calls, allowing users to control the model's behavior and adjust the level of creativity or randomness in the generated output.
Update templates to improve ontology grounding, specifically: 1. Improve template prompts to produce more accurate and precise results. 2. Relax vocabulary branch constraints to enable broader capture of concepts outside of the target branch due to relevant concepts appearing in multiple branches within the vocabulary. Do this for all templates except `contains_process` and `env_medium`, where concepts are sufficiently constrained to a single branch. By doing this we increase our reliance on effective prompts to guide the LLM to extract relevant concepts without extracting irrelevant concepts. The issue of irrelevant concepts may be addressed downstream in an additional post processing step that trims out these concepts. Note vocabulary constraints don't seem to work in vocabularies using the BioPortal API. 3. Replace semantically descriptive labels (e.g., `measurement_type`) in templates with less semantically related labels (e.g., `output`). This change mitigates the risk of the LLM misinterpreting labels as placeholders for extracted values, leading to parsing errors and incorrect results.
Updated the `expand_curie` function to utilize a significantly larger prefix map, enabling the expansion of a wider range of CURIEs.
Correct the expand_curie function to handle CURIEs containing more than one semicolon, preventing the ValueError: too many values to unpack error.
Implement a visualization to assess the grounding success rates of different OntoGPT configurations. This visualization utilizes a 100% stacked bar chart to compare and contrast the performance of various configurations.
Add logging capabilities to the `benchmark_against_standard` function to provide insights into the ongoing execution process, especially helpful for this time-consuming operation.
Create a set of test data containing term-set similarity scores for various configurations, enabling unit testing of downstream functions that analyze and interpret these scores.
Implement a visualization to assess the accuracy of different OntoGPT configurations relative to a baseline standard for each predicate represented by OntoGPT templates. Use a simple box plot to effectively display and compare similarity metrics across predicate values.
Implement a visualization to assess the accuracy of different OntoGPT configurations relative to a baseline. Use a simple box plot to display and compare configurations.
Make writing plots to file optional in the `plot_grounding_rates` function by introducing a new parameter to control this behavior. This allows for flexible usage, including previewing plots without generating files.
Remove the outdated `add_dataset_annotations_to_workbook` function, as it lacks the necessary granularity for predicate-level categorization of semantic annotations, a crucial aspect of our current annotation model. While alternative approaches exist (e.g., annotating with terms from multiple vocabularies and then categorizing based on branch), the ongoing development and active community support for OntoGPT suggest a more promising long-term solution.
Consolidate multiple OntoGPT workbook annotator functions into a single, unified function to improve code maintainability, reduce redundancy, and enhance overall code clarity.
Resolve an issue in the `add_predicate_annotations_to_workbook` function that prevents it from returning the expected results.
Update `.readthedocs.yaml` to explicitly specify the path to `config.py`. This ensures proper documentation builds and avoids potential issues with an upcoming deprecation of inferred configuration.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.