Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Evals] Evaluation docs improvements #985

Closed
ssbushi opened this issue Oct 1, 2024 · 3 comments
Closed

[Evals] Evaluation docs improvements #985

ssbushi opened this issue Oct 1, 2024 · 3 comments
Assignees
Labels
devui docs Improvements or additions to documentation

Comments

@ssbushi
Copy link
Contributor

ssbushi commented Oct 1, 2024

No description provided.

@ssbushi ssbushi self-assigned this Oct 1, 2024
@ssbushi ssbushi converted this from a draft issue Oct 1, 2024
@ssbushi ssbushi added the docs Improvements or additions to documentation label Oct 1, 2024
@ssbushi
Copy link
Contributor Author

ssbushi commented Oct 1, 2024

Autogenerated from Gemini:

This text reveals several areas where the documentation for Genkit, particularly around evaluation, could be improved:

* **Clarify how evaluators are standardized.**  The text acknowledges that while evaluation metrics like Faithfulness and Answer Relevance are becoming standardized, their implementation can vary. The documentation should provide more concrete information on this, perhaps by:
    *  Giving specific examples of how implementations can differ.
    *  Offering guidance on choosing the best implementation for different use cases.
    *  Explaining how Genkit handles these variations to ensure consistency.

* **Provide more guidance on quantifying output variables.** The text mentions that users can define custom evaluation metrics, but it should offer more support on how to do this effectively.  Consider adding:
    *  Examples of quantifying different types of outputs.
    *  Best practices for designing custom metrics.
    *  A step-by-step guide to implementing custom evaluators.

* **Expand on the scope of pre-defined evaluators.** Users need a clearer understanding of what metrics like "Maliciousness" actually measure. The documentation should:
    *  Provide detailed explanations of each pre-defined metric.
    *  Clarify which RAGAS metrics are included in Genkit.
    *  Offer examples of how these metrics are used in practice.

* **Improve the description of "Maliciousness"**. The current explanation is vague. The documentation should clearly define what constitutes "maliciousness" in the context of LLMs and how the evaluator identifies it.

* **Clarify the analogy to testing.** While the text likens evaluators to E2E testing, it could be more explicit about how they fit into the development process. This could involve:
    *  Explaining when and how to use evaluators during development.
    *  Providing examples of how evaluators can help identify regressions.
    *  Discussing how evaluators can be integrated into a CI/CD pipeline.

By addressing these points, the documentation can better support users in understanding and effectively using Genkit's evaluation features.

Context: https://discord.com/channels/1255578482214305893/1281391213550895124/1282325935038926868

@odbol
Copy link
Contributor

odbol commented Dec 11, 2024

Also, making the example code actually compile would be nice.

@ssbushi ssbushi moved this to In Progress in Genkit Backlog Dec 12, 2024
@odbol
Copy link
Contributor

odbol commented Dec 12, 2024

Made the code compile: https://github.com/firebase/genkit/pull/1497/files

@ssbushi ssbushi closed this as completed Jan 20, 2025
@github-project-automation github-project-automation bot moved this from In Progress to Done in Genkit Backlog Jan 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devui docs Improvements or additions to documentation
Projects
Archived in project
Development

No branches or pull requests

3 participants