diff --git a/README.md b/README.md
index 4d25536..8cdf555 100644
--- a/README.md
+++ b/README.md
@@ -37,9 +37,9 @@
- **Modularized Evaluation**: Measure each module in the pipeline with tailored metrics.
-- **Comprehensive Metric Library**: Covers Retrieval-Augmented Generation (RAG), Code Generation, Tool Use, Agent Tool, Classification and a variety of LLM use cases. Mix and match Deterministic, Semantic and LLM-based metrics.
+- **Comprehensive Metric Library**: Covers Retrieval-Augmented Generation (RAG), Code Generation, Agent Tool Use, Classification and a variety of other LLM use cases. Mix and match Deterministic, Semantic and LLM-based metrics.
-- **Leverage User Feedback in Evaluation**: easily build a close-to-human ensemble evaluation pipeline with mathematical guarantees.
+- **Leverage User Feedback in Evaluation**: Easily build a close-to-human ensemble evaluation pipeline with mathematical guarantees.
- **Synthetic Dataset Generation**: Generate large-scale synthetic dataset to test your pipeline.
@@ -51,7 +51,7 @@ This code is provided as a PyPi package. To install it, run the following comman
python3 -m pip install continuous-eval
```
-if you want to install from source
+if you want to install from source:
```bash
git clone https://github.com/relari-ai/continuous-eval.git && cd continuous-eval
@@ -133,11 +133,20 @@ print(metric(**datum))
Deterministic |
ToolSelectionAccuracy |
+
+ Custom |
+ |
+ Define your own metrics |
+
-You can also define your own metrics, you only need to extend the [Metric](continuous_eval/metrics/base.py#23) class implementing the `__call__` method.
+To define your own metrics, you only need to extend the [Metric](continuous_eval/metrics/base.py#L23C7-L23C13) class implementing the `__call__` method.
Optional methods are `batch` (if it is possible to implement optimizations for batch processing) and `aggregate` (to aggregate metrics results over multiple samples_).
+## Run evaluation on pipeline modules
+
+Define modules in your pipeline and select corresponding metrics.
+
```python
from continuous_eval.eval import Module, ModuleOutput, Pipeline, Dataset
from continuous_eval.metrics.retrieval import PrecisionRecallF1, RankedRetrievalMetrics
@@ -184,6 +193,7 @@ llm = Module(
)
pipeline = Pipeline([retriever, reranker, llm], dataset=dataset)
+print(pipeline.graph_repr()) # optional: visualize the pipeline
```
Now you can run the evaluation on your pipeline
@@ -204,7 +214,7 @@ To **log** the results you just need to call the `eval_manager.log` method with
eval_manager.log("answer_generator", response)
```
-the evaluator manager also offers
+The evaluator manager also offers
- `eval_manager.run_metrics()` to run all the metrics defined in the pipeline
- `eval_manager.run_tests()` to run the tests defined in the pipeline (see the documentation [docs](docs.relari.ai) for more details)
diff --git a/docs/public/module-level-eval.png b/docs/public/module-level-eval.png
index b3eafe5..5e01e36 100644
Binary files a/docs/public/module-level-eval.png and b/docs/public/module-level-eval.png differ