Releases: zetaalphavector/RAGElo
v0.1.8
New features:
The Query
object now supports two new methods for easier evaluation of your retrieval pipeline:
query.get_runs()
returns a dictionary of TREC-style runs for all the agents that retrieved documents for that query. (the mapping is agent_id -> query_id->document_id->retrieval_score).query.get_qrels()
returns a TREC-style qrels dictionary with the judgement scores assigned by an Evaluator. The mapping is query_id->document_id->relevance).
You can explore how these two methods work in the new example notebook here that uses the ir-measures
package.
Another addition (by @RodrJ106) is the addition of a new LLMProvider
for Ollama! Now you can also run RAGElo on locally, without the need to call an external provider. Thanks!
A potential breaking change is that the retrieved_docs
and the answers
attributes of the Query
object are now dictionaries instead of lists (mapping the document id or the agent name, respectively, to the actual object). This was done to better support future changes where RAGElo relies less on CSV files everywhere, but instead saves and serializes its internal state as a dictionary, until the user actually asks for an output as a CSV.
What's Changed
- Add missing f-string to warning. by @RodrJ106 in #38
- Add ollama as new llm provider by @RodrJ106 in #39
- Remove extra domain sentence by @din0s in #40
- Add get_qrels and get_runs for Queries by @ArthurCamara in #41
New Contributors
Full Changelog: 0.1.7...0.1.8
0.1.7
v0.1.6
What's Changed
- Fix issue with RDNAM parsing of answer by @matprst in #32
- docs: update README.md by @eltociear in #33
- Elo Ranker returns dictionary with agents scores by @ArthurCamara in #34
New Contributors
- @matprst made their first contribution in #32
- @eltociear made their first contribution in #33
Full Changelog: 0.1.5...0.1.6
v0.1.5
Adds support to Python >= 3.8
What's Changed
- Support Python 3.8 by @ArthurCamara in #29
Full Changelog: 0.1.3...0.1.5
v0.1.4
Hotfix for Python3.10
0.1.2
Main changes:
- OpenAI calls are much faster now and can be done in parallel.
- The pairwise answer evaluations are easier to use and more configurable.
- A new PairwiseExpertAnswerEvaluator evaluator was added.
- Added a notebook with examples of using RAGElo as a library.
What's Changed
- Added parallel calls to OpenAI with asyncio by @ArthurCamara in #21
- Change from aiohttp sessions to using OpenAI's Async clients. by @ArthurCamara in #22
- Improve batching by @ArthurCamara in #25
- Refactor pairwise answer eval by @frejonb in #26
- Notebook example by @ArthurCamara in #27
Full Changelog: 0.1.1...0.1.2
v0.1
RAGElo goes 0.1!
In this release, RAGElo as a library was completely revamped, with a much easier to use unified interface, simpler to use commands (evaluate
and batch_evaluate
). Now using an Evaluator is a simple as calling evaluator.evaluate("query", "document")
.
Custom Evaluators and metadata support
Not a fan of the existing evaluators? Now both Retrieval and Answer evaluators support fully custom promptings using the RetrievalEvaluator.CustomPromptEvaluator
and AnswerEvaluator.CustomPromptEvaluator
, respectively.
As part of the custom evaluators, now RAGElo also supports custom metadata injection into your prompts! Want to include the current timestamp into your evaluator? Add a {today_date}
placeholder to the prompt and pass it as a metadata to the evaluate
method:
from ragelo import get_retrieval_evaluator
prompt = """You are a helpful assistant for evaluating the relevance of a retrieved document to a user query.
You should pay extra attention to how **recent** a document is. A document older than 5 years is considered outdated.
The answer should be evaluated according tot its recency, truthfulness, and relevance to the user query.
User query: {q}
Retrieved document: {d}
The document has a date of {document_date}.
Today is {today_date}.
WRITE YOUR ANSWER ON A SINGLE LINE AS A JSON OBJECT WITH THE FOLLOWING KEYS:
- "relevance": 0 if the document is irrelevant, 1 if it is relevant.
- "recency": 0 if the document is outdated, 1 if it is recent.
- "truthfulness": 0 if the document is false, 1 if it is true.
- "reasoning": A short explanation of why you think the document is relevant or irrelevant.
"""
evaluator = get_retrieval_evaluator(
"custom_prompt", # name of the retrieval evaluator
llm_provider="openai", # Which LLM provider to use
prompt=prompt, # your custom prompt
query_placeholder="q", # the placeholder for the query in the prompt
document_placeholder="d", # the placeholder for the document in the prompt
answer_format="multi_field_json", # The format of the answer. In this case, a JSON object with multiple fields
scoring_keys=["relevance", "recency", "truthfulness", "reasoning"], # Which keys to extract from the answer
)
raw_answer, answer = evaluator.evaluate(
query="What is the capital of Brazil?", # The user query
document="Rio de Janeiro is the capital of Brazil.", # The retrieved document
query_metadata={"today_date": "08-04-2024"}, # Some metadata for the query
doc_metadata={"document_date": "04-03-1950"}, # Some metadata for the document
)
CLI Interface changes
In the CLI front, each evaluator has its own subprogram now. Instead of calling ragelo
with a long list of parameters, you can call ragelo retrieval-evaluator <evaluator>
or ragelo answer-evaluator <evaluator>
with your preferred evaluator. (We are big fans of the ragelo retrieval-evaluator domain-expert
😉 ).
Other changes:
- Moved from using
dataclasses
to Pydantic'sBaseModel
. The code should support Pydantic >=0.9, but let us know if it doesn't work for you. - Calling
batch_evaluator
will now return both the existing and new annotations, instead of only writing to new annotations to a file. - Interface of the
batch_evaluator
is much simplified. Now, instead of a dictionary of dictionaries, it requires a list ofQuery
, and each query have its own list of documents and answers. PairwiseAnswerEvaluator
is much simplified now.k
is the number of games to generate per query, instead of the grand total.- Many specific methods are simplified and moved upper in the class hierarchy. More code sharing and easier to maintain!
Full Changelog: 0.0.5...0.1.0
v0.0.5
What's Changed
Major overhaul to the code!
- More modular
- Tests
- Simpler and more Coherent class interface
- Simpler iterators
- Update OpenAI version
by @ArthurCamara in #7
Full Changelog: 0.0.3...0.0.5
0.0.3
Added a new document evaluator (domain_expert) and a bunch of bugfixes.
What's Changed
- Adding Domain Expert Evaluator by @ArthurCamara in #5
Full Changelog: 0.0.2...0.0.3
0.0.2
First public release of RAGElo, an LLM powered annotator for RAG Agents using an Elo-style tournament