Releases: KruxAI/ragbuilder
Releases · KruxAI/ragbuilder
v0.1.4
What's Changed
New SDK that allows for module-wise optimization.
Basic Usage:
from ragbuilder import RAGBuilder
# Initialize and optimize
builder = RAGBuilder.from_source_with_defaults(input_source='data.pdf')
results = builder.optimize()
# Run a query through the complete pipeline
response = results.invoke("What is HNSW?")
# View optimization summary
print(results.summary())
Advanced Configuration
For fine-grained control, you can customize every aspect:
from ragbuilder.config import (
DataIngestOptionsConfig,
RetrievalOptionsConfig,
GenerationOptionsConfig
)
# Configure data ingestion
data_ingest_config = DataIngestOptionsConfig(
input_source="data.pdf",
document_loaders=[
{"type": "pymupdf"},
{"type": "unstructured"}
],
chunking_strategies=[{
"type": "RecursiveCharacterTextSplitter",
"chunker_kwargs": {"separators": ["\n\n", "\n", " ", ""]}
}],
chunk_size={"min": 500, "max": 2000, "stepsize": 500},
embedding_models=[{
"type": "openai",
"model_kwargs": {"model": "text-embedding-3-large"}
}]
)
# Configure retrieval
retrieval_config = RetrievalOptionsConfig(
retrievers=[
{
"type": "vector_similarity",
"retriever_k": [20],
"weight": 0.5
},
{
"type": "bm25",
"retriever_k": [20],
"weight": 0.5
}
],
rerankers=[{
"type": "BAAI/bge-reranker-base"
}],
top_k=[3, 5]
)
# Initialize with custom configs
builder = RAGBuilder(
data_ingest_config=data_ingest_config,
retrieval_config=retrieval_config
)
# Access individual components
vectorstore = results.data_ingest.get_vectorstore()
docs = results.retrieval.invoke("What is RAG?")
answer = results.generation.invoke("What is RAG?")
Full Changelog: 0.0.22...v0.1.4
0.0.22
What's Changed
- Enhance
DataProcessor
Class for Error Handling, Efficiency, and Logging Improvements by @Mefisto04 in #72 - Bug fixes by @aravind10x in #77
New Contributors
- @Akhsuna07 made their first contribution in #75
- @Mefisto04 made their first contribution in #72
Full Changelog: 0.0.21...0.0.22
0.0.21
What's Changed
- VectorDB Support: Qdrant & Weaviate by @ashwinzyx in #73
- Docker-Brew fix by @ashwinzyx in #74
Full Changelog: 0.0.20...0.0.21
0.0.20
What's Changed
- Add chat playground by @aravind10x in #64
New Contributors
- @eltociear made their first contribution in #48
- @FarukhS52 made their first contribution in #60
- @Ruhi14 made their first contribution in #61
Full Changelog: 0.0.18...0.0.20
0.0.18
What's Changed
- Colbert Retriever by @ashwinzyx in #53
- Change re-ranker selection to 1 at a time by @aravind10x in #54
Full Changelog: 0.0.17...0.0.18
0.0.17
RAGBuilder v0.0.17 Release Notes
What's New:
- Support for Re-Rankers:
- ColBERTv2: colbertv2.0
- Cohere: rerank-english-v3.0
- Jina: jina-reranker-v1-base-en
- Cross-Encoder Rerankers:
- Mixedbread-ai/mxbai-rerank-base-v1
- BAAI/bge-reranker-base
- FlashRank: ms-marco-MiniLM-L-12-v2
- RankLLM: GPT-4o
- Data Processing:
- Remove Stopwords, Strip Tags, Punctuation, Whitespaces, Stem Text
- Hyperparameter Visualization:
- Track Bayesian Optimization progress and visualize parameter importance.
What's Changed:
- Optimization insights by @aravind10x in #50
- Added Five Rerankers by @ashwinzyx in #51
- Data Processing by @ashwinzyx in #49
- Updated setup.py by @ashwinzyx in #52
Full Changelog: 0.0.16...0.0.17
0.0.16
What new
- 10% Sampling for Trials runs
- Contextual Retriever from Anthropic
- Optuna Intergration for more Efficient Hyperparameter tuning
What's Changed
- Sampling @aravind10x in #45
- Optuna Integration Hyperparameter tuning
- Contextual retriever by @ashwinzyx in #44
- Minor: View Result While Execution in Progress
Full Changelog: 0.0.15...0.0.16
0.0.15
What new
- Vanilla Graph RAG with Graph Retriever
- Hybrid Graph RAG: Graph Retreiver + Vector Retreiver
- Neo4J Integration
- More Improved Ensemble Retriever
What's Changed
- Update Version 0.0.14 by @ashwinzyx in #35
- GraphRAG by @ashwinzyx in #36
Full Changelog: 0.0.14...0.0.15
0.0.14
What new
- Top K fix for Contextual Compression
- More Improved Ensemble Retriever
What's Changed
- Mixed active content bug fix by @aravind10x in #29
- Top k cc by @ashwinzyx in #31
- EnsembleRetriever Update by @ashwinzyx in #33
- Track synthetic data gen by @ashwinzyx in #34
Full Changelog: 0.0.13...0.0.14
0.0.13
New SOTA Templates
- HYDE
- Hybrid RAG
- Semantic Chunker
- Stepback Prompting
- Query Rewriting
- RRF- Reciprocal Rank Fusion RAG
What's Changed
- Get Bayesian number of runs as a user input by @aravind10x in #1
- Support for other models by @aravind10x in #3
- Langchain upgrade by @aravind10x in #4
- Milvus Integration by @ashwinzyx in #5
- [Major] Remove dependency on OpenAI by @aravind10x in #6
- Minor updates by @aravind10x in #8
- Minor Changes- Update version, Add Milvus by @ashwinzyx in #9
- Updated GOOGLE_APPLICATION_CREDENTIALS in .env-Sample by @ashwinzyx in #10
- add separator to model/model owner by @ashwinzyx in #12
- [Minor] Model name bug by @aravind10x in #13
- Fix path expansion bug by @aravind10x in #14
- Fix Ollama Library Issue for Docker support by @ashwinzyx in #16
- Update setup.py by @ashwinzyx in #17
- Get rid of model name in filename by @aravind10x in #19
- Overhaul SOTA templates by @aravind10x in #20
- Update setup.py by @ashwinzyx in #21
- Update README.md by @ashwinzyx in #22
- Pr counter tracking by @ashwinzyx in #23
- Bugfix for SOTA templates by @aravind10x in #27
New Contributors
- @aravind10x made their first contribution in #1
- @ashwinzyx made their first contribution in #5
Full Changelog: https://github.com/KruxAI/ragbuilder/commits/0.0.13