arXivRAG is a comprehensive tool designed to enhance the retrieval and generation of academic content from the arXiv database. Leveraging advanced Retrieval-Augmented Generation (RAG) techniques, arXivRAG provides researchers, students, and enthusiasts with the ability to discover and generate summaries, insights, and analyses of arXiv papers efficiently.
- Retrieval-Augmented Generation: Combines the power of retrieval systems with generative models to enhance the accuracy and relevance of responses.
- arXiv Integration: Directly queries the arXiv repository to fetch and summarize academic papers.
- User-Friendly Interface: Provides an easy-to-use interface for querying and obtaining summaries of scientific papers.
- Customizable: Allows users to customize the retrieval and generation parameters to suit their specific needs.
- Enhanced Search: Advanced search capabilities to quickly find relevant papers.
- Summarization: Automatic generation of concise summaries for arXiv papers.
- Custom Queries: Tailored query support to retrieve specific information from academic papers.
- Real-Time Access: Seamless integration with the arXiv API for real-time data access.
- Citation and Trend Analysis: Analyze citation networks, visualize the impact of papers, and identify emerging research trends based on recent publications and citation patterns.
To get started with arXivRAG, follow these steps:
- Clone the repository:
git clone https://github.com/phitrann/arXivRAG.git
cd arXivRAG
- Create a virtual environment (we recommend using conda):
conda create -n arxiv-rag python=3.10
conda activate arxiv-rag
- Install the required dependencies:
pip install -r requirements.txt
To use arXivRAG, follow these steps:
- Run the main script:
python main.py
- Query the system:
- Enter your query related to a scientific paper.
- The system will retrieve relevant papers from arXiv and generate a summary.
You can customize the behavior of arXivRAG by modifying the configuration file config.yaml
. Key parameters include:
- retrieval_model: The model used for retrieving relevant papers.
- generation_model: The model used for generating summaries.
- num_retrievals: The number of papers to retrieve for each query.
- max_summary_length: The maximum length of the generated summary.
We welcome contributions from the community! If you have ideas for new features or improvements, feel free to open an issue or submit a pull request.
In case you want to submit a pull request, please follow these steps:
- Fork the repository.
- Create a new branch:
git checkout -b feature/your-feature-name
- Make your changes and commit them:
git commit -m "Add your commit message"
- Push to the branch:
git push origin feature/your-feature-name
- Create a pull request.
This project is released under the Apache 2.0 license. See the LICENSE file for details.
- Thanks to the contributors of the arXivRAG project.
- Special thanks to the developers of the retrieval and generation models used in this project.