AI powered Chatbot with real time updates.

freecompress-SearXNG_Video.-.Made.with.Clipchamp.mp4
Ensure you have the following installed before proceeding:
- Python 3.8 or higher
- Git (for cloning the repository)
- Virtualenv (recommended)
First, clone the repository to your local machine.
```bash
git clone https://github.com/your-username/your-repo-name.git
cd your-repo-name
It's a good practice to create a virtual environment for your project to isolate dependencies.
python -m venv venv
venv\Scripts\activate
source venv/bin/activate
Install the necessary dependencies specified in the requirements.txt file.
pip install -r requirements.txt
You will need API keys for various integrations used in the project (e.g., Hugging Face, SearXNG). Create a .env file in the root directory of your project and add the necessary environment variables.
Add your API keys in the following format:
HF_TOKEN=your_huggingface_api_token GROQ_API_KEY=your_groq_api_key MISTRAL_API_KEY=your_mistral_api_key SEARXNG_KEY=your_searxng_secret_key
Once you've installed the dependencies and set up your API keys, run the app using:
python app.py
The app will be hosted locally by default. Open your browser and go to: http://127.0.0.1:7860/
To run your own instance of SearXNG, follow these steps.
Ensure you have docker
and docker-compose
installed on your system. If not, follow the respective installation guides:
Clone the SearXNG Docker repository to your local machine:
```bash
git clone https://github.com/searxng/searxng-docker.git
cd searxng-docker
You can configure the SearXNG settings by editing the settings.yml file. Adjust parameters like search engines, safe search settings, etc.
To edit the settings, run:
nano settings.yml
very important add the "Json" to formats
formats:
- html
- json
Once the configuration is set, start the SearXNG instance using Docker Compose:
docker-compose up -d
This will start the SearXNG service in detached mode, meaning it will run in the background.
After the instance starts, you can access SearXNG via your browser at:
http://localhost:8080
If you're integrating SearXNG with another application, you need to point the application to your local SearXNG instance. Update your .env file with the following:
SEARXNG_KEY=http://localhost:8080
Now, your application will send search requests to your locally hosted SearXNG instance.
- Overview
- Core Components
- Key Features
- Architecture
- Main Functions
- API Integration
- Advanced Parameters
This application is a sophisticated web scraper and AI-powered chat interface specifically designed for financial news analysis. It combines web scraping capabilities with multiple Language Learning Models (LLMs) to provide intelligent, context-aware responses to user queries about financial information.
- Uses SearXNG as the primary search meta-engine
- Supports multiple search engines (Google, Bing, DuckDuckGo, etc.)
- Implements custom retry mechanisms and timeout handling
- Supports multiple LLM providers: Hugging Face (Mistral-Small-Instruct), Groq (Llama-3.1-70b), Mistral AI (Open-Mistral-Nemo)
- Implements semantic similarity using Sentence-Transformer
- PDF processing with PyPDF2
- Web content scraping with Newspaper3k
- BM25 ranking algorithm implementation
- Document deduplication and relevance assessment
- Query type determination (knowledge base vs. web search)
- Query rephrasing for optimal search results
- Entity recognition
- Time-aware query modification
- Relevance assessment
- Content summarization
- Semantic similarity comparison
- Document deduplication
- Priority-based content ranking
- Custom retry mechanism
- Rate limiting
- Error handling
- Content filtering and validation
- You start by interacting with a Gradio Chat Interface.
- Your query is sent to the Query Analysis (QA) section.
- The system then determines the type of query (DT).
- If it's a type that can use a Knowledge Base, it generates an AI response (KB).
- If it requires web searching, it rephrases the query (QR) for web search.
- The system extracts the entity domain (ED) from the rephrased query.
- The extracted entity domain is sent to the SearXNG Search Engine (SE).
- The search engine returns the search results (SR).
- The search results are processed by web scraping (WS).
- If the content is in PDF format, it is scraped using PDF Scraping (PDF).
- If in HTML format, it's scraped using Newspaper3k Scraping (NEWS).
- Relevant content is summarized (DS) and checked for uniqueness (UC).
- Content is ranked (DR) based on:
- BM25 Scoring (BM): A scoring method to rank documents.
- Semantic Similarity (SS): How similar the content is to the query.
- The scores are combined (CS) to produce a final ranking (FR).
- The final ranking is summarized again (FS) to create a final summary.
- The AI-generated response (KB) and final summary (FS) are combined to form the final response.
- The final response is sent back to the Gradio Chat Interface (UI) for you to see.
- BM25: Custom implementation for document ranking
- Search and Scrape Pipeline: Handles query processing, web search, content scraping, document analysis, and content summarization.
determine_query_type(query, chat_history, llm_client)
: Determines whether to use knowledge base or web search based on context.search_and_scrape(query, chat_history, ...)
: Main function for web search and content aggregation.rerank_documents_with_priority(query, documents, entity_domain, ...)
: Hybrid ranking using BM25 and semantic similarity.llm_summarize(json_input, model, temperature)
: Generates summaries using the specified LLM and handles citation and formatting.
- Required API Keys: Hugging Face, Groq, Mistral, SearXNG
- Environment Variables Setup: Use dotenv to load environment variables
Parameter | Description | Range/Options | Default | Usage |
---|---|---|---|---|
Number of Results | Number of search results retrieved. | 5 to 20 | 5 | Controls number of links/articles fetched from web searches. |
Maximum Characters | Limits characters per document processed. | 500 to 10,000 | 3000 | Truncates long documents, focusing on relevant information. |
Time Range | Specifies the time period for search results. | day, week, month, year | month | Filters results based on recent or historical data. |
Language Selection | Filters search results by language. | en , fr , es , etc. |
en |
Retrieves content in a specified language. |
LLM Temperature | Controls randomness in responses from LLM. | 0.0 to 1.0 | 0.2 | Low values for factual responses; higher for creative ones. |
Search Engines | Specifies search engines used for scraping. | Google, Bing, DuckDuckGo, etc. | All engines | Choose specific search engines for better or private results. |
Safe Search Level | Filters explicit/inappropriate content. | 0: No filter, 1: Moderate, 2: Strict | 2 (Strict) | Ensures family-friendly or professional content. |
Model Selection | Chooses the LLM for summaries or responses. | Mistral, GPT-4, Groq | Varies | Select models based on performance or speed. |
PDF Processing Toggle | Enables/disables PDF document processing. | True (process) or False (skip) |
False |
Processes PDFs, useful for reports but may slow down speed. |
This project uses Docker and Docker Compose for easy setup and deployment. Follow these steps to get the application running:
- Docker
- Docker Compose
-
Clone this repository to your local machine.
-
Navigate to the project directory:
cd path/to/project
-
Create a
.env
file from the example:cp .env-example .env
-
Open the
.env
file in a text editor and modify the values as needed for your local setup. Pay special attention to:PORT
: The port on which the application will run (default is 5000)SEARXNG_KEY
: The URL for your SearXNG instance (default is http://localhost:8080)
-
Build and start the Docker containers:
docker-compose up --build
This command will build the Docker image and start the container. The
--build
flag ensures that the image is rebuilt if there are any changes. -
The application should now be running and accessible at
http://localhost:5000
(or whatever port you specified in the.env
file). -
To stop the application, press
Ctrl+C
in the terminal where docker-compose is running. -
To run the containers in detached mode (in the background):
docker-compose up -d
-
To stop and remove the containers when running in detached mode:
docker-compose down
If you make changes to the application code or Dockerfile, you'll need to rebuild the Docker image:
docker-compose up --build
If running in detached mode, you can view the logs with:
docker-compose logs -f
- If you encounter any issues, ensure that the required ports are not being used by other applications.
- Check that all necessary environment variables are correctly set in your
.env
file. - Verify that Docker and Docker Compose are correctly installed and up to date.
For more detailed information on Docker Compose commands, refer to the official Docker Compose documentation.