This project aims to create an interactive RAG(Retrieval Augmented Generation) System to help users understand the Indian Budget by leveraging various technologies such as LLMA, Pinecone,LangChain ,FastAPI, Streamlit. The tool allows users to query budget documents and receive relevant answers based on their queries.
- Document Loading and Chunking: Loads PDF documents, splits them into smaller chunks for processing.
- Embedding and Indexing: Uses Pinecone and HuggingFace embeddings for efficient document retrieval.
- Query and Answer: Provides a FastAPI endpoint for querying the document index and retrieving answers.
- User Interface: A user-friendly interface built with Streamlit for easy interaction.
- Python 3.8+
- Pinecone API Key
- Groq API Key
-
Clone the repository:
git clone [https://github.com/your-username/indian-budget-expert.git](https://github.com/Ansumanbhujabal/BudgetExpert.git)
-
Create a virtual environment and activate it:
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install the required packages:
pip install -r requirements.txt
-
Create a
.env
file with your Pinecone API key:echo "PINECONE_API_KEY=your_pinecone_api_key" > .env echo "GROQ_API_KEY=your_groq_api_key" > .env
-
Download and Prepare Documents:
- Extract budget URLs from the internet using Firecrawl.
- Filter the URLs to include only those posted after 2023.
- Download the PDFs and manually select the useful ones.
-
Start FastAPI Server:
uvicorn main:app --reload
-
Start Streamlit Interface:
streamlit run main.py
-
FastAPI Endpoint:
- Endpoint:
/query
- Method:
POST
- Input:
{ "query": "Explain the allocation for healthcare", "top_k": 2 }
- Output:
{ "query": "Explain the allocation for healthcare", "answer": "The healthcare allocation has been increased by 10%..." }
- Endpoint:
-
Streamlit Interface:
- Navigate to the Streamlit interface in your browser.
- Enter your query in the input box and click "Get Answer".
- The answer will be displayed below the input box.
Contributions are welcome! Please feel free to submit a pull request or open an issue.
This project is licensed under the MIT License. See the LICENSE file for details.
Happy querying!