Personal Website’s RAG

1 minute read

🤖 Intelligent Q&A System for My Personal Website

Overview

I built an intelligent Retrieval-Augmented Generation (RAG) system that transforms my static personal website into an interactive knowledge base. Visitors can now ask questions about my projects, experience, and content, receiving accurate answers sourced directly from my website’s content.

🏗️ Architecture & Technology Stack

Core Components

LangChain: Orchestrates the RAG pipeline and document processing
FAISS: High-performance vector database for semantic search
Google Gemini 2.0 Flash Lite: Latest LLM for natural language understanding and generation
FastAPI: Modern, fast web framework for the RESTful API
Fly.io: Cost-effective deployment with scale-to-zero capabilities

Data Pipeline

The system automatically syncs with my GitHub repository containing website content:

def clone_and_load_repo(github_url, local_dir):
    if os.path.exists(local_dir):
        Repo(local_dir).remotes.origin.pull()
    else:
        Repo.clone_from(github_url, local_dir)

    documents = []
    for root, _, files in os.walk(local_dir):
        for file in files:
            if file.endswith((".md")):
                loader = TextLoader(os.path.join(root, file))
                documents.extend(loader.load())
    return documents

🔄 RAG Implementation Details

Vector Store Creation

Documents are processed through a sophisticated chunking strategy to maintain context while enabling efficient retrieval:

def save_vector_store(docs):
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=200
    )
    splits = text_splitter.split_documents(docs)

    embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
    vectorstore = FAISS.from_documents(splits, embeddings)
    vectorstore.save_local("vector_store/faiss_index")

Retrieval Chain Setup

The RAG chain combines Google’s latest embedding model with Gemini 2.0 Flash Lite for optimal and cost-efficient performance:

def setup_rag():
    embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
    vectorstore = FAISS.load_local(
        "vector_store/faiss_index",
        embeddings,
        allow_dangerous_deserialization=True
    )

    llm = ChatGoogleGenerativeAI(
        model="gemini-2.0-flash-lite-preview-02-05",
        temperature=0.5,
    )

    retrieval_qa_chat_prompt = hub.pull("langchain-ai/retrieval-qa-chat")
    combine_docs_chain = create_stuff_documents_chain(llm, retrieval_qa_chat_prompt)
    rag_chain = create_retrieval_chain(vectorstore.as_retriever(), combine_docs_chain)

    return rag_chain

🚀 API Design & Deployment

FastAPI Backend

Clean, documented API with proper error handling and CORS configuration:

@app.post("/ask")
async def ask(query: Query):
    try:
        result = qa.invoke({"input": query.question})
        return {
            "answer": result["answer"],
            "sources": [doc.metadata["source"] for doc in result["context"]],
        }
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

📊 Technical Highlights

Semantic Search: FAISS enables lightning-fast similarity search across embedded content
Context Preservation: RecursiveCharacterTextSplitter maintains document coherence
Latest AI: Gemini 2.0 Flash Lite provides state-of-the-art language understanding
Production Ready: Proper error handling, logging, and security measures

🔄 Future Enhancements

Multi-modal support: Image and video content understanding
Conversation memory: Multi-turn dialogue capabilities
Cloud-based vector stores: Such as Pinecone