AI News Generator – Powered by LangChain

07 Aug 2025

Reading time ~3 minutes

Introduction

The AI News Generator is a personal project I built to explore autonomous LLM agents and multi-step pipelines using LangChain. The goal is simple: given a topic, generate a well-structured blog article that is fact-checked, cited, and exportable.

The system uses LangChain chains to simulate editorial roles — research, validation, writing, and citation. It’s deployed with Streamlit, uses OpenRouter for model access, and Qdrant as a vector memory for validated facts.

Live Demo →

Motivation

I wanted to build something that mimics how a research team works:

Researcher → Fact-Checker → Writer → Editor

Rather than relying on a single prompt, this pipeline architecture distributes responsibility to different chains. This improves control, traceability, and explainability — important for factual writing and reducing hallucinations.

This was also a hands-on way to explore LangChain agents, multi-model inference, and the tradeoffs in real-world deployment (especially with memory backends like ChromaDB vs Qdrant).

Architecture

flowchart LR
    A[User Input: Topic] --> B[Research Chain]
    B --> C[Validation Chain]
    C --> D[Writer Chain]
    D --> E[Citation Chain]
    C --> F[Qdrant Vector Store]
    E --> G[Final Blog Output]

Each chain is an independent LangChain LLMChain with its own prompt template and input/output format. Agents communicate only through their intermediate results, keeping the design modular and testable.

Tools & Models

Component	Description
LangChain	Orchestrates multi-step chains with prompt templating
Streamlit	Frontend UI for interacting with the pipeline
OpenRouter	Model access for Mistral, Gemma, etc.
Qdrant	Stores validated facts as retrievable vectors
Tavily / Wiki	Used for external search sources
ReportLab / python-docx	Generates PDF and DOCX downloads

You can switch between models like Mistral Small 3.1 or Gemma 3 from the UI. More models via OpenRouter can be added easily.

Workflow

User enters a topic
Research Chain uses Tavily + Wiki to gather raw data
Validation Chain filters and fact-checks the results
Validated facts are stored in Qdrant
Writer Chain generates a readable blog draft
Citation Chain rewrites with inline references
Blog is shown on screen and exportable as PDF or DOCX

This flow mimics an editorial workflow, with clear responsibilities per stage.

Challenges

The hardest part? Deployment + Memory.

Initially, I used CrewAI with ChromaDB — great for local testing, but not deployable on Streamlit Cloud due to Chroma’s limitations. I tried switching CrewAI to use Qdrant, but found it still had tight coupling with Chroma internally.

So I rebuilt everything using LangChain + Qdrant, which gave me better deployment stability and memory flexibility. It also made the app easier to extend (e.g., adding new chains or models).

Other challenges:

Rate limits on OpenRouter’s free models
Handling long context windows
Designing prompt templates that pass clean outputs between chains

Key Learnings

Multi-agent systems are modular but fragile — small bugs in one chain can snowball
Vector memory is powerful but must be tuned to avoid duplicate retrievals
UI/UX in Streamlit can make or break the experience (fun facts while loading = user happiness)
Model choice matters — some LLMs hallucinate more than others even with the same prompt

Result

🧠 Clean chain architecture (no agent frameworks)
✅ Validated facts stored as long-term memory
📄 Exportable content for real-world use
🧪 Production-ready demo on Streamlit Cloud

Future Plans

Add document upload + summarization
Fine-tune prompts per model (Gemma/Mistral perform differently)
Add historical memory for previous topics via Qdrant filters

View on GitHub → —

👉 Also check out:
A similar project built using CrewAI instead of LangChain — leaner agent execution, easier role setup, same great results, minus Streamlit. Check the similar project →

“Let the agents do the research — you just pick the topic.”