Skip to content

charlesl3/warmedge-chatbot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

173 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WarmGPT Backend (WarmEdge Chatbot)

Production-ready Retrieval-Augmented Generation (RAG) system for figure skating knowledge.

Overview

WarmGPT converts unstructured skating discussions (Reddit, GoldenSkate, Wiki) into structured knowledge and delivers grounded answers using a retrieval-first pipeline.

Stack - FAISS (vector search) - SentenceTransformers (embeddings) - Intent routing (rule-based) - Modular LLM layer - FastAPI backend - Railway deployment

Frontend (Next.js / Vercel) consumes this API via HTTP.


Architecture

User → Intent Router → FAISS Retrieval → Prompt Builder → LLM → Response

Design principles: - Retrieval-first (no blind LLM calls) - Deterministic search layer - Modular model abstraction - Clear separation of frontend and backend


Key Components

Retrieval

  • Embeddings stored in FAISS
  • Top-k similarity search
  • Metadata stored separately

Intent Routing

  • Greeting
  • Social message
  • Knowledge lookup Reduces unnecessary LLM calls.

LLM Layer

  • Called only after retrieval
  • Provider abstracted and replaceable

How to Run

The best bug-free way is to run from our website: Click here


Run Locally (optional)

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
uvicorn backend.server:app --reload

Test:

curl -X POST http://localhost:8000/chat -H "Content-Type: application/json" -d '{"message":"hi","history":[]}'

Deployment (Railway)

  • Push to GitHub
  • Connect repo to Railway
  • Set environment variable: HF_API_TOKEN=your_token_here
  • Start command: uvicorn backend.server:app --host 0.0.0.0 --port 8080

My Contribution

  • Designed full RAG architecture
  • Built knowledge distillation pipeline
  • Implemented FAISS retrieval + intent routing
  • Structured prompt builder + LLM abstraction
  • Deployed production backend (Railway)


Additional Features

1. Chip Mode Routing (light transformation layer)

  • Functions: detect_transform_mode(message), classify_query_intent(...)
  • Purpose: allow user to control how the answer is generated, not just what is asked

Behavior:

  • frontend injects Mode: simplify / deeper / drills / diagnose
  • backend detects mode and routes execution accordingly
  • distinguishes:
    • transformation tasks (simplify, deeper)
    • normal RAG queries (drills, diagnose)

Execution:

  • runs before answer_question(...) in /chat
  • branches into:
    • simplify → rewrite path (no RAG)
    • deeper → hybrid path (RAG + merge)
    • others → unchanged RAG pipeline, just a new regular query

If triggered:

  • simplify:
    • rewrites last assistant answer only
  • deeper:
    • combines last answer + new retrieval, then merges
  • drills / diagnose:
    • mapped to how_to / diagnosis intent

Effect:

  • introduces a lightweight “agent-like” routing layer
  • improves consistency (no unnecessary re-retrieval)
  • enables controlled depth expansion without breaking RAG

Agent Logic v1

1. Clarification (gatekeeping)

  • Function: needs_clarification(query, history)
  • Purpose: decide whether the system has enough information to answer

Behavior:

  • detects vague or underspecified queries
  • checks for missing skill level in recommendation questions
  • uses history to avoid unnecessary clarification

Execution:

  • runs after intent classification
  • only triggers when intent == "default"

If triggered:

  • stops pipeline and asks a follow-up question

2. State Construction (user understanding)

  • Function: build_skater_state(query)
  • Purpose: extract structured user information

Includes:

  • skill level (beginner / intermediate / advanced)
  • jump signals (axel, double, etc.)
  • body info (height, weight → categories)
  • experience type (adult vs standard)
  • goal (e.g., equipment recommendation)

Usage:

  • used later in prompt to condition answers
  • provides implicit context when user input is incomplete

3. Intent Classification (routing)

  • Function: classify_query_intent(query, history)
  • Purpose: decide how to process the query

Current intents:

  • how_to → actionable improvement
  • comparison → compare options
  • diagnosis → infer causes from symptoms
  • experience_lookup → general explanation
  • default → fallback

Role:

  • executed before clarification
  • controls downstream behavior and response type

4. Answer Planning (response control)

  • Function: build_answer_plan(query, intent, state, history, clarify)
  • Purpose: determine how the answer should be constructed

Behavior:

  • assigns:
    • mode (coaching / diagnosis / explanation / comparison / standard / clarification)
    • depth (short / medium / detailed)
    • structure (e.g., causes → what to try → drills)
    • context usage flags (use_context, avoid_repetition)

Execution:

  • runs after intent and clarification
  • output is passed into answer_question(...)

Role:

  • controls response style independently of retrieval
  • ensures different query types produce different answer structures

5. Dynamic Retrieval Depth (adaptive k)

  • Function: choose_k(query, intent, state, history)
  • Purpose: adjust retrieval depth based on context

Behavior:

  • increases k for short or vague queries
  • decreases k when strong skill signals exist
  • decreases k when prior context is present
  • varies k by intent

Execution:

  • runs before retrieval
  • determines number of documents retrieved

6. Retrieval Fallback (retry strategy)

  • Location: answer_question()

Behavior:

  • if initial retrieval is weak:
    • generate fallback query
    • perform second retrieval
  • if still weak:
    • trigger clarification

Flow:

  • retrieve → weak → retry → (success → answer) / (fail → clarify)

7. Self-Repair Loop (answer-level retry)

  • Location: /chat endpoint in server.py

Behavior:

  • after generating the first answer:
    • evaluate answer quality (length, vagueness, retrieval strength)
  • if answer is weak:
    • trigger a second pass through full RAG pipeline
  • if second pass improves retrieval:
    • replace original answer
  • otherwise:
    • keep original answer

Flow:

  • retrieve → answer → judge →
    (strong → return) / (weak → retry → compare → return best)

8. Smart Follow-up Layer (LLM-powered continuation)

  • Location: /chat in server.py + helpers in agent.py

Behavior:

  • after final answer (post self-repair):
    • decide if a follow-up is useful based on:
      • intent (how_to / diagnosis / comparison)
      • missing user context (e.g., skill level)
      • weak or repaired retrieval
  • if triggered:
    • build a structured prompt (query + answer + intent + state + recent history)
    • call LLM to generate one short, specific follow-up question
    • append to the answer
  • otherwise:
    • return answer as-is

Constraints:

  • at most one question
  • short (≤ ~20 words)
  • context-aware and skating-specific
  • no generic phrasing
  • does not modify original answer

Flow:

  • retrieve → answer → repair →
    followup_decision → (no → return) / (yes → LLM_generate → append → return)

Agent Logic v2.0

1. Clarification (gatekeeping)

  • Purpose: decide whether the system has enough information to generate a useful answer before running full RAG.

  • Main functions:

    • semantic_clarification_check()
      • thin semantic LLM controller
    • needs_clarification()
      • backend clarification gatekeeper
  • v2 improvements:

    • moved from keyword-based clarification
      • "recommend" in query
    • to semantic clarification understanding
      • thinking about upgrading my boots
  • Trigger examples:

    • missing skating level for equipment recommendations
    • ambiguous short domain terms (loop)
    • vague references (which one?)
  • Usually does NOT clarify:

    • technique/how-to questions
    • factual questions
    • short replies to previous clarification
  • Convergence logic:

    • uses:
      • MAX_CLARIFICATIONS
      • force_answer
    • prevents endless clarification loops
    • forces answering after enough clarification
  • Conversation-state awareness:

    • detects when user is replying to a clarification question
    • activates:
      • force_answer = True
  • Philosophy:

    • clarify only when answer would become:
      • misleading
      • useless
      • badly personalized
    • prefer useful answers over excessive questioning

2. Answer Plan Builder (generation planning)

  • Purpose: build a lightweight answer-generation strategy before LLM generation.

  • Main functions:

    • build_answer_plan()
  • Responsibilities:

    • determine:
      • response structure
      • coaching depth
      • diagnosis framing
      • practical-action emphasis
      • tone softening
  • v2 improvements:

    • moved from static prompt behavior
    • to dynamic answer planning
    • answer structure now adapts to:
      • intent
      • secondary intents
      • user state
      • topic
  • Example plan signals:

    • "what_to_try"
    • "likely_causes"
    • "equipment_note"
    • "confidence_softening"
  • Philosophy:

    • generation should not use:
      • one-prompt-fits-all
    • build an answer strategy first
    • then let the LLM execute the plan

3. Retrieval Query Construction + Fallback + Repair

Retrieval Query Construction

  • Purpose: build the complete immutable retrieval profile before retrieval.

  • Main functions:

    • build_retrieval_profile()
    • maybe_merge_history()
  • Responsibilities:

    • semantic query construction
    • terminology normalization
    • history-aware merging
    • intent-aware enrichment
    • topic-aware enrichment
  • v2 improvements:

    • semantic query mutation now happens ONLY here
    • removed:
      • hidden fallback rewriting
      • repair-time query rewriting
      • scattered intent injection
  • Philosophy:

    • retrieval semantics should freeze before retrieval
    • avoid hidden query mutation later in pipeline

Retrieval Evaluator + Fallback

  • Purpose: evaluate retrieval quality before generation and recover weak retrieval mechanically.

  • Main functions:

    • evaluate_retrieval_quality()
    • apply_fallback_if_needed()
  • Retrieval checks:

    • top score
    • score ambiguity
    • document count
    • retrieval confidence
  • v2 improvements:

    • fallback separated completely from repair
    • fallback now modifies:
      • retrieval strategy only
    • fallback does NOT modify:
      • semantic query meaning
  • Example fallback actions:

    • increase retrieval pool
    • use alternate prebuilt query variants
    • broaden retrieval depth
  • Philosophy:

    • weak retrieval is a retrieval problem
    • recover retrieval before generation
  • Purpose: evaluate generated answers and repair weak answers after generation.

  • Main functions:

    • evaluate_answer_quality()
    • repair_answer()
  • Evaluation checks:

    • vague language
    • weak structure
    • missing practical guidance
    • unsupported reasoning
    • missing semantic focus coverage from the original query
  • Focus term system:

    • semantic focus terms are extracted upstream by the intent controller
    • repair evaluation checks whether generated answers remain anchored to important query concepts
    • examples:
      • adult
      • camel spin
      • blade change
      • outside edge
      • Edea Chorus
    • prevents:
      • generic but fluent answers
      • semantic drift
      • incomplete comparisons
  • v2 improvements:

    • removed:
      • repair-time retrieval reruns
      • repair-time query rewriting
    • added:
      • semantic focus-term coverage checks
    • repair now modifies:
      • generation behavior only
  • Philosophy:

    • repair should improve answer quality
    • repair should preserve semantic alignment with the original user request
    • repair should not secretly rerun retrieval

4. Retrieval Query Construction + Conversational Merge

  • Purpose: build a stable retrieval query before retrieval execution.

  • Main functions:

    • build_retrieval_profile()
    • maybe_merge_history()
    • is_self_contained_query()
    • is_incomplete_continuation()
  • Responsibilities:

    • query normalization
    • intent-aware enrichment
    • topic-aware enrichment
    • conversational context reconstruction
  • v2 improvements:

    • moved from blind history concatenation
    • to semantic merge gating
    • semantic query mutation now happens ONLY here
  • Merge philosophy:

    • merge only when current query depends on previous context
    • avoid unrelated conversational contamination
  • Usually merges:

    • short incomplete replies
      • Jackson
      • only on the right foot
      • that edge
    • context-dependent continuations
  • Usually does NOT merge:

    • self-contained descriptive questions
    • clear topic pivots
    • fully specified equipment/technique questions
  • Previous issue:

    • semantically related but unrelated skating topics could merge
      • loop turn
      • lutz scratching
    • caused retrieval noise and ambiguous retrieval scores
  • Current approach:

    • first checks:
      • semantic completeness
    • then checks:
      • conversational dependency
    • embeddings now act only as:
      • soft merge fallback
  • Philosophy:

    • retrieval should reconstruct conversational intent
    • not blindly concatenate recent messages
    • prioritize retrieval precision over excessive memory inheritance

5. Smart Follow-Up Agent (conversation continuation)

  • Purpose: decide whether the conversation should continue after the main answer, and generate a targeted follow-up only when useful.

  • Main functions:

    • build_followup_decision()
      • backend continuation-state controller
    • build_followup_prompt()
      • strategy-aware follow-up generation prompt
  • v3 improvements:

    • moved from:
      • simple intent-based follow-up triggering
    • to:
      • state-aware continuation reasoning
  • Uses system state from:

    • retrieval evaluation
    • fallback recovery
    • repair results
    • clarification state
    • retrieval merge state
    • confidence estimation
  • Follow-up reasons:

    • retrieval_ambiguity
    • repair_recovery
    • context_continuation
    • missing_user_state
    • progression_coaching
  • Trigger examples:

    • ambiguous skating symptoms
    • repaired but still uncertain answers
    • context-dependent continuation queries
    • missing useful skating state
    • medium-confidence diagnosis/coaching answers
  • Usually does NOT follow up:

    • high-confidence resolved answers
    • clarification-active conversations
    • short factual answers
    • fully resolved retrieval situations
  • Clarification interaction:

    • clarification and follow-up are mutually exclusive
    • prevents stacked questioning behavior
    • avoids conversational instability
  • Conversation-state awareness:

    • uses:
      • used_history_merge
      • retrieval confidence
      • fallback traces
      • repair traces
    • detects unresolved conversational state
  • Philosophy:

    • follow-up should resolve remaining uncertainty
    • continuation should be targeted, not generic
    • avoid:
      • engagement fluff
      • repetitive questions
      • unnecessary continuation pressure

6. Retrieval Strategy Builder (retrieval orchestration)

  • Purpose: build retrieval behavior before RAG retrieval.

  • Main functions:

    • build_retrieval_strategy()
  • Responsibilities:

    • determine:
      • retrieval breadth
      • exploration level
      • retrieval precision
      • retrieval k
  • v2 improvements:

    • moved from:
      • flat intent → k
    • to:
      • semantic retrieval strategy
  • Uses:

    • primary intent
    • secondary intents
    • topic
    • skater state
    • prior context
  • Strategy signals:

    • primary_diagnosis
    • diagnosis_with_equipment
    • comparison_with_equipment_precision
    • strong_skill_signal
    • specific_topic
  • Philosophy:

    • retrieval behavior should depend on:
      • semantic uncertainty
      • topic specificity
      • mixed intent composition
    • not only:
      • one flat intent

WarmGPT V2 - user signup/login features

  • Supabase auth + persistent sessions
  • Persistent skater profiles
  • Profile-aware generation
  • Semantic profile evolution
  • Human-confirmed memory mutation
  • Async backend persistence

Dual-State Architecture

Persistent Profile

Long-term skater identity.

Transient Query State

Per-message inferred skating context.

Key decision: persistent identity and transient conversation context are separated.


Semantic Memory Pipeline

User Message
    ↓
LLM Semantic Detection
    ↓
Strict Validation
    ↓
Frontend Confirmation
    ↓
Async Backend Persistence

Canonical Skill Normalization

"1fl"        → 1F
"double sal" → 2S
"ax-el"      → 1A

Key decision: persistent storage remains canonicalized.


Backend Trust Boundary

Frontend:

  • UI
  • interaction

Backend:

  • semantic authority
  • validation
  • persistence

User contents storage 1 - frequently asked topics

  • Added silent async recurring-topic memory for authenticated users
  • Introduced lightweight LLM-based topic extraction with normalized skating-domain tags
  • Built persistent semantic memory layer using Supabase (user_topic_memory)
  • Separated long-term skating interests from temporary mechanics/symptoms
  • Added soft recurring-topic prompt injection ([RECENT USER FOCUS]) for lightweight personalization
  • Ensured all topic extraction + DB writes run asynchronously after response generation
  • Kept semantic memory invisible to frontend and non-blocking to Q&A latency
  • Cleaned architecture by consolidating active user state into profiles table

Debugging Agents

fixed clarification + follow up question bug

  • Discovered semantic drift caused by parallel agent interactions after introducing jump detection and profile-update agents
  • Found that clarification follow-up replies were sometimes incorrectly treated as standalone intents
  • Observed retrieval and answer planning drifting away from the original unresolved question during clarification flows
  • Identified weak clarification-attachment logic and lack of conversational anchor preservation as core causes
  • Added explicit clarification state tracking instead of heuristic question-based inference
  • Added semantic clarification attachment resolution using LLM reasoning
  • Introduced conversational anchor preservation before downstream retrieval/planning/state extraction
  • Switched downstream routing to use resolved conversational query instead of raw latest user message
  • Prevented jump/profile semantic agents from hijacking clarification follow-up turns into unrelated semantic topics
  • Established lightweight orchestration guidance across parallel semantic agents to stabilize multi-agent conversational behavior
  • Added state-aware sharpening reasoning: WarmGPT now evaluates whether logged skating hours realistically support blade dullness, explains possible mismatches (under-logging, bad sharpening, mounting issues), and softly encourages better session tracking for more accurate equipment awareness.

About

WarmGPT: a figure skating–focused AI assistant powered by a retrieval-augmented knowledge base built from real skater experiences, discussions, and practical insights.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages