← Back
← Part of: Building an AI-Powered Local Newsroom

Email Assistant System

Executive Summary

The Email Assistant is an intelligent email processing and response generation system that transforms the publisher's inbox from a time-consuming bottleneck into an automated workflow engine. Built on Claude AI with Playwright browser automation, it handles the complete lifecycle of email communication: triaging 174K+ emails with 4-tier classification, generating contextually-aware draft responses using few-shot learning from 11,332 historical examples, and executing automated actions like RSVP form filling and calendar management. The system's adaptive memory architecture learns from every human edit, continuously improving its understanding of communication preferences, sender relationships, and organizational knowledge. For a single-person newsroom, this represents the difference between drowning in email and having capacity for journalism.


Section 1: Technical Architecture

System Overview:

The Email Assistant operates as a multi-stage pipeline where emails flow through classification, task extraction, context enrichment, response generation, and human approval before any action is taken. The architecture enforces human-in-the-loop (HITL) gates at critical decision points while automating the mechanical work of drafting and form filling.

Core Technologies:

Component Technology Purpose
Email Ingestion Gmail API + Service Account Extract emails with streaming attachment upload to Google Drive
Classification Gemini 2.0 Flash Lite Fast 4-tier triage (email/notify/no/spam) at ~10x throughput
Deterministic Filter Regex + Domain Lists Pre-filter spam before LLM calls, blocks scam TLDs
Response Generation Claude Sonnet 4 Draft emails with tool calling for structured outputs
Context Retrieval pgvector HNSW Semantic search across 3.6M email chunks
Browser Automation Playwright Form filling, RSVP completion, screenshot capture
Task Orchestration Custom Python Route tasks to skills, track lifecycle, handle retries
Memory System PostgreSQL + SentenceTransformer Adaptive learning with 4-level scope hierarchy

Email Processing Pipeline:

  1. Ingestion: Gmail API extraction via gmail_service_account_extractor_with_dedup.py pulls new emails, computes SHA-256 fingerprints for deduplication, and streams attachments to Google Drive. ICS calendar attachments are automatically parsed into email_events records.

  2. Classification: The triage system operates in two phases. First, deterministic_filter.py applies rule-based spam detection (scam TLDs, known spam domains, newsletter patterns) to block obvious noise before any LLM calls. Remaining emails go to Gemini Flash for 4-tier classification:

  3. email: Requires response from publisher
  4. notify: Awareness-only (press releases, FYI items)
  5. no: Not relevant to newsroom operations
  6. spam: Blocked and flagged for filter learning

  7. Entity Extraction: SpaCy NER identifies people, organizations, and locations. The participant intelligence system (extract_email_participants.py) builds relationship maps showing who communicates with whom, role classifications (sender/recipient/CC/BCC), and communication frequency patterns across 453K+ participant records.

  8. Context Retrieval: When generating a response, context_enrichment.py retrieves:

  9. Sender profile from sender_profiles (organization, role, communication history)
  10. Prior thread messages with bidirectional history
  11. Adaptive memories from the 4-level scope hierarchy
  12. Similar historical responses via few-shot retrieval
  13. Relevant RSVP URLs detected in email content

  14. Response Generation: Claude Sonnet 4 receives a comprehensive prompt including writing instructions, sender context, few-shot examples, and learned preferences. The model uses tool calling to produce structured outputs (ResponseEmailDraft, NewEmailDraft, Question, MeetingAssistant). Calendar and RSVP actions are extracted as tasks rather than executed inline.

  15. Human Review: The CLI (email_assistant/cli.py) presents drafts for approval with full context. Reviewers can:

  16. Send as-is (immediate feedback: draft worked perfectly)
  17. Edit and send (triggers memory extraction from diff)
  18. Revise with notes (regenerate with additional instructions)
  19. Answer questions (continue multi-turn conversation)
  20. Skip or discard

Integration Points:

Key Technical Achievements:


Section 2: Features & Standards

Core Capabilities:

  1. Intelligent Triage: Processes 174K+ emails with 4-tier classification. Deterministic rules handle obvious cases (newsletters, spam TLDs), preserving LLM capacity for nuanced decisions. Backlog mode skips financial/social notifications for efficient catchup processing.

  2. Voice-Accurate Drafting: Generates responses in the publisher's voice using few-shot learning. The system retrieves successful past responses to similar email types (press inquiries, event invitations, collaboration requests) and includes them in the prompt as style examples.

  3. Task Extraction & Orchestration: Parses emails for actionable items (RSVPs, calendar events, form submissions) and routes them through a skill-based execution system with retry logic and failure escalation.

  4. Automated Form Filling: The SmartFormSkill uses Claude Vision to analyze any web form, compose appropriate values from organization memory, and present a preview for human approval before submission. Captures screenshots as evidence at each stage.

  5. Adaptive Memory: Learns from human edits to improve future drafts. The pipeline: parse edit diff deterministically, pass quality gate, extract memory via LLM, reconcile with existing memories using AUDN (Add/Update/Delete/Noop) operations, track effectiveness based on subsequent acceptance rates.

Standards & Best Practices:

Evolution:

The Email Assistant began as a simple binary triage system in October 2025, classifying emails as "needs response" or "no action." Within weeks, user feedback drove expansion to 4-tier classification to distinguish "notification-only" emails from true no-action items.

The drafting system shipped in November 2025, initially using static prompts. Few-shot retrieval was added in December after observing inconsistent voice matching. The breakthrough came when historical response data was processed to create a consolidated corpus of 1,417 high-quality examples (reduced from 11,332 via outcome-aware ranking).

January 2026 saw the reflection system - an attempt to learn from human edits by sending raw diffs to Claude for preference extraction. Research revealed this approach degrades over time without verification. The reflection system was replaced in February 2026 with the current adaptive memory architecture based on deterministic feedback parsing and effectiveness tracking.

Browser automation matured through March 2026, evolving from simple RSVP filling to the SmartFormSkill that can handle organizational information forms, surveys, and grant applications by composing appropriate responses from memory.


Section 3: Impact on News Operations

Time Savings:

The Email Assistant eliminates the most repetitive aspects of inbox management: - Triage: What previously required scanning every email (2-3 hours/day) is now pre-classified. The reviewer sees actionable items first, notifications separately, and never sees spam. - Response drafting: Common email types (press releases, event invitations, collaboration inquiries) get complete draft responses in seconds rather than requiring composition from scratch. - Form filling: Event RSVPs that took 5-10 minutes each (navigate, fill fields, submit) are now handled in under a minute with human approval.

Quality Improvements:

AI-assisted drafting produces more consistent communication than human-only workflows: - No missed follow-ups: Every actionable email gets a draft response queued - Consistent routing: Press releases always get routed to editorial AND pitched for advertising - Historical context: The system knows the full communication history with each sender, preventing duplicate asks or inconsistent responses - Deadline awareness: Date-aware prompting prevents embarrassing responses to stale emails

Relationship Management:

The sender profile system (sender_profiles with 10K records) provides instant context on any email sender: - Organization and role classification - Communication frequency and patterns - Prior response history - Network connections (who else they communicate with)

This context flows into draft generation, ensuring responses acknowledge existing relationships rather than treating every email as a cold contact.

Mission Alignment:

For a single-person newsroom, email management is existential. Without automation, the publisher's time is consumed by administrative correspondence rather than journalism. The Email Assistant inverts this equation: AI handles the mechanical work (classification, drafting, form filling) while humans retain control over final communication decisions.

The system explicitly supports the publisher's business development role - responses to promotional content automatically include advertising pitches alongside editorial routing. This dual-track approach ensures no business opportunity is missed while maintaining editorial credibility.

The adaptive memory system means the assistant gets better over time. Every edit teaches it something new about communication preferences. After six months of operation, the system requires fewer human corrections because it has learned the nuances of when to be formal versus casual, which senders prefer brief responses, and how to handle recurring request types.


Report compiled from 321 commits spanning July 2025 to March 2026 Primary source files: email_assistant/ (56 Python modules across cli, orchestration, drafting, skills, memory, rsvp)