MEEKER <AJM> USER: AJM@NYC-01 SESSION 04-17-26 · 09:42:11 EST LIVE

SECTION 03 · SUBSYSTEM REPORT

Email Assistant System

200+ emails triaged every day. LLM classification with Gemini Flash, a participant intelligence graph across 2,580+ entities, and Playwright RPA for response automation, with end-to-end vector search over the full inbox archive.

Production Gemini Flash Playwright Vector Search Published March 2026
200+ Emails triaged daily. LLM classification, participant graph, and automated response pipeline, all local, all searchable.

Email Assistant System

Executive Summary

The Email Assistant is an intelligent email processing and response generation system that transforms the publisher's inbox from a time-consuming bottleneck into an automated workflow engine. Built on Claude AI with Playwright browser automation, it handles the complete lifecycle of email communication: triaging 174K+ emails with 4-tier classification, generating contextually-aware draft responses using few-shot learning from 11,332 historical examples, and executing automated actions like RSVP form filling and calendar management. The system's adaptive memory architecture learns from every human edit, continuously improving its understanding of communication preferences, sender relationships, and organizational knowledge. For a single-person newsroom, this represents the difference between drowning in email and having capacity for journalism.


Section 1: Technical Architecture

System Overview:

The Email Assistant operates as a multi-stage pipeline where emails flow through classification, task extraction, context enrichment, response generation, and human approval before any action is taken. The architecture enforces human-in-the-loop (HITL) gates at critical decision points while automating the mechanical work of drafting and form filling.

Core Technologies:

Component Technology Purpose
Email Ingestion Gmail API + Service Account Extract emails with streaming attachment upload to Google Drive
Classification Gemini 2.0 Flash Lite Fast 4-tier triage (email/notify/no/spam) at ~10x throughput
Deterministic Filter Regex + Domain Lists Pre-filter spam before LLM calls, blocks scam TLDs
Response Generation Claude Sonnet 4 Draft emails with tool calling for structured outputs
Context Retrieval pgvector HNSW Semantic search across 3.6M email chunks
Browser Automation Playwright Form filling, RSVP completion, screenshot capture
Task Orchestration Custom Python Route tasks to skills, track lifecycle, handle retries
Memory System PostgreSQL + SentenceTransformer Adaptive learning with 4-level scope hierarchy

Email Processing Pipeline:

  1. Ingestion: Gmail API extraction via gmail_service_account_extractor_with_dedup.py pulls new emails, computes SHA-256 fingerprints for deduplication, and streams attachments to Google Drive. ICS calendar attachments are automatically parsed into email_events records.

  2. Classification: The triage system operates in two phases. First, deterministic_filter.py applies rule-based spam detection (scam TLDs, known spam domains, newsletter patterns) to block obvious noise before any LLM calls. Remaining emails go to Gemini Flash for 4-tier classification:

  3. email: Requires response from publisher
  4. notify: Awareness-only (press releases, FYI items)
  5. no: Not relevant to newsroom operations
  6. spam: Blocked and flagged for filter learning

  7. Entity Extraction: SpaCy NER identifies people, organizations, and locations. The participant intelligence system (extract_email_participants.py) builds relationship maps showing who communicates with whom, role classifications (sender/recipient/CC/BCC), and communication frequency patterns across 453K+ participant records.

  8. Context Retrieval: When generating a response, context_enrichment.py retrieves:

  9. Sender profile from sender_profiles (organization, role, communication history)
  10. Prior thread messages with bidirectional history
  11. Adaptive memories from the 4-level scope hierarchy
  12. Similar historical responses via few-shot retrieval
  13. Relevant RSVP URLs detected in email content

  14. Response Generation: Claude Sonnet 4 receives a comprehensive prompt including writing instructions, sender context, few-shot examples, and learned preferences. The model uses tool calling to produce structured outputs (ResponseEmailDraft, NewEmailDraft, Question, MeetingAssistant). Calendar and RSVP actions are extracted as tasks rather than executed inline.

  15. Human Review: The CLI (email_assistant/cli.py) presents drafts for approval with full context. Reviewers can:

  16. Send as-is (immediate feedback: draft worked perfectly)
  17. Edit and send (triggers memory extraction from diff)
  18. Revise with notes (regenerate with additional instructions)
  19. Answer questions (continue multi-turn conversation)
  20. Skip or discard

Integration Points:

  • Data Warehouse: All email metadata, classifications, drafts, and task state persist to PostgreSQL with full audit trails. The ea_thread_triage, ea_email_drafts, and ea_tasks tables track the complete lifecycle.

  • Memory System: The adaptive memory architecture (email_assistant/memory/) integrates with draft generation via ContextEnricher. Memories are scoped to sender, organization, project, or global levels and are retrieved using RRF (Reciprocal Rank Fusion) scoring.

  • Skill Registry: The TaskOrchestrator dispatches extracted tasks to registered skills including RSVPSkill (form filling), SmartFormSkill (intelligent form analysis), CalendarSkill, and DraftingSkill.

Key Technical Achievements:

  • 10x throughput improvement via deterministic pre-filtering and Gemini Flash migration
  • Adaptive memory with effectiveness tracking - memories that lead to rejected drafts get decayed
  • Two-phase HITL for browser automation - analyze form, show preview, await approval, then execute

Section 2: Features & Standards

Core Capabilities:

  1. Intelligent Triage: Processes 174K+ emails with 4-tier classification. Deterministic rules handle obvious cases (newsletters, spam TLDs), preserving LLM capacity for nuanced decisions. Backlog mode skips financial/social notifications for efficient catchup processing.

  2. Voice-Accurate Drafting: Generates responses in the publisher's voice using few-shot learning. The system retrieves successful past responses to similar email types (press inquiries, event invitations, collaboration requests) and includes them in the prompt as style examples.

  3. Task Extraction & Orchestration: Parses emails for actionable items (RSVPs, calendar events, form submissions) and routes them through a skill-based execution system with retry logic and failure escalation.

  4. Automated Form Filling: The SmartFormSkill uses Claude Vision to analyze any web form, compose appropriate values from organization memory, and present a preview for human approval before submission. Captures screenshots as evidence at each stage.

  5. Adaptive Memory: Learns from human edits to improve future drafts. The pipeline: parse edit diff deterministically, pass quality gate, extract memory via LLM, reconcile with existing memories using AUDN (Add/Update/Delete/Noop) operations, track effectiveness based on subsequent acceptance rates.

Standards & Best Practices:

  • Privacy & Security: Emails remain in PostgreSQL on local infrastructure. Service account authentication isolates Gmail access. Sensitive fields (API keys, service account JSON) live in .env outside version control. Attachment uploads go to organization-owned Google Drive.

  • Tone Matching: Writing instructions encode the publisher's voice: warm, direct, business-development oriented. Anti-patterns are explicitly blocked (no em-dashes, no "I hope this email finds you well", no corporate jargon). Few-shot examples ground generation in proven successful responses.

  • Accuracy Safeguards: The Question tool lets the model ask for information rather than guess. Date awareness prevents embarrassing "looking forward to your event" responses for past events. The quality gate blocks memory extraction from trivial edits (typo fixes, whitespace changes) that would pollute the learned preference corpus.

Evolution:

The Email Assistant began as a simple binary triage system in October 2025, classifying emails as "needs response" or "no action." Within weeks, user feedback drove expansion to 4-tier classification to distinguish "notification-only" emails from true no-action items.

The drafting system shipped in November 2025, initially using static prompts. Few-shot retrieval was added in December after observing inconsistent voice matching. The breakthrough came when historical response data was processed to create a consolidated corpus of 1,417 high-quality examples (reduced from 11,332 via outcome-aware ranking).

January 2026 saw the reflection system - an attempt to learn from human edits by sending raw diffs to Claude for preference extraction. Research revealed this approach degrades over time without verification. The reflection system was replaced in February 2026 with the current adaptive memory architecture based on deterministic feedback parsing and effectiveness tracking.

Browser automation matured through March 2026, evolving from simple RSVP filling to the SmartFormSkill that can handle organizational information forms, surveys, and grant applications by composing appropriate responses from memory.


Section 3: Impact on News Operations

Time Savings:

The Email Assistant eliminates the most repetitive aspects of inbox management: - Triage: What previously required scanning every email (2-3 hours/day) is now pre-classified. The reviewer sees actionable items first, notifications separately, and never sees spam. - Response drafting: Common email types (press releases, event invitations, collaboration inquiries) get complete draft responses in seconds rather than requiring composition from scratch. - Form filling: Event RSVPs that took 5-10 minutes each (navigate, fill fields, submit) are now handled in under a minute with human approval.

Quality Improvements:

AI-assisted drafting produces more consistent communication than human-only workflows: - No missed follow-ups: Every actionable email gets a draft response queued - Consistent routing: Press releases always get routed to editorial AND pitched for advertising - Historical context: The system knows the full communication history with each sender, preventing duplicate asks or inconsistent responses - Deadline awareness: Date-aware prompting prevents embarrassing responses to stale emails

Relationship Management:

The sender profile system (sender_profiles with 10K records) provides instant context on any email sender: - Organization and role classification - Communication frequency and patterns - Prior response history - Network connections (who else they communicate with)

This context flows into draft generation, ensuring responses acknowledge existing relationships rather than treating every email as a cold contact.

Mission Alignment:

For a single-person newsroom, email management is existential. Without automation, the publisher's time is consumed by administrative correspondence rather than journalism. The Email Assistant inverts this equation: AI handles the mechanical work (classification, drafting, form filling) while humans retain control over final communication decisions.

The system explicitly supports the publisher's business development role - responses to promotional content automatically include advertising pitches alongside editorial routing. This dual-track approach ensures no business opportunity is missed while maintaining editorial credibility.

The adaptive memory system means the assistant gets better over time. Every edit teaches it something new about communication preferences. After six months of operation, the system requires fewer human corrections because it has learned the nuances of when to be formal versus casual, which senders prefer brief responses, and how to handle recurring request types.