TECHNICAL

Teaching GPT-4 to Slide Into DMs (Professionally)

February 24, 202615 MIN READ

The Issue

Here's a universal truth in sales: a lead who fills out a form at 2 AM, fueled by career anxiety and caffeine, will not remember you exist by 9 AM. Multiply that by thousands of leads per month. The sales team was drowning -- not in leads, but in the graveyard of leads that went cold 48 hours after first contact.

Manual follow-ups don't scale. Emails have an open rate that hovers somewhere between "spam" and "your aunt's forwarded chain mail." But WhatsApp? WhatsApp messages get read. Every single one. People will ignore a calendar invite from their manager but open a WhatsApp message from an unknown number within minutes. We decided to meet leads where they already live.

The Goal

Build a GPT-4 powered WhatsApp agent that:

Re-engages cold leads through natural conversation -- not "Dear Sir/Madam" energy
Extracts structured data (name, role, company, course interest) from freeform chat
Knows when a human sales rep has taken over and shuts up accordingly
Supports broadcast outbound messaging for scholarship alerts, session reminders, and no-show follow-ups
Patches collected data back into the user table so the CRM stays current without manual entry

The Solution

WATI API as the WhatsApp backbone. All message send/receive flows through WATI's API. Inbound messages hit a Fastify webhook (/services/wati), get validated, and are queued into BullMQ for async processing. Outbound broadcasts use WATI's template message API with provider-specific template name mappings.

Structured prompt design, not "be helpful." The GPT-4 agent uses a Role-Goals-Constraints framework baked into the system message. The inbound agent's prompt defines a strict data collection priority order (Name > Course > Role > Company > Experience > Query) and explicit rules: never re-ask confirmed details, know when to stop, refuse off-topic questions. The outbound re-engagement agent follows a Diagnose-Resolve-Convert flow with message chunking rules (max 3 bubbles per phase, 2-3 sentences each). This isn't prompt engineering -- it's prompt architecture.

Zod-validated structured output. The agent's response schema is enforced via zod and OpenAI's withStructuredOutput(). Every response includes both responseToPerson (the message text) and collectedData (a structured object with fields like NAME, CURRENT_ROLE, COURSE_NAME). The model doesn't just chat -- it fills a form while chatting.

Operator detection at the webhook layer. When a sales rep manually replies via WATI's dashboard, the webhook payload includes an operatorEmail field. If present, the message is logged but the AI agent is never invoked. This check happens early in the webhook handler -- before the message even hits the queue. Simple, but getting this wrong means the AI argues with your own sales team.

Conversation context via collected_data persistence. Each bot response stores a collectedData JSONB column on the whatsapp_message table. Before generating a new response, the system fetches the last bot message's collectedData and injects it into the prompt as previously_collected_data. The agent never asks "what's your name?" twice.

Broadcast outbound system. Scholarship notifications, session reminders, and re-engagement templates flow through BullMQ workers. The scholarship engine checks eligibility (quiz scores + lead ratings), selects the right template, and dispatches both channels with a 20-minute delay after opportunity creation.

Architecture

Loading diagram...

Complexities Faced

Structured extraction from casual chat. A user saying "I'm Rahul, been doing backend stuff at Flipkart for 3 years" needs to map to {NAME: "Rahul", CURRENT_ROLE: "backend", CURRENT_COMPANY: "Flipkart"}. The Zod schema enforcement helps, but the prompt had to be iteratively tuned to avoid the model hallucinating company names or inventing roles the user never mentioned.

Operator detection sounds trivial until it isn't. The first implementation checked operatorEmail inside the message processing function. This meant the message was already queued and partially processed before the AI was told to stand down. Moving the check to the webhook handler -- before enqueuing -- eliminated race conditions where the AI and a human would both reply.

Message threading on WhatsApp. WhatsApp doesn't have native threading like Slack. We built parent_message_id foreign key linking on the whatsapp_message table to track which user reply belongs to which outbound template. Critical for the session attendance cron -- it needs to know "did the mentee reply to this specific reminder?"

Dummy email accounts. Many leads sign up with auto-generated @dummy.live emails. The patch logic explicitly checks for this pattern before overwriting user data, preventing the AI from accidentally updating real user profiles with conversationally-extracted data.

What I Learned

Prompt engineering for conversational agents is a fundamentally different discipline than prompt engineering for one-shot data extraction. A conversation has state, memory, and the model needs to know when it's done. The Role-Goals-Constraints framework proved far more reliable than free-form instructions -- it gives the model a decision tree, not vibes.

The real complexity in AI-powered messaging isn't the AI. It's the handoff paths: AI to human, broadcast to conversation, extracted data to database. Each transition is a potential failure mode. Build those paths first, add the AI second.

And if you're building on third-party messaging APIs: their webhook will send you things you didn't ask for, in formats you didn't expect, at volumes you didn't plan for. Validate early, queue everything, and never trust a payload you haven't logged.

BACK TO ALL POSTS