LLM Prompt Engineering Production

Defensive Prompt Engineering for Production

Battle-tested techniques to make your LLM prompts robust, reliable, and production-ready

September 20, 2024 • 3 min read

Defensive Prompt Engineering for Production

Your prompt works great in testing. Then production happens.

Users input weird edge cases. The LLM hallucinates. Your carefully crafted template breaks. Here’s how to build prompts that survive contact with real users.

The Core Principles

Be explicit, not clever
Constrain the output space
Handle errors gracefully
Test edge cases relentlessly

Technique 1: Structured Outputs

Bad:

Analyze this lead and tell me if they're a good fit.

Good:

Analyze this lead and respond with ONLY valid JSON in this exact format:
{
  "score": <number 0-100>,
  "reasoning": "<string max 200 chars>",
  "risk_factors": ["<string>", ...],
  "recommended_action": "route_to_sales|nurture|disqualify"
}

Requirements:
- score must be 0-100
- reasoning must explain the score
- recommended_action must be one of the three options listed
- If unsure, set score to 50 and explain uncertainty in reasoning

Why this works:

Parser can validate the output
LLM has clear constraints
Edge cases are handled explicitly

Technique 2: Few-Shot Examples

Include examples of tricky cases:

Examples:

Input: "Company: TechCorp, Employees: 5000, Industry: SaaS"
Output: {"score": 85, "reasoning": "Enterprise SaaS company in target segment", ...}

Input: "Company: Mom's Bakery, Employees: 2, Industry: Food"
Output: {"score": 15, "reasoning": "Below minimum company size, wrong industry", ...}

Input: "Company: [INCOMPLETE DATA], Employees: unknown"
Output: {"score": 50, "reasoning": "Insufficient data to score accurately", ...}

Now analyze this lead:
{lead_data}

The third example teaches the model how to handle missing data.

Technique 3: Guardrails

Add explicit safety checks:

Before responding:
1. Verify you have enough information to answer
2. Check if the question is within your scope
3. If you're uncertain, say so explicitly

DO NOT:
- Make up information not in the context
- Speculate about financial data
- Provide legal or medical advice

Technique 4: Output Validation

Never trust LLM output blindly:

def validate_lead_score(response):
    try:
        data = json.loads(response)
        
        # Check required fields
        required = ["score", "reasoning", "recommended_action"]
        if not all(k in data for k in required):
            return None, "Missing required fields"
        
        # Validate types and ranges
        if not 0 <= data["score"] <= 100:
            return None, "Score out of range"
        
        if data["recommended_action"] not in ["route_to_sales", "nurture", "disqualify"]:
            return None, "Invalid action"
        
        return data, None
    except json.JSONDecodeError:
        return None, "Invalid JSON"

If validation fails, retry with a different prompt or use fallback logic.

Technique 5: Versioning and Testing

Treat prompts like code:

/prompts
  /lead_scoring
    v1_baseline.txt
    v2_improved_examples.txt
    v3_structured_output.txt

Run A/B tests:

Version A: 78% accuracy
Version B: 85% accuracy ✅

Deploy B, keep A as fallback.

Real-World Example

Here’s a production prompt I use for email generation:

You are an expert email writer for B2B sales outreach.

TASK: Write a personalized email to {prospect_name} at {company_name}

CONTEXT:
- Their role: {role}
- Company info: {company_info}
- Recent news: {news}

REQUIREMENTS:
1. Subject line: max 50 characters, no clickbait
2. Body: 80-120 words
3. Include ONE specific detail from context
4. End with clear, low-friction CTA
5. Tone: professional but conversational
6. DO NOT mention pricing or demos in first email

OUTPUT FORMAT:
Subject: <string>
Body: <string>

EXAMPLE:
Subject: Quick question about {company}'s automation
Body: Hi {name}, I noticed {company} recently {news_item}. Many companies at your scale struggle with... [continue]

Now write the email:

This prompt:

✅ Has clear constraints
✅ Includes context
✅ Provides an example
✅ Specifies exact output format
✅ Handles edge cases (no pricing in first email)

Testing Your Prompts

Build a test suite:

Happy path: Normal, well-formed inputs
Edge cases: Missing data, extreme values
Adversarial: Trying to break the system
Regression: Past failures

Run this suite on every prompt change.

Key Takeaways

Structure your outputs - Use JSON, XML, or strict formats
Show examples - Especially for edge cases
Add guardrails - Explicit dos and don’ts
Validate outputs - Never trust the LLM blindly
Version and test - Treat prompts like code

Production-grade prompts aren’t clever. They’re defensive, explicit, and boring. That’s what makes them reliable.

Isragel Andres

AI Specialist focused on RAG systems, workflow automation, and AI agents. I build production-ready AI systems with measurable outcomes.

GitHub LinkedIn

The Complete Guide to RAG Evaluation

Learn how to measure and improve your RAG system with practical metrics and evaluation frameworks

RAG LLM

← Back to Blog

Defensive Prompt Engineering for Production

Defensive Prompt Engineering for Production

The Core Principles

Technique 1: Structured Outputs

Technique 2: Few-Shot Examples

Technique 3: Guardrails

Technique 4: Output Validation

Technique 5: Versioning and Testing

Real-World Example

Testing Your Prompts

Key Takeaways

Related Articles

The Complete Guide to RAG Evaluation

Install App