Defensive Prompt Engineering for Production
Battle-tested techniques to make your LLM prompts robust, reliable, and production-ready
Defensive Prompt Engineering for Production
Your prompt works great in testing. Then production happens.
Users input weird edge cases. The LLM hallucinates. Your carefully crafted template breaks. Here’s how to build prompts that survive contact with real users.
The Core Principles
- Be explicit, not clever
- Constrain the output space
- Handle errors gracefully
- Test edge cases relentlessly
Technique 1: Structured Outputs
Bad:
Analyze this lead and tell me if they're a good fit.
Good:
Analyze this lead and respond with ONLY valid JSON in this exact format:
{
"score": <number 0-100>,
"reasoning": "<string max 200 chars>",
"risk_factors": ["<string>", ...],
"recommended_action": "route_to_sales|nurture|disqualify"
}
Requirements:
- score must be 0-100
- reasoning must explain the score
- recommended_action must be one of the three options listed
- If unsure, set score to 50 and explain uncertainty in reasoning
Why this works:
- Parser can validate the output
- LLM has clear constraints
- Edge cases are handled explicitly
Technique 2: Few-Shot Examples
Include examples of tricky cases:
Examples:
Input: "Company: TechCorp, Employees: 5000, Industry: SaaS"
Output: {"score": 85, "reasoning": "Enterprise SaaS company in target segment", ...}
Input: "Company: Mom's Bakery, Employees: 2, Industry: Food"
Output: {"score": 15, "reasoning": "Below minimum company size, wrong industry", ...}
Input: "Company: [INCOMPLETE DATA], Employees: unknown"
Output: {"score": 50, "reasoning": "Insufficient data to score accurately", ...}
Now analyze this lead:
{lead_data}
The third example teaches the model how to handle missing data.
Technique 3: Guardrails
Add explicit safety checks:
Before responding:
1. Verify you have enough information to answer
2. Check if the question is within your scope
3. If you're uncertain, say so explicitly
DO NOT:
- Make up information not in the context
- Speculate about financial data
- Provide legal or medical advice
Technique 4: Output Validation
Never trust LLM output blindly:
def validate_lead_score(response):
try:
data = json.loads(response)
# Check required fields
required = ["score", "reasoning", "recommended_action"]
if not all(k in data for k in required):
return None, "Missing required fields"
# Validate types and ranges
if not 0 <= data["score"] <= 100:
return None, "Score out of range"
if data["recommended_action"] not in ["route_to_sales", "nurture", "disqualify"]:
return None, "Invalid action"
return data, None
except json.JSONDecodeError:
return None, "Invalid JSON"
If validation fails, retry with a different prompt or use fallback logic.
Technique 5: Versioning and Testing
Treat prompts like code:
/prompts
/lead_scoring
v1_baseline.txt
v2_improved_examples.txt
v3_structured_output.txt
Run A/B tests:
- Version A: 78% accuracy
- Version B: 85% accuracy ✅
Deploy B, keep A as fallback.
Real-World Example
Here’s a production prompt I use for email generation:
You are an expert email writer for B2B sales outreach.
TASK: Write a personalized email to {prospect_name} at {company_name}
CONTEXT:
- Their role: {role}
- Company info: {company_info}
- Recent news: {news}
REQUIREMENTS:
1. Subject line: max 50 characters, no clickbait
2. Body: 80-120 words
3. Include ONE specific detail from context
4. End with clear, low-friction CTA
5. Tone: professional but conversational
6. DO NOT mention pricing or demos in first email
OUTPUT FORMAT:
Subject: <string>
Body: <string>
EXAMPLE:
Subject: Quick question about {company}'s automation
Body: Hi {name}, I noticed {company} recently {news_item}. Many companies at your scale struggle with... [continue]
Now write the email:
This prompt:
- ✅ Has clear constraints
- ✅ Includes context
- ✅ Provides an example
- ✅ Specifies exact output format
- ✅ Handles edge cases (no pricing in first email)
Testing Your Prompts
Build a test suite:
- Happy path: Normal, well-formed inputs
- Edge cases: Missing data, extreme values
- Adversarial: Trying to break the system
- Regression: Past failures
Run this suite on every prompt change.
Key Takeaways
- Structure your outputs - Use JSON, XML, or strict formats
- Show examples - Especially for edge cases
- Add guardrails - Explicit dos and don’ts
- Validate outputs - Never trust the LLM blindly
- Version and test - Treat prompts like code
Production-grade prompts aren’t clever. They’re defensive, explicit, and boring. That’s what makes them reliable.
Related Articles
The Complete Guide to RAG Evaluation
Learn how to measure and improve your RAG system with practical metrics and evaluation frameworks