Prompt Engineering for Developers: The 5 Essential Patterns

Prerequisites: To follow this post, you need to know what an API is and have made an HTTP call from code before. You don’t need to know what an LLM is, what a token is, or have worked with AI before. I’ll explain those concepts before using them.

SDK setup before the examples:

npm install @anthropic-ai/sdk

import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY, // environment variable, don't hardcode
});

All snippets in this post assume anthropic is initialized with this block.

The first time I integrated an LLM into real code, I wrote something like “Give me a summary of this text.” The model returned a summary. Fine. But sometimes it was too long, sometimes in English even though the text was in Spanish, and sometimes it added phrases like “Sure! Here’s the summary:” that broke my parser. The problem wasn’t the model. It was me.

Why doesn’t the model “understand” what you’re asking?

Imagine you hand a sticky note to a new intern: “summarize this.” It works if the intern has context, knows what format you expect, and what the summary is for. Without that context, each person interprets “summarize this” differently.

An LLM (language model: the kind of AI behind ChatGPT, Claude, or GPT-4) works similarly. It doesn’t “understand” your intention. It predicts what text is likely to come next based on the text you give it. That means if you change how you write the instruction, you change the result—sometimes dramatically.

The prompt is the technical specification you give the model. A sticky note produces random results. A clear specification produces predictable results.

Prompt engineering patterns are proven ways to write that specification to get the result you need. There are five patterns that show up in almost every real project.

The 5 Patterns You Need to Know

Zero-shot: The Direct Instruction

Zero-shot is the simplest pattern: you write the instruction directly, with no prior examples. The name comes from “zero shots” (zero examples).

It works well for simple, clear tasks: translating text, answering factual questions, generating lists of ideas. When the task is ambiguous or requires very specific formatting, zero-shot often produces inconsistent results.

// Zero-shot: just the instruction, no prior examples
const response = await anthropic.messages.create({
  model: "claude-opus-4-6",  // the model we're using
  max_tokens: 1024,           // response limit in tokens (1 token ≈ 4 characters in English; slightly less in Spanish)
  messages: [
    {
      role: "user",           // who's speaking: us
      content: "Classify this email as urgent or normal: 'The server is down.'"
    }
  ]
});

// The response text is here
console.log(response.content[0].text);

max_tokens is the maximum number of “text chunks” the model can write in its response. For short responses, 1024 is more than enough.

Few-shot: Teaching with Examples

Few-shot adds two or three example pairs (input → output) before the actual task. The model learns the pattern from your examples and applies it to the new case. It’s especially useful when output format matters a lot or when the task has nuances hard to describe with words.

// Few-shot: we give examples before the actual task
const prompt = `Classify the sentiment. Respond only with: positive, negative, or neutral.

Email: "The shipment arrived earlier than expected"
Sentiment: positive

Email: "I've been waiting three days and nobody responds"
Sentiment: negative

Email: "The product works as described"
Sentiment: neutral

Email: "The quality doesn't justify the price"
Sentiment:`;

// The model completes the pattern with the correct sentiment.
// To call the API, pass this string as content in messages[0].content:
// messages: [{ role: "user", content: prompt }]

With three well-chosen examples, the model learns the tone, format, and logic you expect. Without them, it might respond with “The sentiment of this email is negative because…” instead of the simple “negative” your parser needs.

Chain-of-Thought: Thinking Before Answering

Chain-of-Thought (CoT for short) means asking the model to reason step by step before giving the final answer. You activate it by adding phrases like “Think step by step” at the end of the instruction.

Why it works: LLMs make more errors on reasoning problems when they answer directly. By forcing intermediate steps, the model uses its own reasoning as context for the next step and makes fewer mistakes.

// Chain-of-Thought: we force explicit reasoning
const content = `I have 3 boxes. Each box has 4 bags.
Each bag has 6 red marbles and 2 blue marbles.
How many marbles are there in total?

Think step by step and show each operation before giving the final answer.`;

// Without "think step by step", the model might jump to an incorrect answer.
// With it, the reasoning stays visible and errors decrease.
// To call the API, pass this string as content in messages[0].content:
// messages: [{ role: "user", content }]

Sometimes just adding that phrase at the end of any reasoning instruction is enough. You don’t need a long prompt.

Role Prompting: Changing the Perspective

Role prompting assigns a role to the model before the task: “You’re a senior developer with 10 years of TypeScript experience.” This changes more than tone: it affects what information the model considers relevant to mention, what it assumes you already know, and what technical depth it uses.

The right place for the role is the system prompt, a special instruction that comes before the conversation and defines the model’s behavior for the entire session. Think of it as the job description you give the intern on day one.

// Role prompting: the role goes in the system prompt, not in the user message
const response = await anthropic.messages.create({
  model: "claude-opus-4-6",
  // system applies to the entire session, not just this message
  system: `You're a senior TypeScript developer with REST API experience.
When you review code, point out bugs first, then performance issues.
Be direct: don't explain what's already obvious to someone with technical experience.`,
  max_tokens: 1024,
  messages: [
    {
      role: "user",
      content: "Review this function:\n\nfunction getUsers() {\n  return db.query('SELECT * FROM users');\n}"
    }
  ]
});

The difference between a vague role (“you’re an expert”) and a specific one (“you’re a senior TypeScript developer specialized in REST APIs”) is clear in response quality. The more concrete the role, the fewer “it could be X, Y, or Z” responses.

Structured Output: Integrating LLMs into Code Pipelines

Structured Output means asking the model to respond in JSON with an exact schema. It’s the most important pattern for integrating LLMs into real applications, because you need the response to be parseable by your code without manual processing.

// Structured Output: we define the exact schema we expect
const content = `Analyze this comment and return ONLY valid JSON,
with no additional text, with this exact structure:

{
  "sentiment": "positive" | "negative" | "neutral",
  "confidence": number between 0 and 1,
  "topics": array of strings with maximum 3 elements
}

Comment: "The app is slow but the design is very well crafted"`;

const response = await anthropic.messages.create({
  model: "claude-opus-4-6",
  max_tokens: 1024,
  messages: [{ role: "user", content }]
});

// We parse the response — always inside try/catch
const text = response.content[0].text;
try {
  const data = JSON.parse(text.trim()); // data.sentiment, data.confidence, data.topics
  console.log(data);
} catch (err) {
  // The model returned something that isn't valid JSON
  console.error("Error parsing JSON:", err, "Response received:", text);
}

Writing “ONLY valid JSON, with no additional text” reduces format deviations. For production there are more robust techniques: the post on Programmatic Tool Calling in TypeScript (where the model can call functions you define instead of just generating text) covers how to use validated schemas at the protocol level.

The Anatomy of a Perfect Prompt

An effective prompt has five layers. Most developers only use two of them (instruction and task) and wonder why results are inconsistent.

Role is who the model is for this prompt (“You’re a data analyst specializing in e-commerce”). Goes in the system prompt.

Context is background information the model needs but doesn’t have (“Our store sells second-hand products; reviews are usually brief and informal in tone”). Without context, the model makes assumptions.

Instruction is exactly what it needs to do (“Classify each review into one of these categories: product, shipping, customer service”). One instruction per prompt. When you add two, the model usually prioritizes one and neglects the other.

Format is how you want it to respond (“Return only the category, with no explanation”). If you don’t specify it, the model chooses. In production you can’t depend on that choice.

Examples are the input → output pairs from the few-shot pattern. They go at the end, right before the actual input.

Not every prompt needs all five layers. A simple zero-shot can work fine with just instruction and format. But when results are inconsistent, the first thing to check is which layer is missing.

Loading exercise...

When to Use Each Pattern

The choice isn’t arbitrary. Each pattern solves a specific problem.

Situation	Recommended Pattern	Why	Cost / Limitation
Simple, clear task (translate, summarize)	Zero-shot	Examples only add tokens without real benefit	Inconsistent on ambiguous tasks or with strict formatting
Very specific output format	Few-shot	Examples calibrate format better than any verbal description	Each example adds tokens; on simple tasks the cost exceeds the benefit
Reasoning or calculation problem	Chain-of-Thought	Reduces errors by externalizing intermediate steps	Increases latency and tokens; doesn’t help with direct classification
Response depends on reader profile	Role prompting	Role calibrates technical level and tone without extra instructions	A vague role (“you’re an expert”) changes nothing; requires specificity
LLM inside a code pipeline	Structured Output	Without schema, the parser breaks on any model variation	Fragile to model drift; in production tool calling with validated schema is better
Ambiguous task with multiple correct forms	Few-shot + CoT	Examples delimit the space; CoT reduces errors within it	Combination with highest token consumption; reserve for tasks that really need it

The table gives general guidance, but in practice you need to test. A pattern that works for one model might not work the same for another, even from the same provider. And patterns combine: a typical production prompt has role prompting in the system prompt, few-shot to calibrate format, and structured output for the result.

Loading exercise...

Advanced Patterns: What Comes Next

Once you master the five base patterns, there are three more worth knowing.

Self-consistency: instead of asking for one answer, you ask for several with high temperature and take the most frequent one. It works well on problems where the model can reach the correct answer through multiple paths. Temperature is the parameter controlling how much variety the model introduces in its responses: high temperature means more variety; low temperature means more determinism. The cost of self-consistency is proportional to the number of responses you generate.

ReAct (Reason + Act): combines reasoning and tool use in a loop. The model thinks, decides what tool to use, uses it, processes the result, and decides if it needs more information. It’s the foundation of most modern agents. If you want to see tool calling in real code, the post on Programmatic Tool Calling in TypeScript covers it step by step.

Tree of Thought: extends Chain-of-Thought with branching. The model generates multiple “reasoning paths” in parallel, evaluates them, and selects the best. Useful for complex planning problems, though expensive in tokens.

These three patterns are mainly used in agentic systems, where the LLM doesn’t just respond but makes decisions and takes action. To know how to evaluate whether those systems work well in production, the post on how to evaluate AI agents in production covers metrics and concrete setup.

Prompt Engineering in Production

Using prompts in production is different from using them in a prototype. Three problems show up almost always.

Version prompts like code. If you change a prompt in production without logging the change, you’ll lose the ability to know what updated what behavior. Minimum approach: save each version in a dated file (analyze_v2_2026-03.txt) and keep a record of what version is active. Robust approach: treat prompts like any other code artifact—in version control, with tests that run before each change.

Regression testing. When you change a prompt, you need to know if cases that already worked still work. Building a set of expected input/output pairs and running them against the new prompt before deploying is the minimum practice that prevents silent regressions. The post on testing strategy with AI covers how to structure this when the LLM’s output isn’t deterministic.

Monitoring. In production, model behavior can change even if you don’t change the prompt, because providers update models periodically. Logging inputs, outputs, and latencies, and monitoring the distribution of responses over time, lets you detect those changes before users report them.

Common Mistakes

The Vague Prompt

“Summarize this well” is not an instruction. Well for whom? What length? What format? The model fills in the blanks with its own defaults, which almost never match yours. Specify the audience, maximum length, and format. “Summarize in 3 sentences for an executive without technical context. No bullet points.”

Not Specifying Output Format

If you don’t say how you want the response, it can change from call to call. Sometimes the model uses markdown, sometimes plain text, sometimes adds an introduction. In code this is especially problematic because your parser assumes a fixed format and the model doesn’t have to give it to you consistently. Specify the format when output will be processed by code.

Mixing Languages in the Same Prompt

A system prompt in English with instructions in Spanish is a source of inconsistencies. The model tries to infer what language to use for the response. Pick one language for the entire prompt. If business context requires mixing, specify it explicitly: “Respond always in Spanish regardless of the input language.”

Forgetting the Model Has No Memory Between Calls

Each API call is independent. The model doesn’t remember the previous call. This error produces weird bugs: the first call works fine and subsequent ones seem to “forget” what was agreed. It’s not a model bug—you’re asking it questions without the context it needs. If your code makes multiple related calls, you have to include relevant context in each one.

Putting Too Many Tasks in One Prompt

“Analyze the text, classify sentiment, extract main topics, suggest a response, and evaluate if the user is satisfied” is too much for one prompt. Four separate calls with one clear task each produce more reliable results than one with five tasks. Quality drops the more things you cram into one instruction.

Implementation Checklist

The prompt specifies the model’s role when tone or technical level matters
The instruction is in imperative mood and does one thing (“classify”, “extract”, “summarize”)
Output format is defined explicitly (JSON, plain text, list, etc.)
Prompts are saved as versioned files, not hardcoded in business logic
A test set exists with expected input/output pairs to detect regressions
JSON responses are validated before parsing (JSON.parse inside try/catch)
Response language is specified if input can be multilingual
Each call includes necessary context (doesn’t assume memory between calls)

Frequently Asked Questions

What’s the Difference Between a Prompt and a System Prompt?

A prompt is the message you send to the model on each call. A system prompt is a set of instructions that comes before the conversation and defines the model’s general behavior for the entire session: who it is, what it can and can’t do, what format it should respond in. Users interact through prompts; the system prompt defines the rules of the game for the entire interaction.

When Is Few-shot Worth It Instead of Zero-shot?

When output format is very specific or when the task has nuances hard to describe with words. If you spend three attempts adjusting the instruction and output is still inconsistent, add two or three examples. It almost always fixes the problem. If you can describe the task perfectly in a clear instruction, zero-shot is enough and cheaper in tokens.

Does Chain-of-Thought Always Improve Results?

No. On simple classification tasks or factual information retrieval, it adds verbosity without improving accuracy. Its benefit is clear on logical reasoning, math, and step-by-step planning. Use it when the problem requires reasoning, not when it just requires retrieval or classification.

Why Does the Model Sometimes Ignore the JSON Schema I Ask For?

Because LLMs generate text probabilistically, not follow rules deterministically. If the prompt isn’t explicit enough, the model might add text before the JSON, change field names, or include extra fields. Solution in prompts: be very explicit (“Return ONLY the JSON, with no additional text, no markdown, no explanations”). Robust solution in production is to use tool calling in the API, which forces the model to follow the schema at the protocol level.

Do I Need to Master All Patterns Before Integrating an LLM into My Project?

No. Start with zero-shot for simple tasks and add structured output from the beginning if the response needs to be parsed by code. With those two you can build functional applications. You’ll pick up the other patterns as you encounter the specific cases where what you have isn’t enough.