Traditional Chatbot vs LLM: The Differences That Actually Matter

How a chatbot with intent classification works, how one based on an LLM works, and when to choose each approach with concrete examples.

Contributors: Esther Aznar, Ivan Garcia Villar

To follow this post, you need basic Python knowledge. The code examples are short and commented, but it helps to recognize the syntax.

If you’ve ever typed “I want to speak to a person” in a website’s support chat and the bot replied “I don’t understand. Choose an option: 1. Orders 2. Returns”, you already know firsthand what a traditional chatbot is. That rigidity isn’t a design flaw. It’s exactly how they work.

How a traditional chatbot works

A traditional chatbot isn’t just a dictionary of fixed responses. Inside, it uses a classification model: an AI trained to read what the user writes and decide which category it belongs to. Those categories are called intents or “topics”.

For example, you might have these intents:

  • check_balance: the user wants to know how much money they have.

  • block_card: the user wants to block their card.

When the user writes something, the classifier doesn’t search for the exact phrase. What it does is analyze the message and ask itself: which of the intents does this most resemble? And instead of answering yes or no, it returns a confidence percentage for each one. Something like “this looks 92% like block_card and 4% like check_balance”. If that percentage exceeds the threshold you’ve defined, it executes that intent’s response. If none of them exceed it, the chatbot admits it didn’t understand.

1.00

# This is a simplification of the actual flow
# In production, classify_intent() calls a trained classifier model

def process_message(message):
    intent, confidence = classify_intent(message)  # classifier model

    if confidence < 0.7:  # confidence threshold
        return "I didn't understand. Choose: 1. Balance  2. Card"

    responses = {
        "check_balance": "Your current balance is €340.",
        "block_card": "Call 900 123 456 to block your card."
    }
    return responses[intent]

Beyond recognizing topics, the chatbot also extracts specific data from the message. If you say “block my Visa card”, the topic is “block card” and the specific data is “Visa”. The system then knows which of your cards you want to block.

Thanks to the classifier, the chatbot understands “my card isn’t working” and maps it to block_card even though it wasn’t taught that exact phrase. What it does need is enough training examples so the model learns each intent well.

When does this approach work well? When the user writes exactly what the system expects. The chatbot handles these two questions without problems:

  • “What are the store hours?” → intent hours, fixed response.

  • “I want to place an order” → intent new_order, opens the purchase flow.

But if the user writes “I was charged something strange this week and I don’t know if it’s from the subscription or something else”, the chatbot has no intent for that. It can’t reason about ambiguity. It returns “I didn’t understand” and the conversation ends there.

How an LLM-based chatbot works

An LLM (Large Language Model) is an artificial intelligence that has read millions of texts. It doesn’t look for responses in a predefined list; instead, it understands the meaning of what you write and generates a new response.

The difference is like comparing an employee who only knows 20 memorized answers with one who has studied thousands of documents on the subject and can reason and answer any question you ask them differently each time.

To create a chatbot with an LLM, the first thing you do is give it instructions in writing. You tell it what role it has (for example, “you are a bank’s assistant”), how it should speak, and what it cannot do (for example, “don’t make up customer data”). The model reads these instructions before answering any user question.

import anthropic

client = anthropic.Anthropic()  # Requires ANTHROPIC_API_KEY as an environment variable

# The system prompt defines the chatbot's role and limits
SYSTEM_PROMPT = """You are the assistant for Example Bank.
Never make up balances or customer data."""

def respond(question):
    # We send the question to the model along with the instructions
    response = client.messages.create(
        model="claude-opus-4-6",  # For testing, claude-haiku-4-5-20251001 is cheaper
        max_tokens=500,
        system=SYSTEM_PROMPT,
        messages=[{"role": "user", "content": question}]
    )
    return response.content[0].text  # The text generated by the model

If the chatbot handles real user data, review legal requirements before sending it to an external provider.

That chatbot responds to “I was charged something strange this week and I don’t know if it’s from the subscription or something else” without any problem. There’s no intent configured for that. The model understands the question and generates a useful response.

Moreover, the model remembers what happened earlier in the conversation. If the user said “I have two cards” and then asks “which one should I cancel?”, the model remembers the two cards. This works because the model has a “memory” of the conversation. Though this memory has limits (it can’t remember infinitely long conversations).

The main risk with LLMs is hallucination: the model can generate a response that sounds correct but is made up. If you ask it “how much balance do I have?”, an LLM without access to real data can fabricate a plausible number. That’s why the system prompt in the example includes “never make up balances or customer data”. It doesn’t eliminate the risk entirely, but it reduces it.

When to use each one

Traditional chatbot: Use it when questions always follow the same pattern. For example, a support menu with fixed options: “What do you need? 1. Check balance 2. Block card 3. Hours”.

LLM chatbot: Use it when users can ask in unexpected ways. For example, complex questions or those mixing multiple topics.

1.00

AspectTraditional chatbotLLM chatbot
Cost per conversationVery lowDepends on model and length
Questions off the scriptResponds if similar to a known intent, not if completely newResponds with flexibility
Predictable responsesYes, always the same fixed responseNo, vary by context
Risk of making up dataNo (hardcoded responses)Yes, must be controlled
MaintenanceAdd and edit intentsAdjust system prompt and test

Choose traditional chatbot if:

  • Responses are always the same (order status, hours, FAQs).

  • The user follows a menu or predefined steps.

  • You want to save costs and maximum predictability.

Choose LLM chatbot if:

  • Users ask varied or complex questions.

  • A problem can be described in many different ways.

  • You need flexibility in responses.

Examples:

  • Simple question: “What time does it close?” → Traditional chatbot. It has a fixed response.

  • Complex question: “I bought a product that doesn’t work well. Can I return it if I only have half the packaging?” → LLM. It’s a specific situation that needs analysis.

Hybrid pattern: You can also combine both. The chatbot tries to respond with intents. If it can’t, it passes the question to the LLM. This is common in production because you save costs: the LLM only steps in when needed.

The downside is you have to maintain two systems in parallel and synchronize when to switch between them.

If you want to dive deeper into how to write effective instructions for the LLM, the post on prompt engineering for developers covers the patterns most used in real projects.

Frequently Asked Questions

Is an LLM chatbot always better than a traditional one?

No. For closed and repeatable flows, the traditional chatbot is more controllable and cheaper to operate. Adding an LLM where it’s not needed only adds cost and complexity.

Can the LLM make up responses?

Yes, and it’s the main risk. Hallucinations occur when the model generates text that sounds plausible but doesn’t correspond to reality. The way to mitigate it is to be explicit in the system prompt about what data the model can and cannot provide. If you need real accuracy, you have to connect the chatbot to external data sources instead of relying on what the model remembers from its training.

Can I use both approaches together?

Yes. The most common pattern is to use an intent classifier for frequent questions and the LLM as a fallback for everything else. You get speed and low cost for the predictable, and flexibility for what you don’t.

What if the user asks questions in multiple languages?

The traditional chatbot only works well in languages for which you’ve defined example phrases in each intent. The LLM directly understands multiple languages without additional configuration, though it’s good to indicate in the system prompt which language the model should respond in.