Parallelization in AI Agents: Sectioning and Voting step by step
Learn how to run AI tasks in parallel: Sectioning divides the work, Voting repeats for higher reliability. Code guide for junior developers.
Contributors: Esther Aznar
Imagine you need to analyze ten documents. If you process them one after another, you might spend a minute or so staring at the screen. And that’s not fun. The good news is there are two ways to do that work in parallel, and each one serves a different purpose.
The first is called Sectioning: you take the task, divide it into independent parts, and launch them all at once. The second is called Voting: instead of dividing, you repeat the same task several times and go with the result that most agents agree on. One divides the work, the other repeats it to be more confident in the result.
To follow this post you need to know roughly what an AI agent is (basically a program that sends tasks to a language model and does something with the response) and have basic TypeScript knowledge with async/await. If you haven’t seen how chaining calls works step by step yet, start with the post on prompt chaining (breaking a complex task into chained steps — more here) and then come back here.
What does parallelization mean in an agentic system?
Imagine you need to cook three dishes for dinner. If you cook them one after another, it takes the sum of all three. If you heat them all at the same time, it takes as long as the slowest one. That’s parallelization: doing several things at once instead of waiting for one to finish before starting the next.
In an agentic system (a system with several agents working together), parallelization means launching multiple model calls at the same time and waiting for all of them to finish. In TypeScript, Promise.all() does exactly that: it takes an array of promises and executes them in parallel, returning an array with all results when the last one finishes.
// Without parallelization: 15 seconds if each task takes 5
const result1 = await task1(); // waits 5s
const result2 = await task2(); // waits 5s more
const result3 = await task3(); // waits 5s more
// With Promise.all: 5 seconds (the time of the slowest one)
const [result1, result2, result3] = await Promise.all([
task1(), // all three start at the same time
task2(),
task3(),
]);
The difference in latency shows up from the first tests. But the time saved isn’t always free, and that’s where the choice between Sectioning and Voting comes in.
Sectioning: divide the task, execute in parallel
Sectioning is the pattern you use when a large task can be divided into independent parts. And when I say independent, I mean each part doesn’t need to know what the others are doing to work.
A concrete example: you need to analyze ten contract PDFs to detect problematic clauses. Each contract is independent of the others. You don’t need to read contract 3 to analyze contract 7. You can launch them all at once.
// Sectioning: each contract is analyzed in parallel
async function analyzeContracts(contracts: string[]) {
// Convert each contract into a model call
const analysisPromises = contracts.map((contract) =>
callLLM(contract, "Detect problematic clauses in this contract")
);
// Promise.all waits for all analyses to complete
const results = await Promise.all(analysisPromises);
// results[0] = analysis of contract 0, results[1] of contract 1, etc.
return results;
}
If you have ten contracts and each analysis takes eight seconds, sequentially you wait eighty seconds. With Sectioning you wait eight. The total token cost is exactly the same because you’re processing the same amount of text in total, you’re just doing it all at once instead of in a line.
This pattern has a name in classical programming. Scatter-Gather (scatter tasks and collect results), MapReduce (map an operation over a collection and reduce the results) or Fork-Join (split into parallel branches and merge them at the end) depending on context, but the underlying idea is always the same: you distribute the work, each part does it on its own, and at the end you collect all results. In Sectioning you’re simply doing that, but each “piece of work” is a call to a language model.
When does it make sense to use it? When you have several similar items to process, when each is independent of the others, and when wait time is a real problem.
But what happens when you don’t have a collection of different things, but a single task that needs to work right the first time? That’s where Voting comes in.
Voting: repeat the same task and aggregate the results
Language models don’t always give the same answer. If you ask the same thing several times, you get slightly different answers, and some will be better than others. Voting takes advantage of that variability: you launch the same task N times in parallel and then go with the result that most agents agree on.
How do you decide which one wins? It depends on the type of response you expect:
Simple majority: the response that appears most often wins. It works when responses are easy to compare: “positive / negative”, yes or no answers, concrete categories. If out of five runs four say “positive” and one says “negative”, the result is “positive”. Simple.
By weight: not all responses count equally. For example, you can launch one call with low temperature (the model is more consistent, almost always says the same thing) and others with high temperature (more variety), and give more weight to the first one. Temperature is basically how much the model “varies” in each response (more on temperature here): a low value produces the same response every time, a high value produces more variety.
LLM-as-Judge: a model reads all the responses and decides which one is best. It’s the most robust method when responses are long texts that can’t be compared with simple logic, it doesn’t make sense to count how many times the same phrase appears. If you want to understand this pattern in detail, there’s a dedicated post: Model as Judge: evaluate your AI agent’s responses.
// Voting with simple majority for sentiment classification
async function classifyWithVoting(text: string, runs: number = 5) {
// Launch the same classification N times in parallel
const votes = await Promise.all(
Array.from({ length: runs }, () =>
classifyText(text) // returns "positive" or "negative"
)
);
// Count how many times each response appears
const counts: Record<string, number> = {};
for (const vote of votes) {
counts[vote] = (counts[vote] ?? 0) + 1;
}
// Return the most frequent response
return Object.entries(counts).sort((a, b) => b[1] - a[1])[0][0];
}
The most common mistake with Voting is aggregating results without checking that they make sense together. If the five responses are completely different from each other, the winning result might not be reliable even though it “wins by majority”. Before trusting the result, it’s worth reviewing how much the agents agreed.
When does it make sense to use it? On tasks where precision really matters: support ticket classification, moderation decisions, risk analysis. Not for routine tasks where one correct answer is enough and it’s not worth multiplying the cost.
How much does parallelization cost?
Here comes what most tutorials don’t tell you: parallelization isn’t always free.
With Sectioning, the total cost is the same as doing it sequentially. Ten analyses are ten analyses, whether you do them in a line or all at once. You pay the same and gain time. Easy.
With Voting the story changes. If you launch the same task five times, you pay five times. That’s it. The question you always have to ask yourself is: does the improvement in reliability justify that extra cost?
| Pattern | What reduces? | What multiplies? |
|---|---|---|
| Sectioning | Total latency | Nothing (cost same as sequential) |
| Voting (N runs) | Reliability | Cost × N |
For Sectioning, the answer is almost always yes: same bill, less waiting. For Voting, you have to think about it. If your task already works well with a single call in most cases, adding four more runs might not be worth what it costs. But if you’re making decisions with real consequences—moderating content, approving transactions, classifying something critical—it might be worth every cent.
Common mistakes
Parallelizing tasks with dependencies
The most common mistake when someone discovers Promise.all() for the first time and gets excited. If step B needs the result of step A to work, you can’t launch them at the same time. B starts before A finishes, reads an empty value or undefined, and you get garbage or an exception outright.
The test to know if you can parallelize is quick: can step B start without knowing what step A returned? If the answer is no, they run sequentially. If you still don’t have a clear picture of how to manage steps that depend on each other, the post on prompt chaining covers exactly that case.
Using simple majority with free text
Simple majority only works when responses are easy to compare directly. If you ask five agents to write a summary of the same contract, you’ll get five different summaries. None will be identical to the others, so “the most frequent” is basically one picked at random, and that doesn’t improve anything.
For free text the correct method is LLM-as-Judge. Using simple majority in this case doesn’t improve anything, it just multiplies the cost by five with no real gain in quality.
Launching too many requests at once without limit
Promise.all() with fifty or a hundred simultaneous calls can exceed the API’s rate limits. Providers limit how many calls you can make per second or per minute, and if it happens to you, they return a cascade of 429 errors that interrupts the entire process at once.
The solution is to process in batches: instead of launching a hundred calls at once, you launch smaller groups with a small pause between them. It’s not hard to implement, but it’s much better to plan it from the start than to discover it when it’s already in production.
Implementation checklist
-
The tasks I’m going to parallelize are independent of each other (none depends on the result of another)
-
I’ve calculated the cost multiplied by the number of runs before going to production
-
The aggregation method is compatible with the response type (simple majority only for discrete responses, not for free text)
-
I have error handling when a parallel call fails (I use
Promise.allSettledif I want the others to continue even if one fails) -
I’ve reviewed the API rate limits and process in batches if I launch more than a few dozen calls in parallel
-
For Voting with LLM-as-Judge, the judge’s prompt has explicit evaluation criteria
Frequently Asked Questions
What’s the difference between Sectioning and Voting?
Sectioning divides a task into different parts (contract 1, contract 2, contract 3…) and each part is executed once. Voting executes the same task multiple times with the same input and aggregates the results. Sectioning reduces latency without increasing cost. Voting improves reliability by multiplying the cost by the number of runs.
What happens if one of the parallel calls fails?
Promise.all() fails entirely if any of the promises throws an error: the other calls remain pending but their results are discarded. If you need the rest to continue even if one fails, use Promise.allSettled() instead. It returns an array with the status of each promise (fulfilled or rejected) without canceling any, and then you can filter the ones that failed and decide what to do with them.
How many runs does it make sense to use Voting with?
With three runs you can already detect a dominant response. With five, the majority stabilizes better in most cases. Above seven the return is usually low and the cost rises linearly. But calibrate with your data: if the task is high-risk and cost allows, more runs may be justified.
Does Sectioning only work for lists of documents?
No. It also works when a task has natural sections that don’t depend on each other. For example, if you need to generate the title, summary, and tags of an article independently, you can generate them in parallel. You don’t need a collection of documents to use Sectioning: it’s enough that the task parts are independent.
When is it worth not using either one?
When tasks have strong dependencies between them, when the cost is already high and multiplying it isn’t justified, or when response time isn’t critical to your use case. Parallelizing adds complexity: error handling, rate limits, aggregation logic. If there’s no clear gain in latency or reliability, the sequential version is easier to maintain and debug.