AI agentsidempotencyreliabilityagent errorsaudit

What happens when your AI agent double-books a hotel room?

6 min read

The scenario

Your agent sends a booking request to a hotel API. The network hiccups. The request reaches the server, the room is reserved, a charge is initiated – but the response never makes it back to the agent. From the agent's perspective, the call timed out. From the hotel's perspective, the booking is confirmed.

The agent, seeing a timeout, does what any reasonable retry logic does: it tries again. Same guest, same dates, same room type. This time the response arrives. The confirmation number is different from the first booking, but the agent does not know about the first booking. It presents the second confirmation to the user, who books a flight and considers the trip planned.

Somewhere in the hotel's system, two reservations exist for the same person on the same nights. Two charges have been initiated. Nobody notices until the guest checks in, at which point the front desk has two files and neither side can easily explain why.

This is not a hypothetical. It is a near-inevitable consequence of running an agent that makes transactional API calls over a network that can fail. Networks always can fail. Agents always retry. The question is not whether this happens, but what the infrastructure does about it.

Why "just be careful" does not work

The first instinct is to tell the agent to check whether a booking already exists before making a new one. Query first, then create if nothing found.

This helps in the simple case. It fails in at least two others.

The first failure: the race condition. Between the check and the create, another request can land. This matters more than it sounds in agentic systems, because the same agent might be running multiple parallel sub-tasks, or the user might trigger a second session while the first is still in flight. Two checks can return "no booking found" and both proceed to create.

The second failure: the provider's state may be inconsistent. If the first booking request succeeded on the server but failed before responding, a query might return the reservation in a pending or processing state that the agent code doesn't know how to interpret. Does "pending" mean "retry" or "wait" or "it already worked"? The agent has to guess.

Neither of these problems is about writing careful code. They are structural: the moment you separate the intent to act from the confirmation that it happened, you have created a window in which the action might have succeeded without the actor knowing. Code cannot close that window; the protocol has to.

What idempotency keys actually do

An idempotency key is a unique identifier attached to a request that tells the server: "if you have already processed a request with this identifier, return the same result instead of executing again." The server stores the result of the first execution and replays it on any duplicate.

The agent generates a key before sending the request – a UUID or similar unique value – and includes it in the request header. If the network fails and the agent retries with the same key, the server recognises it and returns the original response without creating a second booking. One key, one reservation, regardless of how many times the request arrives.

This is not a new idea. Stripe has required idempotency keys on payment requests for years. The pattern predates AI agents. But agentic systems make it mandatory rather than best practice, because agents retry more often, under more varied conditions, with less human oversight than traditional API clients. An application server that a human developer restarted after a failure is easy to reason about. An agent that retried sixteen times over four minutes while the user's laptop was asleep is not.

The key must be generated before the first attempt

A common mistake: generating the idempotency key on retry instead of before the first attempt. If the key changes on each retry, it provides no protection – every attempt looks like a new request to the server. The key must be fixed for the lifetime of a logical operation, created before anything is sent, and reused on every retry of that same operation.

Who pays when it still goes wrong?

Idempotency keys work when both sides implement them correctly. When the provider stores responses and deduplicates by key. When the agent reuses the same key on retries. When the TTL on the stored response is long enough to cover the agent's retry window.

In practice, providers implement idempotency at different levels of rigour, agents are built and modified by different teams, and the window between "request sent" and "retry attempt" can stretch further than expected. When something slips through – when the double booking does happen – the next question is: who is responsible, and can either side prove what actually occurred?

This is where a standard application log falls short. A log entry says "booking request received at 14:23:07." It does not say which mandate authorised it, whether the idempotency key was present, what the agent understood the state to be when it retried, or whether the second booking was a retry of the first or an independent request. Reconstructing the sequence of events from application logs on both sides – with different timestamps, different formats, and no shared ground truth – is slow and unreliable.

Why the audit record has to be tamper-evident

A tamper-evident audit log is one where each record is cryptographically chained to the one before it. To alter or delete a record, you must break the chain – which produces a detectable inconsistency in every record that follows. Neither side can quietly remove an entry after the fact.

When a dispute arises, the audit chain answers questions that application logs cannot:

  • Was the idempotency key the same on both requests, or different? If the same: the server should have deduplicated. If different: the agent sent two distinct requests and the question is why.
  • Did the first request produce a mandate-verified response? If yes, the agent had confirmation before it retried. If no, the retry was legitimate and the question shifts to why the server executed both.
  • What was the exact sequence and timing of every call? Not the application logs from two separate systems that need to be reconciled, but a single shared record that both sides signed and neither can alter.

This matters for chargebacks, for customer service, and eventually for the insurance and liability questions that arise when agents handle large or sensitive transactions. "Our logs show X" and "our logs show Y" is a dispute. A tamper-evident chain that both sides can verify is a record.

How this is handled in practice

Sgovr's gateway enforces idempotency at the infrastructure layer. Transactional actions - those with sideEffects: transactional in the provider spec - require an Idempotency-Key header. The gateway stores the response keyed by that value and returns it on any duplicate request within the validity window, without touching the provider again. The deduplication happens before the provider sees the request, which means it works even if the provider does not implement idempotency itself.

The mandate flow adds a further guard. An IntentMandate is scoped to a specific action and expires in seconds. A retry that arrives after the mandate's TTL is rejected – the agent must sign a fresh mandate, which is a deliberate act that creates a new audit record. An unintended retry of a long-expired mandate cannot silently create a second booking; it produces a clear rejection that the agent and the principal can inspect.

Every invocation – successful, failed, rejected, or deduplicated – writes to a hash-chained audit log. The log entry records the mandate, the idempotency key, the response code, and the exact timing. If a double-booking dispute arises, the chain reconstructs the sequence without either side needing to correlate logs from separate systems.

What the SDK does for you

The TypeScript and Python SDKs generate idempotency keys automatically for transactional calls. The key is created before the first attempt and passed through on every retry of the same operation. The developer does not need to manage key generation or retry logic manually; the infrastructure handles both.

For developers building their own clients, the requirement is explicit: the spec marks transactional actions, the gateway enforces the header, and a request without an idempotency key on a transactional action is rejected before it reaches the provider. The constraint is not advisory; it is structural.

The broader lesson

Network failures, retries, and duplicate requests are not edge cases in agentic systems. They are the default operating condition. Agents run over flaky connections, in long-running sessions, with retry logic that is often correct in isolation but creates new problems when combined with stateful providers.

The answer is not to write more careful agents. It is to use infrastructure that makes the failure modes explicit and handles them at the protocol layer, before they reach the application. Idempotency at the gateway. Mandate TTLs that bound the retry window. A tamper-evident audit chain that both sides can verify independently.

When the network hiccups, the double-booking should be impossible by construction – not because the agent was careful, but because the infrastructure made it so. And when a dispute arises anyway, the answer should come from a shared, verifiable record, not from two application logs that may not agree.

The full mandate and payment flow is covered in How AI agents can book hotels without scraping. For the credential model behind mandate signing and why API keys do not carry the same guarantees, see Why API keys are the wrong auth primitive for AI agents.