Creative Codes
← All insights
AutomationJune 2, 202610 min read

Building AI Agents in n8n: Tools, Memory, and Production Patterns

n8n's AI Agent node changed what's possible in visual automation. Here's how we build production AI agents in n8n: tool calls, memory strategies, and the patterns that hold up under real load.

Muhammad Hassan

Founder, Creative Codes. 8 years on backends; last 3 deep on AI agents, RAG pipelines, and production scraping. Python, LangGraph, Playwright, n8n, FastAPI.

n8n's AI Agent node makes it possible to build autonomous agents in a visual editor — but the production patterns are not the ones the tutorials show. Most tutorials show you how to connect an LLM and ask it a question. This post covers what comes after that: tool calls, memory management, error handling, and the architecture decisions that matter when the agent runs daily in production.

What n8n's AI nodes actually are

n8n's AI nodes are a native integration of the LangChain framework into the visual workflow editor. The main nodes:

  • AI Agent: the orchestrator. Takes a prompt, decides which tools to call, processes the results, and produces an output.
  • Chat Model: connects to an LLM provider (OpenAI, Anthropic, Ollama, etc.). The AI Agent calls this to generate responses and decide on tool use.
  • Tool nodes: HTTP Request, Code, Workflow (sub-workflows), Vector Store, and others. The agent calls these as needed to answer queries or complete tasks.
  • Memory nodes: Window Buffer, Postgres, Redis, or Zep. Controls how much conversation history the agent retains between messages.

The AI Agent node implements a ReAct-style loop (Reason + Act): the LLM reasons about what to do, calls a tool, observes the result, reasons again, and continues until it has a final answer. This is exactly what you'd build manually in Python with LangChain — except it's wired together visually.

Setting up a basic agent

The minimal working setup:

  1. Trigger (Webhook, Schedule, or manual)
  2. AI Agent node (connected to Chat Model + at least one Tool)
  3. Chat Model node (e.g., OpenAI GPT-4o or Claude claude-sonnet-4-6)
text
Webhook Trigger
    ↓
AI Agent
    ├── Chat Model: GPT-4o
    ├── Tool: HTTP Request (fetch external data)
    └── Tool: Code (custom logic)

The system prompt on the AI Agent node is where you define the agent's behavior. Be specific. "You are a helpful assistant" produces generic behavior. "You are a data extraction agent. When given a company name, use the search tool to find their latest funding round, extract the amount and date, and return structured JSON with keys: company, amount_usd, date, round_type." produces useful behavior.

Tool calls: three patterns we use

Pattern 1: HTTP Request as tool

The HTTP Request node, when attached to an AI Agent as a tool, allows the agent to make API calls. The agent decides when to call it and what parameters to pass.

This is useful when the agent needs to fetch real-time data: stock prices, weather, a CRM lookup, a search API. The agent constructs the request based on the task context, makes the call, and incorporates the result into its reasoning.

Key setup: define the tool's input schema in the HTTP Request node's "Tool" section. Be explicit about what the agent should pass (e.g., query as a string, limit as a number). Vague schemas produce inconsistent tool calls.

Pattern 2: Sub-workflow as tool

Any n8n workflow can be exposed as a tool. You call it by attaching a "Call n8n Workflow" node as a tool to the AI Agent. The sub-workflow runs, returns an output, and the agent uses that output.

This is the pattern we use most in production. Why: complex operations (database lookups, file processing, multi-step transformations) live in separate workflows where they're testable and maintainable independently. The AI Agent stays clean — it just knows "I can call this workflow with these inputs."

A real example: an agent that processes inbound lead emails. The agent calls a sub-workflow to look up the sender in the CRM, another to check if the domain is a known competitor, and a third to draft a categorized response. The agent doesn't need to know the CRM's API shape. That lives in the sub-workflow.

Pattern 3: Code node as tool

For custom logic that doesn't fit a pre-built tool — complex string parsing, custom scoring formulas, local data transformations — attach a Code node as a tool. Write JavaScript (or Python) directly.

The agent calls the Code node when it needs that logic. The code receives the agent's input as $input.first().json, does the work, and returns a JSON object.

javascript
// Example: score a lead based on rules
const lead = $input.first().json;

const score = (
  (lead.company_size > 100 ? 30 : 10) +
  (lead.industry === 'fintech' ? 25 : 0) +
  (lead.has_existing_pipeline ? 20 : 0) +
  (lead.budget_mentioned ? 25 : 0)
);

return { score, tier: score > 60 ? 'hot' : score > 30 ? 'warm' : 'cold' };

Memory: the part most tutorials skip

An AI Agent without memory treats every execution as the first conversation. For many automation workflows, that's fine. For anything involving multi-turn conversation, context accumulation, or state across sessions, it's not.

Window Buffer Memory (default)

The simplest option. Keeps the last N messages in the workflow's execution context. Works for single-session conversations. If the workflow restarts or you're running many parallel sessions, there's no persistence — each run starts fresh.

Use this for: one-shot automations, single email processing, batch jobs where each item is independent.

Postgres Chat History

Persists conversation history to a Postgres table keyed by session ID. The AI Agent node passes a sessionId to the memory node, and it loads/saves the relevant messages.

text
Session ID: {{ $('Webhook').item.json.user_id }}

This is what we use for customer-facing chatbots and support agents. Each user has a persistent conversation history. The agent remembers what was discussed last week.

Key consideration: the history grows indefinitely without cleanup. Run a periodic job to delete sessions older than N days or truncate to the last N messages per session.

Redis Session Memory

Same as Postgres but with built-in TTL. Set TTL: 3600 and sessions expire automatically after an hour of inactivity. Good for transient interactions where you want context during a session but don't need long-term persistence.

A production example: content monitoring agent

Here's a workflow we built for a client that monitors competitor blog posts and summarizes changes daily:

text
Schedule Trigger (daily at 8am)
    ↓
AI Agent
    ├── Chat Model: GPT-4o
    ├── Tool: HTTP Request → fetch competitor RSS feed
    ├── Tool: Sub-workflow → check if URL was already processed (Postgres lookup)
    ├── Tool: Sub-workflow → extract article content (Playwright scrape)
    └── Tool: Code node → format output as structured JSON
    ↓
Filter: only new articles
    ↓
Postgres → store processed articles
    ↓
HTTP Request → POST summary to Slack

The agent's system prompt:

You are a content monitoring agent. Your job is to process competitor blog posts. For each new article, use the fetch tool to get the RSS feed, filter for posts published in the last 24 hours, check if each URL has already been processed, and for new posts, extract the full content and produce a structured summary with: title, url, key_points (list of 3), competitive_relevance (low/medium/high), and recommended_action.

The agent handles the conditional logic (new vs already processed) using its tools and reasoning. No IF nodes needed in the main workflow. The workflow stays clean.

Production patterns that actually matter

Token limit management

AI Agent workflows can hit token limits in two ways: the system prompt + tool schemas exceed the model's context window, or accumulated memory fills the context.

Monitor token usage via the execution log. If you're seeing truncated responses or context window errors, reduce system prompt length (be more concise), shorten tool descriptions (agents don't need essays to understand what a tool does), or reduce the memory window size.

For workflows processing long documents: chunk the document before passing it to the agent. Pass one chunk at a time rather than the full text.

Rate limiting

If your workflow fires hundreds of times per hour, you'll hit LLM API rate limits. n8n doesn't have a built-in rate limiter, but you can implement one with a Redis counter + Wait node: check the count before the AI Agent call, increment on execution, wait if the limit is reached.

Error handling in AI workflows

AI Agent nodes fail in ways regular nodes don't: the model might return malformed JSON, tool calls might fail mid-reasoning, or the agent might get stuck in a reasoning loop.

Set a "Max iterations" limit on the AI Agent node (usually 5-10 for most tasks). This prevents infinite loops.

Add an error workflow that catches AI Agent failures and logs: the input that triggered the failure, the iteration count reached, and the last tool call attempted. This is the only way to debug why an agent failed on a specific input.

Cost monitoring

Add a Function node after every AI Agent call that logs: model used, estimated input tokens, estimated output tokens, workflow name, execution timestamp. Write to a ai_costs table in Postgres. Run a weekly query to see which workflows are consuming the most tokens. This turns a surprise billing event into a predictable line item.

When n8n AI agents aren't enough

n8n's AI Agent node is a single-agent, synchronous execution model. It works well for tasks where one agent, with a set of tools, can complete the work in one run.

Where it falls short:

  • Multi-agent coordination: if you need agents reviewing each other's work, debating, or specializing across parallel tasks, you need LangGraph or a similar multi-agent framework. n8n doesn't support agent-to-agent communication natively.
  • Long-running stateful tasks: agents that need to pause, resume, and maintain complex state over hours are better handled in a dedicated agent framework with proper checkpointing.
  • High-throughput parallel execution: if you need 50 agents running simultaneously across 50 inputs, n8n's concurrency model adds overhead. A Python service with async execution will be faster.

For RAG-powered agents where retrieval is the main tool, n8n's Vector Store node works well at moderate scale. For production RAG with hybrid search, cross-encoder reranking, and strict accuracy requirements, see RAG Pipelines in Production — the architecture there benefits from a dedicated Python service rather than n8n nodes.


If you're building AI agents in n8n and hitting the limits of tutorials, let's talk about the architecture.

Related: Building Production n8n Workflows: Architecture, Error Handling, Deployment

n8n Automation Development → | AI & Machine Learning services →

Related service

Need complex n8n workflows built to production standards?

AI Workflow Automation

We publish new posts every few weeks. See more on the insights page.