Haystack Tutorial (TypeScript): testing agents locally for advanced developers

By Cyprian AaronsUpdated 2026-04-21
haystacktesting-agents-locally-for-advanced-developerstypescript

This tutorial shows you how to run and test a Haystack agent locally in TypeScript without wiring it straight into production infrastructure. You’ll build a small harness that exercises tool calls, captures outputs, and gives you a repeatable way to debug agent behavior before shipping it.

What You'll Need

  • Node.js 18+ and npm
  • A TypeScript project with tsconfig.json
  • @haystack/core installed
  • An OpenAI API key set as OPENAI_API_KEY
  • Optional: dotenv if you want to load environment variables from a .env file
  • A terminal that can run ts-node, tsx, or compiled Node output

Step-by-Step

  1. Start by creating a minimal TypeScript project and installing the Haystack package. Keep this local-first so you can iterate on prompts, tools, and failure cases without deploying anything.
mkdir haystack-agent-local-test
cd haystack-agent-local-test
npm init -y
npm install @haystack/core dotenv
npm install -D typescript tsx @types/node
npx tsc --init --module nodenext --target es2022 --moduleResolution nodenext --strict
  1. Add your environment variable loader and define a small config file for local runs. This keeps the API key out of your code and makes test runs reproducible.
// src/env.ts
import "dotenv/config";

export const config = {
  openAIApiKey: process.env.OPENAI_API_KEY,
};

if (!config.openAIApiKey) {
  throw new Error("OPENAI_API_KEY is required");
}
  1. Create a simple tool the agent can call during tests. For local agent testing, use deterministic tools first; that lets you validate orchestration before introducing flaky external dependencies.
// src/tools.ts
import { Tool } from "@haystack/core";

export const orderStatusTool = new Tool({
  name: "order_status",
  description: "Returns the status of an order by order ID",
  parameters: {
    type: "object",
    properties: {
      orderId: { type: "string" },
    },
    required: ["orderId"],
    additionalProperties: false,
  },
  execute: async ({ orderId }: { orderId: string }) => {
    const statuses: Record<string, string> = {
      "1001": "shipped",
      "1002": "processing",
      "1003": "delivered",
    };

    return { orderId, status: statuses[orderId] ?? "not_found" };
  },
});
  1. Wire up an agent runner that calls the model with the tool attached. The goal here is not to build your final app; it’s to create a local harness that answers one question well: did the agent choose the right tool and return the right result?
// src/run-agent.ts
import { OpenAIChatGenerator } from "@haystack/core";
import { orderStatusTool } from "./tools";
import { config } from "./env";

const generator = new OpenAIChatGenerator({
  apiKey: config.openAIApiKey!,
  model: "gpt-4o-mini",
});

const messages = [
  {
    role: "system",
    content:
      "You are a support assistant. Use tools when the user asks about order status.",
  },
  {
    role: "user",
    content: "Check order 1002 and tell me its status.",
  },
];

const response = await generator.generate({
  messages,
  tools: [orderStatusTool],
});

console.log(JSON.stringify(response, null, 2));
  1. Add a tiny test harness so you can run multiple cases locally and inspect behavior fast. In practice, this is where most agent bugs show up: wrong tool selection, missing parameters, or hallucinated answers when a tool should have been used.
// src/test-harness.ts
import { OpenAIChatGenerator } from "@haystack/core";
import { orderStatusTool } from "./tools";
import { config } from "./env";

const generator = new OpenAIChatGenerator({
  apiKey: config.openAIApiKey!,
  model: "gpt-4o-mini",
});

const cases = [
  "Check order 1001.",
  "What is the status of order 9999?",
];

for (const input of cases) {
  const result = await generator.generate({
    messages: [
      { role: "system", content: "Use tools for any order lookup." },
      { role: "user", content: input },
    ],
    tools: [orderStatusTool],
  });

  console.log("\nINPUT:", input);
  console.log("OUTPUT:", JSON.stringify(result, null, 2));
}
  1. Run it locally and keep the output under version control as part of your regression workflow. Once this harness exists, you can swap in more tools, add edge-case prompts, and compare outputs after every change.
{
  "scripts": {
    "dev": "tsx src/run-agent.ts",
    "test:harness": "tsx src/test-harness.ts"
  }
}

Testing It

Run npm run dev first and confirm you get a structured response back instead of an exception. Then run npm run test:harness and check whether the model uses order_status for both cases.

For a good local test setup, verify these behaviors:

  • The tool gets called when the prompt clearly requires it
  • Unknown orders return not_found
  • The response format stays stable across repeated runs

If the model starts answering directly without using the tool, tighten the system message or make the tool description more explicit.

Next Steps

  • Add more deterministic tools, like customer lookup or policy validation, and test them with the same harness.
  • Wrap these runs in Vitest so you can turn prompt regressions into failing tests.
  • Capture tool-call traces to compare model behavior across prompt changes and model upgrades.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides