Connect to other AI Agents & Applications
Connect to other AI Agents & Applications
Connect to other AI Agents & Applications
This cookbook helps you connect your Moveworks Assistant to external AI agents, LLMs, and AI-powered applications. It covers choosing the right integration approach, managing conversation context across turns, and handling asynchronous APIs that require polling. Before diving in, it’s important to understand the trade offs of different integration approaches so you can choose the right architecture for your use case.
Not all integration patterns are created equal. When connecting to external systems, the approach you choose has a significant impact on reliability, controllability, and user experience. Below is a stack-ranked guide from most to least recommended. Before building an agent to agent plugin make sure you understand these tradeoffs.
Use Moveworks plugins with HTTP Actions to call external APIs directly.
This is the most robust approach. By wrapping an external API call inside a Moveworks plugin, you retain full control over:
If you’re trying to connect to a Foundation Model like GPT, try our built-in plugin: QuickGPT.
MCP allows external tools and data sources to be exposed to an AI agent through a standardized protocol.
While functional, MCP introduces trade-offs compared to native API integration:
MCP can be appropriate when a vendor only exposes their capabilities through MCP and does not offer a REST API.
Direct agent to agent communication, where your Moveworks agent delegates work to another autonomous agent is the least recommended approach.
The core issue is that our reasoning engine has no context of the other agent’s working memory. When Moveworks’ reasoning engine delegates to another agent, it has no visibility into that agent’s internal capabilities, tool inventory, or decision making logic. It’s sending a request into a black box and hoping for the best.
Think of it this way: imagine you need to ask a colleague for help, but you have no idea what they’re actually capable of. They have dozens of specialized skills and tools, but none of those are explained to you up front. You just send a message and hope they figure out which of their many capabilities to apply. That’s the experience from the reasoning engine’s perspective — it can’t make an informed decision about what to delegate because it doesn’t understand the other agent’s strengths, limitations, or how it will process the request.
This makes selecting that agent every time for the right plugin extremely difficult.
If you must connect agent to agent, we outline the recommended approaches below
There are three ways that you can manage context, each with their own pros & cons.
This approach lets the Agentic Reasoning Engine manage conversation context for you. The reasoning engine tracks the conversation history and decides what context to pass to your external API on each turn. This is the fastest way to get started, no thread tracking or database needed.
Your plugin will look something like this:
For the easiest implementation, we recommend the following high-level approach.
Create a Conversation Process with an action activity for your agent’s API. This is the core of your plugin — it defines the flow that calls the external API and returns the response. Start by creating the process and adding an action activity that points to the HTTP Action you’ll configure in the next step.

Set up an HTTP action to call the external agent’s API. Here’s an example using the Anthropic API:
Map the slots to the action activity in your conversation process. Pass the slots into the API call using DSL:
Some external APIs keep track of the conversation thread for you — you send a thread_id with each request and the API maintains the full message history on its side. Examples include OpenAI’s Assistants API, where the API stores all messages in a thread and you simply reference the thread ID on subsequent calls.
The key design pattern is to make the thread_id slot optional so that it sends null on the first turn (when no thread exists yet) and carries the returned thread ID forward on subsequent turns.
Create a slot for the thread ID with the following configuration:
The inference policy set to “Always Infer” means:
null.This can be handled directly in a Conversation Process — no compound action or switch needed. The thread_id slot is truthy/falsy, so you can use DSL to conditionally pass it to the external API.
Set up a single HTTP Action that accepts both the thread_id and the user message. The API should always return a thread_id in the response so it can be carried forward.
If the external API doesn’t automatically generate a new thread when thread_id is null, add an action step or logic in your compound action to create a new thread first, then pass the resulting ID to the main API call.
In your Conversation Process, wire the slots directly to the action activity. The input mapping uses DSL — since thread_id is falsy on the first turn, you can pass it as-is:
The action activity returns the response to the user. The thread_id from the API response is now part of the conversation context, so on the next turn the reasoning engine will automatically infer it into the slot.
Important: Make sure the thread_id is visible in the response output shown to the conversation. This is what allows the reasoning engine to pick it up as context on the next turn and infer it into the slot automatically. If the API returns it but it’s not surfaced in the process output, the reasoning engine won’t have it available to infer.
Many external agents and LLMs don’t offer an API that keeps track of the thread for you, which means every API call is stateless — the external system has no memory of prior turns. You can solve this by creating your own thread tracking mechanism using a ServiceNow table (or any database accessible via API).
In your ServiceNow instance, navigate to System Definition > Tables and create a new custom table. A recommended setup:
Table name example: u_agent_thread_log | Label: Agent Thread Log
Set the u_conversation_history column to a max length of 65000 (the ServiceNow string max) or use a multi-line text field. For very long conversations, consider a strategy to trim older messages and keep only the most recent N turns.
Create a Before Insert business rule on u_agent_thread_log to automatically generate a unique u_thread_id when a new record is created. This way, your compound action only needs to POST the u_user_id and the first message — the thread ID is generated server-side.
You need three operations, which you can accomplish via the standard Table API or a Scripted REST API:
Option A: Use the standard Table API
Option B: Create a Scripted REST API for cleaner endpoints and built-in logic (e.g., auto-trimming old messages, validating JSON structure). This is recommended if you want to encapsulate the history-append logic server-side rather than in your compound action.
Create three HTTP Actions in Agent Studio, one for each operation:
POST to create a new record. Send the user’s first message as the initial u_conversation_history value (e.g., [{"role": "user", "content": "..."}]). Returns the sys_id and u_thread_id.GET to retrieve the conversation history by u_thread_id. Returns the u_conversation_history JSON string.PATCH to update the record with the latest user message and assistant response appended to the history.How the flow works:
u_thread_id is returned to the reasoning engine.u_thread_id (collected as a slot with inference policy set to auto-infer). The compound action calls Get_Thread_Action to retrieve history, constructs the full message array, calls the external LLM API, then calls Update_Thread_Action to append the new exchange.thread_id as a slot with an inference policy set to automatically infer from context — the reasoning engine will carry it forward across turns without asking the user.This approach gives you full context continuity with any stateless API, and the conversation history lives in a system you control.
Some external agents and APIs don’t return results immediately. Instead, they accept a request, return a job or task ID, and require you to poll for the result. You can handle this pattern in Agent Studio using a compound action with chained action steps and delay_config to space out polling attempts.
The pattern: Submit, wait, and poll with stacking intervals
Rather than polling aggressively (which wastes API calls and may hit rate limits) or waiting too long (which degrades user experience), use a stacking wait strategy that starts short and gets progressively longer:
job_id or task_id from the response.delay_config on the next action step to pause, then call the status endpoint.switch to check the status. If still processing, hit a second polling step with a longer delay.Adjust the polling intervals based on the expected response time of your external system. For APIs that typically respond in under a minute, you might use 5s -> 15s -> 1m. For long-running jobs, consider 1m -> 5m -> 15m. Set the last poll for the upper bound of the system you are connecting to.
LLM providers charge based on the number of tokens processed (both input prompt and output generation). Long conversations or large documents can become expensive quickly.
Best Practices:
max_tokens parameter in your API calls to cap the length of the response and prevent unexpectedly large (and expensive) outputs.Standard public LLM APIs may use your prompt data to train their models. Sending Personally Identifiable Information (PII) or sensitive company data is a significant risk.
Best Practices:
Triggering reliability can vary depending on the use case and breadth of positive utterance subject matter. Below are some options to optimize your LLM plugins:
system message (or an equivalent field) in your API request body. This pre-prompts the LLM with its role or instructions (e.g., “You are an expert at rewriting text to be more professional”). The user then only needs to provide the core input, making the interaction much smoother.