Structured Data Analysis
The Problem: LLMs Can’t Do Math at Scale
LLMs are powerful reasoners, but they have a fundamental limitation: they struggle with mathematical and analytical tasks. A classic example is asking a model “How many r’s are in strawberry?” - it confidently answers two, when the correct answer is three. Without code execution, the model is guessing at character counts rather than actually counting. Ask it to verify with code, and it gets it right every time.
Now scale that problem up. If an LLM can’t reliably count letters in a single word, imagine asking it to compute an NPS score across 2,000 survey responses, or total ARR by industry across 500 accounts. It will summarize from a tiny sample, hallucinate totals, or silently truncate the data. The results look plausible but aren’t trustworthy. The Reasoning Engine needs a different tool for the job.
The Solution: Code Interpreter
The answer is to let the Reasoning Engine do what it’s good at. Understanding the question and writing code and then execute that code deterministically against the full dataset. Instead of the Reasoning Engine trying to reason 500 ARR values, it writes a Python script that computes the exact total and hands the result back.
This is called the Code Interpreter, and it’s a built-in skill of the Reasoning Engine. In the Moveworks platform, this capability is called Structured Data Analysis (SDA).
What is SDA?
SDA is the Reasoning Engine’s Code Interpreter capability. When a plugin returns structured data that’s too large for inline summarization, the Reasoning Engine automatically writes and executes Python code against the full dataset - performing aggregations, groupings, filters, charts, and more - then synthesizes the result into a final user-facing message.
SDA activates automatically. After your plugin’s final action completes, the output is checked against the 7K token limit. If the output exceeds that limit and is valid structured JSON, SDA takes over. There’s nothing to enable, wire up, or configure.
Consider a plugin that returns 2,247 NPS survey responses. Without SDA, the Reasoning Engine sees only a handful of records that fit within its context window. It produces a surface level summary with no actual NPS calculation, no regional breakdown, no theme frequency counts. With SDA, the same plugin and same dataset produces a complete statistical analysis: overall NPS score, promoter/passive/detractor splits by region, comment theme extraction with frequencies, and actionable signals - all computed from the full 2,247 records.
SDA triggers based on output size alone. There’s no configuration needed. Your only job is to return well-structured JSON. This guide covers how to do that effectively.
How SDA Works
Data Flow Architecture
Every plugin response follows one of three paths based on its size:
- ≤ 7K tokens: The Reasoning Engine summarizes the data. No code execution involved.
- > 7K tokens: The output routes to SDA. SDA samples a snippet of the dataset to understand its structure, writes Python code against the full data, executes it, and hands the result back to the Reasoning Engine for the final user facing message.
- > 100MB: Hard failure - the entire plugin response is rejected and the user sees an error.
The key implication: your plugin doesn’t decide whether SDA runs. The size of the final output decides. Two conditions must both be true for SDA to activate:
- Output exceeds 7K tokens. Below this threshold, the Reasoning Engine handles everything with reasoning alone.
- Output is valid structured JSON. SDA will not activate on raw text, semi-structured content, or data embedded inside a string field (e.g.,
"data": "{\"accounts\": [...]}"). If your source API returns stringified JSON, parse it in your output mapper before returning it.
Critical Limits
Designing Your Output
SDA can only work with what it receives. It doesn’t know what your plugin could return it only sees what it did return, plus any metadata you attached. That makes output design the highest leverage area for improving SDA quality.
Include All Data SDA Needs in the Final Output
SDA receives only the output of your plugin’s action activities. It can’t see intermediate results inside compound actions, it can’t call APIs, and it can’t ask for more data. If a field isn’t in the final output, SDA will either skip it or produce an incorrect analysis.
This matters most for multi-step plugins. If your process fetches accounts from one API and enrichment data from another, make sure the final action merges both into a single output. Use MAP() to combine data from multiple actions:
In this example, data.raw_user_list comes from one action and data.company_context comes from another. The final output merges both so SDA has everything it needs in a single flat list.
Use Friendly Field Names
SDA uses field names as its primary signal for understanding what each column represents. A name like churn_rate is self-documenting; ChurnRate__c forces the model to infer meaning from a system-generated suffix. Friendly names also improve the user experience - these names surface in messages and charts.
Flatten Nested Structures
SDA works best with tabular data - flat rows and columns that map cleanly to a dataframe. Deeply nested JSON forces SDA to write extra code to unpack the structure before it can even start the analysis, and that extra complexity increases the chance of errors.
Bad - Nested with cryptic API field names:
Good - Flat with friendly names:
Complete Output Mapper Example
Here’s a production-ready output mapper that ties together everything above - friendly names, flat structure, and display_instructions_for_model for baseline analysis rules:
Steering SDA with Instructions
You steer SDA’s behavior at two levels: the plugin name + description and the display_instructions_for_model field (which steers SDA on how to interpret the data once it has it).
Name and Describe Your Plugin Effectively
SDA uses your plugin’s name and description as context when writing code against your data. A descriptive name like NPS_Score_Analysis immediately tells SDA what domain it’s working in and what kinds of calculations make sense. A vague name like excel_data_plugin gives it nothing to work with.
The description matters even more. When your description lists the fields your plugin returns, SDA can map the user’s question to the right columns and choose appropriate analysis methods. If your description says “Returns insights on NPS score with account owner, internal scoring, customer, and region,” SDA knows to look for score fields, group by region, and segment by customer. If your description just says “Returns data from excel,” SDA is lacking critical context.
The bad example may fail a question like “Which accounts have the highest churn rate?” - nothing in the description signals that churn data is available.
Use display_instructions_for_model for Baseline Rules
Once the Reasoning Engine routes to your plugin, SDA decides what code to write based on two inputs: the user’s question and the display_instructions_for_model field attached to your data. Think of the user’s question as what to compute and the instructions as how to interpret the data while computing it.
The distinction matters. If you put "Calculate total ARR by industry" in your instructions, SDA might execute that calculation even when the user asks a completely different question. Instructions should define baseline rules that apply regardless of the query, not specific analyses.
Stick to rules that should hold true for every query against this dataset:
- Units and formats -
"annual_recurring_revenue is USD","churn_rate is 0-100%" - Default groupings -
"Group by territory when comparing regions" - Business rules -
"Flag churn_rate > 15 with renewal in 90 days as at-risk"
What NOT to Put in Instructions
Don’t use instructions to dictate specific analyses. That’s the user’s job - their question tells SDA what to compute. Instructions that overlap with user intent create conflicts, and SDA may ignore one or both.
Analysis Instructions Are Not Guaranteed
Instructions that tell the model how to analyze (e.g., "fuzzy match Telecam to Telecom") are unreliable. SDA may skip them entirely, or worse, apply them in contexts where they don’t make sense. Fix data quality issues upstream in your output mapper or API call, not in instructions.
Size Optimization
Your plugin’s response has a hard ceiling (100MB), and every field you return eats into that budget. But size isn’t just about avoiding failures. Smaller, focused datasets also produce better analysis. When SDA has 15 fields to work with instead of 50, it writes simpler code with fewer opportunities for error.
Optimize at three levels: pre-filter at the API, prune the schema, and post-filter with DSL.
A 500-row dataset with 30 fields is dramatically larger than the same 500 rows with 10 fields. Pruning aggressively can be the difference between a dataset that fits size limitations and can shave seconds off your plugin run time.
Pre-filtering at the API Level
The cheapest data is data you never fetch. Use plugin inputs to narrow the query before hitting the external API. You can define multiple filter inputs on the same plugin and mark them as optional, so the plugin works broadly but can be scoped when the user provides constraints:
- Date ranges -
start_dateandend_dateto bound time-series queries - Current user -
assigned_tothe requesting user, so each person sees only their records - Numeric thresholds - e.g., only accounts with more than 500 employees
- Optional inputs - Mark all of the above as optional so the plugin still works when the user doesn’t specify a filter
Schema Pruning with MAP()
MAP() lets you project only the fields SDA needs from the raw API response. Drop system metadata, internal IDs, audit timestamps, and long text blobs. If SDA won’t reference a field in a calculation or summary, it shouldn’t be in the output.
Post-filtering with FILTER() and other Data Mapper Syntax
When the source API doesn’t support server-side filtering, use DSL and Data Mapper expressions to remove records inside your plugin’s process before the output reaches SDA:
Debugging and Validation
After launching your plugin, verify SDA is working correctly:
- Look for the “Running analysis” indicator. When SDA activates, the chat shows a progress indicator during code execution. If you don’t see it, your output is likely under the 7K token threshold or isn’t valid structured JSON.
- Check the code citation. SDA generates a reference labeled “Analysis Code” alongside its response. Expand it to see the exact Python code that ran - including how it parsed your fields, what aggregations it performed, and whether it interpreted your data correctly. This is your primary debugging tool.
- Test across question types. The same dataset should support different analyses depending on the user’s question. Ask for totals, groupings, trends, and filters to confirm SDA handles your schema correctly across all of them.