Reasoning Engine Upgrade Program

Control how your AI Assistant receives Reasoning Engine upgrades
View as Markdown

As research labs developer frontier large language models, Moveworks is committed to bringing frontier intelligence to your AI assistant as fast as possible.

The Reasoning Engine Upgrade Program gives every Moveworks tenant a predictable upgrade track for new versions of the reasoning engine.

Upgrade Programs

Frontier Reasoning Engine

Receive the latest reasoning engine models as soon as they are available.

Best for:

  • Trailblazing customers who want users to experience frontier intelligence and are comfortable with pre-GA quality
  • All customer dev/sandbox tenants

Standard Reasoning Engine

Receive the latest reasoning engine models once they are ready for GA.

Best for:

  • Most production tenants for enterprise customers

Basic Reasoning Engine

Migrated to the latest reasoning engine models at the deprecation deadline, typically 2–4 weeks after the Standard release.

Best for:

  • Organizations with more extended change management programs.
Limited Support Warning

Note: If you experience reasoning quality issues with a “Basic Reasoning Engine” model version that’s behind the “Standard” release, our support team won’t be able to help.

We focus all our ML services for prompt tuning & enhancements on the latest version of the Reasoning Engine.

Selecting an Upgrade Program

If you own multiple tenants, you can mix & match programs across your tenants. For example, a customer might put their sandbox on Frontier and production on Basic.

You can change your program verison at any time to access different models. Simply go to Moveworks Setup > Core Platform > AI Assistant > Advanced Settings > Conversation Settings.

Once you pick your model tiering, the update will be reflected for your users within <5 minutes.

The current models for the upgrade program are…

Data CenterFrontierStandardBasic
US WestGPT-5.2GPT-5.2GPT-5.1*
EuropeGPT-5.2GPT-5.2GPT-5.1*
CanadaGPT-5.2GPT-5.2GPT-5.1*
AustaliaGPT-5.2GPT-5.2GPT-5.1*
GovCloudGPT-5.2GPT-5.2GPT-4.1*

*This is the default though is subject to change based on customer needs. You should reach out to you CSMs to confirm the exact model version.

Model Behavior Changelog

We will express the improved (& regressed) behaviors of these models as we load them into an agentic harness (ours being the Agentic Reasoning Engine). Generally, newer models will exhibit greater performance, stronger instruction following, and a better ability to attain the user’s goal.

We optimize our harness to work with frontier LLMs so you continue to get great performance.

Read the launch blog

(+) More comprehensive & accurate tool calling. GPT 5.2 is more comprehensive in the sets of tools it uses to help answer user questions. As a result, it provides a more helpful answer and is less likely to need to handoff users to internal help channels.

(+) Better Enterprise Search. GPT 5.2 works better with Enterprise Search. GPT 5.2 provides a more concise and relevant search query when using Search. This means the assistant finds what you’re looking for more efficiently, with less noise in the search process.

(+) Stronger Tool Call Post-Processing. After a tool or plugin is called by GPT 5.2, it does a better job of writing code to analyze and restructure the results.

(+) Less Hallucinations. GPT-5.2 does a better job of discerning what capabilities it does and doesn’t have and hallucinates less when conversing with the user

(-) Verbose tool responses & reasoning traces. When your Assistant executes a tool, GPT-5.2 is more likely to explain how it interpreted the results, sharing details about how your tool is structured rather than answering the question.

Tip

You can course correct for verbose tool responses by influencing the Reasoning Engine through display_instructions_for_model (docs)

Read the launch blog

(+) Improved planning & tool calling. GPT-5.1 is capable of longer-running tasks compared to the GPT-4 series counterparts. As a result, we’ve found it can preserve accuracy while taking on work than spans ~20% more tool calls.

(-) Tool Calling Accuracy. GPT-5.1 had a higher tendency to call the wrong tools and to call them with incorrect arguments, hurting the likelihood of getting work done.

(-) Latency. GPT-5.1 introduced significantly more latency into the reasoning engine over non-reasoning model alternatives like GPT-4.1

(-) Hallucinations. GPT-5.1 had a stronger tendency to hallucinate capabilities and pretend that it can do things that it can’t.

FAQ

  • Can I release a Reasoning Engine model version to a subset of my users?
    • No. Upgrades apply to the entire tenant. Customers who need targeted testing should use a sandbox environment.
  • Are there pricing or packaging restrictions on the program version?
    • No. You’re eligible for any version.
  • How much testing time do I get before the Basic deadline?
    • The Frontier and Basic milestones are generally kept ~1 month apart so you have sufficient time for testing.