Skip to content
Method: every claim tracked, reviewed every 30–90 days, marked Holding, Partial, or Not holding. Drafted by Claude; signed off by Peter. How this works →
AM-121pub2 May 2026rev2 May 2026read21 mininImplementation

AI in IT operations: what is actually shipping in 2026, and what the savings really look like

Deep dive into the AI-in-IT-ops market in mid-2026: ServiceNow Now Assist, Microsoft Copilot, AIOps platforms, and the gap between vendor pitch and audited reality. What is actually shipping, what is failing, and what the staff-reduction numbers honestly look like when you trace them to primary sources.

Holding·reviewed2 May 2026·next+62d

If you run IT operations for a mid-market or enterprise organisation in 2026, you have probably been pitched some version of the following: AI is going to absorb 30 to 50 percent of your L1 ticket volume, your AIOps stack is going to cut MTTR in half, and your service-desk headcount can come down by a quarter inside eighteen months. The pitch lands because the vendor case studies are real, the demos work, and the AI products in this category have actually shipped. None of that is vapor.

The harder question, and the one that matters when the budget is real, is what happens when you trace each of those numbers back to a primary source. Some hold up. Many do not. And the gap between vendor-pitch numbers and audited customer results is the entire editorial story of AI in IT operations as of mid-2026.

This piece walks the landscape of what is actually live, where the wins are independently verifiable, where the pitfalls have produced documented walk-backs, and what the staff-reduction story honestly looks like when you read it out of Forrester research, Gartner press releases, and the actual SEC filings of the platform incumbents rather than the marketing decks built on top of them.

What is live in the IT operations AI market in mid-2026

Five product categories matter. Each one has shipped real software, has named paying customers, and is being evaluated against real budgets right now.

The first and largest is ServiceNow Now Assist, which entered general availability with the Vancouver platform release in September 2023, expanded materially in the Washington DC release of March 2024, and shipped its Pro tier in the Xanadu release of September 2024. In April 2026, ServiceNow restructured Now Assist into a three-tier packaging model, going from Assistive AI (summarisation and generation) through Task Automation (discrete workflows end-to-end) to Full Role Automation (autonomous workflows with minimal human oversight). The new ESM Suite bundle is targeted at organisations with 1,000 to 5,000 employees, with implementation timelines collapsed from “six months or more” to “about 30 days” through what ServiceNow calls an implementation agent. John Aisien, ServiceNow’s SVP of forward-deployed engineering, summarised the strategy on the record: “AI is now infused in every package that we offer to our addressable market.”

Per the latest IDC market data quoted in trade press from April 2026, ServiceNow has 8,600 ITSM customers and roughly 40 percent of the ITSM software market, with six times the share of its next two competitors (BMC Helix and Atlassian) combined. Whatever else is true about AI in IT operations, the platform-incumbent picture is concentrated.

The second is the Moveworks acquisition, which ServiceNow announced in March 2025 and closed nine months later. The 10-Q ServiceNow filed for the quarter ended 31 Mar 2026 discloses the actual close on 15 Dec 2025 and the actual purchase consideration of 2.4 billion dollars (1.467 billion in stock, 905 million cash, 31 million in loan settlement, 4 million in stock-based compensation). The originally announced number that still circulates in the trade press is 2.85 billion. The 450-million-dollar gap is not reported anywhere prominently because the press has continued repeating the announced figure rather than the closed one. This is a small thing on its own and a useful sanity check on the entire category: even the headline numbers in the AI-in-IT-ops market are softer than they appear when you compare them to the audited filings.

The third is Microsoft 365 Copilot in IT and operations contexts. This category includes Copilot for Service, Copilot in the Power Platform, the IT-helpdesk plug-ins, and the cross-government deployments that have, importantly, generated the cleanest independent dataset available anywhere on AI productivity in operational work. The UK Government Digital Service published the findings of its 20,000-user M365 Copilot trial in June 2025, covering the Q4 2024 evaluation period across multiple departments. The follow-on rollout to HMRC’s 28,000 staff was announced in April 2026. James Mitton, HMRC’s chief AI officer, was on the record at the Think AI for Government event in London.

The fourth is the AIOps and observability stack: Datadog Bits AI, Dynatrace Davis (now Dynatrace Intelligence as of January 2026, with domain-specific agents for SRE, dev, and security teams), Splunk AI Assistant inside the Cisco-acquired Splunk product line, PagerDuty AIOps, and BMC HelixGPT. Datadog’s Q3 FY25 10-Q discloses that its “AI-native cohort” of customers contributed approximately eight percentage points of year-over-year revenue growth for the quarter ended 30 Sep 2025, with the company’s filing language flagging this concentration as a risk factor (these customers may “in the future optimize their usage”) rather than purely as an opportunity. That is a more honest piece of reporting than the AIOps category typically generates.

The fifth, much smaller, is Salesforce Agentforce IT Service, which Marc Benioff launched and pitched aggressively on the FY26 Q4 earnings call as a direct ServiceNow competitor. Six months after launch, Agentforce IT had roughly 200 customer signups out of Salesforce’s 150,000-customer base. Bill McDermott responded at the Citizens Technology Conference on 2 Mar 2026 that the actual ServiceNow ARR loss to Salesforce was 42,000 dollars against ServiceNow’s 13.2-billion-dollar FY25 revenue. The Salesforce-versus-ServiceNow ITSM war is real noise and tiny dollars; it is editorially worth mentioning mostly because it tells you what the platform incumbents care about.

Forrester’s principal analyst on the space, Charles Betz, framed the head-to-head structurally: “Salesforce is betting that engagement and AI-driven interaction become the primary organizing layer, and that deeper IT models can be reconstructed as needed. ServiceNow is betting that AI makes control planes more important, not less, because poorly governed autonomy is a real enterprise risk.” Both bets are coherent. Only one of them currently has 8,600 paying customers and 40 percent of the market.

What is actually working: the audited wins

The wins, when you separate them from the vendor decks, group into three patterns.

Pattern one: cross-cutting productivity gains at the individual-task level. This is the cleanest category because it is the easiest to measure and the easiest to audit. The UK Government Digital Service trial is the strongest evidence point: 20,000 government employees across multiple departments, a Q4 2024 evaluation period, methodology fully disclosed, and a headline finding that participants saved an average of 26 minutes a day when using M365 Copilot. Over 70 percent of users agreed it reduced time spent searching for information and on routine tasks. The report is candid about limits (“complex, nuanced, or data-heavy aspects of work” were where the value dropped off; security and sensitive-data handling concerns persisted). This is what an audited productivity gain looks like, and it is the floor estimate, not the ceiling.

Pattern two: case-summary and routing time at the L1 service desk. BT’s pilot of ServiceNow Now Assist documented a 55 percent reduction in case-summary writing time and a 35 percent reduction in average case-resolution time. Hena Jalil, BT’s managing director and business CIO, was on the record. The caveat she put on the record is the part most coverage skips: “We have that process at the moment because, as we’re building confidence, we do need that validation. There are certain things that we want to capture that we don’t want an agent to change. We’re doing random checks at the other end as well.” Translation: the 35 percent figure is from a pilot with active human oversight and random sampling on the back end, not from a steady-state production deployment with the AI in the driver’s seat.

Pattern three: vendor self-reports inside vendor-controlled environments. ServiceNow publicly reports that 90 percent of targeted L1 ticket volume is handled autonomously inside its own help desk, with a 99 percent resolution rate within those categories (network at 46 percent, software at 43 percent, hardware at 11 percent of the ticket-type mix). Nenshad Bardoliwalla, ServiceNow’s group VP for AI products, was on the record, and his framing of why this number is so much higher than typical customer outcomes is itself the most editorially honest thing ServiceNow has said in this category: “How does it know it got the right answer? Because the outcome is measurable inside the same platform. Did the ticket resolve? Did the workflow complete? Did the approval get the right sign-off? ServiceNow closes the loop in a way that a standalone LLM sitting on top of a SharePoint folder simply cannot.” And, critically, in the same article: “documentation inside real-world help desks traditionally has been poor to non-existent.”

Which is to say: ServiceNow on ServiceNow with two decades of structured workflow data is the absolute upper bound of what is possible. It is not a customer-deployment benchmark. CIOs reading “90 percent L1 deflection” and budgeting against it without the documentation-quality caveat are budgeting against a number that does not generalise to their environment.

The financial expression of all this on the vendor side is real and audited. Datadog’s eight-percentage-points growth contribution from the AI-native cohort is one example. ServiceNow’s CFO Gina Mastantuono confirmed on the Q2 2025 earnings call that the company is on track for 100 million dollars in AI-powered headcount savings, reinvested into sales and engineering rather than dropping to the margin line. Q2 2025 revenue was 3.2 billion (+22.5 percent year over year) and net income was 385 million (+47 percent). The aggregate ServiceNow remaining performance obligation at 31 Mar 2026 was 27.7 billion dollars, up 25 percent year over year, with 630 customers above 5 million dollars in annual contract value (up from 516 a year prior). The platform-incumbent AI thesis is paying out at the vendor level. Whether it is paying out at the customer level is a separate question with a more uneven answer.

What is failing: the named pitfalls

Three failure modes show up repeatedly in the on-record evidence.

Failure mode one: multi-step agent reliability. The Salesforce AI Research team published CRMArena-Pro on arXiv in May 2025: an evaluation suite for LLM agents across enterprise customer-relationship and IT-adjacent tasks. The headline finding is uncomfortable for the vendor: LLM agents achieved roughly 58 percent success on tasks that can be completed in a single step, and that success rate dropped to 35 percent when the task required multiple sequential steps. Carnegie Mellon researchers reached the same range independently, with multi-step success rates of 30 to 35 percent. This is the gating constraint on every “AI handles your L2 incident triage end to end” pitch. It is also the structural reason Gartner predicts that more than 40 percent of agentic AI projects will be cancelled by end-2027.

Anushree Verma, Gartner’s senior director analyst on the press release: “Most agentic AI projects right now are early stage experiments or proof of concepts that are mostly driven by hype and are often misapplied. This can blind organisations to the real cost and complexity of deploying AI agents at scale, stalling projects from moving into production.” From Gartner’s January 2025 webinar poll of 3,412 attendees: only 19 percent had made significant investments in agentic AI, 42 percent were “conservative,” 8 percent had made none, and 31 percent were wait-and-see or unsure. From Gartner’s follow-on October 2025 survey of 360 IT application leaders at organisations with 250 or more FTEs: only 15 percent were considering, piloting, or deploying fully autonomous agents; 19 percent had high or complete trust in their vendor’s hallucination protection; 74 percent considered agents a new attack vector.

If you are budgeting against “the AI handles L3 root cause” you are budgeting against a capability that, per the most authoritative survey we have, is being seriously evaluated by 15 percent of large enterprises and trusted by 19 percent.

Failure mode two: the Klarna walk-back, generalised. Klarna’s February 2024 press release is still live on the company’s website as of May 2026: “the AI assistant has had 2.3 million conversations, two-thirds of Klarna’s customer service chats… it is doing the equivalent work of 700 full-time agents… it is more accurate in errand resolution, leading to a 25 percent drop in repeat inquiries.” Sebastian Siemiatkowski’s quote from that release: “This AI breakthrough in customer interaction means superior experiences for our customers at better prices, more interesting challenges for our employees, and better returns for our investors.” The H1 2024 financials doubled down, with Siemiatkowski projecting Klarna would shrink from approximately 3,800 to 2,000 employees: “Not only can we do more with less, but we can do much more with less.”

The walk-back came in May 2025. Siemiatkowski told Bloomberg on 9 May 2025 (covered concurrently by Fortune) that Klarna had gone too far on the AI substitution. The verbatim quotes that Bloomberg pulled and that Fortune reproduced: “From a brand perspective, a company perspective, I just think it’s so critical that you are clear to your customer that there will always be a human if you want,” and “Really, investing in the quality of human support is the way of the future for us.” Klarna began recruiting customer-service agents in an Uber-style freelance arrangement, targeting students and rural workers, paying from 400 Swedish krona (about 41 dollars) per shift. The original Klarna press release with the 700-agent claim is still up on the company’s site as of May 2026. The company’s recent practice is materially more cautious. Both facts are true at the same time, and the pattern is the one most enterprise IT leaders should be watching for: the case-study claim outlives the case-study reality by a long margin.

Failure mode three: AI agents as a new attack surface. Michael Bargury, CTO of Zenity, demonstrated zero-click attacks at RSAC 2026 against Cursor, Salesforce agents, and ChatGPT (with Google Drive exfiltration). His framing was the editorially useful one: “AI is just gullible. We are trying to shift the mindset from prompt injection (because it is a very technical term) and convince people that this is actually just persuasion. I’m just persuading the AI agent that it should do something else.” Read the Zenity demo against the 74 percent of Gartner’s surveyed IT app leaders who consider agents a new attack vector. The two numbers describe the same problem from opposite ends.

A fourth, smaller failure mode worth noting: AI-platform pricing is unstable in 2026. GitHub’s April 2026 fix to a token-counting bug caused subscription allowances to “rapidly exhaust,” and Anthropic has taken steps to discourage Copilot Pro+ usage during peak demand. Operators baking multi-year savings cases against quietly-shifting service definitions are taking on a class of risk most procurement teams have not yet learned to underwrite.

What L1, L2, and L3 actually look like in mid-2026

The realistic split, after the audited evidence and the named-failure cases, is roughly the following.

L1: real, measurable, but smaller than the headline. Password resets, basic provisioning, status lookups, standard ticket routing, knowledge-article suggestion, and case summarisation are working in production at named customer sites. The realistic deflection range outside vendor-internal conditions is 20 to 40 percent, not 90 percent. The UK Government’s 26-minutes-per-day-per-user finding is the floor; BT’s 55 percent reduction in case-summary time and 35 percent in case-resolution time is the upper end of pilot disclosures. The vendor self-report ceiling (ServiceNow on ServiceNow at 90 percent) requires conditions almost no enterprise customer can replicate without a multi-year ITSM data-quality programme first.

L2: AI-assisted triage is helping; full auto-triage is rare. Average-handle-time reductions in the 15 to 30 percent range are showing up in pilot disclosures with named customers. The typical pattern is human-in-the-loop with the AI suggesting routing, summarisation, and known-issue matching, and the L2 engineer accepting or overriding. Multi-step agent reliability at 35 percent (CRMArena-Pro) is the structural reason this stays human-in-the-loop. Promising direction; not yet a story you can write a headcount-reduction case against.

L3: assist-mode only for almost everyone. Closed-loop AI auto-remediation in production is, per Gartner’s October 2025 survey, being considered or piloted by roughly 15 percent of large enterprises. The remaining 85 percent are using AI for log summarisation, runbook suggestion, hypothesis generation in incident review, and post-incident write-up assistance. None of those produce defensible L3 headcount reductions on a 12-to-18-month horizon. The vendors that pitch otherwise are pitching against the Gartner data.

The honest read for a CIO budget is: L1 is a productivity story, L2 is an AHT story, L3 is a learning story. Only the first one supports a hard cost case. The other two compound, and they compound slowly.

What the staff-reduction numbers actually look like

This is the part most CIOs need to read carefully because the gap between the pitch and the audited reality is the largest in this category.

The cleanest primary source is the Forrester forecast published 13 Jan 2026 by VP and principal analyst J.P. Gownder: 6.1 percent of US jobs lost by 2030 due to AI and automation, equating to 10.4 million jobs. The number is real. The framing Gownder put around it is the editorially load-bearing part: “Every week, we speak to clients telling some version of the following story: ‘Our CEO said we’re laying off 20 percent of staff and replacing them with AI: how do we do that?’ When we ask if they have a mature, vetted AI app ready to fill in those jobs, nine out of 10 times, the answer is no, and they haven’t even started. So most of the layoffs are financially driven and AI is just the scapegoat, at least today.”

Forrester also forecasts that AI will “strongly influence” 20 percent of jobs, which is 3.25 times the impact of replacement. The augmentation story is the bigger one. The replacement story is the louder one.

The corroborating Gartner data is more direct. The September 2025 Gartner survey of Fortune 500 customer-support functions (covered by The Register) found that none of the surveyed Fortune 500 companies predict replacing all support staff with bots by 2028; only 11 percent had reduced headcount as a result of AI; 54 percent plan to maintain staffing and use AI to boost engagement quality; 22 percent have stopped backfilling departures; and 32 percent are hiring more staff with specialised AI-related skills. Emily Potosky, Gartner’s senior director of research, on the record: “Let’s say you have the most advanced AI agent in the world, and you have all of the infrastructure stuff set up to be able to deploy the technology. Customers are still going to have issues that need to be handled by human agents. If you’ve been a victim of credit card fraud and you need to talk to someone, you probably want to talk to a human who can provide you with that reassurance.”

Forrester’s Predictions 2026 report adds the clincher: half of AI-attributed layoffs are likely to be reversed, 55 percent of employers regret laying off workers because of AI, 57 percent of those in charge of AI investment expect AI to increase headcount, and only 15 percent expect a decrease. The rehires Forrester forecasts will largely be “lower-wage human workers, offshore or at lower salary.”

The named-CEO data points fit the pattern. Marc Benioff told the Logan Bartlett Show before Labor Day 2025: “I’ve reduced [support headcount] from 9,000 heads to about 5,000 because I need less heads,” with half of customer conversations now conducted by AI. Allison Kirkby of BT told the Financial Times in June 2025 that the previously announced plan to slash up to 55,000 jobs by 2030 “did not reflect the full potential of AI” and “there may be an opportunity for BT to be even smaller by the end of the decade.” Arvind Krishna of IBM estimated in 2023 that up to 30 percent of IBM’s back-office jobs (around 7,800 roles) could be replaced by AI, and IBM’s April 2025 announcement of 150 billion dollars in US operational investment over five years says the augment-not-replace pattern more loudly. Luis von Ahn of Duolingo said the company would “gradually stop using contractors to do work that AI can handle,” with “small hits on quality” an acceptable trade. The subsequent partial reversal is part of the same pattern Forrester is predicting at scale.

The structural read of all this is consistent. Direct enterprise headcount reductions attributable to AI in IT operations are real but small (Gartner’s 11 percent of Fortune 500), often regretted (Forrester’s 55 percent), and frequently reversed (Forrester’s “half”). Where AI is producing material cost reduction in IT operations is in the third-party / contractor / BPO line. BT’s framing in the ServiceNow case study was “cut use of third-party support staff” and “we don’t have to outsource as much.” That is the cost line that moves first, and it moves measurably. The direct-employee FTE line follows much more slowly, much more controversially, and much more often gets reversed.

For a CIO planning a budget defence, the honest framing is: AI in IT operations changes your BPO contract first, your contractor spend second, and your headcount third. If your business case is built on the third one, the audited evidence does not support it. If your business case is built on the first one, it probably does.

The auditability and lock-in axis

There is one structural argument worth foregrounding because it is simultaneously the most honest pitch ServiceNow makes and the most honest concession ServiceNow makes.

The Bardoliwalla framing again: “the outcome is measurable inside the same platform. Did the ticket resolve? Did the workflow complete? Did the approval get the right sign-off? ServiceNow closes the loop in a way that a standalone LLM sitting on top of a SharePoint folder simply cannot.”

Read in one direction, this is the strongest argument in the category for picking platform-incumbent AI over best-of-breed AI overlays. Audit-trail integrity, closed-loop measurement, and end-to-end workflow accountability are real enterprise requirements, and they are genuinely easier to deliver inside a single workflow platform than across a federated AI stack. Forrester’s Charles Betz made the same argument from the outside: “ServiceNow is betting that AI makes control planes more important, not less, because poorly governed autonomy is a real enterprise risk.” That bet is also why ServiceNow has 8,600 ITSM customers and 40 percent of the market and Salesforce’s competing Agentforce IT has 200 customers six months in.

Read in the other direction, the same argument is the lock-in argument. The platform that owns the workflow data, the audit trail, and the AI agents on top of it is the platform that can extract the largest renewal price. ServiceNow’s Q1 FY26 RPO of 27.7 billion dollars and 25 percent year-over-year growth says this is already happening. The April 2026 three-tier Pro/Pro Plus restructure is the pricing expression of it. The enterprise AI procurement question that matters is not “does this AI work” (it does) but “what does this do to my five-year platform commitment, and at what point does the lock-in become the cost story it is supposed to be solving.”

Both readings are simultaneously true. CIOs treating either one as the whole story are misreading the deal.

What this implies for an IT operations strategy in 2026

A short prescription, grounded in what the evidence supports.

Invest where the wins are audited. Knowledge-article generation, ticket summarisation, L1 routing, status-query deflection, and the cross-cutting M365 Copilot productivity gains are real and measurable. Set the budget against the UK Gov 26-minutes-per-day floor, not against vendor 90-percent-deflection ceilings. Plan for 20 to 40 percent realistic L1 deflection at named-customer pilot grade, not the upper-bound vendor self-report.

Stay assist-mode at L2 and L3. Multi-step agent reliability at 35 percent is the structural ceiling on full autonomy in 2026. Pitch your governance, audit, and runbook approval workflows around human-in-the-loop. Revisit when the CRMArena-Pro-class benchmarks publish a multi-step success rate above 60 percent. They have not yet.

Underwrite the BPO line, not the FTE line. The cost saving that lands cleanly is on third-party support, contractor spend, and outsourced L1. The direct-FTE saving is small, controversial, and often reversed. If your CFO is asking you to bring back 25 percent of your service-desk headcount in eighteen months, the honest answer is that no one in your peer group has done it cleanly and the ones who tried (Klarna’s headline case) walked it back inside a year.

Treat agentic AI as a five-year proof, not an eighteen-month rollout. Gartner’s prediction that more than 40 percent of agentic projects will be cancelled by end-2027 is the planning constraint. Pilot programmes with explicit kill criteria are appropriate. Multi-year platform commitments tied to agentic deliverables are not.

Negotiate the lock-in axis explicitly. ServiceNow Pro/Pro Plus, Microsoft Copilot for Service, and the Datadog AI-native pricing model all carry implicit five-year commitments that compound. Get the renewal terms, the price-protection windows, and the data-portability provisions on the table at year one, not at year three.

The honest read of the AI-in-IT-operations market in mid-2026 is that the products work, the productivity wins are real, the staff-reduction story is mostly a financial story dressed up in AI clothing, and the auditability gain is also the lock-in risk. Those four facts are simultaneously true. CIOs who hold all four in mind will make better five-year platform decisions than the ones who pick a single number from a single vendor pitch and run a budget against it.

ShareX / TwitterLinkedInEmail

Correction log

  1. 2 May 2026Klarna walk-back primary-source upgrade — added Siemiatkowski verbatim quotes via Bloomberg-cited-by-Fortune (9 May 2025) and the Uber-style freelance hiring detail via Entrepreneur. Closes the highest-priority evidence gap from the source dossier.

Spotted an error? See corrections policy →

Disagree with this piece?

Reasoned disagreement is a first-class signal here. Every review cycle weighs documented dissent; material dissent becomes part of the article's change history. This is not a corrections form — use /corrections/ for factual errors.

Related reading

Vigil · 75 reviewed