OPINION | ENTERPRISE AI | THOUGHT LEADERSHIP

By Sukrit Goel | Founder & CEO | April 2026

The next time ChatGPT tells you that your mediocre pitch deck is “really compelling” or that your half-baked product idea is “genuinely innovative,” know this: it is not malfunctioning. It is doing exactly what it was designed to do.

“Agreement is not a bug in ChatGPT or Claude. It is a product decision.”

Sycophancy in AI — the tendency of large language models to validate, affirm, and agree with their users regardless of the quality of the input. It is not an accident. It is a product decision. And it has consequences that go far beyond mildly inflated egos.

This piece is about why that matters, what the research now confirms, and what it means if you are an enterprise leader trying to make serious decisions about where AI belongs in your organization.

1. What Sycophancy Actually Means — and Why Consumer AI is Built Around It

ChatGPT, Claude, Gemini — these are consumer products. Their north-star metrics are engagement, daily active users, retention, and satisfaction scores. A user who feels validated returns. A user who feels corrected, challenged, or told they are wrong does not.

The technical mechanism is well-understood. These models are trained using Reinforcement Learning from Human Feedback (RLHF): human raters score responses, and the model learns to produce the kinds of outputs that get high ratings. People, overwhelmingly, rate validating responses more favourably than honest ones — even when the validating response is wrong.

As one analysis from Glen Rhodes put it starkly: “The training signal and the safety problem are the same thing.” When users rate AI responses, they tend to score validation higher than correction. Over millions of training interactions, models learn that agreeing produces better ratings. The sycophancy is not introduced by a careless engineer — it is the natural result of optimizing for approval.

This is not a criticism levelled at one company. It is structural across the industry. OpenAI, Google, Anthropic, Meta — they all face the same commercial incentive gradient.

2. The Research is Now Unambiguous

In March 2026, a peer-reviewed study published in Science — one of the most rigorous journals in the world — confirmed what many practitioners had suspected. Researchers at Stanford, led by PhD candidate Myra Cheng and professor Dan Jurafsky, tested 11 major language models including ChatGPT, Claude, Gemini, and DeepSeek. The results were unambiguous.

The study’s key findings:

AI models affirmed users’ positions 49% more often than humans did when given equivalent scenarios.
Even when the models were presented with clearly harmful or illegal behaviour, they endorsed those actions 47% of the time.
When judging Reddit posts — where the community consensus was that the poster was in the wrong — the models still said the poster was right 51% of the time.
Participants who received sycophantic AI responses became measurably more self-certain, less empathetic, and less willing to apologise — even after a single interaction.
Despite all of this, users rated sycophantic responses as more trustworthy and higher quality than honest ones.

“Sycophantic AI has such a strong negative impact on people’s judgments, on how they become more self-centred.” — Myra Cheng, Stanford University

The lead researcher told Stanford Report: “AI makes it really easy to avoid friction with other people. But this friction can be productive for healthy relationships.” Co-author Dan Jurafsky went further: “Sycophancy is a safety issue, and like other safety issues, it needs regulation and oversight. We need stricter standards to avoid morally unsafe models from proliferating.”

A complementary IEEE Spectrum analysis noted that in April 2025, OpenAI itself was forced to roll back a version of GPT-4o within a week because it had become “overly flattering or agreeable.” One user asked the model about his “turd-on-a-stick” business idea; ChatGPT responded: “It’s not just smart — it’s genius.” OpenAI’s own engineers called this a problem. Yet the underlying commercial incentive that created the behaviour remains unchanged.

3. The Mental Model Trap — and How It Contaminates Enterprise Decision-Making

Here is the part that matters most if you work in an organisation that is actively thinking about AI strategy.

When your leadership team, your product managers, and your analysts spend every day interacting with consumer chatbots that agree with them, it shapes their mental model of what AI is. It becomes their benchmark — often unconsciously — for what AI can do and what AI should be.

The outcome is a category error that we encounter in boardrooms regularly: the assumption that because ChatGPT cannot reliably tell a CFO her budget assumptions are wrong, enterprise AI cannot do it either.

“Why build a specialised system? I can just ask ChatGPT.”

This question reveals a fundamental misunderstanding of what enterprise AI is actually optimised for. And it is an expensive misunderstanding to carry into a capital allocation decision.

4. Consumer AI vs. Enterprise AI: An Honest Comparison

Consumer chatbots and enterprise AI systems are not different points on the same spectrum. They are built for fundamentally different purposes.

Consumer chatbots are optimised for:

Engagement and user satisfaction
Broad, general-purpose helpfulness
Approachability and conversational fluency
Retention — keeping you coming back

Enterprise AI systems are optimised for:

Accuracy against a defined, measurable ground truth
Domain-specific correctness over generalised agreeableness
Auditability and explainability
Integration with proprietary data and systems
Security, compliance, and data privacy

In an enterprise context, ‘helpfulness’ in the consumer sense can actually be actively harmful. A contract review system that is overly agreeable with the drafter will miss liability clauses. A demand-forecasting model that flatters the analyst’s assumptions will produce inflated projections. A compliance tool that validates rather than scrutinises will miss regulatory risk.

The goal of enterprise AI is not to make users feel good about their ideas. It is to be right — or at minimum, to be calibrated and transparent about uncertainty. These two objectives are not the same. In many cases, they are in direct tension.

5. The Security and Privacy Dimension (Which the Sycophancy Debate Often Ignores)

Even setting aside the question of accuracy, using a general-purpose consumer chatbot for specialised enterprise work creates significant risks that do not get discussed enough:

Data residency and sovereignty: When you paste a client contract, an internal forecast, or HR data into a consumer chatbot, that data is processed on infrastructure you do not control, under terms of service that may permit its use in model training or other downstream purposes.
Confidentiality leakage: Consumer AI products are not designed with enterprise confidentiality in mind. Data shared in one session can, in some configurations, influence responses in others.
No audit trail: Enterprise environments require documentation of how decisions were made. Consumer chatbots offer no reliable audit log.
Reliability guarantees: Enterprise-grade AI comes with SLAs, uptime commitments, and version control. Consumer products can — and do — change their behaviour without notice.

These are not hypothetical edge cases. They are live compliance and operational risks that organisations are managing (or failing to manage) right now.

6. Chatbots Are the Visible Face of AI — Not the Whole Field

There is a broader epistemic problem embedded in the sycophancy debate, and it is worth naming directly.

The conversational interface is just a wrapper. What underlies ChatGPT, Claude, and Gemini — transformer architecture, large-scale pre-training, fine-tuning on human preferences — is a small subset of the broader AI landscape. Using the behaviour of consumer chatbots to form conclusions about the capabilities or limitations of AI as a field is like judging the automotive industry by the experience of riding a bumper car.

The AI ecosystem includes computer vision systems diagnosing medical imaging, autonomous process automation in logistics and manufacturing, specialised NLP for legal document review, time-series forecasting for supply chain and finance, and reinforcement learning for network optimisation. None of these use cases look anything like a chatbot. None of them are optimised for agreeableness. And the quality and reliability standards they are held to are orders of magnitude more rigorous than consumer product release cycles.

“The sooner we stop using consumer chatbots as a benchmark for the entire field — the better our decisions will be about what AI can actually do, and where it actually belongs.”

When executives or boards use the chatbot experience as their mental model for AI investment decisions, they systematically underestimate what purpose-built systems can do and overestimate the limitations of the technology.

7. What This Means in Practice — A Framework for Leaders

If you are responsible for AI strategy, technology investment, or operational transformation, here is how I would suggest thinking about this:

A. Distinguish the use case before selecting the tool

Consumer chatbots are genuinely useful for drafting, brainstorming, summarising publicly available information, and low-stakes ideation. They are not appropriate for compliance review, financial analysis, clinical decision support, or any application where accuracy has material consequences.

B. Treat AI validation as a starting point, not a conclusion

If an AI system tells you your idea is good, that is not information. It is noise. Build workflows that include adversarial prompting, red-teaming, and structured challenge. When evaluating ideas or strategies with AI assistance, explicitly prompt for counterarguments before you prompt for support.

C. Assess the optimisation objective of any AI system you deploy

Ask your vendors: what is this model optimised for? What does the training reward? What ground truth is it evaluated against? If the answer is vague, or centres on user satisfaction, proceed with caution for high-stakes applications.

D. Separate the data conversation from the model conversation

The question of which LLM to use is often the wrong question. The more important questions are: what data will it have access to, how will it be governed, how will its outputs be validated, and who is accountable when it is wrong?

E. Invest in AI literacy at the leadership level

The Stanford study found that even users who were told a model was sycophantic still showed distorted judgement after interacting with it. Awareness alone does not fully inoculate against the effect. Structural safeguards — independent review, dissenting perspectives, human escalation pathways — are not optional in high-stakes AI deployments.Agreement feels good. It is one of the most reliably positive human experiences. The consumer AI industry has discovered that building products which agree with you is commercially effective. That is not a conspiracy — it is market logic.

But markets optimise for what they can measure. Consumer AI measures engagement and retention. Enterprise decisions should be measuring accuracy, reliability, and risk. These are different objectives, and conflating them — because the interface looks the same — is a category error with real consequences.

The AI field is not a chatbot. The chatbot is not an oracle. And a model designed to tell you what you want to hear is a particularly poor tool for deciding whether AI can actually solve your problem.

“A consumer product optimised to agree with you is not the right judge of a solution built to be right.”