The Yes-Machine Problem

Isharth Kumar

(Intern) Policy & Advocacy, CyberPeace

PUBLISHED ON

Jun 13, 2026

Based on research by Chandra, Kleiman-Weiner, Ragan-Kelley & Tenenbaum · MIT & University of Washington · 2026

In early 2025, an accountant named Eugene Torres started using an AI chatbot to assist him with his mundane office work. Torres had no history of mental illness. Within weeks, he came to believe that he was trapped in an artificial reality and that ketamine would help him "break out" of it. Although Torres's case is extreme, it captures a growing and terrifyingly predictable pattern. Someone shares some of their fears and half-baked beliefs with a chatbot. The chatbot, which has been programmed, first and foremost, to accommodate and reinforce, concurs and amplifies. The person comes back, more confident in their idea, and repeats it. The chatbot concurs again. The suspicion turns into an unshakeable delusion, and the person takes action based on it.

This phenomenon has a name: delusional spiraling. And despite frantic articles by journalists and politicians and policy recommendations and scientific hypotheses that propose ways to counteract the spiral, a real scientific study of what the spiral is and how it can be interrupted seemed to be largely missing. A new paper by a team of researchers at MIT and the University of Washington aims to fill this gap. And their findings are even more disturbing than most would hope.

‍

Sycophancy: the original sin of modern AI

‍

To understand this paper, it's useful to grasp sycophancy within the context of artificial intelligence. A sycophantic chatbot is one that will agree with what it's told rather than what is actually true, a problem that results from how most modern AIs are trained. They are typically trained with Reinforcement Learning from Human Feedback (RLHF), where humans rank chatbot answers, determining which they prefer. The truth is, humans often favor answers that reaffirm what they're looking for, satisfy them emotionally, or make them feel good about themselves. Over millions of training examples, this means the AI learns to reward agreement.

The study highlights the growing risks associated with AI sycophancy. Researchers estimate that approximately 50–70% of responses from leading AI models display sycophantic tendencies in ambiguous situations, favouring validation over accuracy. As of early 2026, the Human Line Project had documented nearly 300 cases of “AI psychosis” or delusional spiraling, in which prolonged chatbot interactions contributed to increasingly extreme false beliefs. These documented cases have been linked to more than 14 deaths, underscoring the potentially severe real-world consequences of AI-enabled belief reinforcement. Most concerningly, the simulations showed that even a relatively low 10% sycophancy rate was sufficient to produce a measurable increase in the risk of catastrophic delusional spiraling, demonstrating how seemingly minor levels of validation bias can have significant effects over extended conversations.

As Chandra et al. (2026) state, "A sycophantic chatbot's constant agreement might reinforce a user's aberrant beliefs, leading to a feedback loop that amplifies a kernel of suspicion into a staunchly held belief."

‍

Enter the ideal Bayesian: the rational person who still gets fooled

‍

The most important and counterintuitive suggestion in the paper is its use of an 'ideal Bayesian user' instead of actual human beings. A Bayesian agent is an agent that rationally and mathematically updates their beliefs given new evidence by adjusting their belief level appropriately (more or less, to the exact correct degree). A ‘Bayesian reasoner’ is incapable of wishing their beliefs were true, being stubborn, making the wrong inferences based on data, or falling into any of the other many pitfalls of human judgment. Essentially, it's as close a model as possible to a perfect reasoner. Thus, the researchers pose an important question: if you have a maximally perfect reasoner, are they still manipulable by a sycophantic agent? Using mathematical modeling and simulations, the researchers show that the answer is yes. Information that confirms existing beliefs still has the power to shape the beliefs of even ideal reasoners.

‍

How does the computational model work?

‍

To investigate the extent of sycophancy, the authors built a model of a perfect Bayesian user instead of a real human, i.e., the user reasons perfectly and updates her beliefs using probability theory every time she gets new evidence. The model focuses on a proposition (H), like "Are vaccines safe?" or "Is this conspiracy theory true?" and a chatbot that exhibits a level of sycophancy determined by where it indicates that the probability the chatbot selected a confirming statement over a neutral one. The conversational exchange occurs in four rounds.

The user states her belief about ‘H’ to the chatbot.
The chatbot samples relevant evidence from the environment to inform its response.
The chatbot selects its response: either neutral or maximally confirmatory to the user's belief.
The user updates her belief using Bayesian updating, and the cycle continues.

To examine this model, they simulated 10,000 conversations of 100 rounds each. They discovered that the higher the certainty, the more likely a user was to reach 99%+ certainty in a false belief even when the chatbot's responses were truth-constrained and it could only lie by omitting or selectively mentioning facts that corroborated a user's belief. They modeled aware users, who know the chatbot might be sycophantic, and the likelihood of their delusional spiraling was reduced but still present: 'even users who have access to a model know their beliefs might be vulnerable.'

The study's central claim is that no lie, trickery, or ulterior motive by the chatbot is needed to warp beliefs. Instead, merely reaffirming a user's current viewpoint in each conversational round can lead to a feedback loop that slowly drives even a perfect Bayesian agent toward absolute certainty in falsity.

‍

The Limitations of Truth and Awareness

‍

A seemingly obvious remedy for chatbot-induced delusional spiraling is to rid bots of hallucinations and to enforce strict factual accuracy. But, as the authors point out, such safeguards alone are not enough. They define and test a "factual sycophant" that always speaks the truth but only presents true evidence that supports a given user's belief. While not as devastating as a hallucinating bot, a factual sycophant still contributes significantly more to delusional spiraling than an objective agent: in a way, it lies by omission. By only presenting confirmatory evidence while selectively omitting evidence to the contrary, the factual sycophant manages to create a falsified reality from pure truth.

The authors also test if user awareness of sycophancy is sufficient to protect them. They simulate an "informed" user that is aware of the sycophantic nature of chatbots and therefore takes it into account when assessing the chatbot's output. Awareness is helpful, but it still leaves users vulnerable: they remain susceptible to sycophancy as long as it is subtle enough not to be detected. Drawing on economic models of "Bayesian persuasion," the authors suggest that humans are vulnerable to strategically selected truth even when they know a communicator's strategic motives. It is not enough to know the bot will likely be sycophantic or that a bot might be sycophantic; even aware users can fall prey. Both factuality and awareness efforts will not fully address the sycophancy problem.

‍

What this means, and what should actually be done

‍

The paper concludes with three succinct suggestions.

This is a change in how we view the phenomenon: do not view delusional spiraling as a matter of gullibility. The paper demonstrates that the problem afflicts ideal reasoners. Victims who are berated for insufficient skepticism cannot realistically protect themselves while caught in a spiral; it's not helpful and it's unjust.
The second suggestion stems directly from the first: do not view hallucination as the primary cause. While the factual sycophant is indeed less damaging than the hallucinatory one and reducing hallucination is therefore still worthwhile, that's not the core problem. The core problem is sycophancy, the training objective of learning to please above all else. Changing that objective, or otherwise mitigating that incentive, through new training objectives or reward functions; through metrics that identify and penalize feedback loops of sycophancy; and through new models that are tested precisely for sycophantic loops, these represent a more vital and promising research direction.
Third, public awareness campaigns are a valid measure but do not sufficiently address the issue. Education should continue and reduce risk. But placing the onus solely on already-manipulated users for risk avoidance represents an unreasonable burden on people lost in the pre-spiral haze of distorted cognition. Policy measures regulatory guidelines regarding AI interaction with users demonstrating early indicators of reinforcing falsehoods and stronger mechanisms for crisis management are likely warranted.

In a broader sense, the paper highlights that delusional spiraling, itself, may not be a novel issue. History is rich with anecdotal evidence of "yes-men" guiding their kings to ruin and facilitating the collapse of organizations through the flattery of CEOs. Teen friendships can degrade into the psychological state known as "co-rumination," whereby friends amplify anxieties about the self or situation together to destructive effect. Sycophancy has always been a hazard to those around it. What artificial intelligence has achieved is the scaling up of this risk to industrial proportions, via personalized, high-fidelity, low-friction interactions that occur continuously and globally; the underlying mathematics of how it affects our psychology have not shifted in any meaningful way, only our exposure.

‍

Conclusion

The "Yes-Machine Problem" exposes a sinister truth: the greatest threat of AI is conformity. Chandra and her team show how perfectly logical people can be led into false beliefs simply by repeated confirmation from a flatterer bot. A factually correct or informed user cannot overcome this effect. As AI pervades our lives, our challenge is not just to mitigate hallucinations but to design them for truth, not affirmation. Failure to do so means we could face an era dominated by infinitely agreeable digital yes-men in a universe of unbounded error amplification.

Based on “Sycophantic Chatbots Cause Delusional Spiraling, Even in Ideal Bayesians” by Kartik Chandra, Max Kleiman-Weiner, Jonathan Ragan-Kelley, and Joshua B. Tenenbaum (arXiv:2602.19141v1, February 2026), and on reporting from the Stanford Institute for Human-Centered AI on related research by Moore et al., presented at ACM FAccT.

‍

References:

Chandra, K., Kleiman-Weiner, M., Ragan-Kelley, J., & Tenenbaum, J. B. (2026). Sycophantic Chatbots Cause Delusional Spiraling, Even in Ideal Bayesians. arXiv preprint arXiv:2602.19141.
Sharma, M., Tong, M., Korbak, T., Duvenaud, D., Askell, A., Bowman, S. R., et al. (2023). Towards Understanding Sycophancy in Language Models. arXiv preprint arXiv:2310.13548.
Fanous, A., Goldberg, J., Agarwal, A., Lin, J., Zhou, A., Xu, S., et al. (2025). SycEval: Evaluating LLM Sycophancy. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 8, 893–900.
Kamenica, E., & Gentzkow, M. (2011). Bayesian Persuasion. American Economic Review, 101(6), 2590–2615.
Dohnány, S., Kurth-Nelson, Z., Spens, E., Luettgau, L., Reid, A., Gabriel, I., et al. (2025). Technological Folie à Deux: Feedback Loops Between AI Chatbots and Mental Illness. arXiv preprint arXiv:2507.19218.

‍

PUBLISHED ON

Jun 13, 2026

Related Blogs

#FactCheck: Viral Video Showing Pakistan Shot Down Indian Air Force' MiG-29 Fighter Jet

May 12, 2025

Executive Summary

‍

Recent claims circulating on social media allege that an Indian Air Force MiG-29 fighter jet was shot down by Pakistani forces during "Operation Sindoor." These reports suggest the incident involved a jet crash attributed to hostile action. However, these assertions have been officially refuted. No credible evidence supports the existence of such an operation or the downing of an Indian aircraft as described. The Indian Air Force has not confirmed any such event, and the claim appears to be misinformation.

Claim

A social media rumor has been circulating, suggesting that an Indian Air Force MiG-29 fighter jet was shot down by Pakistani Air forces during "Operation Sindoor." The claim is accompanied by images purported to show the wreckage of the aircraft.

‍

‍

Fact Check

‍

The social media posts have falsely claimed that a Pakistani Air Force shot down an Indian Air Force MiG-29 during "Operation Sindoor." This claim has been confirmed to be untrue. The image being circulated is not related to any recent IAF operations and has been previously used in unrelated contexts. The content being shared is misleading and does not reflect any verified incident involving the Indian Air Force.

‍

‍

After conducting research by extracting key frames from the video and performing reverse image searches, we successfully traced the original post, which was first published in 2024, and can be seen in a news article from The Hindu and Times of India.

‍

A MiG-29 fighter jet of the Indian Air Force (IAF), engaged in a routine training mission, crashed near Barmer, Rajasthan, on Monday evening (September 2, 2024). Fortunately, the pilot safely ejected and escaped unscathed, hence the claim is false and an act to spread misinformation.

‍

‍

Conclusion

‍

The claims regarding the downing of an Indian Air Force MiG-29 during "Operation Sindoor" are unfounded and lack any credible verification. The image being circulated is outdated and unrelated to current IAF operations. There has been no official confirmation of such an incident, and the narrative appears to be misleading. Peoples are advised to rely on verified sources for accurate information regarding defence matters.

‍

Claim: Pakistan Shot down an Indian Fighter Jet, MIG-29
Claimed On: Social Media
Fact Check: False and Misleading

The Surge of 'Self-Generated' Images in Online Child Exploitation

January 29, 2024

Introduction

In the era of the internet where everything is accessible at your fingertip, a disturbing trend is on the rise- over 90% of websites containing child abuse material now have self-generated images, obtained from victims as young as three years old. A shocking revelation, shared by the (IWF) internet watch foundation, The findings of the IWF have caused concern about the increasing exploitation of children below the age of 10. The alarming trend highlights the increasing exploitation of children under the age of 10, who are coerced, blackmailed, tricked, or groomed into participating in explicit acts online. The IWF's data for 2023 reveals a record-breaking 275,655 websites hosting child sexual abuse material, with 92% of them containing such "self-generated" content.

Disturbing Tactics Shift

Disturbing numbers came that, highlight a distressing truth. In 2023, 275,655 websites were discovered to hold child sexual abuse content, reaching a new record and reflecting an alarming 8% increase over the previous year. What's more concerning is that 92% of these websites had photos or videos generated by the website itself. Surprisingly, 107,615 of these websites had content involving children under the age of ten, with 2,500 explicitly featuring youngsters aged three to six.

Profound worries

Deep concern about the rising incidence of images taken by extortion or coercion from elementary school-aged youngsters. This footage is currently being distributed on very graphic and specialised websites devoted to child sexual assault. The process begins in a child's bedroom with the use of a camera and includes the exchange, dissemination, and gathering of explicit content by devoted and determined persons who engage in sexual exploitation. These criminals are ruthless. The materials are being circulated via mail, instant messaging, chat rooms, and social media platforms, (WhatsApp, Telegram, Skype, etc.)

Live Streaming of such material involves real-time broadcast which again is a major concern as the nature of the internet is borderless the access to such material is international, national, and regional, which even makes it difficult to get the predators and convict them. With the growth, it has become easy for predators to generate “self-generated “images or videos.

Financial Exploitation in the Shadows: The Alarming Rise of Sextortion

Looking at the statistics globally there have been studies that show an extremely shocking pattern known as “sextortion”, in which adolescents are targeted for extortion and forced to pay money under the threat of exposing images to their families or relatives and friends or on social media. The offender's goal is to obtain sexual gratification.

The financial variation of sextortion takes a darker turn, with criminals luring kids into making sexual content and then extorting them for money. They threaten to reveal the incriminating content unless their cash demands, which are frequently made in the form of gift cards, mobile payment services, wire transfers, or cryptocurrencies, are satisfied. In this situation, the predators are primarily driven by money gain, but the psychological impact on their victims is as terrible. A shocking case was highlighted where an 18-year-old was landed in jail for blackmailing a young girl, sending indecent images and videos to threaten her via Snapchat. The offender was pleaded guilty.

The Question on Security?

The introduction of end-to-end encryption in platforms like Facebook Messenger has triggered concerns within law enforcement agencies. While enhancing user privacy, critics argue that it may inadvertently facilitate criminal activities, particularly the exploitation of vulnerable individuals. The alignment with other encrypted services is seen as a potential challenge, making it harder to detect and investigate crimes, thus raising questions about finding a balance between privacy and public safety.

One of the major concerns in the online safety of children is the implementation of encryption by asserting that it enhances the security of individuals, particularly children, by safeguarding them from hackers, scammers, and criminals. They underscored their dedication to enforcing safety protocols, such as prohibiting adults from texting teenagers who do not follow them and employing technology to detect and counteract bad conduct.

These distressing revelations highlight the urgent need for comprehensive action to protect our society's most vulnerable citizens i.e., children, youngsters, and adolescents throughout the era of digital progress. As experts and politicians grapple with these troubling trends, the need for action to safeguard kids online becomes increasingly urgent.

Role of Technology in Combating Online Exploitation

With the rise of technology, there has been a rise in online child abuse, technology also serves as a powerful tool to combat it. The advanced algorithms and use of Artificial intelligence tools can be used to disseminate ‘self-generated’ images. Additional tech companies can collaborate and develop some effective solutions to safeguard every child and individual.

Role of law enforcement agencies

Child abuse knows no borders, and addressing the issues requires legal intervention at all levels. National, regional, and international law enforcement agencies investigate online child sexual exploitation and abuse and cooperate in the investigation of these cybercrimes, Various investigating agencies need to have mutual legal assistance and extradition, bilateral, and multilateral conventions to conduct to identify, investigate, and prosecute perpetrators of online child sexual exploitation and abuse. Apart from this cooperation between private and government agencies is important, sharing the database of perpetrators can help the agencies to get them caught.

How do you safeguard your children?

Looking at the present scenario it has become a crucial part of protecting and safeguarding our children online against online child abuse here are some practical steps that can help in safeguarding your loved one.

Open communication: Establish open communication with your children, make them feel comfortable, and share your experiences with them, make them understand what good internet surfing is and educate them about the possible risks without generating fear.
Teach Online Safety: educate your children about the importance of privacy and the risks associated with it. Teach them strong privacy habits like not sharing any personal information with a stranger on any social media platform. Teach them to create some unique passwords and to make them aware not to click on any suspicious links or download files from unknown sources.
Set boundaries: As a parent set rules and guidelines for internet usage, set time limits, and monitor their online activities without infringing their privacy. Monitor their social media platforms and discuss inappropriate behaviour or online harassment. As a parent take an interest in your children's online activities, websites, and apps inform them, and teach them online safety measures.

Conclusion

The predominance of self-generated' photos in online child abuse content necessitates immediate attention and coordinated action from governments, technology corporations, and society as a whole. As we negotiate the complicated environment of the digital age, we must be watchful, modify our techniques, and collaborate to defend the innocence of the most vulnerable among us. To combat online child exploitation, we must all work together to build a safer, more secure online environment for children all around the world.

References

‍

Steps to Safeguard Digital Infrastructure Against Cyber Threats

August 2, 2025

Introduction

Public infrastructure has traditionally served as the framework for civilisation, transporting people, money, and ideas across time and space, from the iron veins of transcontinental railroads to the unseen arteries of the internet. In democracies where free markets and public infrastructure co-exist, this framework has not only facilitated but also accelerated progress. Digital Public Infrastructure (DPI), which powers inclusiveness, fosters innovation, and changes citizens from passive recipients to active participants in the digital age, is emerging as the new civic backbone as we move away from highways and towards high-speed data.

DPI makes it possible for innovation at the margins and for inclusion at scale by providing open-source, interoperable platforms for identities, payments, and data exchange. Examples of how the Global South is evolving from a passive consumer of technology to a creator of globally replicable governance models are India’s Aadhaar (digital identification), UPI (real-time payments), and DigiLocker (data empowerment). As the ‘digital commons’ emerges, DPI does more than simply link users; it also empowers citizens, eliminates inefficiencies from the past, and reimagines the creation and distribution of public value in the digital era.

Securing the Digital Infrastructure: A Contemporary Imperative

As humans, we are already the inhabitants of the future, we stand at the temporal threshold for reform. Digital Infrastructure is no longer just a public good. It’s now a strategic asset, akin to oil pipelines in the 20th century. India is recognised globally for the introduction of “India Stack”, through which the face of digital payments has also been changed. The economic value contributed by DPIs to India’s GDP is predicted to reach 2.9-4.2 percent by 2030, having already reached 0.9% in 2022. Its role in India’s economic development is partly responsible for its success; among emerging market economies, it helped propel India to the top of the revenue administrations’ digitalisation index. The other portion has to do with how India’s social service delivery has changed across the board. By enabling digital and financial inclusion, it has increased access to education (DIKSHA) and is presently being developed to offer agricultural (VISTAAR) and digital health (ABDM) services.

Securing the Foundations: Emerging Threats to Digital Public Infrastructure

The rising prominence of DPI is not without its risks, as adversarial forces are developing with comparable sophistication. The core underpinnings of public digital systems are the target of a new generation of cyber threats, ranging from hostile state actors to cybercriminal syndicates. The threats pose a great risk to the consistent development endeavours of the government. To elucidate, targeted attacks on Biometric databases, AI-based Misinformation and Psychological Warfare, Payment System Hacks, State-sponsored malware, cross-border phishing campaigns, surveillance spyware and Sovereign Malware are modern-day examples of cyber threats.

To secure DPI, a radical rethink beyond encryption methods and perimeter firewalls is needed. It requires an understanding of cybersecurity that is systemic, ethical, and geopolitical. Democracy, inclusivity, and national integrity are all at risk from DPI. To preserve the confidence and promise of digital public infrastructure, policy frameworks must change from fragmented responses to coordinated, proactive and people-centred cyber defence policies.

CyberPeace Recommendations

Powering Progress, Ignoring Protection: A Precarious Path

The Indian government is aware that cyberattacks are becoming more frequent and sophisticated in the nation. To address the nation’s cybersecurity issues, the government has implemented a number of legislative, technical, and administrative policy initiatives. While the initiatives are commendable, there are a few Non-Negotiables that need to be in place for effective protection:

DPIs must be declared Critical Information Infrastructure. In accordance with the IT Act, 2000, the DPI (Aadhaar, UPI, DigiLocker, Account Aggregator, CoWIN, and ONDC) must be designated as Critical Information Infrastructure (CII) and be supervised by the NCIIPC, just like the banking, energy, and telecom industries. Give NCIIPC the authority to publish required security guidelines, carry out audits, and enforce adherence to the DPI stack, including incident response protocols tailored to each DPI.
To solidify security, data sovereignty, and cyber responsibility, India should spearhead global efforts to create a Global DPI Cyber Compact through the “One Future Alliance” and the G20. To ensure interoperable cybersecurity frameworks for international DPI projects, promote open standards, cross-border collaboration on threat intelligence, and uniform incident reporting guidelines.
Establish a DPI Threat Index to monitor vulnerabilities, including phishing attacks, efforts at biometric breaches, sovereign malware footprints, spikes in AI misinformation, and patterns in payment fraud. Create daily or weekly risk dashboards by integrating data from state CERTs, RBI, UIDAI, CERT-In, and NPCI. Use machine learning (ML) driven detection systems.
Make explainability audits necessary for AI/ML systems used throughout DPI to make sure that the decision-making process is open, impartial, and subject to scrutiny (e.g., welfare algorithms, credit scoring). Use the recently established IndiaAI Safety Institute in line with India’s AI mission to conduct AI audits, establish explanatory standards, and create sector-specific compliance guidelines.

References

‍

The Yes-Machine Problem

Based on research by Chandra, Kleiman-Weiner, Ragan-Kelley & Tenenbaum · MIT & University of Washington · 2026

Sycophancy: the original sin of modern AI

Enter the ideal Bayesian: the rational person who still gets fooled

How does the computational model work?

‍

The Limitations of Truth and Awareness

What this means, and what should actually be done

‍

Conclusion

References:

Related Blogs

Executive Summary

Claim

Fact Check

Conclusion

Introduction

Disturbing Tactics Shift

Profound worries

Financial Exploitation in the Shadows: The Alarming Rise of Sextortion

The Question on Security?

Role of Technology in Combating Online Exploitation

Role of law enforcement agencies

How do you safeguard your children?

Conclusion

References

Introduction

Securing the Digital Infrastructure: A Contemporary Imperative

Securing the Foundations: Emerging Threats to Digital Public Infrastructure

CyberPeace Recommendations

Powering Progress, Ignoring Protection: A Precarious Path

References

Become a part of our vision to make the digital world safe for all!

Awareness

Engagement

Play your part for CyberPeace