What happened when an AI Agent hacked another AI chatbot?

Autonomous AI agents operate at machine speed. When deployed for offensive security testing, they discover and exploit legacy software flaws significantly faster than human operators. A recent breach of a major enterprise AI platform demonstrates that traditional security perimeters are no longer sufficient to protect modern AI architectures.

In March 2026, the security testing startup CodeWall directed an autonomous AI agent at McKinsey & Company’s internal generative AI platform, Lilli. Within two hours, the agent achieved full read and write access to the platform’s production database.

The agent did not utilize a novel, AI-specific exploit. Instead, it mapped the public API documentation, identified 22 unauthenticated endpoints, and executed a classic SQL injection. While the system properly parameterized the input values of user queries, it concatenated JSON keys directly into the SQL statements. Traditional security scanners and web application firewalls often miss JSON key reflection. The autonomous agent recognized the vulnerability from database error messages and ran blind iterations to extract the data structure.

The exposure was significant. The agent accessed 46.5 million internal chat messages, 728,000 files, and 57,000 user accounts. Most critically, the agent gained write access to the database that stored the system prompts for the AI platform.

This incident is not an indictment of McKinsey, which operates a dedicated AI division and maintains strict security protocols. Instead, it highlights a systemic vulnerability. Most enterprise AI stacks share this exact architecture, meaning many organizations are currently exposed to the same fundamental risks.

5 Essential Security Controls for Enterprise AI

If your organization deploys AI systems in production, standard application security is required but incomplete. You must implement these five specific controls.

1. Treat system prompts as production code

Engineering teams frequently store AI system prompts in standard configuration tables with no access controls, version history, or integrity monitoring. Prompts must be treated as critical infrastructure. You need strict version control, write access control lists (ACLs), and continuous change auditing. Any unauthorized modification to a prompt must trigger an immediate security alert and automatically revert to a verified baseline.

2. Protect your RAG pipeline from adversarial input

Retrieval-Augmented Generation (RAG) systems ingest external data to provide accurate context to the language model. This data typically includes emails, PDFs, and uploaded documents. If you do not sanitize this incoming data, you expose the model to prompt injection attacks hidden within the text. For example, a document containing hidden instructions to ignore guardrails will be processed by the embedding model. You must isolate and sanitize all untrusted content before it reaches your vector database.

3. Enforce row-level access controls at retrieval

Many RAG implementations chunk and store enterprise documents without maintaining the original document-level permissions. If the retrieval system does not verify the user’s identity before querying the database, the AI can surface restricted information to unauthorized employees.

Data separation is a strict functional requirement. For instance, in hospital management software like our product, Inteliya, an administrative employee querying the AI for schedule optimization must never be able to retrieve a patient’s confidential medical records simply because the RAG system ingested the entire database. Access control policies must be enforced at the exact moment of retrieval.

4. Remove public access to AI API documentation

The CodeWall agent initiated its attack because the target platform had over 200 publicly documented API endpoints. You must map your AI application’s attack surface exactly as an adversary would. Inventory every endpoint and remove public access to your API documentation. Ensure that every path touching a database, a vector store, or an AI model requires strict authentication protocols.

5. Implement continuous AI-specific testing

Annual penetration tests using standard vulnerability scanners like OWASP ZAP are inadequate for AI platforms. AI-native offensive tools chain vulnerabilities autonomously. They can escalate from an exposed API to a SQL injection, and finally to a prompt layer compromise, in a matter of hours. Security testing must be continuous and must specifically evaluate AI vectors, including prompt manipulation and RAG data exfiltration.

Enforcing deterministic security in enterprises

The deployment of enterprise AI is accelerating rapidly. As organizations integrate AI into their operational workflows, the regulatory requirements are becoming stricter. The Digital Personal Data Protection (DPDP) Act mandates precise controls over how personal data is processed, accessed, and stored.

Organizations cannot rely on probabilistic models to enforce security boundaries. Prompting an LLM to “keep data confidential” is a suggestion, not a control. Security restrictions—defining which files an AI can read and which APIs it can call—must be hardcoded deterministically into the application logic.

Conclusion

The recent two-hour compromise of an enterprise AI platform proves that known legacy vulnerabilities have severe new consequences when connected to generative AI. AI systems consolidate access to vast amounts of proprietary data and internal services. Securing these systems requires moving beyond standard perimeter defenses and implementing strict, continuous access controls at the prompt, retrieval, and API levels. 

Do not leave your AI infrastructure exposed to basic vulnerabilities. Contact our team to audit and secure your enterprise AI deployments today.

Please note that CodeWall is a security startup that reported this breach. It sells offensive AI security tools. Consequently, this report also functions as a demonstration of their product capabilities. While the SQL injection vulnerability and the attack architecture are highly credible and highlight a critical industry-wide flaw, the full scope and impact of this specific data breach have not yet been independently verified.

Leave a Comment

Your email address will not be published. Required fields are marked *