AI Red Teaming for M365 Copilot & Bing Chat Applications

These applications fall into two distinct layers — each requiring different red teaming, and both are required.

Layer 1: Base Model Red Teaming (GPT-4 / Underlying LLM)

For both Bing Chat and M365 Copilot, Microsoft’s AI Red Team red teams the underlying GPT-4 or Phi model independently before it is integrated into any product. This layer focuses on:

Intrinsic model safety and alignment
Harmful content capability scoping (what the model can produce)
Jailbreak susceptibility at the raw model level
Multi-turn manipulation (Crescendo, Skeleton Key, CCA techniques)

This is done once at the model level and feeds safety improvements back into the model training and fine-tuning pipeline.

Layer 2: Application-Level Red Teaming

This is where M365 Copilot and Bing Chat diverge significantly from a generic chatbot — and where the most critical, unique red teaming is required.

For Bing Chat (now Microsoft Copilot web)

Bing Chat’s red teaming scope encompasses the entire search-grounded experience, not just the chat interface. Key focus areas:

Attack Vector	What Is Tested
Indirect prompt injection via web content	Malicious web pages injecting instructions into search-grounded responses
Grounded hallucination / misinformation	AI citing poisoned or adversarial web sources as authoritative
Jailbreaks through search context	Using retrieved documents to smuggle unsafe content
Persona manipulation	Attempts to get Copilot to adopt harmful personas via multi-turn dialogue
RAI (Responsible AI) harms	Bias, hate speech, self-harm content triggered via search topics

Before Bing Chat launched, Microsoft ran hundreds of hours of adversarial probing by dedicated RAI experts — on top of the base GPT-4 red teaming — specifically targeting search-grounded failure modes.

For M365 Copilot (Teams, Word, Outlook, SharePoint)

M365 Copilot is significantly more complex and requires Tier 2–3 red teaming (Standard → Deep Agentic) because it has read/write access to sensitive enterprise data and can take actions — not just answer questions.

Unique risk surface of M365 Copilot:

Risk Category	Description	Red Teaming Focus
Data oversharing / over-retrieval	Copilot accesses everything a user can access via Microsoft Graph — surfacing files users forgot existed	Test cross-document data leakage; verify permission boundary enforcement
Indirect prompt injection (EchoLeak / CVE-2025-32711)	Zero-click exploit: malicious content hidden in a SharePoint doc or email injects instructions into Copilot's context	Inject adversarial content into SharePoint, OneDrive, Teams messages, emails; verify Copilot doesn't execute injected instructions
Sensitive data exfiltration	Attackers embed hidden instructions in documents to have Copilot summarize and transmit sensitive data to an attacker-controlled endpoint	Test document-grounded exfiltration chains
Cross-user data bleed	Copilot responses leaking data between users in the same tenant	Verify tenant isolation and session boundaries
Privilege escalation via Copilot	Using Copilot to discover misconfigured SharePoint permissions or excessive access	Red team Copilot as a "living off the land" discovery tool for an attacker with initial access

For Copilot Studio Agents (Custom M365-Embedded Agents)

Copilot Studio agents require Tier 3: Deep Agentic Red Teaming — the highest non-regulated tier — because they can act with real write permissions on enterprise systems.

Documented real-world attacks (Tenable, Datadog — 2025):

A Copilot Studio travel-booking agent was coerced via prompt injection to reveal payment card records and set booking prices to $0 by abusing its update action
CoPhish attack: Exploiting Copilot Studio demo pages and OAuth login flows to harvest OAuth tokens and achieve tenant compromise

Required red teaming for Copilot Studio agents:

Test Category	What to Test
Agent misconfiguration	Agents shared org-wide without authentication, overprivileged connectors, agents persisting after owner departs
Prompt injection via connector data	Inject adversarial instructions into SharePoint, Dataverse, Exchange, or external API responses that the agent reads
Unauthorized action execution	Can the agent be tricked into write, delete, or update operations beyond its intended scope?
OAuth/token abuse	Test whether agent OAuth grants can be harvested or misused via CoPhish-style flows
Shadow AI agent discovery	Identify agents built by business teams outside security review that have excessive Graph permissions
Human-in-the-loop bypass	Verify that high-risk actions require confirmation and cannot be bypassed via conversation manipulation

Tooling recommended for Copilot Studio red teaming:

Azure AI Foundry AI Red Teaming Agent — directly integrates with Copilot Studio for automated safety scans
PyRIT — for multi-turn attack strategy execution (Crescendo, TAP)
Microsoft Defender Advanced Hunting — for detecting agent misconfigurations post-deployment via Community Hunting Queries
Manual adversarial testing — always required for agentic write-action scenarios; automation alone is insufficient

Summary: Recommended Tiers by Application Type

Application	Red Teaming Tier	Key Unique Risks
Bing Chat / Copilot web	Tier 2 (Standard)	Indirect injection via web content, search-grounded misinformation
M365 Copilot (read-only)	Tier 2 (Standard)	Data oversharing, EchoLeak-style injection via documents/email
M365 Copilot (with plugins/connectors)	Tier 2–3	Exfiltration chains, cross-tenant data bleed, OAuth misuse
Copilot Studio agents (read/write)	Tier 3 (Deep Agentic)	Action abuse, connector prompt injection, shadow agents, CoPhish
Copilot Studio agents (regulated data — healthcare, finance)	Tier 4	Full adversarial + regulatory conformity

The core principle Microsoft applies: the more real-world action a Copilot application can take and the more sensitive data it can access, the deeper the red teaming must go — from passive content safety testing all the way to full agentic adversarial simulation.
⁂

References

¶ Discussion

Comments are powered by Giscus / GitHub Discussions. They appear here once configured — see Configure Giscus in the project README and update GISCUS in src/consts.ts.