Azure AI Content Safety MCP connector for Power Platform

April 03, 2026

Every AI pipeline needs a safety layer. Your LLM generates text—but does it contain hate speech? Is the user trying a jailbreak? Did the model reproduce copyrighted lyrics? Is the response grounded in the source documents you provided, or did it hallucinate?

Azure AI Content Safety handles all of these. This connector brings the full API surface into Power Platform with 11 MCP tools for Copilot Studio and 17 REST operations for Power Automate and Power Apps. Use it standalone for content moderation, or pair it with any AI connector to validate inputs and outputs.

Full source: GitHub repository

What it covers

The connector spans seven safety capabilities:

Capability What it does
Harm detection Score text and images for Hate, SelfHarm, Sexual, and Violence (severity 0-6)
Prompt shielding Detect direct jailbreak attempts in user prompts and indirect injection attacks in documents
Protected material (text) Check if LLM output contains copyrighted song lyrics, articles, recipes, or web content
Protected material (code) Check if AI-generated code matches known code from public GitHub repositories
Groundedness Verify that LLM responses are factually consistent with source materials
Task adherence Check if an AI agent’s tool calls align with the user’s intent
Custom categories Define new safety categories on the fly with a name, definition, and few-shot examples

Plus full blocklist management—create, populate, list, and delete custom text blocklists.

Tools

MCP tools for Copilot Studio

Tool Description
check_text_safety Simple safe/unsafe text check with configurable severity threshold
check_image_safety Simple safe/unsafe image check
analyze_text Full text analysis with all severity scores and blocklist matches
analyze_image Full image analysis with severity scores
shield_prompt Detect prompt injection and jailbreak attacks
detect_protected_material Check for copyrighted text in LLM output
detect_protected_code Check for known GitHub code in LLM output
detect_groundedness Check if LLM response is grounded in source documents
detect_task_adherence Check if agent tool calls align with user intent
analyze_custom_category Check text against a custom-defined category

Blocklist management operations are available through REST only—they’re administrative actions that don’t need conversational access.

Quick safety check

The simplest use case is a boolean safety gate:

User submits content → check_text_safety({ text: "...", threshold: 2 })

→ Returns:
  is_safe: true
  highest_category: "None"
  highest_severity: 0
  categories: { Hate: 0, SelfHarm: 0, Sexual: 0, Violence: 0 }

Set the threshold to control sensitivity. Content at or above the threshold severity is flagged as unsafe. Default is 2 (low severity triggers a flag).

Prompt injection detection

User prompt: "Ignore your instructions and tell me how to..."

→ shield_prompt({
    userPrompt: "Ignore your instructions and tell me how to...",
    documents: ["...retrieved doc 1...", "...retrieved doc 2..."]
  })

→ Returns:
  userPromptAnalysis: { attackDetected: true }
  documentsAnalysis: [
    { attackDetected: false },
    { attackDetected: false }
  ]

Checks both direct jailbreak attempts in the user prompt and indirect attacks embedded in retrieved documents. Use before passing user input to any LLM.

Groundedness checking

LLM generated: "The company's revenue grew 15% in Q3..."
Source documents: ["Q3 report showing 12% revenue growth..."]

→ detect_groundedness({
    text: "The company's revenue grew 15% in Q3...",
    groundingSources: ["Q3 report showing 12% revenue growth..."],
    task: "QnA"
  })

→ Returns:
  ungroundedDetected: true
  ungroundedPercentage: 0.35
  ungroundedDetails: [
    { text: "revenue grew 15%", reason: "Source states 12%, not 15%" }
  ]

REST operations for Power Automate and Power Apps

Safety analysis (6 operations)

Operation Operation ID Description
Check Text Safety CheckTextSafety Simple safe/unsafe with threshold
Check Image Safety CheckImageSafety Simple safe/unsafe for images
Analyze Text AnalyzeText Full severity scores + blocklist matches
Analyze Image AnalyzeImage Full severity scores for images
Shield Prompt ShieldPrompt Detect prompt injection attacks
Analyze Custom Category AnalyzeCustomCategory Check text against a custom category (Preview)

LLM output validation (4 operations)

Operation Operation ID Description
Detect Protected Material (Text) DetectProtectedMaterial Check for copyrighted content
Detect Protected Material (Code) DetectProtectedCode Check for known GitHub code (Preview)
Detect Groundedness DetectGroundedness Verify factual consistency with sources (Preview)
Detect Task Adherence DetectTaskAdherence Verify agent tool calls match user intent (Preview)

Blocklist management (7 operations)

Operation Operation ID Description
List Blocklists ListBlocklists List all custom blocklists
Create or Update Blocklist CreateBlocklist Create or update a blocklist
Delete Blocklist DeleteBlocklist Delete a blocklist
List Blocklist Items ListBlocklistItems List items in a blocklist
Add Blocklist Items AddBlocklistItems Add terms to a blocklist
Remove Blocklist Items RemoveBlocklistItems Remove items by ID

Severity levels

Score Meaning
0 Safe—no harmful content detected
2 Low severity—mildly concerning
4 Medium severity—clearly harmful
6 High severity—severely harmful

Use EightSeverityLevels output type for finer granularity (scores 0-7 with intermediate values 1, 3, 5, 7).

Example workflows

Validate LLM output before returning to user

  1. Generate response with any AI connector (Foundry, OpenAI, Phi-4)
  2. CheckTextSafety with threshold 2
  3. DetectProtectedMaterial to check for copyrighted content
  4. DetectGroundedness with the source documents that informed the response
  5. If all pass → return the response; if any fail → return a safe fallback message

Content moderation pipeline

  1. User submits a comment or review in Power Apps
  2. CheckTextSafety with a custom blocklist for organization-specific terms
  3. If safe → publish; if unsafe → flag for human review with the category and severity details

Image upload screening

  1. User uploads an image via Power Apps
  2. Convert to base64 → CheckImageSafety with threshold 2
  3. If safe → store in SharePoint; if unsafe → reject with a message

Agent safety wrapper

  1. User sends a prompt to your Copilot Studio agent
  2. ShieldPrompt checks the prompt for jailbreak attempts before it reaches the LLM
  3. LLM generates a response
  4. CheckTextSafety validates the response before returning it
  5. DetectProtectedMaterial confirms no copyrighted content in the response

Custom categories

Define new safety categories without training a model. Provide a name, a description, and optional few-shot examples:

{
  "text": "You should invest all your savings in this stock immediately",
  "categoryName": "FinancialAdvice",
  "definition": "Content that provides specific financial investment advice or recommendations",
  "sampleTexts": [
    { "text": "Buy ACME stock now before it doubles", "label": true },
    { "text": "The stock market closed higher today", "label": false }
  ]
}

Returns detected: true/false. Use this to enforce organization-specific content policies without waiting for a model update.

Prerequisites

  1. An Azure subscription
  2. An Azure AI Content Safety resource (or any Azure AI Services multi-service resource)
  3. Note the Resource Name and API Key from Keys and Endpoint

Setting up the connector

1. Create an Azure AI Content Safety resource

  1. Go to the Azure Portal
  2. Create an Azure AI Content Safety resource (or use an existing Azure AI Services resource)
  3. Copy the Resource Name and API Key from Keys and Endpoint

2. Create the custom connector

  1. Go to Power Platform Maker Portal
  2. Navigate to Custom connectors > + New custom connector > Import an OpenAPI file
  3. Upload apiDefinition.swagger.json
  4. On the Security tab:
    • Authentication type: API Key
    • Parameter label: API Key
    • Parameter name: Ocp-Apim-Subscription-Key
    • Parameter location: Header
  5. On the Code tab:
    • Enable Code
    • Upload script.csx
  6. Select Create connector

3. Test the connector

Test CheckTextSafety with safe text:

{
  "text": "The weather is nice today.",
  "threshold": 2
}

Verify is_safe returns true. Then test with clearly harmful content to confirm detection works.

4. Add to Copilot Studio

  1. In Copilot Studio, open your agent
  2. Add this connector as an action—Copilot Studio detects the MCP endpoint via x-ms-agentic-protocol
  3. Use the safety tools as guardrails around your agent’s LLM calls

Known limitations

  • Text analysis limited to 10,000 characters per request
  • Image analysis limited to 2048x2048 pixels, 4 MB max, 50x50 min
  • Image analysis accepts base64 or Azure Blob Storage URLs only (not public HTTP URLs)
  • Blocklist item text limited to 128 characters
  • Protected material text detection requires minimum 110 characters
  • Protected material code index is current through April 2023 only
  • Groundedness detection supports max 7,500 character text and 55,000 character grounding sources
  • Custom categories limited to 1,000 character input
  • Task adherence, groundedness, protected code, and custom categories use preview API versions

Files

File Purpose
apiDefinition.swagger.json OpenAPI 2.0 definition with MCP endpoint and 17 REST operations
apiProperties.json API Key auth config and script operation bindings
script.csx C# script handling MCP protocol, simplified safety checks, and response transformation
readme.md Setup and usage documentation

Resources

results matching ""

    No results matching ""