Microsoft Foundry MAI Speech MCP connector for Power Platform

April 06, 2026

MAI-Transcribe-1 launched April 2, 2026, as part of Microsoft’s three-model MAI announcement alongside MAI-Voice-1 and MAI-Image-2. It delivers state-of-the-art speech-to-text accuracy at 3.9% average word error rate, 2.5x faster batch transcription than Azure Fast, and ranks first on the FLEURS benchmark in 11 core languages across the top 25 most-used languages.

Built for messy, real-world audio environments—not clean studio recordings. Meeting rooms with background chatter, phone calls with connection noise, field recordings with wind and traffic.

This connector brings MAI-Transcribe-1 and Whisper into Power Platform with two MCP tools for Copilot Studio and three REST operations for Power Automate and Power Apps. A borrowed chat completion operation lets agents summarize, analyze, or discuss transcription results within a conversation.

Full source: GitHub repository

The models

Model	Accuracy	Languages	Pricing
MAI-Transcribe-1	3.9% avg WER, #1 on FLEURS in 11 core languages	25+	$0.36 per hour of audio
Whisper	Varies by deployment	50+	Varies by deployment

Both models support the same operations—transcription and translation—through separate REST endpoints that take a deployment ID parameter. Select the model at deployment time.

Tools

MCP tools for Copilot Studio

Tool	Description
`transcribe_audio`	Transcribe audio from a URL to text, with optional deployment ID, language, and response format
`chat_completion`	Send a chat message to summarize or discuss transcriptions (borrowed from parent Foundry connector)

How it works

User: "Transcribe the recording from today's standup meeting"
      [provides audio URL]

1. Orchestrator calls transcribe_audio({
     audio_url: "https://storage.blob.core.windows.net/.../standup.mp3",
     language: "en",
     response_format: "verbose_json"
   })

   → Returns:
     text: "Good morning everyone. Let's start with updates..."
     language: "en"
     duration: 847.5
     segments: [{ start: 0.0, end: 3.2, text: "Good morning everyone." }, ...]

User: "Summarize that meeting and pull out the action items"

2. Orchestrator calls chat_completion({
     prompt: "Summarize this transcript and list action items:
              Good morning everyone. Let's start with updates...",
     system_prompt: "You are a meeting notes assistant"
   })

   → Returns structured meeting summary with action items

REST operations for Power Automate and Power Apps

Operation	Operation ID	Method	Path
Transcribe Audio	`TranscribeAudio`	POST	`/openai/deployments/{deployment_id}/audio/transcriptions`
Translate Audio to English	`TranslateAudio`	POST	`/openai/deployments/{deployment_id}/audio/translations`
Chat Completion	`ChatCompletion`	POST	`/models/chat/completions`

Parameter reference

Operation	Parameter	Type	Default	Required
Transcribe Audio	`deployment_id`	string	—	Yes
Transcribe Audio	`file`	file	—	Yes
Transcribe Audio	`language`	string	—	No
Transcribe Audio	`prompt`	string	—	No
Transcribe Audio	`response_format`	enum	—	No
Transcribe Audio	`temperature`	float	0	No
Translate Audio	`deployment_id`	string	—	Yes
Translate Audio	`file`	file	—	Yes
Translate Audio	`prompt`	string	—	No
Translate Audio	`response_format`	enum	—	No
Translate Audio	`temperature`	float	0	No
Chat Completion	`messages`	array	—	Yes
Chat Completion	`temperature`	float	0.7	No
Chat Completion	`max_tokens`	int	4096	No

Response format options

Format	Output	Use case
`json`	Simple JSON with text	Quick text extraction
`text`	Plain text only	Downstream text processing
`verbose_json`	JSON with timestamps and segments	Meeting minutes, indexed search
`srt`	SubRip subtitle format	Video subtitles
`vtt`	WebVTT subtitle format	Web video captions

Transcription response

Field	Description
`text`	The transcribed or translated text
`language`	Detected or specified language
`duration`	Audio duration in seconds
`segments`	Timestamped segments (verbose_json only)

Use cases

Meeting transcription and minutes: Transcribe a meeting recording, then use chat_completion to extract action items, decisions, and key discussion points. The verbose_json format provides timestamps for linking notes back to specific moments.

Call recording analysis and compliance: Process customer service or sales calls through TranscribeAudio, then analyze for compliance keywords, sentiment, or escalation triggers using the chat completion tool.

Multilingual content transcription: MAI-Transcribe-1 supports 25+ languages with auto-detection. Transcribe content in the original language or use TranslateAudio to translate any supported language into English text.

Subtitle generation: Use the srt or vtt response format to generate ready-to-use subtitle files for video content. Drop the output directly into your video publishing workflow.

Voice memo processing: Build a Power Automate flow that watches a SharePoint folder for audio uploads, transcribes each file, and stores the text alongside the original recording.

Podcast and media transcription: Process long-form audio content for searchable archives, show notes, or accessibility compliance.

Prerequisites

An Azure subscription with access to Microsoft Foundry
Deploy a speech-to-text model from the Foundry Model Catalog:
- MAI-Transcribe-1 — for state-of-the-art transcription accuracy
- Whisper — for broad language support
Note the Resource Name (for example, my-foundry-resource from https://my-foundry-resource.openai.azure.com) and API Key from the deployment

Setting up the connector

1. Deploy a speech model in Microsoft Foundry

Go to the Foundry Model Catalog
Search for MAI-Transcribe-1 or Whisper and deploy the model
Copy the Resource Name and API Key from the deployment page

2. Create the custom connector

Go to Power Platform Maker Portal
Navigate to Custom connectors > + New custom connector > Import an OpenAPI file
Upload apiDefinition.swagger.json
On the Security tab:
- Authentication type: API Key
- Parameter label: API Key
- Parameter name: api-key
- Parameter location: Header
On the Code tab:
- Enable Code
- Upload script.csx
Select Create connector

3. Create a connection

Select Test > + New connection
Enter your Resource Name and API Key
Select Create connection

4. Test the connector

Test TranscribeAudio by uploading a short audio file (mp3, wav, or m4a) and setting the deployment ID to your model name (for example, MAI-Transcribe-1).

5. Add to Copilot Studio

In Copilot Studio, open your agent
Add this connector as an action—Copilot Studio detects the MCP endpoint via x-ms-agentic-protocol
Test with prompts like “Transcribe this audio file” or “What was discussed in this recording?”

The MCP transcribe_audio tool downloads audio from a URL, so provide publicly accessible links or SAS URLs for files in Azure Storage.

Known limitations

Audio files must be 25 MB or smaller
Supported formats: mp3, mp4, mpeg, mpga, m4a, wav, webm
For files larger than 25 MB, use the Azure Speech batch transcription API instead
The MCP transcribe_audio tool downloads audio from the provided URL—files must be publicly accessible or use a SAS URL
The chat_completion tool is borrowed from the parent Microsoft Foundry connector and requires a chat-capable model deployed to the same resource
The connector uses two different host patterns: .openai.azure.com for audio operations and .services.ai.azure.com for chat and MCP

Files

File	Purpose
`apiDefinition.swagger.json`	OpenAPI 2.0 definition with MCP endpoint and 3 REST operations
`apiProperties.json`	API Key auth config, dual dynamic host URL policies, and script operation bindings
`script.csx`	C# script handling MCP protocol, audio file download and multipart forwarding, chat completion, and Application Insights telemetry
`readme.md`	Setup and usage documentation

Microsoft Foundry MAI Speech MCP connector for Power Platform

Microsoft Foundry MAI Speech MCP connector for Power Platform

The models

Tools

MCP tools for Copilot Studio

How it works

REST operations for Power Automate and Power Apps

Parameter reference

Response format options

Transcription response

Use cases

Prerequisites

Setting up the connector

1. Deploy a speech model in Microsoft Foundry

2. Create the custom connector

3. Create a connection

4. Test the connector

5. Add to Copilot Studio

Known limitations

Files

Resources

results matching ""

No results matching ""

Microsoft Foundry MAI Speech MCP connector for Power Platform

The models

Tools

MCP tools for Copilot Studio

How it works

REST operations for Power Automate and Power Apps

Parameter reference

Response format options

Transcription response

Use cases

Prerequisites

Setting up the connector

1. Deploy a speech model in Microsoft Foundry

2. Create the custom connector

3. Create a connection

4. Test the connector

5. Add to Copilot Studio

Known limitations

Files

Resources

Subscribe to the weekly newsletter

results matching ""

No results matching ""