Microsoft Foundry MAI Voice MCP connector for Power Platform

April 07, 2026

A flow triggers when a new blog post publishes. It generates an audio narration, uploads the MP3 to SharePoint, and links it in the post metadata. No recording studio, no human narrator, no manual steps.

MAI-Voice-1 launched April 2, 2026, as part of Microsoft’s three-model MAI announcement alongside MAI-Transcribe-1 and MAI-Image-2. It generates 60 seconds of audio in 1 second with emotional range, speaker identity preservation, and custom voice cloning from just a few seconds of sample audio. Already powering Copilot Audio Expressions and Copilot Podcasts.

This connector brings MAI-Voice-1 and Azure Neural TTS voices into Power Platform with three MCP tools for Copilot Studio and three REST operations for Power Automate and Power Apps. A borrowed chat completion tool helps agents generate SSML markup and select voices within a conversation.

Full source: GitHub repository

The model

Model	Speed	Languages	Pricing
MAI-Voice-1	60 seconds of audio in 1 second	Multiple	$22 per 1M characters
Azure Neural TTS	Standard	100+ locales	Varies by region

MAI-Voice-1 delivers top-tier voice generation with nuance, emotional range, and expression that preserves speaker identity across long-form content. Custom voice cloning requires separate enrollment through Azure Speech Studio.

Tools

MCP tools for Copilot Studio

Tool	Description
`synthesize_speech`	Convert text to speech audio with voice, language, and format selection
`list_voices`	Get available voices with optional locale filtering
`chat_completion`	Generate SSML, suggest voices, or discuss speech parameters (borrowed from parent Foundry connector)

How it works

User: "Read this announcement in a warm, professional voice"

1. Orchestrator calls list_voices({
     locale_filter: "en-US"
   })

   → Returns available voices with styles:
     [
       { shortName: "en-US-JennyNeural", gender: "Female",
         styles: ["cheerful", "sad", "angry", "excited", ...] },
       { shortName: "en-US-GuyNeural", gender: "Male",
         styles: ["newscast", "angry", "cheerful", ...] },
       ...
     ]

2. Orchestrator calls synthesize_speech({
     text: "We're excited to announce...",
     voice: "en-US-JennyNeural",
     language: "en-US",
     output_format: "audio-48khz-192kbitrate-mono-mp3"
   })

   → Returns: { status: "success", audio_size_bytes: 48320,
                 voice: "en-US-JennyNeural" }

User: "Generate SSML with emphasis on the product name"

3. Orchestrator calls chat_completion({
     prompt: "Write SSML that emphasizes 'Power Platform'
              in this sentence: Power Platform now supports
              voice-enabled agents.",
     system_prompt: "You are an SSML expert"
   })

   → Returns SSML with prosody and emphasis tags

REST operations for Power Automate and Power Apps

Operation	Operation ID	Method	Path
Synthesize Speech	`SynthesizeSpeech`	POST	`/cognitiveservices/v1`
List Available Voices	`ListVoices`	GET	`/cognitiveservices/voices/list`
Chat Completion	`ChatCompletion`	POST	`/models/chat/completions`

Parameter reference

Operation	Parameter	Type	Default	Required
Synthesize Speech	`X-Microsoft-OutputFormat`	enum	audio-24khz-96kbitrate-mono-mp3	Yes
Synthesize Speech	`body` (SSML)	string	—	Yes
Chat Completion	`messages`	array	—	Yes
Chat Completion	`temperature`	float	0.7	No
Chat Completion	`max_tokens`	int	4096	No

Audio output formats

Format	Quality	Use case
`audio-24khz-96kbitrate-mono-mp3`	General purpose	Default, good balance of quality and size
`audio-48khz-192kbitrate-mono-mp3`	High quality	Podcasts, professional narration
`riff-24khz-16bit-mono-pcm`	Uncompressed WAV	Post-processing, editing
`ogg-24khz-16bit-mono-opus`	Web optimized	Browser playback, streaming

SSML input

The Synthesize Speech operation takes SSML (Speech Synthesis Markup Language) as input, giving you control over voice selection, language, prosody, emphasis, breaks, and speaking styles:

<speak version='1.0' xml:lang='en-US'>
  <voice xml:lang='en-US' name='en-US-JennyNeural'>
    <prosody rate="medium" pitch="default">
      Welcome to today's update.
      <break time="500ms"/>
      Here's what's new.
    </prosody>
  </voice>
</speak>

Use the chat_completion tool or operation to have an LLM generate SSML from plain text with the right voice, pacing, and emphasis for your content.

List Voices response

Each voice includes:

Field	Description
`ShortName`	Voice identifier for SSML (for example, `en-US-JennyNeural`)
`DisplayName`	Human-readable name
`Gender`	Male or Female
`Locale`	Language and region code
`StyleList`	Supported speaking styles (cheerful, sad, newscast, etc.)

Use cases

Automated narration pipeline: Build a Power Automate flow that watches for new content in SharePoint, synthesizes audio narration, and stores the MP3 alongside the original document. Training materials, policy updates, and announcements get audio versions automatically.

Voice agent responses: Give your Copilot Studio agent a voice. Use synthesize_speech to convert the agent’s text responses to audio for phone-based or accessibility-first scenarios.

Multilingual voice output: List voices filtered by locale, select the right voice for the target language, and synthesize speech—all within the same flow. Support content in dozens of languages without maintaining separate voice configurations.

Podcast and audio content generation: Use high-quality output formats (audio-48khz-192kbitrate-mono-mp3) for professional-grade audio. Combine with MAI-Transcribe-1 for a full audio content pipeline—transcribe interviews, edit the text, re-synthesize as polished narration.

Branded voice experiences: Clone a specific voice through Azure Speech Studio, then use this connector to generate branded audio content at scale. Product announcements, customer communications, and training materials all sound consistent.

Accessibility: Convert text content to speech for users who prefer or need audio. Reports, dashboards, notifications—anything text-based can get an audio version through a simple flow.

Prerequisites

An Azure subscription with access to Microsoft Foundry
Speech Services enabled on your Foundry resource (MAI-Voice-1 is available through the Foundry Model Catalog)
Note the Resource Name, API Key, and Region (for example, eastus2) from the Azure portal

Setting up the connector

1. Enable Speech Services

Go to Microsoft Foundry
Ensure Speech Services are enabled on your resource
Copy the Resource Name, API Key, and Region from the deployment page

2. Create the custom connector

Go to Power Platform Maker Portal
Navigate to Custom connectors > + New custom connector > Import an OpenAPI file
Upload apiDefinition.swagger.json
On the Security tab:
- Authentication type: API Key
- Parameter label: API Key
- Parameter name: api-key
- Parameter location: Header
On the Code tab:
- Enable Code
- Upload script.csx
Select Create connector

3. Create a connection

Select Test > + New connection
Enter your Resource Name, API Key, and Region
Select Create connection

4. Test the connector

Test ListVoices first to see available voices. Then test SynthesizeSpeech with simple SSML:

<speak version='1.0' xml:lang='en-US'>
  <voice xml:lang='en-US' name='en-US-JennyNeural'>
    Hello, this is a test of the MAI Voice connector.
  </voice>
</speak>

5. Add to Copilot Studio

In Copilot Studio, open your agent
Add this connector as an action—Copilot Studio detects the MCP endpoint via x-ms-agentic-protocol
Test with prompts like “List available English voices” or “Convert this text to speech”

Known limitations

Audio output is capped at 10 minutes per request
SSML body length is limited by the Speech Services API
Custom voice cloning requires separate enrollment through Azure Speech Studio
The MCP synthesize_speech tool generates audio but cannot return binary data directly to the agent—use the REST SynthesizeSpeech operation to retrieve audio files
The connector uses two different host patterns: {region}.tts.speech.microsoft.com for speech operations and .services.ai.azure.com for chat and MCP
Connection requires three parameters (Resource Name, API Key, Region) unlike other Foundry connectors that need only two

Files

File	Purpose
`apiDefinition.swagger.json`	OpenAPI 2.0 definition with MCP endpoint and 3 REST operations
`apiProperties.json`	API Key auth config, region-based dynamic host URL policies, and script operation bindings
`script.csx`	C# script handling MCP protocol, SSML generation, voice listing with locale filtering, and host routing
`readme.md`	Setup and usage documentation

Microsoft Foundry MAI Voice MCP connector for Power Platform

Microsoft Foundry MAI Voice MCP connector for Power Platform

The model

Tools

MCP tools for Copilot Studio

How it works

REST operations for Power Automate and Power Apps

Parameter reference

Audio output formats

SSML input

List Voices response

Use cases

Prerequisites

Setting up the connector

1. Enable Speech Services

2. Create the custom connector

3. Create a connection

4. Test the connector

5. Add to Copilot Studio

Known limitations

Files

Resources

results matching ""

No results matching ""

Microsoft Foundry MAI Voice MCP connector for Power Platform

The model

Tools

MCP tools for Copilot Studio

How it works

REST operations for Power Automate and Power Apps

Parameter reference

Audio output formats

SSML input

List Voices response

Use cases

Prerequisites

Setting up the connector

1. Enable Speech Services

2. Create the custom connector

3. Create a connection

4. Test the connector

5. Add to Copilot Studio

Known limitations

Files

Resources

Subscribe to the weekly newsletter

results matching ""

No results matching ""