microsoft speech to text

Dictate your documents in Word

Dictation lets you use speech-to-text to author content in Microsoft 365 with a microphone and reliable internet connection. It's a quick and easy way to get your thoughts out, create drafts or outlines, and capture notes. 

Office Dictate Button

Start speaking to see text appear on the screen.

 The dictation feature is only available to  .

How to use dictation

Dictate button

Tip:  You can also start dictation with the keyboard shortcut:  ⌥ (Option) + F1.

Dictation activated

Learn more about using dictation in Word on the web and mobile

Dictate your documents in Word for the web

Dictate your documents in Word Mobile

What can I say?

In addition to dictating your content, you can speak commands to add punctuation, navigate around the page, and enter special characters.

You can see the commands in any supported language by going to  Available languages . These are the commands for English.

Punctuation

.

,

?

!

new line

's

:

;

" "

-

...

' '

( )

[ ]

{ }

Navigation and Selection

Creating lists

Adding comments.

Dictation commands

*

\

/

|

`

_

§

&

@

©

®

°

^

Mathematics

%

#

+

-

x

±

÷

=

< >

$

£

¥

Emoji/faces

:)

:(

;)

<3

Available languages

Select from the list below to see commands available in each of the supported languages.

  • Select your language

Arabic (Bahrain)

Arabic (Egypt)

Arabic (Saudi Arabia)

Croatian (Croatia)

Gujarati (India)

  • Hebrew (Israel)
  • Hungarian (Hungary)
  • Irish (Ireland)

Marathi (India)

  • Polish (Poland)
  • Romanian (Romania)
  • Russian (Russia)
  • Slovenian (Slovenia)

Tamil (India)

Telugu (India)

  • Thai (Thailand)
  • Vietnamese (Vietnam)

More Information

Spoken languages supported.

By default, Dictation is set to your document language in Microsoft 365.

We are actively working to improve these languages and add more locales and languages.

Supported Languages

Chinese (China)

English (Australia)

English (Canada)

English (India)

English (United Kingdom)

English (United States)

French (Canada)

French (France)

German (Germany)

Italian (Italy)

Portuguese (Brazil)

Spanish (Spain)

Spanish (Mexico)

Preview languages *

Chinese (Traditional, Hong Kong)

Chinese (Taiwan)

Dutch (Netherlands)

English (New Zealand)

Norwegian (Bokmål)

Portuguese (Portugal)

Swedish (Sweden)

Turkish (Turkey)

* Preview Languages may have lower accuracy or limited punctuation support.

Dictation settings

Click on the gear icon to see the available settings.

Dictation in Word for the Web Settings

Spoken Language:  View and change languages in the drop-down

Microphone: View and change your microphone

Auto Punctuation:  Toggle the checkmark on or off, if it's available for the language chosen

Profanity filter:  Mask potentially sensitive phrases with ***

Tips for using Dictation

Saying “ delete ” by itself removes the last word or punctuation before the cursor.

Saying “ delete that ” removes the last spoken utterance.

You can bold, italicize, underline, or strikethrough a word or phrase. An example would be dictating “review by tomorrow at 5PM”, then saying “ bold tomorrow ” which would leave you with "review by tomorrow at 5PM"

Try phrases like “ bold last word ” or “ underline last sentence .”

Saying “ add comment look at this tomorrow ” will insert a new comment with the text “Look at this tomorrow” inside it.

Saying “ add comment ” by itself will create a blank comment box you where you can type a comment.

To resume dictation, please use the keyboard shortcut ALT + `  or press the Mic icon in the floating dictation menu.

Markings may appear under words with alternates we may have misheard.

If the marked word is already correct, you can select  Ignore .

Dictate Suggestions

This service does not store your audio data or transcribed text.

Your speech utterances will be sent to Microsoft and used only to provide you with text results.

For more information about experiences that analyze your content, see Connected Experiences in Microsoft 365 .

Troubleshooting

Can't find the dictate button.

If you can't see the button to start dictation:

Make sure you're signed in with an active Microsoft 365 subscription

Dictate is not available in Office 2016 or 2019 for Windows without Microsoft 365

Make sure you have Windows 10 or above

Dictate button is grayed out

If you see the dictate button is grayed out

Make sure the note is not in a Read-Only state.

Microphone doesn't have access

If you see "We don’t have access to your microphone":

Make sure no other application or web page is using the microphone and try again

Refresh, click on Dictate, and give permission for the browser to access the microphone

Microphone isn't working

If you see "There is a problem with your microphone" or "We can’t detect your microphone":

Make sure the microphone is plugged in

Test the microphone to make sure it's working

Check the microphone settings in Control Panel

Also see How to set up and test microphones in Windows

On a Surface running Windows 10: Adjust microphone settings

Dictation can't hear you

If you see "Dictation can't hear you" or if nothing appears on the screen as you dictate:

Make sure your microphone is not muted

Adjust the input level of your microphone

Move to a quieter location

If using a built-in mic, consider trying again with a headset or external mic

Accuracy issues or missed words

If you see a lot of incorrect words being output or missed words:

Make sure you're on a fast and reliable internet connection

Avoid or eliminate background noise that may interfere with your voice

Try speaking more deliberately

Check to see if the microphone you are using needs to be upgraded

Facebook

Need more help?

Want more options.

Explore subscription benefits, browse training courses, learn how to secure your device, and more.

microsoft speech to text

Microsoft 365 subscription benefits

microsoft speech to text

Microsoft 365 training

microsoft speech to text

Microsoft security

microsoft speech to text

Accessibility center

Communities help you ask and answer questions, give feedback, and hear from experts with rich knowledge.

microsoft speech to text

Ask the Microsoft Community

microsoft speech to text

Microsoft Tech Community

microsoft speech to text

Windows Insiders

Microsoft 365 Insiders

Was this information helpful?

Thank you for your feedback.

This browser is no longer supported.

Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.

What is the Speech service?

  • 3 contributors

The Speech service provides speech to text and text to speech capabilities with a Speech resource . You can transcribe speech to text with high accuracy, produce natural-sounding text to speech voices, translate spoken audio, and use speaker recognition during conversations.

Image of tiles that highlight some Speech service features.

Create custom voices, add specific words to your base vocabulary, or build your own models. Run Speech anywhere, in the cloud or at the edge in containers. It's easy to speech enable your applications, tools, and devices with the Speech CLI , Speech SDK , and REST APIs .

Speech is available for many languages , regions , and price points .

Speech scenarios

Common scenarios for speech include:

  • Captioning : Learn how to synchronize captions with your input audio, apply profanity filters, get partial results, apply customizations, and identify spoken languages for multilingual scenarios.
  • Audio Content Creation : You can use neural voices to make interactions with chatbots and voice assistants more natural and engaging, convert digital texts such as e-books into audiobooks and enhance in-car navigation systems.
  • Call Center : Transcribe calls in real-time or process a batch of calls, redact personally identifying information, and extract insights such as sentiment to help with your call center use case.
  • Language learning : Provide pronunciation assessment feedback to language learners, support real-time transcription for remote learning conversations, and read aloud teaching materials with neural voices.
  • Voice assistants : Create natural, human like conversational interfaces for their applications and experiences. The voice assistant feature provides fast, reliable interaction between a device and an assistant implementation.

Microsoft uses Speech for many scenarios, such as captioning in Teams, dictation in Office 365, and Read Aloud in the Microsoft Edge browser.

Image showing logos of Microsoft products where Speech service is used.

Speech capabilities

These sections summarize Speech features with links for more information.

Speech to text

Use speech to text to transcribe audio into text, either in real-time or asynchronously with batch transcription .

You can try real-time speech to text in Speech Studio without signing up or writing any code.

Convert audio to text from a range of sources, including microphones, audio files, and blob storage. Use speaker diarization to determine who said what and when. Get readable transcripts with automatic formatting and punctuation.

The base model might not be sufficient if the audio contains ambient noise or includes numerous industry and domain-specific jargon. In these cases, you can create and train custom speech models with acoustic, language, and pronunciation data. Custom speech models are private and can offer a competitive advantage.

Real-time speech to text

With real-time speech to text , the audio is transcribed as speech is recognized from a microphone or file. Use real-time speech to text for applications that need to transcribe audio in real-time such as:

  • Transcriptions, captions, or subtitles for live meetings
  • Diarization

Pronunciation assessment

  • Contact center agents assist
  • Voice agents

Fast transcription API (Preview)

Fast transcription API is used to transcribe audio files with returning results synchronously and much faster than real-time audio. Use fast transcription in the scenarios that you need the transcript of an audio recording as quickly as possible with predictable latency, such as:

  • Quick audio or video transcription, subtitles, and edit.
  • Video translation

Fast transcription API is only available via the speech to text REST API version 2024-05-15-preview.

To get started with fast transcription, see use the fast transcription API (preview) .

Batch transcription

Batch transcription is used to transcribe a large amount of audio in storage. You can point to audio files with a shared access signature (SAS) URI and asynchronously receive transcription results. Use batch transcription for applications that need to transcribe audio in bulk such as:

  • Transcriptions, captions, or subtitles for prerecorded audio
  • Contact center post-call analytics

Text to speech

With text to speech , you can convert input text into human like synthesized speech. Use neural voices, which are human like voices powered by deep neural networks. Use the Speech Synthesis Markup Language (SSML) to fine-tune the pitch, pronunciation, speaking rate, volume, and more.

  • Prebuilt neural voice: Highly natural out-of-the-box voices. Check the prebuilt neural voice samples the Voice Gallery and determine the right voice for your business needs.
  • Custom neural voice: Besides the prebuilt neural voices that come out of the box, you can also create a custom neural voice that is recognizable and unique to your brand or product. Custom neural voices are private and can offer a competitive advantage. Check the custom neural voice samples here .

Speech translation

Speech translation enables real-time, multilingual translation of speech to your applications, tools, and devices. Use this feature for speech to speech and speech to text translation.

Language identification

Language identification is used to identify languages spoken in audio when compared against a list of supported languages . Use language identification by itself, with speech to text recognition, or with speech translation.

Speaker recognition

Speaker recognition provides algorithms that verify and identify speakers by their unique voice characteristics. Speaker recognition is used to answer the question, "Who is speaking?".

Pronunciation assessment evaluates speech pronunciation and gives speakers feedback on the accuracy and fluency of spoken audio. With pronunciation assessment, language learners can practice, get instant feedback, and improve their pronunciation so that they can speak and present with confidence.

Intent recognition

Intent recognition : Use speech to text with conversational language understanding to derive user intents from transcribed speech and act on voice commands.

Delivery and presence

You can deploy Azure AI Speech features in the cloud or on-premises.

With containers , you can bring the service closer to your data for compliance, security, or other operational reasons.

Speech service deployment in sovereign clouds is available for some government entities and their partners. For example, the Azure Government cloud is available to US government entities and their partners. Microsoft Azure operated by 21Vianet cloud is available to organizations with a business presence in China. For more information, see sovereign clouds .

Diagram showing where Speech service can be deployed and accessed.

Use Speech in your application

The Speech Studio is a set of UI-based tools for building and integrating features from Azure AI Speech service in your applications. You create projects in Speech Studio by using a no-code approach, and then reference those assets in your applications by using the Speech SDK , the Speech CLI , or the REST APIs.

The Speech CLI is a command-line tool for using Speech service without having to write any code. Most features in the Speech SDK are available in the Speech CLI, and some advanced features and customizations are simplified in the Speech CLI.

The Speech SDK exposes many of the Speech service capabilities you can use to develop speech-enabled applications. The Speech SDK is available in many programming languages and across all platforms.

In some cases, you can't or shouldn't use the Speech SDK . In those cases, you can use REST APIs to access the Speech service. For example, use REST APIs for batch transcription and speaker recognition REST APIs.

Get started

We offer quickstarts in many popular programming languages. Each quickstart is designed to teach you basic design patterns and have you running code in less than 10 minutes. See the following list for the quickstart for each feature:

  • Speech to text quickstart
  • Text to speech quickstart
  • Speech translation quickstart

Code samples

Sample code for the Speech service is available on GitHub. These samples cover common scenarios like reading audio from a file or stream, continuous and single-shot recognition, and working with custom models. Use these links to view SDK and REST samples:

  • Speech to text, text to speech, and speech translation samples (SDK)
  • Batch transcription samples (REST)
  • Text to speech samples (REST)
  • Voice assistant samples (SDK)

Responsible AI

An AI system includes not only the technology, but also the people who use it, the people who are affected by it, and the environment in which it's deployed. Read the transparency notes to learn about responsible AI use and deployment in your systems.

  • Transparency note and use cases
  • Characteristics and limitations
  • Integration and responsible use
  • Data, privacy, and security

Pronunciation Assessment

Custom neural voice.

  • Limited access
  • Responsible deployment of synthetic speech
  • Disclosure of voice talent
  • Disclosure of design guidelines
  • Disclosure of design patterns
  • Code of conduct

Speaker Recognition

  • General guidelines
  • Get started with speech to text
  • Get started with text to speech

Was this page helpful?

Additional resources

microsoft speech to text

Azure AI Speech

A managed service offering industry-leading speech capabilities such as speech-to-text, text-to-speech, speech translation, and speaker recognition.

Quickly develop high-quality voice-enabled apps

Build voice-enabled generative AI apps confidently and quickly with the Azure AI Speech. Transcribe speech to text with high accuracy, produce natural-sounding text-to-speech voices, translate spoken audio, and use speaker recognition during conversations. Build faster with pre-built and customizable AI models in  Azure AI Studio .

microsoft speech to text

Industry-leading quality

Get state-of-the-art speech to text, lifelike text to speech, and award-winning speaker recognition.

microsoft speech to text

Compliant and secure

Your data stays yours—your speech input is not logged during processing.

microsoft speech to text

Customizable voices and models

Create custom voices, add specific words to your base vocabulary, or build your own models.

microsoft speech to text

Flexible deployment

Run Speech anywhere, in the cloud or at the edge in containers.

microsoft speech to text

Convert speech to text

Quickly and accurately transcribe audio in more than 100 languages and variants. Gain customer insights with call center transcription, improve experiences with voice-enabled assistants, capture key discussions in meetings and more.

microsoft speech to text

Give your app a voice

Use text to speech to create apps and services that speak conversationally. Create natural-sounding  audio content , improve accessibility with read-aloud functionality, and create custom voice assistants.

microsoft speech to text

Translate speech in real time

Translate audio from more than 30 languages and customize translations for your organization's specific terms—all in your preferred programming language.

microsoft speech to text

Verify and recognize speakers

Confirm a person's identity or recognize who's speaking in a meeting by adding speaker verification and identification to your app.

microsoft speech to text

Activate your assistant or IoT device with a custom keyword

Create a custom keyword for IoT devices and voice-enabled assistants to set your brand apart—making it more personal, personable, and secure.

microsoft speech to text

Add voice commands for hands-free scenarios

Build a touchless, voice-first experience to improve safety and support back-to-work scenarios.

Comprehensive security and compliance, built in

Microsoft invests more than USD1 billion annually on cybersecurity research and development.

microsoft speech to text

We employ more than 3,500 security experts who are dedicated to data security and privacy.

microsoft speech to text

Flexible pricing gives you the power and control you need

Pay for only what you use, with no upfront costs. With Speech, pay as you go based on:

  • The number of hours of audio you transcribe or translate for speech to text and speech translation.
  • The number of characters you convert to audio for text to speech
  • The number of transactions for Speaker Recognition

Get started with an Azure free account

microsoft speech to text

Start free . Get USD200 credit to use within 30 days. While you have your credit, get free amounts of many of our most popular services, plus free amounts of 55+ other services that are always free.

microsoft speech to text

After your credit, move to  pay as you go  to keep building with the same free services. Pay only if you use more than your free monthly amounts.

microsoft speech to text

Trusted by companies of all sizes

AT&T

AT&T delights customers with immersive experiences

AT&T is showcasing its 5G network with an immersive experience that allows customers to talk directly to Bugs Bunny.*

*LOONEY TUNES and all related characters and elements © & ™ Warner Bros. Entertainment Inc. (s21)

Firstlight Media

Progressive brings Flo directly to customers

Progressive used Custom Neural Voice to build a natural-sounding, virtual version of Flo to help customers with everything from getting a free car insurance quote to general insurance questions.

A women sit in the sofa with smiling face

KPMG streamlines call transcription

KPMG uses Speech to Text to transcribe and catalog thousands of calls, reducing compliance costs for its clients by as much as 80 percent.

Two people talking each other

Motorola helps first responders access vital data

Motorola Solutions helps first responders in the field access vital information with a voice-first virtual assistant.

A man talking somthing in a radiophone

Speech documentation and resources

Get started with ai speech.

Browse the  documentation

Take the  Microsoft Learn Speech course

Explore popular developer resources

Checkout our sample code and SDKs

Build speech models quickly with  Speech studio Stack Overflow

Start building with AI Services

IMAGES

  1. Microsoft Azure Speech to Text Review: Features, Pricing, Guide

    microsoft speech to text

  2. Text-to-Speech Tool by Microsoft

    microsoft speech to text

  3. Use Speech-to-Text in Microsoft Office

    microsoft speech to text

  4. How to use speech-to-text on Microsoft Word to write and edit with your

    microsoft speech to text

  5. Change Microsoft Text-to-Speech Voice Windows 10

    microsoft speech to text

  6. 5 Best Speech to Text Software for Windows 10

    microsoft speech to text