It's FOSS

eSpeak: Text To Speech Tool For Linux

Abhishek Prakash

eSpeak  is a command line tool for Linux that converts text to speech. This compact speech synthesizer provides support for English and many other languages. It is written in C.

eSpeak reads the text from the standard input or input file. The voice generated, however, is nowhere close to a human voice. But it is still a compact and handy tool if you want to use it in your projects.

Some of the main features of eSpeak are:

  • Speaks text from a file or from stdin
  • Shared library version to be used by other programs
  • SAPI5 version for Windows, so it can be used with screen-readers and other programs that support the Windows SAPI5 interface
  • Ported to other platforms, including Android, Mac OSX etc.
  • Several voice characteristics to choose from
  • Speech output can be saved as  .WAV file
  • SSML ( Speech Synthesis Markup Language ) is supported partially along with HTML
  • Uses a “formant synthesis” method. This allows many languages to be provided in a small size.
  • Tiny in size, the complete program with language support, etc is under 2 MB.
  • Can translate text into phoneme codes so that it could be adapted as a front end for another speech synthesis engine.
  • Development tools are available for producing and tuning phoneme data
  • Supports several languages; however, in many cases these are initial drafts and need more work

Install eSpeak

To install eSpeak in Ubuntu based system, use the command below in a terminal:

eSpeak is an old tool and I presume that it should be available in the repositories of other Linux distributions such as Fedora. You can install eSpeak easily using the respective package manager. I

n case of Arch Linux, the repository has espeak-ng in place, which is described in the next section.

To use eSpeak, enter espeak in the terminal. It waits for input. You can start typing your text. When you press enter (new line), you can hear the text you had entered.

You can continue adding text in lines to hear it out. Use Ctrl+C to close the running program .

espeak in terminal

There are several other options available. You can browse through them through the help section of the program.

espeak help section explaining the usages

GUI Version: espeakedit

If you prefer the GUI version over the command line, you can install espeakedit which provides a GTK front end to eSpeak.

Use the command below to install espeakedit:

Once installed, you need to copy the data on /usr/lib/x86_64-linux-gnu/espeak-data/ to your home directory. For this, open a terminal and run:

Once done, you can open the espeakedit application. It will look like:

espeak edit gui app

You can enter the text on the field provided and press speak to start. You can save the file as .WAV file and listen later.

The interface is straightforward and easy to use. You can explore the submenus and functions all by yourself.

A New Tool: eSpeak NG

The eSpeak NG is a compact open-source text-to-speech synthesizer, based on eSpeak engine created by Jonathan Duddington.

It offers the features of eSpeak and is in active development. The project also provides a separate espeak-ng-data package, to avoid conflict with the espeak-data package offered by eSpeak project.

To install this, on Ubuntu, run:

The new eSpeak NG project is a significant departure from the eSpeak project, aiming to clean up the existing codebase, add new features, and add to and improve the supported languages.

Also, it is important to note that espeakedit GUI is not part of this new project.

Some of the notable features:

  • Uses the same command-line options as espeak with several additions.
  • Provides new functionality such as specifying the output audio device name to use.
  • Has been ported to other platforms, including Solaris and Mac OSX.
  • Includes different voices whose characteristics can be altered.
  • Available as a command-line program for Linux and Windows to speak text from a file or from stdin.
  • Available as a shared library version for use by other programs.

Wrapping Up

On It’s FOSS, we use Play.ht to provide audio formats of selected articles. The espeak tools are not as good as the professional AI tools.

However, if you want something basic and free to be used in your project, you can give it a try.

Abhishek Prakash

Created It's FOSS 11 years ago to share my Linux adventures. Have a Master's degree in Engineering and years of IT industry experience. Huge fan of Agatha Christie detective mysteries 🕵️‍♂️

How to Install Unity Desktop on Arch Linux

Battle of the texts and the unicode savior, monica: an open-source app for personal relationship management, 5 neovim gui editors you could try if you are not a total terminal junkie, rnote: an open-source drawing app for notes and annotation, become a better linux user.

With the FOSS Weekly Newsletter, you learn useful Linux tips, discover applications, explore new distros and stay updated with the latest from Linux world

It's FOSS

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to It's FOSS.

Your link has expired.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.

The Linux Code

An In-Depth Guide to Open Source Text-to-Speech Engines for Linux

This comprehensive guide explores the top open source text-to-speech (TTS) engines available for Linux. Converting text into lifelike speech is useful for accessibility, delivering information via voice interfaces, learning pronunciation, and more. We’ll cover the capabilities of leading Linux TTS tools, their installation, and plenty of usage examples.

Introduction to Text-to-Speech

Text-to-speech (TTS) is the artificial production of human speech from written text. TTS engines ingest text, process it through natural language pipelines, and output synthesized audio speech. The quality of TTS systems is determined by how natural and humanlike the generated voices sound.

TTS has many practical use cases:

  • Improving accessibility for vision-impaired users
  • Reading text aloud when eyes-free is needed like while driving
  • Delivering information over voice interfaces or phone systems
  • Assisting with learning languages and proper pronunciation
  • Transcribing documents to audio book format
  • Adding speech output to applications by leveraging TTS APIs

High-quality voices require sophisticated deep learning algorithms. Most modern TTS engines utilize machine learning trained on huge datasets of recorded human speech.

In this guide, we’ll focus on open source command line utilities for performing TTS on Linux. Let‘s look at some of the best options.

eSpeak – Lightweight Open Source TTS

eSpeak is an open source text-to-speech engine released in 1995 by Jonathan Duddington. It supports over 70 languages and accents and is highly configurable for adjusting speech parameters.

eSpeak is lightweight and designed to be portable across many systems. It comes bundled with many Linux distributions due to being open source (GPLv3 license). The voices tend to sound robotic but the speech is clear and works well.

To install on Debian/Ubuntu:

Arch Linux:

Basic usage is simple. To output text to speech:

To read a file aloud:

Let‘s go through some ways to customize and control eSpeak‘s voices.

To list all available voices:

This prints out a table summarizing each voice‘s language, dialect, and identifier.

For example, to set the voice to US English:

Adjust the speech rate with the -s flag:

The pitch can be adjusted with -p :

To save audio output to a file, use -w :

This saves a Wave audio file that can be played in media players. eSpeak supports outputting .wav , .mp3 , and .ogg .

In addition to these common uses, eSpeak provides phoneme support for precise pronunciation:

And an API for integrating TTS directly into applications with C, C++, Python and other languages.

Overall, eSpeak provides a capable open source text-to-speech system on Linux. The voices aren‘t as human sounding as some commercial options, but it‘s free, customizable, lightweight, and easy to use.

Festival – Framework for Building TTS Voices

Festival is another leading open source text-to-speech system originally developed at the University of Edinburgh and released in 1997.

Festival utilizes a modular framework for building synthetic voices. It comes packaged with several English voices and support for Spanish, Welsh, and other languages. Festival is well-suited for research and education purposes.

Install Festival using your Linux distribution‘s package manager:

Some example usage:

Festival includes an interactive shell for experimenting with speech synthesis. This allows modifying parameters on the fly:

Under the hood, Festival provides a framework for building TTS voices called FestVox. This allows developers to create new synthetic voices and languages.

For basic usage, Festival has clear text-to-speech capabilities but sounds robotic. The option to build custom voices is useful for research. However, modern TTS technology has surpassed Festival‘s voice quality.

Pico TTS – Optimized Small Footprint Engine

Pico TTS is an open source project to create a small footprint text-to-speech engine optimized for embedded Linux.

The engine itself is written in C++ and comes packaged in many Linux distributions. It‘s licensed under the LGPL and was originally developed for the Raspberry Pi.

Install on Debian/Ubuntu:

Pico TTS supports English, Spanish, French, German, and Italian voices. Since it‘s designed for small systems, the quality is surprisingly good for the small resource requirements.

To synthesize text and save as a WAV file:

Here -l specifies the language code like en-US for US English.

Pico TTS doesn‘t allow piping text directly to stdout. But the WAV output works well for offline usage.

In summary, Pico TTS provides a capable text-to-speech engine optimized for embedded Linux applications like the Raspberry Pi. For desktop use, other options might be higher quality. But as a small footprint engine, Pico TTS works quite well.

gTTS – Leveraging Google‘s TTS API

gTTS provides a command line interface and Python library for Google Translate‘s Text-to-Speech API. It‘s an easy way to access Google‘s state-of-the-art deep learning models.

gTTS can be installed with pip:

Or on Linux distributions:

Basic usage:

This saves the synthesized audio to an MP3 file.

To read a text file aloud:

gTTS supports dozens of languages and natural sounding voices provided by Google:

Prints out all the available languages and voice codes.

For example, set the language to US English:

gTTS is ideal way to leverage Google‘s industry leading text-to-speech engine from the Linux command line. The audio quality is human sounding and highly intelligible.

Comparing Voice Quality Between TTS Engines

There are noticeable differences in audio quality between the open source text-to-speech solutions we covered. Let‘s do a quick comparison.

eSpeak and Festival sound robotic since they rely on formant synthesis instead of deep learning. eSpeak voices tend to be clearer than Festival.

Pico TTS delivers good quality given its tiny resource footprint. The voices aren‘t perfectly human sounding but quite intelligible.

gTTS provides the most natural sounding audio by far since it uses Google‘s state-of-the-art WaveNet deep neural network voices. The quality difference is very noticeable.

For the best sounding voices, gTTS is recommended. But the open source engines like eSpeak work well enough for some use cases, especially considering they‘re free.

Additional Tips and Tricks

Here are some additional tips for getting the most out of Linux text-to-speech engines:

  • Adjust speech rate, pitch, and volume to customize the voice
  • Use phoneme support for precise pronunciation of texts
  • Output audio to a file instead of directly to speakers
  • Pipe audio to media players like mplayer for enhanced controls
  • Chain multiple engines together for more options
  • Install alternative voices and languages
  • Use TTS engines from other languages like Chinese, Russian, etc.
  • Integrate speech synthesis directly into your own apps with provided APIs

And some troubleshooting advice:

  • If no audio, check speakers are not muted and volume is up
  • Install any required audio codec packs for your system
  • Try a different TTS engine if issues with a specific one
  • Look for error output for diagnose problems
  • Consult documentation and GitHub issues page

With a bit of tweaking, the open source text-to-speech engines provide plenty of options for your Linux projects.

Leveraging TTS Engines in Shell Scripts

One useful application of text-to-speech on Linux is scripting batch text file conversions. Here is an example bash script to synthesize all text files in a directory using eSpeak:

This iterates through .txt files, converts each to audio with eSpeak using the -w flag, and saves the output as a .wav file.

Scripts like this provide an easy way to automate batch text-to-speech conversions and workflows.

Appendix: Quick Reference of Engines

Engine Languages Voices License Notes
eSpeak 70+ Formant synthesis GPLv3 Robotic voices, versatile, lightweight
Festival Multiple Formant synthesis Custom Framework for building voices
PicoTTS 5 Formant synthesis LGPL Small footprint, good quality
gTTS Many Google WaveNet NN AGPL Most natural sounding voices

This guide covered several excellent open source text-to-speech utilities for Linux. eSpeak and Festival are classic options that work reasonably well. Pico TTS is great for embedded devices. gTTS provides the best sounding human voices by leveraging Google‘s technology.

The installation process, basic usage, and customization options were explained for each text-to-speech engine. TTS enables many exciting applications on the Linux command line and within scripts or apps.

To learn more about the capabilities of each text-to-speech engine, be sure to consult the official project documentation. Their GitHub repositories also contain useful code samples to get started.

With the power of text-to-speech, Linux can talk back to you! Converting text to natural sounding speech opens many possibilities.

You maybe like,

Related posts, 10 best linux games for free in 2022.

Gaming on Linux has become incredibly popular in recent years, gaining the trust of hardcore gamers thanks to digital video game distribution services like Steam…

11 Best IDEs for Web Development

Integrated development environments (IDEs) are invaluable for making web development easier, faster, and more efficient. Rather than juggling multiple tools, an IDE brings together essential…

30 Best GNOME Extensions for Ubuntu in 2023

GNOME is one of the most popular desktop environments available for Linux today. With its sleek interface and intuitive workflow, GNOME offers a polished user…

Blender

4 Best Open Source Video Editors for Linux, Mac and Windows: A Complete 2023 Guide

Video content creation is more accessible today than ever before thanks to affordable equipment and software. But proprietary video editors like Final Cut Pro or…

5 Best Free and Open Source NAS Software for Linux

Network-attached storage (NAS) devices have become very popular among home users and businesses for centralized file storage and backup. NAS units typically run a Linux-based…

5 Best Linux Distros to Learn Linux

5 Best Linux Distros to Learn Linux

Hi there! If you‘re venturing into the world of Linux for the first time, one key decision you’ll face is: which Linux distribution (or "distro")…

Best Text to Speech Software for Linux in 2024

text to speech software linux

Whether you’re a tech enthusiast seeking to add voice capabilities to your projects, a content creator aiming to reach a broader audience, or an accessibility advocate championing inclusivity, Linux offers a treasure trove of TTS tools to amplify your impact.

In this blog, we embark on an odyssey through the finest TTS software available for Linux, unveiling their unique features, uncovering hidden gems, and equipping you with the knowledge to harness the full potential of speech synthesis on your Linux system. From command-line wizards to intuitive graphical interfaces, prepare to be captivated by the versatility and ingenuity of TTS software designed expressly for the Linux community.

text to speech software linux

Table of Contents

Configuration for enabling tts functionality on linux distributions , troubleshooting common installation issues, tips for enhancing voice quality and naturalness, integrating tts with applications, setting up text to speech on linux.

text to speech software linux

Before installing TTS software on Linux, it is important to ensure that your system meets the necessary prerequisites. These may include a working internet connection, administrative privileges for software installation, and compatibility with the chosen TTS engine or library. Additionally, it is advisable to review the documentation and system requirements provided by the TTS software developers to ensure compatibility with your Linux distribution.

Here is a step by step guide to installing TTS engines and libraries on your Linux system:

1.  Research and select a TTS software:  Explore available TTS engines and libraries compatible with your Linux system. Popular choices include eSpeak, Acapella, and Cepstral.

2.  Install TTS software:  Utilize your package manager (e.g., apt for Debian-based distributions, yum for CentOS/RHEL-based distributions) to install the chosen TTS software. For example, on Debian-based systems, you can use the command sudo apt install <package-name> to install TTS packages.

3.  Verify installation:  After installation, verify that the TTS software is successfully installed by checking for the presence of executable binaries and configuration files.

1.  Configure TTS settings:  Access the configuration files or settings panel of your Linux distribution to enable TTS functionality. This may involve specifying default TTS voices, adjusting speech rate and pitch settings, and configuring audio output devices.

2.  Test TTS functionality : Utilize command-line tools or TTS-enabled applications to test the newly installed TTS software. Generate sample speech output to ensure that the TTS engine is functioning correctly and producing intelligible speech.

Dependency errors:  Resolve dependency errors by installing missing packages or libraries required by the TTS software.

Configuration errors:  Double-check configuration settings and file permissions to ensure proper integration of the TTS software with your Linux distribution.

Audio output issues:  Troubleshoot audio output issues by verifying sound card configurations and checking system audio settings.

Customizing TTS Voices on Linux

Customizing text to speech voices on Linux adds a layer of personalization and adaptability to speech synthesis technology, allowing users to tailor their auditory experiences to their preferences and requirements. 

Here are a few methods to customize TTS voices on Linux:

1.  Voice selection: Begin by selecting TTS voices that align with your preferences and intended use cases. Experiment with different voices to identify those that best suit your needs and resonate with your audience.

2.  Voice modulation:  Some TTS engines offer options to modify voice parameters such as pitch, speed, and intonation. Adjusting these parameters can enhance voice clarity and naturalness, resulting in more engaging speech output.

1.  Use high-quality audio samples:  When building custom voices, utilize high-quality audio recordings to capture the nuances of natural speech patterns and inflections.

2.  Incorporate pronunciation rules:  Ensure that TTS engines adhere to proper pronunciation rules and phonetic transcription guidelines to improve speech intelligibility and accuracy.

3.  Fine-tune prosody and emphasis:  Adjust prosodic features such as emphasis , rhythm, and stress to convey meaning and emotion effectively in synthesized speech.

1.  Speech Synthesis Markup Language (SSML) : SSML provides a standardized markup language for controlling aspects of speech synthesis, such as pronunciation , emphasis, and prosody. TTS engines that support SSML enable developers to fine-tune speech output according to specific requirements.

2.  Text to speech APIs:  Many TTS engines offer API s that allow developers to programmatically generate speech output from text input. These APIs typically provide a straightforward interface for sending text data to the TTS engine and receiving synthesized speech in return.

3.  Speech synthesis API:  The Speech Synthesis API is a web standard that enables web developers to incorporate TTS functionality into web applications. Supported by modern web browsers, this API allows developers to create accessible and interactive web experiences with synthesized speech.

Five Linux Text to Speech Software

Here are the top TTS software for Linux:

Acapela is a popular TTS software known for its high-quality and natural-sounding speech synthesis in over 30 languages and 120 voices. While primarily developed for various platforms, including Windows and macOS, Acapela offers solutions for Linux users as well. Its Linux version provides a wide range of voices in multiple languages, enabling users to create engaging audio content, assistive technologies, and interactive applications.

Acapela’s TTS engine integrates seamlessly with Linux environments, offering advanced customization options for voice modulation, pronunciation, and prosody. With its extensive language support and robust performance, Acapela stands out as a versatile TTS solution for Linux users seeking premium speech synthesis capabilities.

Speechelo is a user-friendly TTS software designed to simplify the process of generating high-quality speech output from text. While primarily marketed towards content creators and marketers, Speechelo offers compatibility with Linux systems through web-based interfaces and desktop applications. Its unique selling point lies in its ability to create lifelike voiceovers with natural intonation and emotion, enhancing the engagement and impact of audiovisual content.

It offers over 30 human-sounding voices that work in over 24 languages. Speechelo’s intuitive interface and diverse range of voice options make it a popular choice among Linux users seeking efficient and professional-grade TTS solutions for multimedia projects, elearning modules, and promotional materials.

Cepstral is a versatile TTS engine recognized for its exceptional voice quality and seamless integration with Linux distributions. Unlike some other TTS solutions, Cepstral boasts a proprietary speech synthesis technology that delivers clear, expressive, and human-like speech output across various applications and platforms. Users can freely trial Cepstral’s high-quality text to speech voices via the internet.

With a selection of six distinct U.S. English voices and additional options for UK English, Spanish , French, Italian, and German , Cepstral caters to diverse linguistic needs. Notably, Cepstral natural sounding voices adhere to SAPI 5 compliance standards. Its lightweight and efficient development tools make it well-suited for resource-constrained Linux environments, ensuring optimal performance without compromising on voice quality or customization capabilities.

eSpeak is an open-source TTS engine specifically developed for Linux and other Unix-like operating systems. As a compact and lightweight solution, eSpeak NG offers basic speech synthesis functionality with support for 50+ languages and pronunciation rules. Its simplicity and ease of use make it an attractive choice for Linux users to install eSpeak NG and get a straightforward TTS solution for basic text to speech conversion tasks, command-line utilities, and accessibility features.

While eSpeak may not offer the advanced customization options or premium voice quality of some commercial TTS engines, its open-source nature and extensive language support make it a valuable addition to the Linux software ecosystem. 

Festival is a comprehensive TTS system developed by the University of Edinburgh, offering extensive support for Linux and other Unix-based platforms. Festival distinguishes itself with its modular architecture and flexible design, allowing users to customize and extend its functionality through a variety of plugins, language models, and voice synthesis techniques. It currently supports five languages (British English, American English , Spanish, Czech and Italian) with many languages in the prototype mode.

With its powerful scripting capabilities and extensive documentation, Festival is well-suited for advanced users, researchers, and developers seeking to explore the depths of speech synthesis technology on Linux. Despite its steep learning curve, Festival remains a popular choice among Linux enthusiasts and academics for its robustness, extensibility, and support for cutting-edge research in TTS and natural language processing.

Text to speech technology plays a pivotal role in enhancing accessibility and usability for users with disabilities within the Linux ecosystem. TTS accessibility features empower individuals with visual impairments, learning disabilities , and motor impairments to access digital content, navigate user interfaces, and engage with technology more independently. By providing auditory feedback and alternative modes of interaction, TTS enables inclusivity and equal access to information and communication tools for all users.

Ongoing developments and advancements in TTS for Linux signify a commitment to improving speech synthesis capabilities and expanding the range of applications and use cases. From advancements in voice quality and naturalness to innovations in multilingual support and domain-specific applications, TTS technology continues to evolve to meet the diverse needs and preferences of users across different contexts and environments.

text to speech software linux

What is text to speech (TTS) software for Linux?

Text to speech (TTS) for Linux refers to software applications or libraries designed to convert written text into spoken words on the Linux operating system. These tools enable users to listen to text-based content such as documents, web pages, or e-books instead of reading them, providing accessibility options for individuals with visual impairments and enhancing user experiences in various applications.

How does TTS software work on Linux platforms?

TTS software on Linux processes textual input using algorithms that analyze linguistic elements. It then generates corresponding speech signals, which are outputted through audio devices, enabling users to hear the synthesized speech. Users can opt for any tool or other Linux distribution to get the TTS installed.

How to convert text to speech in Linux?

To convert text to speech in Linux, users can install and configure TTS software, then utilize command-line tools or integrate TTS functionality into applications to generate speech output from text input. The speech synthesizer and command-line program convert the text to English or other languages as per user requirements. 

Can I customize the voice in TTS on Linux?

Yes, many TTS software options for Linux offer voice customization features. Users can often modify parameters such as pitch, speed, and intonation to tailor the voice to their preferences. 

Which Linux distributions support TTS software?

Most mainstream Linux distributions support TTS software, including Ubuntu, Debian, Fedora, CentOS, and Arch Linux, among others. Users can install TTS software packages from their distribution’s package repositories.

What file formats does TTS software on Linux support?

TTS software on Linux typically supports a variety of file formats for textual input, including plain text files (.txt), rich text format (.rtf), and markup languages such as HTML and XML. On some platforms, audio files can also be used as the final output under TTS.

Can TTS software handle multiple languages on Linux?

Yes, many TTS software options for Linux support multiple languages and offer a diverse selection of voices in various languages and dialects. Some platforms also support real-time translations as users speak text. 

Is there support for real-time TTS on Linux?

Yes, some TTS software options for Linux offer real-time synthesis capabilities, enabling immediate conversion of text input into speech output with minimal latency. It depends on the text to speech software. 

What are the accessibility features of TTS on Linux?

TTS on Linux enhances accessibility by providing auditory feedback, enabling users with visual impairments or reading difficulties to access digital content, navigate interfaces, and interact with applications effectively. It also supports features like screen readers and voice commands, further improving accessibility for users with disabilities. It converts text to audio in various formats, like MP3 and WAV files, in the supported languages with command line options. 

You should also read:

text to speech software linux

Top 10 Generative AI Trends to Watch Out for In 2024

text to speech software linux

Top AI Social Media Tools for 2024

text to speech software linux

The Ultimate Guide to Speech Synthesis in 2024

TTS 1

Top 15 Open Source Speech Recognition/TTS/STT/ Systems

Published on: July 30, 2024 Last updated on: August 1, 2024

A speech-to-text (STT) system , or sometimes called automatic speech recognition (ASR) is as its name implies: A way of transforming spoken words via sound into textual data that can be used later for any purpose.

A text-to-speech (TTS) system , on the contrary, is a method to generate audio from textual data and files. You basically give it the text, and it generates the corresponding speech audio for it.

Both technologies are extremely useful.

They can be used for a lot of applications such as the automation of transcription, writing articles using sound only or creating audiobooks, enabling a complicated analysis of information using the generated textual files… and a lot of other things.

In the past, proprietary software and libraries dominated speech-to-text and text-to-speech technologies. Open source speech recognition alternatives didn’t exist or existed with extreme limitations and no community around.

This is changing, today there are a lot of open source speech tools and libraries that you can use right now.

They even boomed much more than before, thanks to the trend of AI and generative models.

Table of Contents:

What is a Speech Library?

What is an open source stt/tts library, what are the benefits of using open source stt/tts software, 3. flashlight asr (formerly wav2letter++), 4. paddlespeech (formerly deepspeech2), 9. styletts2, 10. coqui tts, 11. gpt-sovits, 12. vall-e x, 13. amphion, 14. emotivoice, what is the best open source speech recognition system, why did you not mention the deepspeech project by mozilla, why did you remove openseq2seq from your list, some other speech models are not mentioned in your article, how about you compare the performance of these models.

It is the software engine responsible for transforming voice to text or vice versa, and It is not meant to be used by end users.

Developers will first have to adopt these libraries and use them to create computer programs that can enable speech recognition for users.

Some of them come with preloaded and trained datasets to recognize the given voices in one language and generate the corresponding texts, while others just give the engine without the dataset, and developers will have to build the training models themselves.

This can be a complex task, similar to asking someone to do my online homework for me or any other, as it requires a deep understanding of machine learning and data handling.

You can think of them as the underlying engines of speech recognition programs.

If you are an ordinary user looking for speech recognition or audio generation for text, then none of these will be suitable for you, as they are meant for development use only.

The difference between proprietary speech recognition and open source speech recognition is that the library used to process the voices should be licensed under one of the known open source licenses, such as GPL, MIT and others.

Microsoft, NVIDIA and IBM for example have their own speech recognition toolkits that they offer for developers, but they are not open source: Simply because they are not licensed under one of the open source licenses in the market.

Check the license of the open source speech-to-text library you are interested in, and if it is an open-source license as identified by OSI , then it is an open source library.

Mainly, you get few or no restrictions at all on the commercial usage for your application, as the open source speech libraries will allow you to use them for whatever use case you may need.

Also, most – if not all – open source speech toolkits in the market are free of charge, saving you tons of money instead of using proprietary ones.

So instead of using proprietary speech services and paying for each minute of voice you convert to text, or paying a recurring monthly subscription, you can use the open source alternatives without limits or anyone’s permission.

Top Open Source STT/TTS Systems

open source speech recognition

In this article we’ll see a couple of these speech transformation systems, what are their pros and cons and when they can be used.

Some of these open source libraries can be used for STT, and some of them can only be used for TTS. Others can be used for both, and we will mention the capabilities of each one so that you can easily choose.

We made sure to select only the top working, still-maintained and useful software that belong in this list for our readers. You can review our criteria for listicle articles on FOSS Post to understand the basis for our selections. Remember that we only cover open-source software on FOSS Post that follow the OSI definition and an OSI-approved license . The ranking is random and does not reflect our rating for the software.

Kaldi is an open source speech recognition (STT) software written in C++, and is released under the Apache public license.

It works on Windows, macOS and Linux. Its development started back in 2009.

Kaldi’s main feature over some other speech recognition software is that it’s extendable and modular: The community provides tons of 3rd-party modules that you can use for your tasks.

Kaldi also supports deep neural networks, and offers excellent documentation on its website . While the code is mainly written in C++, it’s “wrapped” by Bash and Python scripts.

So if you are looking just for the basic usage of converting speech to text, then you’ll find it easy to accomplish that via either Python or Bash. You may also wish to check Kaldi Active Grammar , which is a Python pre-built engine with English-trained models already ready for usage.

Learn more about Kaldi speech recognition from its official website .

Probably one of the oldest speech recognition (STT) software ever, as its development started in 1991 at the University of Kyoto, and then its ownership was transferred to as an independent project in 2005.

A lot of open source applications use it as their engine (Think of KDE Simon).

Julius’ main features include its ability to perform real-time STT processes, low memory usage (Less than 64MB for 20000 words), ability to produce N-best/Word-graph output, ability to work as a server unit and a lot more.

This software was mainly built for academic and research purposes. It is written in C, and works on Linux, Windows, macOS and even Android (on smartphones).

Currently, it supports both English and Japanese languages only.

The software is probably available to install easily using your Linux distribution’s repository; Just search for julius package in your package manager.

You can access Julius source code from GitHub.

If you are looking for something modern, then this one can be included.

Flashlight ASR is an open source speech recognition software that was released by Facebook’s AI Research Team. The code is a C++ code released under the MIT license.

Facebook described its library as “the fastest state-of-the-art speech recognition system available” up to 2018.

The concepts on which this tool is built make it optimized for performance by default.

Facebook’s machine learning library Flashlight is used as the underlying core of Flashlight ASR. The software requires that you first build a training model for the language you desire before becoming able to run the speech recognition process.

No pre-built support for any language (including English) is available. It’s just a machine-learning-driven tool to convert speech to text. So you will have to train and build your own models.

You can learn more about it from the following link .

TTS 6

Researchers at the Chinese giant Baidu are also working on their own speech recognition and text-to-speech toolkit, called PaddleSpeech.

The speech toolkit is built on the PaddlePaddle deep learning framework, and provides many features such as:

  • Speech-to-Text and speech recognition (ASR) support.
  • Text-to-Speech support.
  • State-of-the-art performance in audio transcription, it even won the  NAACL2022 Best Demo Award ,
  • Support for many large language models (LLMs), mainly for English and Chinese languages.

The engine can be trained on any model and for any language you desire.

PaddleSpeech ‘s source code is written in Python, so it should be easy for you to get familiar with it if that’s the language you use.

One of the newest open source speech recognition systems, as its development just started in 2020.

Unlike other systems in this list, Vosk is quite ready to use after installation, as it supports +20 languages (English, German, French, Turkish…) with portable pre-trained models already available for users.

Vosk offers small models (around 100 MB in size) that are suitable for general tasks and lightweight devices, and larger models (up to 1.5 GB in size) for better performance and results.

It also works on Raspberry Pi, iOS and Android devices, and provides a streaming API that allows you to connect to it to do your speech recognition tasks online.

Vosk has bindings for Java, Python, JavaScript, C# and NodeJS.

Learn more about Vosk from its official website .

An end-to-end speech recognition engine that implements ASR.

Written in Python and licensed under the Apache 2.0 license. Supports unsupervised pre-training and multi-GPUs training either on same or multiple machines. Built on the top of TensorFlow.

Has a large model available for both English and Chinese languages.

Visit Athena source code .

Written in Python on the top of PyTorch, ESPnet can be used for both speech recognition (ASR/TTS) and speech-to-text (STT) tasks.

It follows the Kaldi style for data processing, so it would be easier to migrate from it to ESPnet.

The main marketing point for ESPnet is the state-of-art performance it gives in many benchmarks, and its support for other language processing tasks such as machine translation (MT) and speech translation (ST).

The library is licensed under the Apache 2.0 license.

You can access ESPnet from the following link .

One of the newest speech recognition toolkits in the family.

It was developed by the famous OpenAI company (the same company behind ChatGPT ).

The main marketing point for Whisper is that it does not specialize in a set of training datasets for specific languages only; instead, it can be used with any suitable model and for any language.

It was trained on 680 thousand hours of audio files, one-third of which were non-English datasets.

It supports speech-to-text, text-to-speech, and speech translation. The company claims that its toolkit has 50% fewer errors in the output compared to other toolkits in the market.

Learn more about Whisper from its official website .

Also one of the newest libraries on this list, as it was just released in the middle of November 2023.

It employs diffusion techniques with large speech language models (SLMs) training in order to achieve more advanced results than the previous generation of models.

The makers of the model published it along with a research paper, where they make the following claim about their work:

This work achieves the first human-level TTS synthesis on both single and multispeaker datasets, showcasing the potential of style diffusion and adversarial training with large SLMs.

It is written in Python, and has some Jupyter notebooks shipped with it to demonstrate how to use it. The model is licensed under the MIT license.

There is an online demo where you can see different benchmarks of the model: https://styletts2.github.io/

Coqui TTS is a deep learning toolkit designed for Text-to-Speech (TTS) generation, implemented primarily in Python. It is licensed under the MPL 2.0 license.

The software leverages several advanced libraries and frameworks such as PyTorch to facilitate high-performance model training and inference. Notably, Coqui TTS supports multiple architectures including Tacotron2, Glow-TTS, FastSpeech variants, and various vocoder models like MelGAN and WaveRNN.

This modular design allows users not only to utilize pre-trained models available in many languages but also offers tools for fine-tuning existing models or developing new ones tailored to specific needs.

The main features of Coqui TTS include efficient multi-speaker support that enables the synthesis of voices from different speakers using shared datasets while maintaining distinct vocal characteristics.

It also has capabilities such as voice cloning through YourTTS integration and real-time streaming with low latency (<200ms), making it suitable both for academic research applications as well as production environments requiring scalable solutions.

https://github.com/coqui-ai/TTS

GPT-SoVITS is an innovative software tool designed for few-shot voice conversion and text-to-speech (TTS) applications, primarily developed using Python. Licensed under the MIT license.

One of its main features is that only one minute of vocal samples is needed for effective model fine-tuning. The platform supports zero-shot capabilities that allow immediate speech synthesis from a five-second audio sample while also offering cross-lingual support in languages like English, Japanese, and Chinese.

In other words, building training models for this library would be much easier than other ones.

Additionally, it provides functionalities for enhanced emotional control over generated speech and allows customization through various pre-trained models available.

https://github.com/RVC-Boss/GPT-SoVITS

TTS 8

VALL-E X is an open-source implementation of Microsoft’s VALL-E X zero-shot text-to-speech (TTS) model, primarily developed in Python and licensed under the MIT license.

The library allows cloning voices with just a short audio sample while maintaining high-quality speech synthesis across multiple languages including English, Chinese, and Japanese.

It also has advanced functionalities like emotion control during speech generation and accent manipulation when synthesizing different language prompts.

Users can also experiment with voice cloning by providing minimal recordings alongside transcripts or allowing the system’s integrated Whisper model to generate transcriptions automatically from input audio files.

VALL-E X is very close to the state-of-the-art performance in its category.

https://github.com/Plachtaa/VALL-E-X

Amphion is an open-source toolkit designed for audio, music, and speech generation.

Licensed under the MIT license, it is primarily developed in Python with supporting components written in Jupyter Notebook and Shell scripting.

The software leverages various other model structures from other libraries such as FastSpeech2, VITS, VALL-E, NaturalSpeech2 for text-to-speech (TTS) tasks.

One of Amphion’s standout features is that it offers visualizations that can help users understand how it is currently working while doing TTS and audio generation tasks, which makes it a very good software for educational and academic purposes.

Additionally, it comes with a large dataset called “ Emilia ” that contains more than 100,000 hours of speech recordings that can be used for training models in 6 languages including English.

https://github.com/open-mmlab/Amphion

EmotiVoice is an open-source text-to-speech (TTS) engine primarily developed in Python, utilizing libraries such as PyTorch and various audio processing tools. It is licensed under the Apache 2.0 license.

It supports both English and Chinese languages while offering over 2000 unique voices for users to choose from.

The software supports emotional synthesis, allowing the generation of speech that conveys a wide range of emotions like happiness, sadness, anger, and excitement. This functionality enhances user engagement by providing more expressive voice outputs compared to traditional TTS systems.

Unlike most software in our list, this one also includes a user-friendly web interface that can be used to run and manage the model:

TTS 10

It also ships with scripting capabilities suitable for batch-processing tasks.

https://github.com/netease-youdao/EmotiVoice

Piper is a fast, local neural text-to-speech (TTS) system designed for embedded devices such as Raspberry Pi.

The software is primarily written in C++ and is licensed under the MIT license, but it can also be called as a Python library with pip .

Piper supports various voice models trained with VITS technology, enabling it to produce high-quality speech synthesis across multiple languages including English, Spanish, French, German among others.

You can listen to its sample demos in all supported languages from the following URL: https://rhasspy.github.io/piper-samples/

One of the standout features of Piper is its ability to stream audio output in real-time while synthesizing speech from input text. Additionally, users can customize their output clips by selecting different speakers when utilizing multi-speaker models via specific commands during runtime.

Piper is suitable to be used as a home assistant installed on any Raspberry Pi device, so this should be treated as its main advantage. Other models in this article, for example, may not have such an ability.

https://github.com/rhasspy/piper

If you are building a small application that you want to be portable everywhere, then Vosk or Piper are your best options, as they are compatible with Python and support a lot of languages, and can work on devices with low resources such as the Rasberry Pi.

It also provides both large and small models that fit your needs.

If, however, you want to train and build your own models for much more complex tasks, then any of PaddleSpeech, Whisper, GPT-SoVITS, Emotivoice and VALL-E X should be more than enough for your needs, as they are the most modern state-of-the-art toolkits.

Traditionally, Julius and Kaldi are also very much cited in the academic literature, but they are “boring” and don’t have the luxury and new features like other libraries.

So pick up the one that best fits your own needs and requirements.

Frequently Asked Questions (FAQ’s)

Here are some frequent questions that we get asked about this article along with their answers:

DeepSpeech by Mozilla was abandoned many years ago and it is no longer under active development.

We recommend using other open-source models on this page that are still maintained.

Just like DeepSpeech by Mozilla, OpenSeq2Seq from NVIDIA is no longer under active development and was abandoned many years ago.

Try using other models in our list.

Please review the listicle criteria mentioned earlier to understand why we made our choices. Ultimately, we may have missed a few of them, but all of those mentioned are the top ones indeed in the market at the time of writing this article.

You are always welcome to leave us a comment about an addition that you think should be made to this article.

That could be nice for a research paper project or a PhD thesis.

However, this is only a small listicle article to help you get started with voice and text recognition, and can not handle the weight of such a project.

Setting up these models and trying them with real data may take a lot of time, and it’s up to you as a developer to choose the best one that fits your needs.

The speech recognition and TTS category is starting to become mainly driven by open source technologies, a situation that seemed to be very far-fetched a few years ago.

The current open source speech recognition software are very modern and bleeding-edge, and one can use them to fulfill any purpose instead of depending on Microsoft’s or IBM’s toolkits.

If you have any other recommendations for this list, or comments in general, we’d love to hear them below.

Other interesting reads:

open source digital twin software

FOSS Post has been providing high-quality content about open source and Linux software for around 7 years now. All of our content is free so that you can enjoy it whenever you like. However, consider buying us a cup of coffee by joining our Patreon campaign or doing a one-time donation to support our efforts!

Our community platform is here. Join it now so that you can explore tons of interesting and fun discussions about various open source aspects and issues!

Are you stuck following one of our articles or technical tutorials? Drop us a support request in the forum and we'll get right back to you.

You can take a number of interesting and exciting quizzes that the FOSS Post team prepared about various open source software from FOSS Quiz.

With a B.Sc and M.Sc in Computer Science & Engineering, Hanny brings more than a decade of experience with Linux and open-source software. He has developed Linux distributions, desktop programs, web applications and much more. All of which attracted tens of thousands of users over many years. He additionally maintains other open-source related platforms to promote it in his local communities.

Hanny is the founder of FOSS Post.

Enter your email address to subscribe to our newsletter. We only send you an email when we have a couple of new posts or some important updates to share.

Social Links

Open source directory.

Business Software




Designing Software

Development
Engineering

Academic Software
Medical Software
User Software



Threat Software


Join the FOSS!

Enjoying the high-quality content? Join our campaign on Patreon and unlock a 100% ad-free account on our website, beside many other perks and gifts!

Become Part of the Community!

text to speech software linux

Are you facing a technical problem following one of our tutorials? Or maybe you would just like to have a chat about some open source discussions and events? Check out our community which is unlike any other!

Take a Quiz

text to speech software linux

Are you aiming to enhance your experiance with Linux concepts and other open source software? Check our list of helpful quizzes that you can take to measure your knowledge!

Monitor the FOSS World

text to speech software linux

Do you like using RSS feed readers and services? Get our collective .OPML file that contains +80 open source related websites and podcasts so that you keep up with everything happening!

security offer from FOSS Post

Comments on this story are now closed.

Originally published on July 30, 2024, Last Updated on August 1, 2024 by M.Hanny Sabbagh

LinuxGUI.com

LinuxGUI.com

High Quality Text to Speech Software – Best TTS for Linux

The best text to speech for Linux software that provide high quality text to speech — The sound have the highest quality among TTS (Text to Speech) systems, you can try unit selection voices, not hsmm, they should be less robotic — TTS with natural sounding speech.

What is the best text to speech program in Linux?

If you try to find a good solution to TTS on Linux to help you proofread nearly everything you write as without it you almost always have to many mistakes. There is not only good tts for Linux called Cepstral.

The Cepstral, paid Linux software for TTS can speak any text they are given with whatever voice you choose. Cepstral is building new synthetic voices for Text-to-Speech (TTS) every day, and can find or build the right one for any application.

On the creation date of this article, Cesptral gain version 6 with these features added : Natural prosody and smart pronunciation Enhanced audio  (22kHz). New voices added:  Allison now has 20% more source material, Alejandra, Charlie Superb OS integration

Cepstral Supported Language:

US English, German, UK English, Americas Spanish, Canadian French, and Italian

How to Install Cepstral Text to Speech Program in Linux To install tts software called Cepstral for Linux follow these steps:

  • Download the installer file from Cepstral official s ite here
  • Extract the file, e.g. tar -xvzf Cepstral_Allison_x86-64-linux_5.1.0.tar.gz
  • Change directory to the extracted directory, e.g. cd Cepstral_Allison_x86-64-linux_5.1.0
  • Run the install script with elevated privileges,  sudo sh install.sh
  • Enter activation key, how to obtain and activate Cepstral key click here

High Quality Text to Speech - Best TTS for Linux Software

Another Best Free TTS for Linux

This feature available if you are using Google Chrome as your browser because the text to speech software in Linux provided by an Chrome extension! Here is what I did to have pure natural speech for PDF and TEXT FILE for FREE (other solutions are not natural or they’re just paid services).

  • Install SpeakIt! extension on your chrome or chromium.
  • Drag and drop your pdf or text file (*.txt) to browser.
  • Now highlight some text and right click and select SpeakIt! from the rught click context menu. Or you can click icon near the address bar so you can listen to pure natural text-to-speech (check the picture above).

There’s also ways to open other files like .doc and .txt in chrome and do the same. There’s other extensions for chrome that view pdf files, check if it fits you better. Besides you can upload all kind of texts in Google Drive and use SpeakIt! to read it for you. You need to convert your document into pure text file with .txt suffix and drag and drop it into Google Chrome if you want to read it using this extension, also yo need an internet connection.

Text to Speech for Linux: Unveiling Top Solutions for Voice Synthesis

text to speech software linux

Text-to-speech (TTS) technology on Linux allows users to convert written text into spoken words. This functionality is not only useful for the visually impaired but also benefits those who prefer auditory learning or require hands-free computing. Several TTS tools are available for Linux, each offering varying features to cater to diverse needs. Popular among them is eSpeak , a compact open-source software that provides a straightforward command-line interface for speech synthesis.

The landscape of Text-to-speech for Linux encompasses a range of applications from simple, lightweight programs to more complex systems with natural-sounding voices. The quest for naturalness in computer-generated speech has given rise to projects like CMUSphinx , which aims to provide high-quality speech recognition using models trained on different languages. Accessibility and customization are focal points in the development of Linux TTS tools, as many of them are open source and enable modification to meet user-specific requirements.

While TTS technology continues to evolve, Linux users have access to a number of options for integrating speech into their computing experience. Implementations vary from simple command-line interfaces to more sophisticated GUI-based applications, ensuring there is a solution suitable for different skill levels and use cases. Through these applications, Linux upholds its commitment to inclusivity and adaptability in the realm of digital accessibility.

Linux Text to Speech Basics

In the realm of Linux computing, text to speech tools are essential for converting written text into audible speech. These tools are widely used for their accessibility benefits and in various applications where speech output from text is preferable, especially when utilizing high quality voices and natural sounding voices.

Understanding Speech Synthesis

Speech synthesis, commonly referred to as text to speech, involves the artificial production of human speech. However, the quality of the default voice often leaves much to be desired, sounding robotic and unnatural compared to other synthesized voices like Microsoft Sam. The process begins with text analysis, during which the input text is converted into a linguistic structure. Then, during the synthesis phase, this structure is transformed into the audible waveform that we hear as speech. Each TTS system features unique algorithms and technologies to accomplish this complex task, ensuring the output is as natural-sounding as possible.

TTS Engines for Linux

Linux users have access to a variety of TTS engines. High-quality speech voices are crucial for different use cases, such as adding voice instructions to videos or seeking natural and comforting voices for reading text. eSpeak is a compact, open-source TTS engine known for its simplicity and support for multiple languages. It operates via command line and can be easily integrated with different applications. Another example is Festival, which offers a framework for building speech synthesis systems and is known for its versatility in producing custom voices. Some Text-to-speech tools offer additional features like:

  • Adjusting pitch and speed
  • Controlling word gaps

For those seeking more advanced commercial solutions, engines like Cepstral provide a more natural voice quality for professional applications. It’s important to select a TTS engine that balances functionality with system resource requirements, as some engines may be more resource-intensive than others.

Implementation and Usage

Adopting text-to-speech technology on Linux systems can be streamlined by understanding the appropriate tools and their implementation within applications. Users can also convert text to audio files for various purposes, such as creating podcasts or embedding audio. Users have access to various command line and GUI tools, ensuring versatility across different use cases.

Installing TTS Software

To get started, one must install Text-to-speech software. On many Linux distributions this involves package managers like apt for Ubuntu or pacman for Arch Linux. For instance, eSpeak, a compact and open-source TTS program, can be installed using the command sudo apt-get install espeak on Ubuntu-based distributions.

Command Line TTS Tools

Using the command line, eSpeak can convert text files to speech or live input from the standard input. It supports English among other languages and is invoked using commands like espeak "Your text goes here". Advanced usage includes adjusting the pitch, speed, and saving the output to an audio file with flags like -p for pitch, -s for speed, and -w for writing to a file.

For a deep learning approach to Text-to-speech, coqui-ai/TTS offers a toolkit suitable for both research and production environments. This toolkit often requires additional steps for installation, such as working with Python virtual environments and installing dependencies.

Text-to-speech in Applications

Integrating TTS into applications can enhance the accessibility and functionality of software. For example, gosling serves as a wrapper around Google's Cloud Text-to-Speech API , allowing for natural-sounding speech synthesis through simple terminal commands after installation and setup. It shows how modern TTS technology can be leveraged even within Linux terminal environments.

  • Open Source

eSpeak NG – A Text To Speech Synthesizer For Linux

This guide explains what is eSpeak NG , how to install eSpeak NG in Linux and how to convert text to speech using eSpeak NG in Linux .

Table of Contents

What is eSpeak NG?

eSpeak NG is a command line, multi-lingual software speech synthesizer for English and many other languages. We can convert text to speech using eSpeak NG in Linux and Unix-like systems. eSpeak NG is an updated version of eSpeak engine created by Jonathan Duddington.

You can use eSpeak NG to listen to blogs and news sites and also convert text files to voice for visually impaired people. eSpeak includes different voices, and their characteristics can be altered.

eSpeak NG is a cross-platform application that supports Android, Linux, Mac OS and Windows. It is a free, open source program written in C programming language. The source code of eSpeak NG project is hosted in GitHub.

How eSpeak NG works?

eSpeak NG will read aloud the given text for you! It can able to speak text either from standard input or from a file. So, you can directly give the phrase to speak as input for eSpeak NG or save the text in a file and then pass that text file as an input. It uses text-to-speech to speak through the default sound device.

You can also save the output file in wav or mp3 format, instead of speaking directly. The resulting file can be played on any media players, such as VLC, SMplayer etc. It can also translate text into phoneme codes.

Supported languages

eSpeak NG does text to speech synthesis for 100+ languages and accents, including Afrikaans, Albanian, Aragonese, Armenian, Bulgarian, Cantonese, Catalan, Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Farsi, Finnish, French, Georgian, German, Greek, Hindi, Hungarian, Icelandic, Indonesian, Irish, Italian, Kannada, Kurdish, Latvian, Lithuanian, Lojban, Macedonian, Malaysian, Malayalam, Mandarin, Nepalese, Norwegian, Polish, Portuguese, Punjabi, Romanian, Russian, Serbian, Slovak, Spanish, Swahili, Swedish, Tamil, Telugu, Turkish, Vietnamese, Welsh and more. Some languages are supported better than others.

Install eSpeak NG in Linux

eSpeak NG is packaged for popular Linux operating systems, so you can install eSpeak using the default package manager.

To install eSpeak NG on Arch Linux, EndeavourOS and Manjaro Linux, run:

Debian, Ubuntu and its derivatives like Linux Mint and Pop OS:

Fedora, CentOS, AlmaLinux, and Rocky Linux:

Convert text to speech using eSpeak NG

eSpeak NG is fully compatible with its predecessor eSpeak. In fact, eSpeak NG uses the same command line options as eSpeak, with several additional functionalities. Let us see a few examples.

1. Speak a phrase aloud using eSpeak NG:

Alternatively, you can use echo command to pipe the phrase as input to eSpeak NG like below:

eSpeak NG will read aloud the given string through the default sound device.

2. As stated earlier, eSpeak NG can read aloud the contents from a file.

3. Read text input from standard input instead of a file:

Type the word to speak and hit ENTER key. To exit, press CTRL+C .

4. If you want to save output to a WAV audio file, rather than speaking it directly, use -w flag:

5. eSpeak can able to print the phonemes of a text.

The following command will speak the word "ostechnix", and print the phonemes that were spoken.

Sample output:

6. eSpeak NG supports several different voices. To list all voices supported by eSpeak NG, run:

You can also list all voices that speak a specific language, for example English (en), like below:

7. eSpeak NG will speak the given text using the default English voice. If you want to use a different voice, run:

8. For more details about eSpeak NG, refer the man pages:

Gespeaker - A GTK front-end to eSpeak

Gespeaker is a text to speech GTK+ front-end for eSpeak and mbrola. It allows you to play a text in many languages. You can adjust various settings such as voice, pitch, volume and speed.

To install Gespeaker in Debian, Ubuntu and its derivatives, run:

Once installed, launch Gespeaker from menu or application launcher. The default interface of Gespeaker will look like below:

Gespeaker interface

Gespeaker usage is fairly easy! Enter the text to speak and click Play button. it's that simple!!

You can choose language and the voice (male or female) to use from Base settings tab and adjust the values for pitch, volume, speed and delay settings as you wish from the Advanced settings section.

  • eSpeak NG GitHub Repository
  • Gespeaker GitHub Repository

Related read:

  • How To Use Google Translate From Commandline In Linux

' data-src=

Senthilkumar Palani (aka SK) is the Founder and Editor in chief of OSTechNix. He is a Linux/Unix enthusiast and FOSS supporter. He lives in Tamilnadu, India.

Bash Scripting – While And Until Loop Explained With Examples

Tr command in linux explained with examples, you may also like, how to fix ‘failed to install the extension..., how to avoid duplicate entries in bash history..., how to view directory tree structure in linux, how to save linux commands and use them..., record terminal sessions using asciinema in linux, pinguy builder – build your own, custom installable..., leave a comment cancel reply.

Save my name, email, and website in this browser for the next time I comment.

This site uses Akismet to reduce spam. Learn how your comment data is processed .

This website uses cookies to improve your experience. By using this site, we will assume that you're OK with it. Accept Read More

  • A-Z Commands
  • Privacy Policy
  • Terms & Conditions
  • Google News

10 Best Open-source Speech Recognition Tools for Linux

Mehedi Hasan

In modern times, speech is a popular and smart method for interacting with electronic devices. As we know, there are many open source speech recognition tools available on different platforms. From the beginning of this technology, understanding the human voice has improved simultaneously. This is why it has engaged many more professionals than before. The technical advancement is strong enough to make it clearer to the common people.

Open-source Speech Recognition Too ls for Linux

Open source voice recognition tools are no t available like the typical software we use in our daily lives on the Linux platform. After a long research, we found some well-featured applications for you with a short description. Let’s have a look at the points below! 

Kaldi is a special kind of speech recognition software that was started as a part of a project at John Hopkins University. This toolkit comes with an extensible design and is written in C++ programming language. It provides a flexible and comfortable environment to its users, with a lot of extensions to enhance Kaldi’s power . 

kaldi-Open Source Speech Recognition

Noteworthy Features

  • A free and flexible open source voice recognition application under the Apache license. 
  • Runs on multiple platforms, including GNU/Linux , BSD, and Microsoft Windows.
  • Provides support in installing and configuring the application for your system. 
  • Besides the speech recognition system, it also supports deep neural networks and linear transforms.

2. CMUSphinx

CMUSphinx comes with a group of featured-enriched systems with several pre-built packages related to speech recognition. It is an open-source program developed at Carnegie Mellon University. You will get this speaker-independent recognition tool in several languages, including French, English, German, and Dutch.

cmusphinx- open source voice recognition

  • It is an easy-to-use and fast speech recognition system with a user-friendly interface. 
  • Comes with a flexible design and efficient system, even in low-resource platforms. 
  • Provides acoustic model training tools through its Sphinxtrain package. 
  • Helps to perform different types of tasks through its helpful packages, including keyword spotting, pronunciation evaluation, alignment, and more. 
  • It is a cross-platform tool that supports both Windows and Linux systems.

3. DeepSpeech

DeepSpeech is an open source speech recognition engine that converts your speech to text. It is a free application by Mozilla. To run the DeepSearch project on your device, you will need Python 3 or above. Also, it needs a Git extension file, namely Git Large File Storage. It is used to version large files while you run them on your system.

  • DeepSpeech uses the TensorFlow framework to make the voice transformation more comfortable.
  • It supports NVIDIA GPU, which helps to perform quicker inference. 
  • You can use the DeepSearch inference in three ways: The Python package, the Node.JS package, or the Command-line client . 
  • Each time you want to run this software on your system, you’ll need to activate the virtual environment using the Python command. 
  • This application needs a Linux or Mac environment to run.

4. Wav2Letter++

WavLetter++ is a modern and popular speech recognition tool developed by the Facebook AI Research team. It is another open source program under the BCD license. This superfast voice recognition software was built in C++ and introduced with a lot of features. It provides the facility of language modeling, machine translation, speech synthesis, and more to its users in a flexible environment. 

  • It contains an active community on popular platforms like Facebook and Google groups to assist its users worldwide. 
  • WavLetter++ is a fast and flexible toolkit that uses the ArrayFire tensor library for maximum efficiency. 
  • It lets you work with a high-performance framework like wav2letter++, which helps to do successful research and model tuning. 
  • Also, it provides complete documentation through the tutorial sections.
  • You will find detailed recipes for WSJ, Timit, and Librispeech in the recipes folder.

Julius is comparatively an older open source voice recognition software developed by Lee Akinobu. This tool is written in the C programming language by the developers of Kawahara Lab, Kyoto University. It is a high-performance speech recognition application with a large vocabulary. You can use it in both English and Japanese languages. It can be a great choice if you want to use it for academic and research purposes. 

julius

  • Julius is a highly configurable application that can set different search parameters to tune its performance. 
  • This tool is based on a 2-pass strategy, which provides you with real-time and high-quality performance. 
  • It is a cross-platform project that runs on Linux, BSD, Windows, and Android Systems. 
  • Integrated with Julian, a grammar-based recognition parser. 
  • Besides supporting rule-based grammar, it provides Word graph output, Confidence scoring, GMM-based input rejection, and many more facilities.

Simon comes with a modern and easy-to-use speech recognition software developed by Peter Grasch. It is another open source program under the GNU General Public License. You are free to use Simon in both Linux and Windows systems. Also, it provides the flexibility to work with any language you want. 

simon-Open Source Speech Recognition

  • Simon provides the facility to do various arithmetic operations using its voice-controlled calculator.
  • Compatible with Skype and other popular VOIP programs to establish an easy communication system with friends and relatives.  
  • It allows users to watch slide shows and videos, listen to music , and more with simple voice commands. 
  • Also, it is an essential tool for reading newspapers and surfing the internet.

Mycroft has an easy-to-use open source voice assistant that converts voice to text. It is regarded as one of the most popular Linux speech recognition tools in modern times, written in Python. It allows users to make the best use of this tool in a science project or enterprise software application. Also, it can be used as a practical assistant that can tell you the time, date, weather, and more.

  • Integrated with the most popular social media and professional platforms, including Facebook, Github , LinkedIn, and more.
  • You can run this application on different software and hardware platforms. It can be a desktop or a Raspberry Pi .
  • Besides being a smart voice assistant, it provides the facility of audio recording, machine learning, software library, and more. 
  • It lets users convert the natural language to machine-readable data through Adapt, an intent parser of Mycroft.

8. OpenMindSpeech

OpenMindSpeech is one of the essential Linux speech recognition tools that aims to convert your speech to text for free. It is a part of the Open Mind Initiative and runs its operation, especially for developers. Before getting the present name, this program was introduced with different names like VoiceControl, SpeechInput, and FreeSpeech. 

  • It uses the overflow environment in voice recognition operations to make complex applications flexible.
  • Open Mind Speech is mostly compatible with Linux and UNIX-based platforms.
  • Using the internet, speech data can be collected from e-citizens, who contribute to raw data.

9. SpeechControl

SpeechControl is a free speech recognition application that is suitable for any Ubuntu distro. It comes with a graphical user interface based on Qt. Though it is still in its early development stage, you can use it for your project.

  • Speech Control is an open source program under the General Public License (GPL). 
  • It aims to work as a virtual assistant that provides repetitive task guidance to execute the process smoothly. 
  • It is mostly suitable for Linux-based platforms.
  • Also, it provides easy-to-understand user documentation with project details.

10. Deepspeech.pytorch

Deepspeech.pytorch is another mentionable open source speech recognition application that is ultimately the implementation of DeepSpeech2 for PyTorch. It contains a set of powerful networks based on DeepSpeech2 architecture. With many helpful resources, it can be used as one of the essential Linux speech recognition tools for research and project development.

  • Supports noise augmentation that helps to increase robustness at the time of loading audio. 
  • It provides a basic server script to send the post request to the server. 
  • Support several datasets for downloading, including TEDLIUM, AN4, Voxforge, and LibriSpeech. 
  • It lets you add noise to the training data through noise injection.
  • Supports Visdom and Tensorboard for visualizing training on scientific experimentation.

Finishing Thoughts

So, we have reached the finishing point on open source speech recognition tools for Linux. I hope you got comprehensive information regarding this topic. The above-mentioned applications are free, easy to use, and ready to be a part of your academic or personal project.

Which one do you prefer most? If you have any other choices, then don’t hesitate to let us know. Please do share this article with your community if you find it helpful. Till then, have a nice time. Thanks!

Mehedi Hasan

I dont understand alot of this github stuff i just need a deb

i just want to talk to my computer

I frequently make live videos (usually streamed by Instagram or Facebook) and I would like to know if there is a software that can automatically transcribe what I say in these videos, like Youtube does automatically for subtitles. Anyone can help? Thanks

I’m searching for a simple speech recognition to create a variable to select audio files to play for a blind person. This lady only wants to listen to a Bible version called The Message Bible. Unfortunately it isn’t available in a manner that doesn’t require the User to respond to visual selections. I envision a simple command line file triggered by a variable created by her voice when she says something like “Goto the book of Psalms, chapter 23. (since Psalms is indexed by Psalm they would be inside folders marked as chapters.

LEAVE A REPLY Cancel reply

Save my name, email, and website in this browser for the next time I comment.

You May Like It!

Query function in google sheets – a comprehensive usage guide, 10 best red hat-based linux distributions to check out, top 10 best software updater for your windows pc, top 20 emerging iot trends that will shape our future soon, trending now, 19 best multiplayer games for android | play with friends, top 10 best apps to open zip files easily on your smartphone, the 20 best big data tools and software for data analysis, 20 best irc clients for linux systems, surfshark: an all-in-one vpn tool you shouldn’t miss, 14 best data science books for every data scientist to read, 15 best python books for beginner and expert programmers, 9 best machine learning and artificial intelligence books, 14 best cloud computing books for newbies and professionals, 20 best javascript books for newbies and professional.

Copyright © 2024. All Rights Reserved. Ubuntu is a registered trademark of Canonical Ltd .

  • Shell Scripting
  • Docker in Linux
  • Kubernetes in Linux
  • Linux interview question

How to Convert Text to Speech on Linux

Text-to-speech (TTS) is the process of transforming written text into spoken words by means of computer technology. Just imagine a computer that reads a book to you. That is, quite literally, the ultimate device from TTS. TTS, in short, is an electronic voice living in the shell of robots. We can compare it with the situation when it can read any text you provide to it. But it is totally different. The only exception is that companies are switching to automatic manufacturing which is an advantage for them.

Benefits of text-to-speech on Linux

  • Accessibility: Text-to-speech (TTS) is the best friend for choice compared to proprietary software.

Common use cases for text-to-speech applications

  • Accessibility Tools: TTS, in short, is an artificial intelligence feature that has a role in screen readers that is used, among other things, by people who cannot physically see.
  • Audiobooks and Podcasts: Turn text articles into audio files and publish audiobooks or podcasts that contain information and entertainment as objectives
  • Language Learning: This technology helps to master audio like pronunciation, listening skills, and other things well.
  • Content Creation: Streamers, YouTubers, and other people in this community need their TTS settings for voice-overs and to a lesser extent other videos to which they add narration.
  • Customer Service : Spoken responses to customer challenges such as via automated phone systems or chatbots are examples of when TTS is used.

Available TTS engines and tools

1. installing espeak.

eSpeak is straightforward to install and use:

  • Open your terminal.
  • Update your package list:
  • Install eSpeak:

successfully installed  eSpeak in my system already

2. Converting Text to Speech

  • Open terminal and type :

That greatly works, When I enter after that Computer Speech What I give them

  • To read text from a file:

3. Installing Festival

Festival offers more natural-sounding voices and supports multiple languages:

  • Install Festival :

When you Firstly run then it installs in your system after that if re-run then that shows

Converting Text to Speech

Beautifully  pronouns by System  , when I enter that command

With eSpeak and Festival, you can add a voice to your Linux computer! This is a valuable tool for accessibility and a fun way to interact with your machine. Both engines are free and open-source, so why not give them a try?

Convert Text to Speech on Linux – FAQs

Which tts engine is best for my needs.

eSpeak: Best for lightweight, simple applications. Festival: Good for more natural-sounding voices and language support.

Can I use TTS offline?

eSpeak and Festival can be used offline after installation.

Are there any costs associated with these TTS engines?

eSpeak and Festival are free and open-source.

Please Login to comment...

Similar reads.

  • How to Get a Free SSL Certificate
  • Best SSL Certificates Provider in India
  • Elon Musk's xAI releases Grok-2 AI assistant
  • What is OpenAI SearchGPT? How it works and How to Get it?
  • Full Stack Developer Roadmap [2024 Updated]

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

MEDevel.com: Open-source for Healthcare, and Education

15 Open-source Text To Speech TTS Apps and Libraries

Hazem Abbas

Hazem Abbas

What is text-to-speech.

Text-to-speech or speech synthesis is an artificially generated human-sounding speech from text that recognize words and formulate human speech.

The first Text-To-Speech system was introduced to the world in 1968 by Noriko Umeda et al, at the Electrotechnical Laboratory in Japan.

In 1961, physicist John Larry Kelly, Jr and his colleague Louis Gerstman used an IBM 704 computer to synthesize speech, an event among the most prominent in the history of Bell Labs.

text to speech software linux

The benefits of TTS?

OpenTTS: Open Text to Speech Server

The primary advantageous of this technology are people with visual and reading impairments, as they were its first users.

Nowdays, many YouTube channels use this technology in order to minimize their edit and increase their production.

In many modern operating system, Text-to-speech is a built-in accessibility feature to assist people who cannot read on-screen text easily.

About this list

In this article we offer you our collection of free, open-source Text-To-Speech (TTS) and speech synthesis apps. You can also find a new updated list for more open-source web-based TTS apps and services .

1- MARY TTS

MARY TTS is an open-source, multilingual text-to-speech synthesis system written in pure java. It is available for Windows, Linux, and macOS.

MARY TTS is released under the LGPL-3.0 License.

text to speech software linux

Kaldi is a toolkit for speech recognition written in C++ and licensed under the Apache License v2.0.The source code is available at GitHub . Kaldi can run on Windows, Linux, and macOS. It also can run on Android, PowerPC, and with Web Assembly.

OpenTTS is a free, open-source Open Text to Speech Server written in Python. It is released under the MIT License. It supports several languages, and comes with an easy-to-use interface. Furthermore, it comes with numerous alternatives libraries.

Supported languages: English (27), German (7), French (3), Spanish (2), Dutch (4), Russian (3), Swedish (1), Italian (2), Swahili (1), Finnish, Korean, Japanese, Chinese, Swedish, and more.

text to speech software linux

eSpeak is a compact open source software speech synthesizer for English and other languages, for Linux and Windows. It supports several languages, and comes with dozens of useful features, which makes it the ideal choice for many users.

text to speech software linux

Supported languages

Afrikaans, Albanian, Aragonese, Armenian, Bulgarian, Cantonese, Catalan, Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Farsi, Finnish, French, Georgian, German, Greek, Hindi, Hungarian, Icelandic, Indonesian, Irish, Italian, Kannada, Kurdish, Latvian, Lithuanian, Lojban, Macedonian, Malaysian, Malayalam, Mandarin, Nepalese, Norwegian, Polish, Portuguese, Punjabi, Romanian, Russian, Serbian, Slovak, Spanish, Swahili, Swedish, Tamil, Turkish, Vietnamese, Welsh.

5- Text To Speech Converter

This open-source project allows you to convert any text into speech easily by copying and paste the text into its simple interface. It is written in C# programming languages and runs on Windows for now.

text to speech software linux

6- ONLINE TTS

ONLINE TTS is a simple HTML/ JavaScript project that turns your English text into a formidable speech. ONLINE TTS features simple shortcuts, and a clean user-interface.

text to speech software linux

Flite is a small, fast run-time synthesis library suitable for embedded systems and servers. The core Flite library was developed by Alan W Black [email protected] (mostly in his so-called spare time) while employed in the Language Technologies Institute at Carnegie Mellon University. Flite supports Windows, Linux, macOS, Android, FreeBSD, and several other systems.

text to speech software linux

Julius is an open-source large vocabulary continuous speech recognition engine.

It is a high-performance, small-footprint large vocabulary continuous speech recognition (LVCSR) decoder software for speech-related researchers and developers. Based on word N-gram and context-dependent HMM.

text to speech software linux

Athena is an open-source implementation of sequence-to-sequence based speech processing engine

Athena features

Hybrid Attention/CTC based end-to-end ASR

  • Speech-Transformer
  • Unsupervised pre-training
  • Multi-GPU training on one machine or across multiple machines with Horovod
  • End-to-end Tacotron2 based TTS with support for multi-speaker and GST
  • Transformer based TTS and FastSpeech
  • WFST creation and WFST-based decoding
  • Deployment with Tensorflow C++

text to speech software linux

10- ESPnet: end-to-end speech processing toolkit

ESPnet is an end-to-end speech processing toolkit, mainly focuses on end-to-end speech recognition and end-to-end text-to-speech.

It is a developer-friendly application that can integrated into web projects. Developers also can install it using Docker.

text to speech software linux

11- Voice Builder

Voice Builder is an open source text-to-speech (TTS) voice building tool that focuses on simplicity, flexibility, and collaboration. Our tool allows anyone with basic computer skills to run voice training experiments and listen to the resulting synthesized voice.

The Voice Builder project is written using JavaScript and released under the Apache-2.0 License.

text to speech software linux

12- Coqui TTS

Coqui TTS is a library for advanced Text-to-Speech generation. It's built on the latest research, was designed to achieve the best trade-off among ease-of-training, speed and quality.

text to speech software linux

13- Mozilla TTS

Mozilla TTS is a library for advanced Text-to-Speech generation. It's built on the latest research, was designed to achieve the best trade-off among ease-of-training, speed and quality.

text to speech software linux

14- M ycoft Mimic

Mycroft is an open-source voice assistant system. Mimic is the built-in TTS library created by Mycroft team.

text to speech software linux

15- Free TTS

text to speech software linux

If you know any other open-source TTS application, toolkit, or library that we didn't mention here, let us know.

text to speech software linux

Read More Articles in Development

Why cleaning xcode cache is essential: a step-by-step guide and recommended tools.

If you are a developer who uses Xcode to build your apps, then this guide is for you. Xcode is an incredibly powerful integrated development environment (IDE) for macOS, designed for creating apps for all Apple platforms. However, as with any sophisticated tool, Xcode can accumulate a significant amount of

22 Free Log Viewer Apps for Linux Systems

Log viewers are essential tools for managing and analyzing system logs on Linux. They allow users to monitor logs in real-time, filter and search for specific entries, and quickly identify issues within a system. For DevOps engineers, system admins, server admins, and developers, log viewers provide invaluable insights into system

12 Free SSH Clients for Windows, Alternative to Putty The SSH slayer

What is an SSH Client? An SSH (Secure Shell) client is a software application that enables secure communication between your local computer and a remote server. It uses the SSH protocol to provide encrypted connections, ensuring that data transferred between the two systems is protected from unauthorized access. With an

Top 8 Open-Source Design Systems for Enterprises and Startups

What is a Design Systems? A design system is a comprehensive set of standards, guidelines, components, and tools used to create a consistent visual and user experience across a product or a suite of products. It serves as a single source of truth for designers, developers, and other stakeholders involved

Why PNPM Should Be Your Go-To Node Package Manager: Installation and Usage Guide for Developers

In the world of Node.js development, managing packages efficiently is crucial. For years, NPM (Node Package Manager) has been the standard choice, but recently, PNPM has emerged as a strong alternative, offering significant improvements in performance, storage efficiency, and developer experience. This post will explore why you should consider

Tutorial: Installing Strapi Headless CMS with PostgreSQL Using Docker and Docker Compose

Strapi is a powerful open-source headless CMS that allows you to manage content effortlessly. Using Docker and Docker Compose simplifies the setup process, making it easy to deploy and manage your Strapi instance. In this tutorial, we’ll guide you through the steps to install Strapi using Docker and Docker

Google Chrome’s Latest Updates: Discover the Magic of Google Lens and Chat with Gemini!

Google Chrome just got smarter! With its latest updates, browsing the web is about to become more interactive and intuitive. Whether you're a tech enthusiast or just someone who loves new features, Chrome's fresh updates will transform how you explore the internet. Let’s dive into

Reclaiming Disk Space in Docker: A Guide to Pruning Unused Resources

Mastering Docker Pruning: A Quick Guide to Reclaim Disk Space and Improve Performance

11 Free Svelte Admin Panel and Dashboard Starters for Startups and Lazy Developers :D

Svelte is a modern JavaScript framework that takes a unique approach to building web applications. Unlike traditional frameworks, which work primarily in the browser, Svelte shifts much of the work to the build step, compiling components into highly efficient, minimal JavaScript that directly manipulates the DOM. This results in faster

Development

Science - healthcare, open-source apps, medical apps, dev. resources.

Stack Exchange Network

Stack Exchange network consists of 183 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Natural Sounding Text to Speech?

I am looking for some easy to install text to speech software for Ubuntu that sounds natural. I've installed Festival , Gespeaker , etc., but nothing sounds very natural. All very synthetic and hard to understand.

Any recommendations out there?

  • software-recommendation
  • text-to-speech

Jorge Castro's user avatar

  • 1 Possible duplicate of How can I install and use text-to-speech software? –  Organic Addict Commented Dec 5, 2015 at 20:24

17 Answers 17

Svox pico2wave.

A very minimalistic TTS, a better sounding than espeak or mbrola (to my mind). Some information here .

I don't understand why pico2wave is, compared to espeak or mbrola, rarely discussed. It's small, but sounds really good (natural). Without modification you'll hear a natural sounding female voice.

AND ... compared to Mbrola, it recognise Units and speaks it the right way! For example:

  • 2°C → two degrees
  • 2m → two meters
  • 2kg → two kilograms

After installation I use it in a script:

Then run it with the desired text:

or read the contents of an entire file:

That's all to have a lightweight, stable working TTS on Ubuntu.

jkoop's user avatar

  • 2 As far as I can see, it only uses cli parameters as input. Is there any way I can get pico2wave to read text from a filename? –  Carlos Eugenio Thompson Pinzón Commented Feb 15, 2014 at 17:42
  • 17 pico2wave is in package libttspico-utils in recent versions of ubuntu. @CarlosEugenioThompsonPinzón cat <filename> | xargs -I foo -0 pico2wave -w blah.wav foo –  naught101 Commented Mar 11, 2014 at 9:11
  • 3 @CarlosEugenioThompsonPinzón pico2wave -w a.wav "$(input.txt)" =). Agree that this CLI interface is bad design: unlike the huge majority of CLIs, and possible to reach the OS max CLI arg length . –  Ciro Santilli OurBigBook.com Commented Apr 13, 2014 at 9:44
  • 1 @Koen I don't know! :-) Like any other problem, try to produce a minimal example, e.g. using echo {1..1000} –  Ciro Santilli OurBigBook.com Commented Jun 22, 2015 at 9:48
  • 1 @user49557 We're not supposed to hijack others' questions, so maybe you can create a new question, explaining what exactly you installed, and what it is that went wrong, and then I can always try and help you (no guarantees, though, I'm not an expert :P) –  Koen Commented Jun 25, 2015 at 12:24

Pico and espeak are fun and easy to get to work, but they're not all that good. The default Festival voices are also not that good. However, Festival is a scheme-based speech framework, where a number of researchers have built much better plug-in voices. You can easily surpass the pico2wave quality on stock Ubuntu, because one of those voices is available as a ready-made package.

To make Festival sound natural, here's what to do:

You can do it from the command line by using -b (or --batch ) and putting each command into single quotes:

You can get other quite good voices from the Nitech repository, but installing them is finicky, and the default paths changed so the file name references in the bundled scheme files may need to be manually edited to work on stock Ubuntu.

Jon Watte's user avatar

  • 4 Btw, in Ubuntu 16.04, this package seems to be missing. You can download and install the deb from Debian and it will work fine: packages.debian.org/sid/all/festvox-us-slt-hts/download sudo dpkg -i Downloads/festvox-us-slt-hts_0.2010.10.25-2_all.deb –  Jon Watte Commented Aug 20, 2017 at 2:48
  • 1 OP asked for natural sounding TTS. Festival is still quite robotic. –  Nav Commented Mar 27, 2022 at 12:55
  • 1 Small command to read content of clipboard from bash: echo "(SayText \"$(xclip -selection clipboard -o)\")" | festival '(voice_cmu_us_slt_arctic_hts)' --pipe –  Olle Härstedt Commented Nov 5, 2022 at 23:41
  • 1 @jjxtra The manual page is at linux.die.net/man/1/festival and documents the command. You can run in --server mode. Or you can the festival command language to synthesize an utterance and save it to disk. See also cstr.ed.ac.uk/projects/festival/manual/festival_7.html –  Jon Watte Commented Feb 17, 2023 at 17:04
  • 1 An example on how to read a text file, and being able to pause it, would be nice, too. –  Olle Härstedt Commented Nov 3, 2023 at 9:10

I believe Ive found the best TTS software for free using a Google Chrome extension called "SpeakIt". This only works in the Chrome browser for me on Ubuntu. It doesnt work with Chromium for some reason. SpeakIt comes with two female voices which both sound very realistic compared to everything else out there. There are at least four more male & female voices listed s Chrome extensions if you search the Chrome Web Store using "TTS" as your query.

Usage : For use on a website. you highlight the text you want to be read and either right click and "SpeakIt" or click the SpeakIt icon docked on the Chrome top bar.

Firefox users also have two options. Within Firefox addons, do a search for TTS and you should find "Click Speak" and also "Text to Voice". The voices are not as good as the Chrome SpeakIt voices, but are definitely usable.

The SpeakIt extension uses iSpeech technology and for a price of $20 a year, the site can convert text to MP3 audio files. You can input text, URLs, RSS feeds, as well as documents such as TXT, DOC, and PDF and output to MP3. You can make podcast, embed audio, etc. Here is a link , and a sample of their audio (don't know how long the link will last).

Pablo Bianchi's user avatar

  • 4 Unfortunately none of the browser options work for PDF files. Have you come across one that does? I'd like to be able to select paragraphs to read from a PDF (i.e. not have to paste bits to terminal or other) –  James Owers Commented May 7, 2016 at 18:05
  • 1 this extension works for me on chromium 50.0.2661.94 using Debian 8.4 and its great! i especially like the english female voice. my only complaint is that it pauses for too long on commas. –  mulllhausen Commented Jun 28, 2016 at 21:56
  • It often mispronounces words and also takes time to send the text to a separate server rather then just using your own system. –  Goddard Commented Mar 4, 2017 at 6:25
  • Link is broken. –  842Mono Commented Feb 28, 2021 at 1:33
  • output is terrible compared to voicerss - very mechanical –  Michael Commented Nov 12, 2021 at 17:10

Simple Google™ TTS

Update from project page ( 2016 ): This project is currently unmaintained and will remain so for the foreseeable future .

Because of the lack of a better alternative I wrote a bash script that interfaces with a perl script by Michal Fapso to provide TTS via Google Translate. From the project description:

The intention is to provide an easy to use interface to text-to-speech output via Google's speech synthesis system. A fallback option using pico2wave automatically provides TTS synthesis in case no Internet connection is found. As it stands, the wrapper supports reading from standard input, plain text files and the X selection (highlighted text).

The main features are:

  • online TTS synthesis via Google translate
  • offline TTS synthesis via pico2wave
  • supports a variety of different languages
  • can read from CLI, text files and highlighted text
  • supports reading highlighted text with fixed formatting (e.g. PDF files)

Installation and usage are documented on the project page .

I'd be glad if you gave it a try. Bug reports and any other feedback are welcome!

Glutanimate's user avatar

  • This has to be one of the coolest projects I've ever seen. Just wow. 😲 –  user525989 Commented Nov 30, 2016 at 21:25
  • 6 This is no longer being maintained. –  Goddard Commented Mar 4, 2017 at 18:52

A fast, local neural text to speech system. Check site project for installation, download of a voice and usage. For e.g.:

gTTS , Google Text-to-Speech

gTTS , a Python library and CLI tool to interface with Google Translate's text-to-speech API. Writes spoken mp3 data to a file, a file-like object (bytestring) for further audio manipulation, or stdout .

Cons : CLI-only. Need to be online as it requires requesting to Google public open endpoint.

Documentation and more examples

Some were already mentioned

Coqui.ia TTS . Installation:

Mimic . Installation:

Mimic 3 . Installation of the plugin:

eSpeak + Gespeaker (GUI) ( Gespeaker source code )

Cons : Old and ugly

  • Google Translate, ImTranslator, Dictionary, TTS by Smart Link Corporation

Chromium/Brave/Chrome

  • Text to speech that brings productivity

tacotron and mimic2 , based on the Google paper

  • I found piper to be the best. I use this script for "speak selected text" feature: medium.com/@IanEdington/… –  IanEdington Commented Mar 11 at 2:11

I have looked high and low for text to speech for Ubuntu that is high quality. There is none. My vocal cords are paralyzed so I needed TTS to add voice instructions to my Ubuntu videos . You can get commercial high quality Linux text to speech software here . It's just really expensive. I ended up buying Natural Reader for Windows (doesn't work in Ubuntu under Wine) for $40. Maybe later I will get the Linux one.

Joe Steiger's user avatar

  • dude, there is and I was using it like last week there are at least 5 or 6 and I can't for the life of me find any of them now, gotta love our community –  mchid Commented Dec 21, 2015 at 10:53
  • 1 Textaloud has instructions to make their product work under wine. see nextup.com/forum/viewtopic.php?t=3349 I believe that cepstral has a linux port too. I have not been able to get my favorite software balabolka to work. I have windows 10 installed mostly for tts processing. MS David is good and similar to cepstral david. The prior one is free if you have windows 10. –  Bhikkhu Subhuti Commented Jun 19, 2016 at 11:34

I have been conducting research on the best sounding and easily tuned text to speech voices. Below is a listing of what I thought were the top 5 products in order of sound quality. Most of the websites associated with these product have an interactive demo that will allow for you to make your own determination.

  • AT&T Natural voices
  • CereProc Voices

Jim's user avatar

  • 3 are there are available for linux? idon't think so –  Mehdi Khademloo Commented Dec 3, 2016 at 0:36

Combine SVOX tools (pico) with LibreOffice:

SVOX (pico) tools are easy to install and brings good quality voices in Ubuntu. Install it:

You can use LibreOffice in combination with SVOX (pico) tools by install the "Read Text" extension and you obtain a "GUI" for this excellent TTS software:

Set up Read Text Extension's options with Tools - Add-ons - Read selection.... Use /usr/bin/python as the external program. Select a command line option that includes the token (PICO_READ_TEXT_PY) , you may want to experiment some of them.

Now you only have to select some text in LO Writer, Calc, Impress or Draw and clic on the icon added as a tool bar (a happy face with a ballon).

leoperbo's user avatar

I find Nitech HTS voices on festival very natural and comforting over any other voices I have heard. See this link on how to set up Nitech and other sounds with festival. I have not found a good gui which I can use to configure those voices but setting them via festival.scm still works. That post is very old and you might want to find the actual installation directory using "locate festival" command

razor's user avatar

  • Seems to be very good. Found demos here cstr.ed.ac.uk/projects/festival/onlinedemo.html –  Iacchus Commented Aug 21, 2014 at 8:32
  • 3 Yes, the Nitech voices are heads and shoulders above other Festival voices (except the CMU voices, which are also very good.) Too bad they're hard to install. There is one good CMU voice that has a default package in Ubunut, it's called cmu_us_slt_arctic_hts and comes in the package festvox-us-slt-hts. It is much better than pico or espeak! –  Jon Watte Commented Apr 25, 2017 at 19:23

Here is what I did to have pure natural speech for pdf and other text files(other solutions are not natural or they're just paid services). This is actually a work around using chromium or chrome but works fast and easy.

  • Install SpeakIt! extension on your chrome or chromium.
  • Install PDF Viewer if you're using chromium(chrome already has a pdf viewer for free) and check 'Allow in incognito' and 'Allow access to file URLs' options in extensions settings of chromium.
  • Drag and drop your pdf to browser.
  • Now highlight some text and right click and select SpeakIt! so you can listen to pure natural text-to-speech.

There's also ways to open other files like .doc and .txt in chrome and do the same. There's other extensions for chrome that view pdf files, check if it fits you better. Besides you can upload all kind of texts in Google Drive and use SpeakIt! to read it for you. Another extension called 'Speak text' works the same way and has natural speech.

Pouya Sanooei's user avatar

  • Could you elaborate on how to make SpeakIt read pdf files saved in Google Drive? –  Marco Lackovic Commented Sep 24, 2014 at 15:12

When searching for a better tts engine to use with the new firefox 49 narrative mode I found pico tts (svox) - my favorite TTS engine.

How to change the default speech synthesis engine system wide?

People at arch linux brought me to the right path:

Uncomment the module you like and make it default in speech-dispatcher settings:

Restart the daemon:

BUT, when starting firefox again, nothing happens. According to the link above (arch forum post #10 and #16) works with festival (did not try), but the speech-dispatcher for pico does not list available voices. It won't run.

Any idea out there would be highly appreciated ;-)

apos's user avatar

My favorite text-to-speech program is called Magic English, but like Natural Reader mentioned by Joe Steiger, it is a Windows program and I'm not sure if it will run under Wine.

AT&T Natural Voices is available online as a demo, but that's more of a work-around than a solution...

SouthwindCG's user avatar

For that I build Intelligent Speaker - extension for Google Chrome. It can read pages even without selection (when text detention is correct).

Vitaly Zdanevich's user avatar

  • Much better than Speakit! for me, thx –  Vahid Pazirandeh Commented Feb 25, 2022 at 20:51

Pico, mbrola, cmu, festival, flite, all SUCK in 2017 (They were amazing in the 90s). AT&T natural speech (which is fantastic) isn't linux compat and it's not free, therefore we use Google

Jonathan's user avatar

  • 3 This is a duplicate of Glutanimate answer (the author of that project). Also: "Status update: This project is currently unmaintained and will remain so for the foreseeable future." He suggests some alternatives –  Pablo Bianchi Commented Feb 21, 2019 at 17:59
  • This project is currently unmaintained and will remain so for the foreseeable future. This script and many others like it rely on an unofficial API that has recently become increasingly difficult to support. As Google continues to lock down access to their TTS interface I see no choice other than to suspend maintaining this script for the time being. –  erwin Commented Jul 19, 2021 at 10:48
  • @PabloBianchi his answer did not have the install code in it –  Jonathan Commented Jul 26, 2021 at 22:10

Verbify-TTS

Yes! I encounter the exact same problem you are describing myself. One year ago I created a custom TTS I am using myself since almost two years now, and I open sourced it. It works offline and for free, using AI-based high-quality voice. You can you it everywhere: Firefox browser, PDF reader, chrome, LibreOffice, etc. It supports both Ubuntu and windows.

Feel free to have a look, I just created a video tutorial with installation steps and DEMO: https://youtu.be/hb1ZVwUcPCU

Download link and Project page: https://github.com/MattePalte/Verbify-TTS

Feel free to leave comment/open issue to discuss new ideas, problems or constructive criticism.

Hoping it will help you.

Matteo Paltenghi's user avatar

In Linux systems, you can dump X selection (the text you have selected on your screen with the mouse) to a text file, then read with some TTS (currently I use Google Translate Python script gTTS):

Bind this script to some key, for example, right menu key, and every time you select some text in any program: Firefox, Thunderbird, LibreOffice Write, PDF reader, or even Terminal, you will hear the text.

PS. you can also add --slow option to gtts-cli.

asashnov's user avatar

Comparison table of offline free CLI software

I think what we at this point is the big summary table:

Tool Sounds remotely natural Output to file Multilingual Tested on
(libttspico-utils 1.0+git20130326-14) y. Some weird distortions, but reasonable. y 24.04
idiap/coqui-ai-TTS 0.24.1 + Tacotron2 y. Output is randomly different each time. Most words are awesome. Punctuation timing is off. Sometimes it goes completely crazy and it is hilarious. 24.04
y. Not amazing, but OK. Slight voice distortion and punctuation off. n 24.04
(speech-dispatcher 0.12.0) n n 24.04
(gnustep-gui-runtime 0.30.0) n n n 24.04
1.48.15 n 24.04
2.5.0 n n 24.04
n 24.04
1.51 n 24.04
24.04
toirtoise-tts 3.0.0 24.04

Empty cell means "unknown, untested".

My quick test strings are:

  • en : "Hello, my name is John Smith. What is your name?"
  • fr : "Bonjour, je m'appelle Jean Jacques. Tu t'appelles comment?"

"Remotely natural" is of course extremely subjective, and will suffer from the continual moving of AI goalposts as things evolve and we get used to better systems. For now, maybe I'd consider it something along "good enough for an informal video voiceover".

Previously mentioned at: https://askubuntu.com/a/1466489/52975

On Ubuntu 24.04 in a clean virtualenv running:

fails with:

ERROR: Cannot install piper-tts==1.1.0 and piper-tts==1.2.0 because these package versions have conflicting dependencies.

bug report: https://github.com/rhasspy/piper/issues/509

On Ubuntu 24.04:

idiap/coqui-ai-TTS

https://github.com/idiap/coqui-ai-TTS

The first time you call it it installs the necessary model automatically.

Sound takes 5-10 s to start coming out on each invocation, which is unacceptable for frequent short sentences.

The default model seems to be Tacotron2 : https://github.com/NVIDIA/tacotron2 but you can select other models from CLI.

coqui-ai/TTS

Previously mentioned at: https://askubuntu.com/a/1447599/52975

Does not support python 3.12 (Ubuntu 24.04), pip install TTF fails. Report: https://github.com/coqui-ai/TTS/issues/3257 Collaborator: https://github.com/coqui-ai/TTS/issues/3257#issuecomment-2096792618 says instead use idiap/coqui-ai-TTS

Based on the README similarity it seems to be a fork of https://github.com/mozilla/TTS

festival + festvox-us-slt-hts

Mentioned at: https://askubuntu.com/a/908889/52975 tested on Ubuntu 24.04:

tortoise-tts

https://github.com/neonbjb/tortoise-tts

No easy CLI instructions:

  • https://speechbrain.github.io/
  • https://github.com/suno-ai/bark

Bibliography:

  • How to text-to-speech output using command-line?
  • https://www.reddit.com/r/MachineLearning/comments/12kjof5/d_what_is_the_best_open_source_text_to_speech/
  • https://www.reddit.com/r/software/comments/176asxr/best_open_source_texttospeech_available/
  • https://www.reddit.com/r/opensource/comments/19cguhx/i_am_looking_for_tts_software/
  • https://www.reddit.com/r/LocalLLaMA/comments/1dtzfte/best_tts_model_right_now_that_i_can_self_host/

Ciro Santilli OurBigBook.com's user avatar

You must log in to answer this question.

Not the answer you're looking for browse other questions tagged software-recommendation text-to-speech ..

  • The Overflow Blog
  • From PHP to JavaScript to Kubernetes: how one backend engineer evolved over time
  • Featured on Meta
  • We've made changes to our Terms of Service & Privacy Policy - July 2024
  • Bringing clarity to status tag usage on meta sites

Hot Network Questions

  • Canceling factors in a ratio of factorials
  • Rashi's opinion on the child of a gentile father
  • R Squared Causal Inference
  • What's the origin of the colloquial "peachy", "simply peachy", and "just peachy"?
  • How is the grammar of this sentence explained?
  • Vector of integers such that almost all dot products are positive
  • Clarification about a notation (or typo?) in ComplexityZoo for QAM class
  • Why are volumes of revolution typically taught in Calculus 2 and not Calculus 3?
  • If more collisions happen with more resistance, why less heat is generated?
  • What food plants/algae are viable to grow 9,000 meters beneath the sea, if given plenty of light and air?
  • Inconsistent “unzip -l … | grep -q …” results with pipefail
  • There are at least 3 versions of a quote, with 2 having different attributions. What is the original, who said it, and what does the quote mean?
  • Unexpected behavior of SetDelayed and Derivative
  • How do you hide an investigation of alien ruins on the moon during Apollo 11?
  • Melee Opportunist--making opportunity attacks have some bite for melee characters
  • In theory, could an object like 'Oumuamua have been captured by a three-body interaction with the sun and planets?
  • Is the error in translation of Genesis 19:5 deliberate?
  • Perpendicularity problem about squares drawn externally on a triangle
  • Books to read as an intro to existential philosophy
  • Flight left while checked in passenger queued for boarding
  • Opamp Input Noise Source?
  • Will this be the first time that there are more people aboad the ISS than seats in docked spacecraft?
  • If Miles doesn’t consider Peter’s actions as hacking, then what does he think Peter is doing to the computer?
  • How can you trust a forensic scientist to have maintained the chain of custody?

text to speech software linux

The Linux Portal Site

Speech

Mimic 3 – neural Text to Speech (TTS) engine

Mimic 3 is a neural text to speech engine that can run locally, even on low-end hardware like the Raspberry Pi 4. The software speaks over 25 languages with over 100 pre-trained voices. Mimic 3 uses VITS, a “Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech”.

Mimic 3 is free and open source software.

Let’s take you through the installation steps first before demonstrating the software.

Installation

We tested the software on Ubuntu 22.10. We prefer installing software with the source code although there are packages available for Ubuntu/Debian.

We first install the python3.10-venv package. The venv module supports creating lightweight “virtual environments”, each with their own independent set of Python packages.

$ sudo apt install python3.10-venv

Next, clone the GitHub repository with the command:

$ git clone https://github.com/MycroftAI/mimic3

Change into the newly created mimic3 directory.

$ cd mimic3

Run the install.sh script

$ ./install.sh

This script downloads and installs all the necessary Python dependencies in a virtual environment.

There’s also a pre-built Docker image available for Intel/AMD CPus and 32/64-bit ARM. The software can also be installed with pip, a cross-platform package manager.

Next page: Page 2 – In Operation

Pages in this article: Page 1 – Introduction / Installation Page 2 – In Operation / Summary

guest

This site uses Akismet to reduce spam. Please read our FAQ before making a comment .

Ed*

I am impressed by this. Thank you

  • Linux distributions
  • Linux tutorials
  • Frequently Asked Questions
  • Linux HowTo’s
  • Linux Distro’s
  • Linux & Open Source News
  • Advertising on Unixmen
  • Become a Contributor
  • Unixmen collaborated with Unixstickers

eSpeak- A text to speech opensource software for Linux

eSpeak Speech Synthesizer is an open source speech synthesizer for Windows, Mac and Linux based OS . It provides the option for listening to text in multiple languages. The speech is clear and the available text in English, can be listened to in any alternative language easily.

eSpeak does text to speech synthesis for the following languages, some better than others. Afrikaans, Albanian, Armenian, Cantonese, Catalan, Croatian, Czech, Danish, Dutch, English, Esperanto, Finnish, French, German, Greek, Hindi, Hungarian, Icelandic, Indonesian, Italian, Kurdish, Latvian, Lojban, Macedonian, Mandarin, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Spanish, Swahili, Swedish, Tamil, Turkish, Vietnamese, Welsh.

You can download espeak from the official download page .

How  to use espeak?

1-choose your  voice Language

2- Speak the words specified in command line

This is the default usage

3-Speak your document

4-Generate voice file from text document

# espeak -t mydocument.txt -w myaudio.wav

Useful Link:

reseller hosting  

Latest Articles

text to speech software linux

Sustainable Web Development: How Frontend Companies are Going Green

Setting up and managing proxy servers on linux, popular linux software contracts for businesses.

                       

eSpeak is a compact open source software speech synthesizer for English and other languages, for Linux and Windows.  

eSpeak uses a "formant synthesis" method. This allows many languages to be provided in a small size. The speech is clear, and can be used at high speeds, but is not as natural or smooth as larger synthesizers which are based on human speech recordings.

eSpeak is available as: Features. . eSpeak converts text to phonemes with pitch and length information.

I regularly use eSpeak to listen to blogs and news sites. I prefer the sound through a domestic stereo system rather than small computer speakers, which can sound rather harsh. . The eSpeak speech synthesizer supports several languages, however in many cases these are initial drafts and need more work to improve them. Assistance from native speakers is welcome for these, or other new languages. Please contact me if you want to help.

eSpeak does text to speech synthesis for the following languages, some better than others.

Afrikaans, Albanian, Aragonese, Armenian, Bulgarian, Cantonese, Catalan, Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Farsi, Finnish, French, Georgian, German, Greek, Hindi, Hungarian, Icelandic, Indonesian, Irish, Italian, Kannada, Kurdish, Latvian, Lithuanian, Lojban, Macedonian, Malaysian, Malayalam, Mandarin, Nepalese, Norwegian, Polish, Portuguese, Punjabi, Romanian, Russian, Serbian, Slovak, Spanish, Swahili, Swedish, Tamil, Turkish, Vietnamese, Welsh. is at: . is a GUI program used to prepare and compile phoneme data. It is now available for download. Documentation is currently sparse, but if you want to use it to add or improve language support, let me know. and originally written for Acorn/RISC_OS computers starting in 1995. This version is an enhancement and re-write, including a relaxation of the original memory and processing power constraints, and with support for additional languages.

  • Compare Business Software
  • Thought Leadership
  • Add Your Software
  • Software Advertising Options

Best Text to Speech Software for Linux of 2024

Find and compare the best text to speech software for linux in 2024.

  • Highest Rated
  • Most Reviews

Use the comparison tool below to compare the top Text to Speech software for Linux on the market. You can filter results by user reviews, pricing, features, platform, region, support options, integrations, and more.

Top Pick

TextSpeech Pro

Digital Future

CreateAIvoiceovers

The Seaplace Group, LLC

NOLA AUTOMATION

Intelligent Speaker

Arria NLG Studio

Capti Voice

Acapela TTS

Acapela Group

Acapela Cloud

ReadSpeaker

  • You're on page 1

text to speech software linux

Disinfect your Windows infected machine by Installing Debian

Text to Speech synthesis software

Text-speak.png

There are in principle many free software alternatives for converting text to speech on Linux but in practice there's just two and they are rather poor compared to proprietary alternatives. They can be used to make the computer read text and speak in very artificial-sounding voices.

  • 1 The free software alternatives for converting Text to Speech on Linux
  • 2 HOWTO use Mimic
  • 3.1 Adding MBROLA voices: don't bother
  • 4 HOWTO use festival
  • 5 HOWTO use flite
  • 6 Proprietary alternatives

The free software alternatives for converting Text to Speech on Linux

Program rating example voice
espeak-ng
v1.50
default

2.5.0
default
flite
1.3 (2005)
default

v1.3.0.1
ab
slt

The practically usable alternatives for converting text to speech using free software on GNU/Linux desktop and laptop machines are:

  • mimic from Mycroft, forked off an early version of the flite software, is the best choice if you are only interested in the English language.
  • festival is actively developed and it works fine but it is not great and it does not sound as good as mimic. festival may be the better choice for non-English languages. Festival is developed by the British at the University of Edinburgh. The project was dead for many years which is why some GNU/Linux distributions still ship an ancient version from 2004 even though there have been several releases after the project was somewhat revived in 2017. if echo 'hello' | festival --tts results in a Segmentation fault (core dumped) then it is likely because your distribution gave you an outdated version.

mimic and festival are not what you could call "natural-sounding". They do produce acceptable and, more importantly, understandable results even though both sounds very artificial.

There are several other alternatives but they not very good and, in most cases, usable. Many web pages, notably older pages and pages made by people who didn't do anything and just cut and paste from older pages, will recommend the following programs:

  • The flite project is not dead, there was a release (2.5.1) in July 2020. You can acquire the source from github.com/festvox/festival and compile it yourself if you want to try a newer flite version. Why GNU/Linux distributions ship an ancient 2005 version is unclear.
  • There's a Java alternative called freetts which was last updated in 2009. Good luck getting that one working. We tried and gave up after wasting too much time on it.
  • espeak, last updated in 2014, is another widely recommended alternative that isn't usable on modern distributions. The espeak espeak-ng fork is actively developed and it is quite usable.
  • espeak-ng (espeak next generation) can be used but it doesn't sound very good. All the distributions have a working version available in their repositories, so there's that.
  • There is also a GNU project for voice synthesis called gnuspeech . It was last updated in 2015. You can view the code at git.savannah.gnu.org: gnuspeech and you may be able to get it to compile if you have a lot of patience and willingness to change the code so it compiles against modern libraries. Getting it to work is not easy and it isn't very good.

GNU/Linux systems have a layer between applications with text to speech features and the applications who provide these features called speech-dispatcher . speech-dispatcher can be configured any of the above mentioned programs.

HOWTO use Mimic

A video explaining the four essential freedoms software must have to qualify as free software made in kdenlive using mimic -voice slt to create the audio.

mimic from Mycroft is available as a package called mimic on most GNU/Linux distributions. It is a pure command-line tool, there is no GUI. Using it is strait-forward:

mimic -t "Hello world" makes it say "Hello world".

-f filename.txt makes it read a text file. Adding -o output.wav makes mimic write the voice output to a .wav formatted audio file.

This is what mimic -t 'Hello, this is a test of the emergency broadcasting system' -o mimic-test.wav ; oggenc mimic-test.wav sounds like:

The mimic package comes with several built-in voices. There is also support for voice-files. One voice-file comes pre-installed in /usr/share/mimic/voices . There are no additional voice files available on the mimic website at mimic.mycroft.ai/ but there are some files flitevox files in a voices/ folder that are not included in the package distributions ship on the GitHub page at https://github.com/MycroftAI/mimic1 .

The internal voices in mimic can be used by passing the -voice option. The available built-in internal voices can be listed with mimic -lv

This will, when using mimic v1.3.0, output: Voices available: ap slt slt_hts kal awb kal16 rms awb_time

The slt and slt_hts voices are female voices. Here is a test of slt made using:

mimic -t 'Hello, this is a test of the emergency broadcasting system' -voice slt -o mimic-slt-test.wav

  • ab, awb, kal and rms are male voices. awb is probably British. kal is probably a drunk. rms does not sound anything like Richard Stallman .
  • slt and slt_hts are female voices.
  • awb_time and kal16 seem to be broken, using them does not produce any understandable outout

Run mimic --help to see all the available command-line options.

HOWTO use espeak-ng

espeak-ng is a commmand-line tool which, like most command-line tools, accepts piped input. It will happily turn all piped input, either it's a file you cat or text you echo and turn it into spoken audio. Example:

echo 'Hello, this is a test of the emergency broadcasting system' | espeak-ng

This is what it wounds like - twice:

espeak-ng does have quite a lot of options for "enhancing" the audio. You can set things like speed, pause between words and amplitude. And there's several different voices available for it. Thus; you can play around with it but don't expect "professional" results no matter what you do.

The most interesting options to try with espeak-ng are espeak-ng --voices and espeak-ng --voices=mb which will list all the available voices for the default and the MBROLA voice synthesizer respectively. The list for --voices will be long and look like this

(That's just 3 lines picked randomly, espeak-ng outputs a much longer list)

These voices can then be used with the -v option. Thus; to make it say something with the Norwegian voice you could do:

echo 'Nei takk ikke fiskeboller' | espeak-ng -v gmq/nb

espeak-ng is developed at github.com/espeak-ng/espeak-ng/ .

Adding MBROLA voices: don't bother

espeak-ng supports using MBROLA as a back-end. The list for MBROLA supported voices can be generated by espeak-ng --voices=mb and it will look similar to regular voices. However, using them will only work if you have the mbrola binary installed. It is non-free and not available in distributions. You can download and install it from http://tcts.fpms.ac.be/synthesis/mbrola.html if you want to. It it not worth the trouble. The voices available to it are different from espeak-ng's stock - but they are not better. If anything, they sound worse.

The espeak-ng manual page lists a lot more options. But as said, it won't sound great no matter what you do.

HOWTO use festival

festival will say whatever is piped to it if you have a working version and you add the --tts option:

You can pipe files to festival and have them read:

Many GNU/Linux distributions ship wildly outdated versions of festival . You may find that the version your distribution includes segfaults and exits when you try to use it. You can acquire the source code from github.com/festvox/festival and compile it yourself that's the case.

HOWTO use flite

All the GNU/Linux distributions ship flite 1.3 from 2005 for some reason we can't begin to imagine. There are several newer releases available, v2.5.1 was released in July 2020.

The text you want flite to say can be specified with -t .

flite 1.3 will not produce any audio, or anything else, if you tell it to say something with -t . It does support file output and that works.

will produce a flite-1.3-test.wav file you can play with aplay or mpv .

You will want to compile and install a recent version (source at github.com/festvox/flite ) if you want to use flite because the version Linux distributions ship is typically wildly outdated and outright horrible.

Proprietary alternatives

Amazon Polly is the best proprietary alternative if you want text-to-speech functionality in a non-free software project. It is botnet text to speech cloud service operated by the very evil American Amazon corporation . Stallman would absolutely not approve. baby WOGUE uses it to make YouTube Video s about free software. You can check that channel out to get an idea how Amazon Polly sounds. It is better than mimic and espeak-ng for practical purposes and worth looking into if you think evil proprietary software tied to cloud services is acceptable when there is no superb free alternative. You could check out AWS: Getting Started with Amazon Polly if you are interested. Most of the Android "apps" for text-to-speech use the Amazon Polly API.

Read Aloud: A Text to Speech Voice Reader is a plug-in for the Mozilla Firefox web browser which lets you do text-to-speech in that web browser using server-side services. The "standard" voices available are all generated using Google services. A Google account is required to use some of the "premium" voices. There are also many other "premium" voices available that use other third party services. You need to buy a subscription in order to use those voices.

Natural Reader is a plug-in for the Chrome and Chromium web browsers which lets you do text-to-speech in those browsers using a server-side service.

Read Aloud and Natural Reader are both decent alternatives if you want something read aloud. The obvious downsides with those are that a) they are limited to in-browser text-to-speech only and b) they use proprietary cloud services to do the actual text to speech synthesis. Everything you ask them to read is sent to the cloud.

Enable comment auto-refresher

avatar

Anonymous (f4df9e7b4e)

Permalink | Reply

avatar

Anonymous (d2eefaa43c)

avatar

Anonymous (eaf6c0c9b7)

avatar

Anonymous (56369a2ac0)

Conditional

Quick summary: all four of these programs suck. The output quality is terrible.

  • Software comparisons

Navigation menu

Page actions.

  • View source

Personal tools

  • Not logged in
  • Contributions
  • Create account
  • Breaking News
  • Software Reviews
  • Game Reviews

fun free games

  • Racing games
  • Blue Nebula
  • Secret Chronicles of Dr. M.
  • SuperTuxKart
  • Unvanquished

software benchmarks

  • Web Browser Performance Round-Up April 2021
  • bzip2 vs lzip vs xz

educational videos

  • Arch Conf 2020
  • Fosdem 2021
  • LibrePlanet 2021
  • X.Org Developers Conference 2020
  • Lectures by Richard Stallman

Comparisons

  • BitTorrent clients
  • Desktop Environments
  • Image Viewers
  • Video Editors
  • RSS feed readers
  • System Monitoring Programs

Great software

  • Cantata mpd music player
  • mpv media player

for beginners

  • Bash Guide for Beginners
  • Learn to touch-type
  • Learn to compress and decompress archives with tar
  • Learn how to convert video files with ffmpeg
  • Make GIMPs interface colorful and happy
  • Learn to lists the ports a system is listening on

cheat sheets

  • Bourne Shell Reference
  • Tao of Regular Expressions
  • Magic Command Line Collection
  • see a games FPS Second and other data in a HUD overlay
  • use the numeric keyboard keys as mouse in XOrg
  • Red Star OS
  • HOWTO get Korean input on Manjaro
  • HOWTO get Korean input on Ubuntu
  • Rockit Girl
  • Ask LinuxReviews

feed reader feeds

  • News (Atom)

try your luck

  • Random Page
  • Random news story
  • Random Game
  • Recent Changes
  • What links here
  • Related changes
  • Special pages
  • Printable version
  • Permanent link
  • Page information
  • This page was last edited on 2 March 2021, at 19:54.
  • Privacy Policy
  • About LinuxReviews
  • Latest News
  • Latest Reviews

Linux.org

  • Search forums
  • General Linux Forums

Text-to-Speech Software

  • Thread starter Deleted member 161260
  • Start date Jul 31, 2023

Deleted member 161260

  • Jul 31, 2023

Dear sirs and ladies. Please forgive me if this is the wrong forum. Please, does anyone have any knowledge of good offline Text-to-Speech Software? Thank you. Good day to you. Sir's and Madam's  

Condobloke

Well-Known Member

Festival...? https://www.linux.org/threads/troubleshooting-festival-progam-on-linux.46118/#post-200504  

APTI

  • Sep 12, 2023
Condobloke said: Festival...? https://www.linux.org/threads/troubleshooting-festival-progam-on-linux.46118/#post-200504 Click to expand...

kibasnowpaw

kibasnowpaw

Active member.

  • Sep 13, 2023

My primary concern with Text-to-Speech (TTS) on Linux is the audio quality. In my opinion, it doesn't sound very natural. That's why I prefer online platforms like NaturalReaders and PlayHT, which produce audio that closely resembles human speech. However, if you're not particular about the sound quality and just want the text to be read aloud, then "Festival" could be a suitable choice for you.  

kibasnowpaw said: My primary concern with Text-to-Speech (TTS) on Linux is the audio quality. In my opinion, it doesn't sound very natural. That's why I prefer online platforms like NaturalReaders and PlayHT, which produce audio that closely resembles human speech. However, if you're not particular about the sound quality and just want the text to be read aloud, then "Festival" could be a suitable choice for you. Click to expand...
APTI said: festival uses a default voice that sounds very human. Click to expand...

Play.ht - Untitled

whyp.it

APTI said: yes, festival is great. not easy to configure different voices but the default one is fine. I use it often. Click to expand...

This link is about Festival and some good details that help to know before you install it. https://www.linux.org/threads/troubleshooting-festival-progam-on-linux.46118/  

kibasnowpaw said: I haven't used it in a long while, but from what I recall, this is how it sounded, or at least something similar to it This is one of the AI voices play.ht uses and why i use it alot. Play.ht - Untitled Listen to this track for free on Whyp. whyp.it Click to expand...
APTI said: perhaps because I compare it to things like espeak which sounds like steven hawking, I have a better opinion of it. It is true that it sounds like a Vulcan speaking but I find it to be a good non emotional human speak. I use it on fedora 34 thru 38 Click to expand...

MikeWalsh

  • Sep 14, 2023

Super Moderator

MikeWalsh said: If anybody wants me to, I can bundle TextAloud! and the AT&T voices up into a tarball & make it available. @KGIII , would staff here be okay with that? I don't know what site policy is with regard to private cloud-hosting a/cs. Click to expand...

I think mike walsh completely missed the purpose here. He is recommending windows software and using wine which are things many of us are against. Then recommending online web based solutions again I feel misses the point. Most of us looking for speech to text or text to speech are developing and online solutions only work for a limited number of things and require you to be online. My opinion is that we generally are looking for a self contained NON WINDOWS solution that works without the need to be connected to the internet for it. While the internet is nice and I use it and take full advantage, it can go out and does everyday. Not something I would want to rely on as a developer.  

  • Sep 16, 2023
I think mike walsh completely missed the purpose here. He is recommending windows software and using wine which are things many of us are against. Click to expand...
He is recommending windows software and using wine which are things many of us are against . Click to expand...
MikeWalsh said: I don't particularly care where I source my software. Click to expand...

Deleted member 108694

  • Sep 17, 2023

There is eSpeak as well - https://espeak.sourceforge.net/ But has not been updated in quite some time  

  • Sep 18, 2023
MikeWalsh said: @APTI :- How? No specific mention was made that it MUST be Linux-only. Anyway, I wasn't "recommending" anything. I was merely detailing what I myself used. Nah. See, to me, that's an archaic attitude I've never been able to comprehend. I don't pretend to be a "purist". It's an indisputable fact that for some stuff, Windows software just IS better. It's also undeniable that for many other things, Linux will knock spots off, and run rings around Windows. I don't particularly care where I source my software. I run a small number of Windows apps, alongside a LOT of Linux stuff. In some cases it's because it's the best app for the job, OR it's because I got so used to using it under Windows. Sometimes, I've never been able to find a Linux equivalent that will do what I want in quite the same way; in most cases, I'm more than happy with the way the Linux equivalent does the job. Etc, etc..... I switched to Linux when I did - in 2014 - not because of any particular anti-Windows grievances, but because I was just fed-up with it. I'd been using that platform from 1989 right through to 2014; that's a quarter of a century. I didn't have an outstandingly positive experience with Windows, but I wouldn't describe it as an especially negative one, either. It was simply the thing in the background that let me run my programs (I wasn't at all 'tech-savvy' in those days). After 25 years, I was more than ready for something different, so I decided to take a look at Linux..... .....where I've been ever since. ~~~~~~~~~~~~~~~~~~~~~~​ It's also a fact that with 32GB of RAM and over 5TB+ of storage, I'm not short of the necessary resources. My set-up is NOT typical of the average Linux user, I'll grant you, but at the same time I was NOT "recommending" anything. The OP was asking about offline text-to-speech software.......so I mentioned what my set-up consisted of. That's all. Sorry if you disagreed with what I posted about. Not my intention to "offend" anyone here, but.......the last time I looked, even WINE itself IS Linux software. Mike. Click to expand...
  • Oct 3, 2023
Alexzee said: Very good video, thank you. Is there text to speech software for Linux that you can put in training mode? Click to expand...
  • Oct 5, 2023
kibasnowpaw said: Greetings fellow tech enthusiasts, I've been meandering through the intricate alleys of Text-to-Speech (TTS) technology, particularly in the Linux environment. It’s a fascinating yet, at times, exasperating expedition, given the current state of affairs. Let me unravel my findings and concerns in detail. For those unacquainted, TTS technology translates on-screen text into spoken word. It's a godsend for individuals like me who find audio content more digestible, or those in need of assistive technologies. My deep dive into this world began with the Windows environment, where I encountered the Heather22 US English Voice during the era of Text Aloud 2 or 3. A brief on Heather22: This voice model was renowned for its fluidity, realism, and the uncanny ability to mimic human intonation. It was a breakthrough that set a precedent for TTS quality, at least in my esteemed opinion. Fast forward to my foray into Linux, and it appears the landscape isn't as lush. While some advocates are singing praises, my experience, to put it mildly, has been starkly contrasting. The voices I've encountered are somewhat robotic, lacking the nuanced human touch that Heather22 so effortlessly rendered. My attempt to port Heather22 to Linux, utilizing Wine (a compatibility layer for running Windows applications on Linux), met with insurmountable technical barricades. It appears Wine is not yet sophisticated enough to emulate the intricate architecture and file dependencies required to operationalize Heather22 on Linux. I've found solace, albeit temporary, in online TTS platforms like https://www.naturalreaders.com . However, the dependency on internet connectivity and the occasional latency issues make it a less than perfect solution. So, what’s the crux of the issue? The Linux TTS ecosystem, for all its merits, is yet to reach the zenith of voice quality and realism that's not just a luxury but a necessity for individuals reliant on auditory content. The disparity is not just audible but backed by tangible data, accentuating a need for accelerated advancements in this domain. I’m not dismissing the efforts of Linux developers. But, in a world where auditory content is ascending the hierarchy of content consumption, the exigency for a refined, human-like TTS on Linux is not just desirable, but imperative. If you’ve navigated this terrain and discovered hidden gems or workarounds, your insights would be invaluable. The quest for auditory perfection continues, albeit with a mix of skepticism and anticipation. I'm not entirely certain what you're referring to when you say 'put in training mode.' Could you please clarify? Click to expand...

text to speech software linux

Members online

  • Rocktheflock

Latest posts

  • Latest: SuperWookie68
  • 34 minutes ago
  • 37 minutes ago
  • Latest: Gears
  • 46 minutes ago
  • Latest: prmthz
  • Today at 1:33 PM
  • Latest: solidsnake
  • Today at 1:06 PM

Navigation Menu

Search code, repositories, users, issues, pull requests..., provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications You must be signed in to change notification settings

Speech To Speech: an effort for an open-sourced and modular GPT4-o

huggingface/speech-to-speech

Folders and files.

NameName
80 Commits

Repository files navigation

text to speech software linux

📖 Quick Index

  • Docker Server approach
  • Server/Client approach
  • Local approach
  • Model parameters
  • Generation parameters
  • Notable parameters

This repository implements a speech-to-speech cascaded pipeline with consecutive parts:

  • Voice Activity Detection (VAD) : silero VAD v5
  • Speech to Text (STT) : Whisper checkpoints (including distilled versions )
  • Language Model (LM) : Any instruct model available on the Hugging Face Hub ! 🤗
  • Text to Speech (TTS) : Parler-TTS 🤗

The pipeline aims to provide a fully open and modular approach, leveraging models available on the Transformers library via the Hugging Face hub. The level of modularity intended for each part is as follows:

  • VAD : Uses the implementation from Silero's repo .
  • STT : Uses Whisper models exclusively; however, any Whisper checkpoint can be used, enabling options like Distil-Whisper and French Distil-Whisper .
  • LM : This part is fully modular and can be changed by simply modifying the Hugging Face hub model ID. Users need to select an instruct model since the usage here involves interacting with it.
  • TTS : The mini architecture of Parler-TTS is standard, but different checkpoints, including fine-tuned multilingual checkpoints, can be used.

The code is designed to facilitate easy modification. Each component is implemented as a class and can be re-implemented to match specific needs.

Clone the repository:

Install the required dependencies using uv :

The pipeline can be run in two ways:

  • Server/Client approach : Models run on a server, and audio input/output are streamed from a client.
  • Local approach : Runs locally.

Docker Server

Install the nvidia container toolkit.

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

Start the docker container

docker compose up

Server/Client Approach

To run the pipeline on the server:

Then run the client locally to handle sending microphone input and receiving generated audio:

Running on Mac

To run on mac, we recommend setting the flag --local_mac_optimal_settings :

You can also pass --device mps to have all the models set to device mps. The local mac optimal settings set the mode to be local as explained above and change the models to:

  • LightningWhisperMLX

Recommended usage with Cuda

Leverage Torch Compile for Whisper and Parler-TTS:

For the moment, modes capturing CUDA Graphs are not compatible with streaming Parler-TTS ( reduce-overhead , max-autotune ).

Command-line Usage

Model parameters.

model_name , torch_dtype , and device are exposed for each part leveraging the Transformers' implementations: Speech to Text, Language Model, and Text to Speech. Specify the targeted pipeline part with the corresponding prefix:

  • stt (Speech to Text)
  • lm (Language Model)
  • tts (Text to Speech)

For example:

Generation Parameters

Other generation parameters of the model's generate method can be set using the part's prefix + _gen_ , e.g., --stt_gen_max_new_tokens 128 . These parameters can be added to the pipeline part's arguments class if not already exposed (see LanguageModelHandlerArguments for example).

Notable Parameters

Vad parameters.

  • --thresh : Threshold value to trigger voice activity detection.
  • --min_speech_ms : Minimum duration of detected voice activity to be considered speech.
  • --min_silence_ms : Minimum length of silence intervals for segmenting speech, balancing sentence cutting and latency reduction.

Language Model

  • --init_chat_role : Defaults to None . Sets the initial role in the chat template, if applicable. Refer to the model's card to set this value (e.g. for Phi-3-mini-4k-instruct you have to set --init_chat_role system )
  • --init_chat_prompt : Defaults to "You are a helpful AI assistant." Required when setting --init_chat_role .

Speech to Text

--description : Sets the description for Parler-TTS generated voice. Defaults to: "A female speaker with a slightly low-pitched voice delivers her words quite expressively, in a very confined sounding environment with clear audio quality. She speaks very fast."

--play_steps_s : Specifies the duration of the first chunk sent during streaming output from Parler-TTS, impacting readiness and decoding steps.

Distil-Whisper

Contributors 7.

@andimarafioti

  • Python 99.6%
  • Dockerfile 0.4%
  • Meeting Transcription
  • Meeting Note Taker
  • Meeting Recording
  • Headphones and Devices
  • Audio Quality
  • Tips and Best Practices
  • Meeting Apps
  • Meeting Templates
  • Remote Work
  • Contact Centers
  • Accent Localization
  • Engineering Blog

Enhancing User Experience with Voice-Activated Speech-to-Text Software

text to speech software linux

User satisfaction

Media and content creation, frequently asked questions.

Play the article

Spread the word

Have you ever wondered how modern businesses can keep up with the communication demands of today’s fast-paced world? As technology evolves and user expectations rise, the need for more accessible and efficient tools has never been greater. How can we ensure that our communication methods are effective and inclusive?

Voice-activated Speech-to-Text (STT) software offers a powerful solution to these challenges. By converting spoken words into text in real-time, this innovative technology enhances accessibility, improves productivity, and transforms how we interact with digital platforms. 

This article will cover the benefits of using Voice-activated Speech-to-Text (STT) software to enhance user experience in your contact center or customer-facing department.

What is Voice-Activated Speech-to-Text Software?

Voice-activated Speech-to-Text (STT) software is a technology that converts spoken language into written text in real time. This software leverages advanced algorithms and machine learning models to recognize and transcribe spoken words with high accuracy. Users can interact with devices or applications simply by speaking, eliminating the need for manual typing or input.

Key Features of Speech-to-Text Software

Real-time transcription Instantly converts spoken words into text, allowing users to see their speech transcribed as they talk.
Security and privacy Ensures that voice data is protected through encryption and compliance with data protection regulations, providing users with peace of mind.
Multi-language support Supports multiple languages, making the software accessible to a global audience and useful in multilingual environments.
High accuracy Advanced algorithms ensure that speech is transcribed with minimal errors, even in challenging audio environments.

How To Enhance User Experience with Voice-Activated Speech-to-Text Software

As technology continues to evolve, enhancing user experience remains a top priority for developers and businesses alike. Voice-activated Speech-to-Text (STT) software is a prime example of a tool that can significantly improve how users interact with digital platforms. 

By addressing key aspects such as accessibility, productivity, and overall satisfaction, voice-activated Speech-to-Text is transforming the way people engage with technology.

Accessibility

Voice-activated Speech-to-Text software plays a crucial role in making digital interactions more accessible, particularly for people with disabilities. For individuals with hearing impairments, STT can provide real-time transcriptions of spoken content, allowing them to follow conversations, presentations, or videos without missing important information. 

Additionally, for those with physical challenges that limit their ability to type or use traditional input devices, voice-activated Speech-to-Text offers a hands-free alternative, enabling them to interact with technology more independently. This technology breaks down barriers, making digital communication more inclusive for everyone.

Productivity

In professional environments where time is of the essence, such as contact centers or fast-paced office settings, voice-activated Speech-to-Text can significantly boost productivity. By instantly converting speech into text, this software eliminates manual typing, allowing employees to focus on more critical tasks. 

For example, in contact centers, agents can use STT to quickly transcribe customer conversations, ensuring accurate records and faster inquiry resolution. The ability to dictate notes, emails, or reports also saves time, making workflows more efficient and reducing the likelihood of errors.

User satisfaction is greatly enhanced when interactions with technology are seamless and frustration-free. Voice-activated Speech-to-Text contributes to this by providing a smooth, hands-free interface that simplifies communication. Users no longer need to struggle with typing or navigating complex menus; they can simply speak, and the software does the rest. 

This ease of use makes tasks quicker and reduces cognitive load, allowing users to engage with technology more comfortably and confidently. The result is a more satisfying and enjoyable user experience in personal or professional settings.

Which Industries Benefit From Using Voice-Activated STT

industries using stt

Voice-activated Speech-to-Text (STT) technology is making a significant impact across various industries, revolutionizing how businesses and professionals communicate and operate. Here are some examples of industries where this technology is proving to be particularly transformative:

Customer service

In the customer service industry, voice-activated STT is a game-changer. Contact centers, in particular, benefit from the ability to transcribe customer interactions in real-time. This capability ensures accurate records of conversations, which can be used for quality assurance, training, and compliance purposes. 

Additionally, voice commands allow agents to navigate through systems or retrieve information hands-free, leading to faster response times and improved customer satisfaction. The use of STT also enables more efficient handling of customer inquiries, as agents can focus on solving problems rather than on manual data entry.

In education, voice-activated STT technology is enhancing accessibility and learning outcomes. For students with disabilities, such as those with hearing impairments or learning challenges, STT provides real-time transcriptions of lectures and classroom discussions, making educational content more accessible. 

Furthermore, teachers can use STT to create instant transcripts of their lessons, which can be shared with students for review and study. This technology also supports language learning by allowing students to practice pronunciation and receive immediate feedback through text conversion.

The healthcare industry is another sector where voice-activated STT is making a substantial impact. Medical professionals often need to document patient interactions, transcribe notes, and update medical records quickly and accurately. 

STT technology streamlines these processes by allowing doctors and nurses to dictate notes directly into electronic health record (EHR) systems, saving time and reducing the risk of errors associated with manual data entry. 

Additionally, STT enables hands-free operation of devices, which is crucial in sterile environments like operating rooms. The ability to transcribe spoken medical information in real-time also facilitates better communication and coordination among healthcare teams.

Legal services

In the legal industry, voice-activated STT is transforming how legal professionals handle documentation and case management. Lawyers and paralegals can use STT to transcribe interviews, depositions, and courtroom proceedings accurately and efficiently. This technology saves time and ensures that legal records are thorough and precise.

Additionally, STT allows legal professionals to quickly search through large volumes of transcribed text to find relevant information, making it easier to prepare for cases and manage legal documents.

Voice-activated STT is also making waves in the media and content creation industries. Journalists, writers, and content creators can use STT to transcribe interviews, speeches, and meetings, streamlining the content production process. 

This technology also enables content creators to dictate articles, scripts, and social media posts, speeding up the writing process and reducing the physical strain associated with long hours of typing. The ability to produce content more quickly and accurately gives media professionals a competitive edge in a fast-paced industry.

Krisp’s Role in Innovating Speech-to-Text Technology

Krisp has emerged as a leading innovator in the field of Speech-to-Text (STT) technology, consistently pushing the boundaries of what is possible in voice recognition and transcription. 

With a focus on delivering exceptional accuracy, speed, and user-friendly integration, Krisp’s STT solution is designed to meet the business needs.

Advanced accuracy and speed

Krisp’s Speech-to-Text technology is built on state-of-the-art algorithms and machine learning models that ensure high accuracy in transcriptions. By continuously refining its AI, Krisp has achieved a level of precision that minimizes errors, even in challenging audio environments. 

Whether it’s transcribing a fast-paced conversation or deciphering speech with heavy accents, Krisp’s STT solutions delivers reliable results swiftly.

Seamless integration with existing tools

Understanding the importance of compatibility, Krisp has developed its Speech-to-Text to be easily integrated into a wide range of applications. Whether it’s being used in contact centers, transcription services, or virtual assistants, Krisp’s STT technology can be seamlessly embedded into existing workflows, enhancing productivity without disrupting operations.

Commitment to privacy and security

Krisp is also committed to ensuring that its Speech-to-Text technology upholds the highest privacy and security standards. 

Recognizing the sensitivity of voice data, Krisp uses private clouds to store call transcripts . You choose the cloud at the set up and each transcript with <1 second latency is automatically uploaded to your chosen location. 

Krisp’s dedication to innovation in Speech-to-Text technology is evident in the advanced features and customizable solutions it offers. By focusing on accuracy, ease of integration, and security, Krisp is not only enhancing user experience but also setting new standards in the industry.

Book a Demo

Key Takeaways 

  • Voice-activated Speech-to-Text (STT) technology is transforming communication by providing real-time transcription, voice command recognition, and multi-language support.
  • Accessibility is greatly enhanced through STT, making digital interactions more inclusive for individuals with disabilities, such as hearing impairments or physical challenges.
  • Productivity is significantly boosted in professional environments, especially in contact centers, where quick and accurate communication is essential.
  • User satisfaction improves with the use of STT, offering a hands-free, seamless interface that reduces frustration and enhances the overall user experience.
  • Industries benefiting from STT include customer service, education, healthcare, legal services, and media/content creation, all experiencing improved efficiency, accessibility, and communication.

Who benefits from speech-to-text software? Speech-to-text software benefits a wide range of users, including individuals with disabilities (such as those with hearing impairments or physical challenges), professionals in high-demand environments (like customer service agents, healthcare providers, and legal professionals), and anyone looking to improve productivity by dictating rather than typing. It’s also valuable for students, educators, content creators, and businesses needing accurate transcription of spoken content.

How does voice activated software help? Voice-activated software helps by allowing users to interact with devices and applications using spoken commands, reducing the need for manual input like typing or clicking. This technology increases accessibility for those with physical limitations, enhances productivity by speeding up tasks, and provides a hands-free, seamless experience that can be particularly useful in fast-paced or multi-tasking environments.

What does speech-to-text software do? Speech-to-text software converts spoken language into written text in real-time. It transcribes spoken words, phrases, and sentences into digital text that can be used for various purposes, such as creating documents, sending messages, or inputting data. This software is often used for accessibility, documentation, and improving communication efficiency.

What are the advantages of speech-to-text? The advantages of speech-to-text include increased productivity, allowing faster transcription and reducing the need for manual typing. It also enhances accessibility, enabling those with physical or cognitive disabilities to interact with technology more easily. Speech-to-text improves accuracy in documentation, supports multilingual communication, and offers a more natural way of interacting with digital devices.

Related Articles

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn .

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

The Linux User’s Ultimate Guide to Text Editors

Featued image for: The Linux User’s Ultimate Guide to Text Editors

Basic text editors seem like they should be among the least interesting of Linux utilities when, in reality, they are some of the most critical. These simple and fundamental tools are essential to system configuration and other text-based tasks.

This article explains the importance of text editors for Linux users and demonstrates basic editing tasks using two of the most common editors: vim and nano. By the end of this piece, you will be able to create, edit, save and close text documents using both editors.

It’s essential to practice these skills. You should follow along with the examples in the text on your own Linux system or create a lab computer for these activities. You may need to install vim or nano on the system, though at least one of them is usually available on most distributions. Review this article on Linux commands to ensure you’re comfortable entering information at the command line.

If you need to add vim or nano to a Debian-based distribution , type:

sudo apt install vim
sudo apt install nano

To install vim or nano on a Red Hat -based distribution, type:

sudo dnf install vim
sudo dnf install nano

Both vim and nano are written in standard documentation using all lowercase characters.

Note: It is a poor security practice to log on to a Linux system as the root (administrator) user. Most systems force you to log on as a regular user and then use the sudo (short for “super user do”) command to elevate your privileges. You may be prompted for your password when using sudo . You probably mainly need sudo when editing system configuration files that are normally reserved for access by the root user.

Most Linux text editors must provide an alternative system to get around the problem of not having a graphical user interface (GUI). Many Linux deployments avoid the GUI to maintain speed, simplicity and stability. Therefore, these editors don’t include a menu where you can use a mouse to select Save or Exit.

There are two common approaches:

  • Modes : Users switch the editor between modes. The keyboard either enters text or accepts commands, depending on the current mode.
  • Meta keys : Users press one or more meta keys to enter commands. Meta keys perform special functions when combined with another key. For example, meta keys may include Ctrl or Alt .

The editor must have a way of differentiating between text you’re trying to write into the file and commands you’re trying to issue, such as save or copy/paste. This concept is important, especially if you’re used to GUI-based editors like Windows Notepad or macOS TextEdit.

You may be familiar with some common keyboard shortcuts like Ctrl+S to save or Ctrl+P to print. These are examples of using meta keys.

Why Are Text Editors So Important?

Text editors are standard tools on the system. Since Linux and Linux applications receive their primary settings and options from configuration files, managing these configuration files is clearly essential. If an administrator wants to change how a service like the Apache web server functions, they must edit the Apache configuration file.

Here are a few common tasks for text editors:

  • Edit configuration files that control system actions and services.
  • Edit configuration files for applications like web servers or databases.
  • Create scripts and programming files to automate tasks.

Text editors are lightweight applications that consume few system resources. Some, such as vim, are highly customizable, helping you optimize them for tasks like writing Python code or authoring longer documents.

Common Linux Text Editors

Many text editors are available for Linux, so I’ll just cover two of the most common.

  • vim : A powerful and flexible editor with a steep learning curve.
  • nano : An intuitive and simple editor that may be limited for more complex projects.

Vim used to be the default editor for most Linux distributions, though these days, many distros rely on nano instead. Vim is highly configurable and customizable with plug-ins. In fact, it may be overpowered for such basic tasks as changing a line in a configuration file from no to yes, while nano is perfect for those sorts of tasks.

Other important but less common editors include Emacs (a favorite of many developers) and gedit , a basic text editor for Linux distributions with a graphical user interface.

The vim Text Editor

The name “vim” stands for “vi improved.” It is a fresh version of an older Unix/Linux editor called vi (pronounced “vee-eye”). It’s been a standard Linux application for decades, and with good reason. It’s highly configurable, very customizable, fast and efficient. It is not, however, the simplest application to learn.

Many resources exist for learning vim basics. Linux training courses nearly always cover it, many tutorials address it and plenty of online forums discuss tweaks and modifications to it. The official vim website includes documentation , too. Finally, the program itself has a built-in tutorial to walk you through its essential features.

While looking at vim’s documentation, check out its unique licensing mechanism, too.

Vim uses modes to change how users interact with the program. Pressing a key on the keyboard has a different effect depending on the mode.

The primary modes to be aware of are listed below:

  • Execute mode : Think of this as a subset of Command mode. Issue additional commands to vim by using the : character before the command.
  • Insert mode : Pressing keys inserts text in the file. This is how you add, edit or remove text.

Modes are less complex than they seem at first. I think of them as different ways of using the keyboard. When in Command mode, you’ll use the keyboard to manage the file, such as saving changes. When in Insert mode, you’ll use the keyboard to manage the text in the file, such as adding data.

The two most basic keys to know are lowercase i and Esc . Vim opens in Command mode. Lowercase i switches from Command mode to Insert mode. The Esc key switches from Insert mode back to Command mode. When in doubt, press the Esc key; then you’ll know you’re in Command mode.

Basic Document Management With vim

Vim offers a truly vast number of options. Many new Linux users find themselves overwhelmed by its extensibility and features. However, there are really only four essential vim skills you must learn immediately. Once you master these, you can explore additional vim capabilities.

The four essential tasks are:

  • Create or open a file.
  • Edit the file.
  • Save your changes.
  • Close the file.

To create a file, simply type vim and the name of the new file. Vim opens automatically with a blank document. You can open an existing file the same way. For example, to open a file named linux - basics . txt in your home directory, type the following commands:

linux-basics.txt

Remember to use tab completion to autofill filenames. This trick makes you quicker and helps eliminate typos.

Vim opens in Command mode, meaning that if you press a key on the keyboard, you are giving vim a command. You’ll need to switch to Insert mode to edit the file.

Press the i key to enter Insert mode. Vim should display an INSERT message in the lower left corner. Note that other keys exist to put you in Insert mode, too. These variations place the cursor in different locations. For now, use the lowercase i key.

text to speech software linux

If you press keyboard keys now, you’ll enter text into the document. Once you’re in Insert mode, add the following text to your document:

is a powerful and flexible open-source operating system.

text to speech software linux

Great! You’ve edited the file by entering some text. Next, you need to save your changes. There are several ways of doing this in vim, but for now, press the Esc key to return to Command mode and then press :w (the w character stands for “write the file to disk” or save). The : key puts vim in Execute mode, offering additional ways to enter commands.

text to speech software linux

After saving your document, you can close the vim editor and return to the Linux command prompt. To do so, press Esc to ensure you’re in Command mode, then type :q to quit vim.

text to speech software linux

By the way — you could have combined the write and quit steps by typing :wq (“write then quit”), but I wanted to demonstrate them as separate steps.

Type the following command to check that your file contains the expected text (remember that Linux is case-sensitive):

linux-basics.txt

You should see the sentence you added to the file.

text to speech software linux

Use the vim linux - basics . txt  command to open the file again. Enter Insert mode with the i character and add more text to your file, such as the following sentence:

are many Linux distributions, such as Ubuntu and Fedora.

Save your changes and exit vim by typing :wq .

You can use the arrow keys on your keyboard to move the cursor up, down, left and right through the text.

One of the first places you may get hung up in vim is when you wish to exit a file without saving the changes. Vim displays an error when you attempt this, saying, “No write since last change.” To exit the file without saving changes, use Esc to enter Command mode and type the :q! combination.

text to speech software linux

Review and repeat these steps until you are comfortable with them. If you’re looking at other vim tutorials or documentation, you may see many additional (and very useful) options, but without a firm understanding of these four basic tasks, vim gets confusing in a hurry.

Additional vim Tricks

There are several ways to enter Insert mode. These depend on the position of your cursor, so use the arrow keys to place your cursor at the desired location in the file, then use one of these keys to enter Insert mode and begin entering text.

  • i : Insert text before the cursor.
  • I : Insert text before the first non-blank character of the line.
  • o : Start a new line below the cursor and insert text.
  • O : Start a new line above the cursor and insert text.

These options assume you’re in Command mode. Use the arrow keys to move the cursor to where you want. Here are some other ways of navigating inside the file:

  • gg or [[ : Jump to the top of the file.
  • G or ]] : Jump to the bottom of the file.
  • 22G : Jump to line 22 of the file.

One of my favorite settings is to cause vim to display line numbers within a file.

  • :set number : Display line numbers along the left side of the file.

text to speech software linux

Manage text in Command mode using the following commands:

  • x : Delete the character where the cursor is.
  • dw : Delete the word where the cursor is.
  • dd : Delete the line where the cursor is.
  • 3dd : Delete three lines beginning where the cursor is.
  • 0 : Jump the cursor to the beginning of the line.
  • $ : Jump the cursor to the end of the line.

Linux configuration files or program code can have hundreds or even thousands of lines. One helpful option is searching for a particular keyword or string of text. Use the / character followed by the text you want to search for. Be sure you’re in Command mode for this. If you want to search for the string “disabled” (representing a disabled or off setting), then use the following command:

Use the n and N keys to move forward or backward through the results.

Vim terminology is a bit different than you might be used to. Yank is the vim term for copy, delete is also a cut function and put is the word for paste.

  • yy : Yank the current line.
  • 4yy : Yank the current line and the following three lines (for a total of four).
  • dd : Cut (or delete) the current line.
  • 4dd : Cut the current line and the following three lines.
  • p : Put or paste the yanked or cut text at the cursor’s position.

Configure Additional vim Options

Vim installs with a common set of defaults most people find useful. You can customize it to fit your needs by using a vim configuration file named .vimrc . The file does not exist by default, so you must create it. Be sure to do so in your home directory. Note that the first character of the file name is a dot.

Begin by moving to your home directory with cd  and then creating the .vimrc file:

.vimrc

Press the i key to place vim in Insert mode.

Add whatever custom configurations you prefer. Here are a few common examples, including comment fields to explain them:

line numbers number tabs equal to four spaces tabstop=4 search results hlsearch

Switch to Command mode with Esc , then type :wq to save your changes and quit vim.

Note that the .vimrc  file uses the “ character to mark comments rather than the more common # .

Vim includes many configuration settings. Search online for the interesting and useful ways vim users have customized the tool over the years. For example, many Python developers use vim as their preferred integrated development environment (IDE). Use guides like Vim and Python – A Match Made in Heaven to customize your .vimrc  file for Python.

Vim relies on plug-ins to manage many additional custom features, extending the program’s usefulness. For example, the NerdTree plug-in displays your file structure within a vim window so you can see your entire project.

The nano Text Editor

Nano is simpler and less confusing than vim, though it is also less feature-rich and extensible. Still, it’s a great solution for quick configuration file edits or for authoring short documents. And the menu at the bottom of the nano interface means you don’t have to memorize a bunch of odd keystrokes.

Nano functions using meta keys — mainly, the Ctrl key. It opens in a normal interface, meaning if you press a key on the keyboard, it will enter text into the file. Hold down the Ctrl key and press other keys to give instructions like “save to nano.” Nano uses the ^ character to represent the Ctrl meta key, so if you see ^X , it means Ctrl+X .

Many Linux distributions include nano by default, though you can install it if it’s not already part of your favorite distro. Nano’s homepage includes documentation, FAQs and shortcuts.

Basic Document Management With nano

In the discussion of vim above, I showed four basic tasks: Create/open a file, edit the file, save changes and exit. These fundamental tasks apply to nano, too (and really, they apply to any text editor).

Create a new file or open an existing one in your home directory by typing these commands:

linux-basics.txt

text to speech software linux

The file opens, showing the text you entered with vim. Note the menu at the bottom, which displays some standard nano functions. (Others are available but not shown.)

Use the arrow keys to move below the existing lines of text and type the following information:

common Linux text editors are vim and nano.

To save your changes, press the Ctrl key and then the S key. Use the O key to “write out” your changes (this is equivalent to “Save As” in other programs). Nano shows you the current file name, so you can just press Enter . You’ve saved the file, so quitting nano is the final step. Press Ctrl again along with the X key to exit nano. The editor will prompt you if you’ve forgotten to save changes.

text to speech software linux

Practice these steps a few times. They are the same steps you learned above with vim. You should master these four basic steps for both editors.

Additional nano Tricks

Like vim and other editors, nano offers many basic options. Here are several you will find particularly useful.

  • Ctrl+A : Jump to the start of the current line.
  • Ctrl+E : Jump to the end of the current line.
  • Ctrl+K : Delete the current line.
  • Alt+U : Undo the most recent change.
  • Ctrl+W : Search for a string of text. You will be prompted to enter the search string.
  • Alt+R : Search for a string. You will be prompted to enter the text you want to replace it with.

Nano has a straightforward method for cutting or copying text and pasting it elsewhere. It functions by marking the start and end of the text you want to copy and then specifying where to paste it.

Start marking the text by placing your cursor at the beginning of the desired text, then press Alt+A . Use the arrow keys to move the cursor to the end of the text you want to work with. The text between the two points will be highlighted. You can either cut or copy it.

  • Alt+6 copies the text.
  • Ctrl+K  cuts the text.

Now that the text is in the buffer (on the “clipboard”), move the cursor to the point where you want the content pasted. Select Ctrl+U  to paste it.

Configure Additional nano Options

Nano offers useful customizations and additional settings. Many of these are handy for development work and managing system configuration files. As with vim, you can set permanent customizations in a configuration file in your home directory. Use nano to create and open a file named .nanorc . (Note the “dot” at the start, marking this as a hidden file.)

Here are a few sample entries. Put these on separate lines. Use the # character to mark comments (explanations) for each entry, as seen below:

linenumbers tabsize 4 autoindent

Head over to the official nanorc page for more ideas and options.

Graphical Text Editors

You may sit at standard Linux workstations with a graphical user interface (GUI) running on it. If that’s the case, you aren’t likely to want to jump out to the Terminal to write text files. Various GUI-based editors exist. One of the most common is GNU gedit .

This menu-driven text editor is similar to macOS TextEdit or Windows Notepad. Use your mouse to select options from the menus at the top of the interface. Many familiar keyboard shortcuts function in gedit, too.

text to speech software linux

Other GUI text editors exist and might be useful, depending on your needs. Here are a few:

  • Kate : Robust KDE-based editor good for coding.
  • Leafpad : Lightweight editor that emphasizes simplicity.
  • Sublime : Developer coding platform that is not open source but still popular in the Linux community.
  • VS Code : Microsoft’s cross-platform, extensible coding solution.

Recall that text editors and word processors are not the same. Word processors have far more features oriented on large and complex documentation. They typically embed a lot of hidden instructions within the text that interfere with configuration files and programming languages. Word processors are a different tool with a different job than text editors. One example word processor is LibreOffice Writer .

Add Linux Editors to macOS and Windows

Multiple versions of vim exist for macOS and Windows, too. Adding vim to your daily-use computer (even if it’s not Linux) is a handy way to practice your editing skills.

Nano was included with older macOS versions. You can add it to your current macOS version using a package manager like Homebrew . Various nano versions exist for Windows, too.

I don’t find nano to be as powerful or extensible as vim. Or, to phrase that another way, nano is simpler and less confusing than vim. I really don’t believe one is better than the other, but they are both useful for different things. I find nano to be handy for very quick and basic configuration file edits, such as managing root login via SSH in the / etc / ssh / sshd_config file by using either yes or no . I prefer vim for longer configuration files, where I need to search for various settings. I also use vim periodically to write more substantial documents, like this tutorial. Because I’ve used vim for a long time, I’m more comfortable with it, so it is my go-to editor. (I even use it on my Mac!)

I recommend you get comfortable with opening and editing files using both vim and nano. That skill will serve you well on nearly any Linux distribution you come across. Practice with both whenever you need to generate or edit some basic text!

text to speech software linux

Mistral-NeMo-Minitron 8B Foundation Model Delivers Unparalleled Accuracy

text to speech software linux

Last month, NVIDIA and Mistral AI unveiled Mistral NeMo 12B , a leading state-of-the-art large language model (LLM). Mistral NeMo 12B consistently outperforms similarly sized models on a wide range of benchmarks . 

Today, we announce Mistral-NeMo-Minitron 8B, one of the most advanced open-access models in its size class. This model consistently delivers leading accuracy on nine popular benchmarks. The Mistral-NeMo-Minitron 8B base model was obtained by width-pruning the Mistral NeMo 12B base model , followed by a light retraining process using knowledge distillation. This is a successful recipe that NVIDIA originally proposed in the paper, Compact Language Models via Pruning and Knowledge Distillation . It’s been proven time and again with NVIDIA Minitron 8B and 4B, and Llama-3.1-Minitron 4B models. 

Training tokensWino-Grande 5-shotARC
Challenge 25-shot
MMLU 5-shotHella
Swag 10-shot
GSM8K 5-shotTruthfulQA 0-shotXLSum en (20%)
3-shot
MBPP
0-shot
Human
Eval
0-shot
Llama 3.1 8B15T77.2757.9465.2881.8048.6045.0630.0542.2724.76
Gemma 7B6T786164825045173932
Mistral-NeMo-Minitron 8B380B
Mistral NeMo 12BN/A82.2465.1068.9985.1656.4149.7933.4342.6323.78

Overview of model pruning and distillation 

Model Pruning is the process of making a model smaller and leaner, either by dropping layers ( depth pruning ) or dropping neurons and attention heads and embedding channels ( width pruning ). Pruning is often accompanied by some amount of retraining for accuracy recovery.

Model distillation is a technique used to transfer knowledge from a large, complex model, often called the teacher model , to a smaller, simpler student model . The goal is to create a more efficient model that retains much of the predictive power of the original, larger model while being faster and less resource-intensive to run. Herein, we employ distillation as a light retraining procedure after pruning, on a dataset much smaller than that used in model training from scratch.

Iterative pruning and distillation is an approach where, starting from a single pretrained model, multiple progressively smaller models can be obtained. For example, a 15B model can be pruned and distilled to obtain an 8B model, which in turn serves as a starting point for pruning and distilling a 4B model, and so on. 

The combination of model pruning followed by light retraining through distillation has been found to be an effective and cost-efficient approach to train a family of models. For each additional model, just 100-400 billion tokens are used for retraining—a greater than 40x reduction compared to training from scratch. As such, the compute cost savings to train a family of models (12B, 8B, and 4B) is up to 1.95x compared to training all models from scratch. 

The learning from extensive ablation studies has been summarized into 10 best practices for structured weight pruning combined with knowledge distillation . We found that width pruning consistently outperforms depth pruning and, most importantly, pruned and distilled models outperform models trained from scratch in quality. 

Mistral-NeMo-Minitron 8B

Following our best practices, we width-pruned the Mistral NeMo 12B model to obtain an 8B target model. This section details the steps and parameters used to obtain the Mistral-NeMo-Minitron 8B base model, as well as its performance.

Teacher fine-tuning

To correct for the distribution shift across the original dataset the model was trained on, we first fine-tuned the unpruned Mistral NeMo 12B model on our dataset using 127B tokens. Experiments showed that, without correcting for the distribution shift, the teacher provides suboptimal guidance on the dataset when being distilled.

Width-only pruning

Given our goal of obtaining the strongest 8B model possible, we proceeded with width-only pruning. We pruned both the embedding (hidden) and MLP intermediate dimensions along the width axis to compress Mistral NeMo 12B. Specifically, we computed importance scores for each attention head, embedding channel, and MLP hidden dimension using the activation-based strategy. Following importance estimation, we:

  • Pruned the MLP intermediate dimension from 14336 to 11520
  • Pruned the hidden size from 5120 to 4096
  • Retained the attention head count and number of layers

Distillation parameters

We distilled the model with peak learning rate=1e-4, minimum learning rate=4.5e-7, linear warm up of 60 steps, cosine decay schedule, and a global batch size of 768 using 380 billion tokens (the same dataset used in teacher fine-tuning).

Mistral-NeMo-Minitron 8B provides class-leading accuracy and consistently outperforms recently introduced state-of-the-art models of similar size. Mistral-NeMo-Minitron 8B is our first work on the distillation of the Mistral NeMo 12B model and provides strong support for our structured weight pruning combined with knowledge distillation best practices. Further work distilling and obtaining even smaller and more accurate models is planned. The technique implementation will be gradually rolled out in the NVIDIA NeMo framework for generative AI.

To learn more, check out these resources:

  • LLM Pruning and Distillation in Practice: The Minitron Approach
  • Compact Language Models via Pruning and Knowledge Distillation  
  • NVlabs/Minitron GitHub repo 
  • Mistral-NeMo-Minitron 8B base model on Hugging Face

Acknowledgments

This work would not have been possible without contributions from many people at NVIDIA. To mention a few of them:

Foundation model : Sharath Turuvekere Sreenivas, Saurav Muralidharan, Raviraj Joshi, Marcin Chochowski, Pavlo Molchanov, Mostofa Patwary, Daniel Korzekwa, Ashwath Aithal, Mohammad Shoeybi, Bryan Catanzaro, and Jan Kautz Alignment : Ameya Sunil Mahabaleshwarkar, Hayley Ross, Brandon Rowlett, Oluwatobi Olabiyi, Shizhe Diao, and Yoshi Suhara Datasets : Sanjeev Satheesh, Jupinder Parmar, Shengyang Sun, Jiaqi Zeng, Zhilin Wang, Yi Dong, Zihan Liu, Rajarshi Roy, Wei Ping, Makesh Narsimhan Sreedhar, and Oleksii Kuchaiev TensorRT-LLM : Bobby Chen, James Shen and Chenhan Yu Hugging Face support : Ao Tang, Yoshi Suhara, and Greg Heinrich

Related resources

  • GTC session: Generative AI Theater: Mixtral of Experts Explained
  • GTC session: Exploring Foundation Models: The Pillars of AI Advancement
  • GTC session: Mistral AI: Frontier AI in Your Hands
  • NGC Containers: Mistral-7B-Instruct-v0.3
  • SDK: NeMo Megatron
  • SDK: NeMo LLM Service

About the Authors

Avatar photo

Related posts

Decorative image of two cartoon llamas in sunglasses.

How to Prune and Distill Llama-3.1 8B to an NVIDIA Llama-3.1-Minitron 4B Model

Decorative image of a model with multiple apps.

Power Text-Generation Applications with Mistral NeMo 12B Running on a Single GPU

Illustration showing models and NeMo.

NVIDIA NeMo Accelerates LLM Innovation with Hybrid State Space Model Support

Illustration representing NeMo Framework.

New NVIDIA NeMo Framework Features and NVIDIA H200 Supercharge LLM Training Performance and Versatility

text to speech software linux

Deploying a 1.3B GPT-3 Model with NVIDIA NeMo Framework

text to speech software linux

Optimizing Inference Efficiency for LLMs at Scale with NVIDIA NIM Microservices

text to speech software linux

Writer Releases Domain-Specific LLMs for Healthcare and Finance

text to speech software linux

Accelerating Hebrew LLM Performance with NVIDIA TensorRT-LLM

A connected grid of AI applications, optimizing RAG pipelines.

Enhancing RAG Pipelines with Re-Ranking

text to speech software linux

Accelerate Generative AI Inference Performance with NVIDIA TensorRT Model Optimizer, Now Publicly Available

IMAGES

  1. How to Install eSpeak Text to Speech Software on Ubuntu 20.04

    text to speech software linux

  2. Text to Speech: A Look at the Qt Speech Module

    text to speech software linux

  3. How to Convert Text to Speech on Linux: 12 Steps (with Pictures)

    text to speech software linux

  4. How To: Text to speech in linux terminal

    text to speech software linux

  5. Speech Recognition to Text in Linux, Ubuntu using Google Docs

    text to speech software linux

  6. Text To Speech On Linux With Festival

    text to speech software linux

COMMENTS

  1. eSpeak: Text To Speech Tool For Linux

    eSpeak: Text To Speech Tool For Linux. eSpeak is a command line tool for Linux that converts text to speech. This compact speech synthesizer provides support for English and many other languages. It is written in C. eSpeak reads the text from the standard input or input file. The voice generated, however, is nowhere close to a human voice.

  2. An In-Depth Guide to Open Source Text-to-Speech Engines for Linux

    This comprehensive guide explores the top open source text-to-speech (TTS) engines available for Linux. Converting text into lifelike speech is useful for accessibility, delivering information via voice interfaces, learning pronunciation, and more. We'll cover the capabilities of leading Linux TTS tools, their installation, and plenty of usage examples. Introduction to Text-to-Speech Text-to ...

  3. Best Text to Speech Software for Linux in 2024

    Here is a step by step guide to installing TTS engines and libraries on your Linux system: 1. Research and select a TTS software: Explore available TTS engines and libraries compatible with your Linux system. Popular choices include eSpeak, Acapella, and Cepstral. 2.

  4. 13 Best Free Linux Speech Recognition Tools

    TensorFlow implementation of Baidu's DeepSpeech architecture. Julius. Two-pass large vocabulary continuous speech recognition engine. OpenSeq2Seq. TensorFlow-based toolkit for sequence-to-sequence models. CMUSphinx. Speech recognition system for mobile and server applications. Eesen. End-to-End Speech Recognition.

  5. 7 Best Open Source Text-to-Speech (TTS) Engines

    The 7 Best Open Source Text-to-Speech (TTS) Engines. Here are some well-known open-source TTS engines: 1. MaryTTS (Multimodal Interaction Architecture) A flexible, modular architecture for building TTS systems, including a voice-building tool for generating new voices from recorded audio data.

  6. Top 15 Open Source Speech Recognition/TTS/STT/ Systems

    A text-to-speech (TTS) system, on the contrary, is a method to generate audio from textual data and files. You basically give it the text, and it generates the corresponding speech audio for it. ... Hanny brings more than a decade of experience with Linux and open-source software. He has developed Linux distributions, desktop programs, web ...

  7. eSpeak NG Text-to-Speech

    The eSpeak NG is a compact open source software text-to-speech synthesizer for Linux, Windows, Android and other operating systems. It supports more than 100 languages and accents. It is based on the eSpeak engine created by Jonathan Duddington. eSpeak NG uses a "formant synthesis" method. This allows many languages to be provided in a small size.

  8. High Quality Text to Speech Software

    The Cepstral, paid Linux software for TTS can speak any text they are given with whatever voice you choose. Cepstral is building new synthetic voices for Text-to-Speech (TTS) every day, and can find or build the right one for any application. As you may know that Cepstral is non free program for Linux, and you have to pay for $40.

  9. Text to Speech for Linux: Unveiling Top Solutions for Voice Synthesis

    Text-to-speech (TTS) technology on Linux allows users to convert written text into spoken words. This functionality is not only useful for the visually impaired but also benefits those who prefer auditory learning or require hands-free computing. Several TTS tools are available for Linux, each offering varying features to cater to diverse needs.

  10. Convert Text To Speech Using eSpeak NG In Linux

    Type the word to speak and hit ENTER key. To exit, press CTRL+C. 4. If you want to save output to a WAV audio file, rather than speaking it directly, use -w flag: $ espeak-ng -w audio.wav "I use Arch, BTW". 5. eSpeak can able to print the phonemes of a text.

  11. 10 Best Open-source Speech Recognition Tools for Linux

    7. Mycroft. Mycroft has an easy-to-use open source voice assistant that converts voice to text. It is regarded as one of the most popular Linux speech recognition tools in modern times, written in Python. It allows users to make the best use of this tool in a science project or enterprise software application.

  12. Best Text to Speech Software for Linux

    Digital Future. TextSpeech Pro is a professional text-to-speech software product, proudly awarded "the best text to speech software in the world". Synthesize text-to-speech from any document format (text, Microsoft Word, PDF, Microsoft Excel, RTF, etc) using a variety of voices and languages.

  13. How to Convert Text to Speech on Linux

    Benefits of text-to-speech on Linux. Accessibility: Text-to-speech (TTS) is the best friend for choice compared to proprietary software. Common use cases for text-to-speech applications. Accessibility Tools: TTS, in short, is an artificial intelligence feature that has a role in screen readers that is used, ...

  14. How to Install eSpeak Text to Speech Software on Ubuntu 20.04

    eSpeak command can be used to convert text into speech. You can give any text file as an input or enter the texts on the terminal for conversion. Let's speak the line "Hi this is a sample" and record it to the sample.mp4 audio file. espeak "Hi this is a sample" -w sample.mp4 -g 60 -p 70 -s 100 -v en-us. Here, -w parameter specifies the ...

  15. 15 Open-source Text To Speech TTS Apps and Libraries

    10- ESPnet: end-to-end speech processing toolkit. ESPnet is an end-to-end speech processing toolkit, mainly focuses on end-to-end speech recognition and end-to-end text-to-speech. It is a developer-friendly application that can integrated into web projects. Developers also can install it using Docker.

  16. software recommendation

    gTTS, Google Text-to-Speech. gTTS, a Python library and CLI tool to interface with Google Translate's text-to-speech API. Writes spoken mp3 data to a file, a file-like object (bytestring) for further audio manipulation, or stdout. Cons: CLI-only. Need to be online as it requires requesting to Google public open endpoint.

  17. Mimic 3

    Mimic 3 is a neural text to speech engine that can run locally, even on low-end hardware like the Raspberry Pi 4. The software speaks over 25 languages with over 100 pre-trained voices. Mimic 3 uses VITS, a "Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech". Mimic 3 is free and open source software.

  18. eSpeak- A text to speech opensource software for Linux

    CTRL + SPACE for auto-complete. eSpeak Speech Synthesizer is an open source speech synthesizer for Windows, Mac and Linux based OS. It provides the option for listening to text in multiple languages. The speech is clear and the available text in English, can be listened to in any alternative language easily. eSpeak does text to speech synthesis ...

  19. eSpeak: Speech Synthesizer

    The speech is clear, and can be used at high speeds, but is not as natural or smooth as larger synthesizers which are based on human speech recordings. eSpeak is available as: A command line program (Linux and Windows) to speak text from a file or from stdin.

  20. Top Text to Speech Software for Linux in 2024

    Wavel.ai. $0 11 Ratings. See Software. Wavel is an AI Dubbing Studio which personalizes videos at Scale. Wavel uses the power of artificial intelligence and human intelligence to create a realistic speech, text, and text-to text solution. Wavel is the only tool to generate the most precise localized files with over 99% accuracy and a record ...

  21. Text to Speech synthesis software

    The practically usable alternatives for converting text to speech using free software on GNU/Linux desktop and laptop machines are: mimic from Mycroft, forked off an early version of the flite software, is the best choice if you are only interested in the English language. festival is actively developed and it works fine but it is not great and ...

  22. Text-to-Speech Software

    Greetings fellow tech enthusiasts, I've been meandering through the intricate alleys of Text-to-Speech (TTS) technology, particularly in the Linux environment. It's a fascinating yet, at times, exasperating expedition, given the current state of affairs. Let me unravel my findings and concerns in detail.

  23. GitHub

    This repository implements a speech-to-speech cascaded pipeline with consecutive parts: Voice Activity Detection (VAD): silero VAD v5; Speech to Text (STT): Whisper checkpoints (including distilled versions) Language Model (LM): Any instruct model available on the Hugging Face Hub! 🤗; Text to Speech (TTS): Parler-TTS🤗

  24. User Experience with Voice-Activated Speech-to-Text Software

    Speech-to-text software converts spoken language into written text in real-time. It transcribes spoken words, phrases, and sentences into digital text that can be used for various purposes, such as creating documents, sending messages, or inputting data. This software is often used for accessibility, documentation, and improving communication ...

  25. The Linux User's Ultimate Guide to Text Editors

    Use the arrow keys to move the cursor to the end of the text you want to work with. The text between the two points will be highlighted. You can either cut or copy it. Alt+6 copies the text. Ctrl+K cuts the text. Now that the text is in the buffer (on the "clipboard"), move the cursor to the point where you want the content pasted.

  26. Mistral-NeMo-Minitron 8B Foundation Model Delivers Unparalleled

    Last month, NVIDIA and Mistral AI unveiled Mistral NeMo 12B, a leading state-of-the-art large language model (LLM).Mistral NeMo 12B consistently outperforms similarly sized models on a wide range of benchmarks.. Today, we announce Mistral-NeMo-Minitron 8B, one of the most advanced open-access models in its size class.