ElevenLabs: the most realistic speech generator

Admin · November 7, 2024, 9:28pm

ElevenLabs is designed to convert text into speech using AI voices.

Credentials

On Scade you are provided with credentials to use the service. However if you already have an account you can use your own credentials here.

Text

This is the content you want ElevenLabs to convert to speech. It can be any written text that you would like to hear spoken aloud.

Voice

Lets you choose a specific voice to read your text aloud, providing options for different genders, tones, accents, and styles.
If you enter your own credentials, the “Is Custom Voice ID” toggle will appear, allowing you to enter the ID of a custom voice.

Stability

This setting controls how consistent the voice sounds. A higher stability value (up to 1.0) will make the voice more predictable and steady, while a lower value can add more natural variations to make it sound less robotic and more like real speech.

Similarity boost

This controls how closely the generated voice tries to match the original voice. A higher similarity boost (up to 1.0) means it will try harder to mimic the original speaker, while a lower boost allows for more flexibility in the voice.

Style

The tone or emotion you want the voice to have. A higher value amplifies the original speaker’s style more strongly in the generated speech.

Use speaker boost

When enabled, this can enhance the clarity or presence of the speaker’s voice, making it stand out more in the audio.

Optimize streaming latency

This helps reduce delay when streaming speech output, making the response time faster. It’s useful for real-time applications where you want the speech to be heard immediately.

Custom output format

You can choose how you want the voice file to be saved or streamed. For example, you may want it as MP3, WAV, or other audio file type depending on your needs.

Pronunciation dictionary locators

This allows you to specify a dictionary that provides custom pronunciations for specific words. If the AI struggles with pronouncing special names or uncommon words, you can use this to guide it.

Timeout in seconds

The maximum time the node will wait before giving up on a request. For example, if the text-to-speech process takes too long, the request will fail after this period.

Max retries

This specifies a number of times the node will try to process your request again if something goes wrong. If the first request fails, it will repeat this many times before giving up.

Additional headers

These are extra pieces of information you can send along with your request, often used for things like security or content type. Most users won’t need to worry about this unless they’re doing advanced API work.

Additional query parameters

Extra filters or options you can pass in the URL when making your request. These are used to fine-tune what you’re asking the API to do.

Additional body parameters

These are additional settings or data points you send along with the text if you want more control over how the speech is generated. Include them in the main body of your request.

This is a simplified overview of how each of these parameters work in ElevenLabs’ node, helping you to customize how your text is turned into speech.

How to use ElevenLabs node on Scade

Begin by creating a text input field in the start node and a file input field in the end node. If you’re not sure how to do this, please refer to the Getting started guide.

This node includes pre-filled fields, allowing you to try it out immediately.

When you use your own credentials, a toggle and field for adding a custom voice will appear. Simply enter the ID of your desired voice from your account and proceed with configuring the node.

Next, input your desired text in the Start node’s text field, connect it to the text parameter of the ElevenLabs node, and link the file output to the result field of your end node.