ElevenLabs: the most realistic speech generator

ElevenLabs is designed to convert text into speech using AI voices.

Credentials

  • On Scade you are provided with credentials to use the service. However if you already have an account you can use your own credentials here.

Text

  • This is the content you want ElevenLabs to convert to speech. It can be any written text that you would like to hear spoken aloud.

Voice

  • Lets you choose a specific voice to read your text aloud, providing options for different genders, tones, accents, and styles.
  • If you enter your own credentials, the “Is Custom Voice ID” toggle will appear, allowing you to enter the ID of a custom voice.

Stability

  • This setting controls how consistent the voice sounds. A higher stability value (up to 1.0) will make the voice more predictable and steady, while a lower value can add more natural variations to make it sound less robotic and more like real speech.

Similarity boost

  • This controls how closely the generated voice tries to match the original voice. A higher similarity boost (up to 1.0) means it will try harder to mimic the original speaker, while a lower boost allows for more flexibility in the voice.

Style

  • The tone or emotion you want the voice to have. A higher value amplifies the original speaker’s style more strongly in the generated speech.

Use speaker boost

  • When enabled, this can enhance the clarity or presence of the speaker’s voice, making it stand out more in the audio.

Optimize streaming latency

  • This helps reduce delay when streaming speech output, making the response time faster. It’s useful for real-time applications where you want the speech to be heard immediately.

Custom output format

  • You can choose how you want the voice file to be saved or streamed. For example, you may want it as MP3, WAV, or other audio file type depending on your needs.

Pronunciation dictionary locators

  • This allows you to specify a dictionary that provides custom pronunciations for specific words. If the AI struggles with pronouncing special names or uncommon words, you can use this to guide it.

Timeout in seconds

  • The maximum time the node will wait before giving up on a request. For example, if the text-to-speech process takes too long, the request will fail after this period.

Max retries

  • This specifies a number of times the node will try to process your request again if something goes wrong. If the first request fails, it will repeat this many times before giving up.

Additional headers

  • These are extra pieces of information you can send along with your request, often used for things like security or content type. Most users won’t need to worry about this unless they’re doing advanced API work.

Additional query parameters

  • Extra filters or options you can pass in the URL when making your request. These are used to fine-tune what you’re asking the API to do.

Additional body parameters

  • These are additional settings or data points you send along with the text if you want more control over how the speech is generated. Include them in the main body of your request.

This is a simplified overview of how each of these parameters work in ElevenLabs’ node, helping you to customize how your text is turned into speech.

How to use ElevenLabs node on Scade

Begin by creating a text input field in the start node and a file input field in the end node. If you’re not sure how to do this, please refer to the Getting started guide.

This node includes pre-filled fields, allowing you to try it out immediately.

When you use your own credentials, a toggle and field for adding a custom voice will appear. Simply enter the ID of your desired voice from your account and proceed with configuring the node.

Next, input your desired text in the Start node’s text field, connect it to the text parameter of the ElevenLabs node, and link the file output to the result field of your end node.