Text-to-Speech (TTS)

The Text-to-Speech Layer is the last stage in the /audio-chat pipeline. After generating and optionally refining the answer, Billx-Agent uses ElevenLabs to convert that text into a natural-sounding voice response.


🎯 What It Does

  • Accepts a text string (e.g. "Top 5 products are...")

  • Sends it to ElevenLabs TTS API

  • Receives an audio file in return

  • Encodes the audio in Base64 format

  • Includes the audio in the API response so the client can play it


🔉 Where It's Used

  • Primarily in POST /audio-chat

  • Can also be used standalone via POST /tts


🔁 Audio Response Example

{
  "refined_answer": "The top 3 selling products are A, B, and C.",
  "audio_content": "<base64-encoded-audio>"
}

You can play the audio in your frontend using JavaScript like this:

const audio = new Audio("data:audio/mp3;base64,<audio_content>");
audio.play();

⚙️ Output Format

  • Default format: MP3 (via ElevenLabs)

  • Returned as base64 string in API response

  • Compatible with web, mobile, and desktop audio players


🎙️ Voice Customization (Advanced)

If supported by your ElevenLabs plan, you can:

  • Choose different voices

  • Adjust speech rate and pitch

  • Localize for different languages (future-ready)


📌 TTS is optional — if your users prefer reading results, you can ignore the audio_content field.


Last updated