Voice Input

Voice queries are a powerful feature of Billx-Agent, enabling users to speak naturally and receive database results without typing. To ensure accurate transcription and reliable responses, follow these best practices.


✅ DO

✔️ Speak Clearly and Naturally

  • Use a steady speaking pace

  • Avoid trailing off or mumbling

  • Speak as if you’re giving a command, like:

    “Show me top 10 products by sales in the last month”


✔️ Use Concise Voice Prompts

  • Ideal voice prompts are under 15 seconds

  • Long, multi-part queries can reduce transcription accuracy

🎯 Good: “How many new users signed up this week?”

⚠️ Risky: “Give me users by signup date but only if they purchased in June and are from Canada or the US and they cancelled later”


✔️ Use High-Quality Audio

  • Preferred formats: .mp3 or .wav (mono)

  • Sample rate: 16kHz or higher

  • Limit background noise and echo

📱 On mobile? Use the built-in mic and speak close to the device.


✔️ Test Voice Equivalents of Text Prompts

Make sure your spoken prompts resemble the same structure as effective text prompts:

Voice: “Orders over $1,000 in March 2024” Text equivalent: "Show orders over $1000 from March 2024"


❌ AVOID

❌ Using Filler Words

Avoid words like:

  • “Uh...”, “Okay so like...”, “I guess I want to maybe...”

These confuse the transcription and degrade the SQL match.


❌ Uploading Non-Speech Audio

  • Do not upload music, multi-speaker podcasts, or unclear voice recordings

  • The STT engine is optimized for single-speaker, query-style input


❌ Giving Update or Instructional Commands

❌ “Remove all the failed transactions from the last quarter” ❌ “Delete accounts that are inactive”

Voice input is treated as read-only intent and will not generate destructive SQL.


🔁 Voice + Text Option

You can also send a text override along with your audio via /audio-chat, in case speech fails:

{
  "text": "Top 5 customers by purchase value last year"
}

🧠 Voice commands are powerful — and with the right phrasing and audio quality, your users can get spoken answers from live data in seconds.

Last updated