Input Handler (Text / Voice)

The Input Handler is the first processing layer in Billx-Agent’s pipeline. It accepts user input in either text or audio form and prepares it for the AI Query Engine.


🎯 What It Does

  • Accepts input via:

    • POST /chatText prompt

    • POST /audio-chatVoice or text

  • Automatically detects the input type:

    • If audio is provided, it uses ElevenLabs Speech-to-Text (STT) to transcribe it.

    • If text is provided directly, transcription is skipped.


🧼 Pre-Processing

Before passing the prompt to the AI, the input handler performs:

  • Whitespace and formatting cleanup

  • Punctuation standardization

  • Validation to ensure it's a supported query format


🎤 Audio Input Flow

  1. User uploads a .wav or .mp3 audio file

  2. ElevenLabs STT converts speech → text

  3. Transcribed prompt is treated like a regular /chat input


✏️ Example Input Flow

Input Type
Example
Process

Text

"Top 5 products by sales"

Sent directly to AI for SQL generation

Audio

Spoken query: same as above

Transcribed → cleaned → forwarded


✔ This abstraction makes it possible to switch seamlessly between text and voice inputs with no extra code on the client side.

Last updated