Skip to content

uhlive.stream.recognition

The stream recognition API SDK for voice bots.

Stream for voicebots, or Stream Human to Bots, or Stream H2B is a set of API enabling clients to build interaction between a human end-user and a bot, for example to create Interactive Voice Response (IVR) on the phone, or a voicebot within an app.

For an overview of the concepts, protocols and workflow, see the higher level documenation and more specifically the Websocket H2B protocol reference.

The protocol is messages based and uses websockets as transport. You are free to use whatever websocket client library you like to communicate with the API, and use our SDK to encode/decode the messages.

Quickstart

First retrieve a one time access token with the Auth API.

Then use that token to build an authenticated URL, open a websocket connection to it with the websocket client library of your choice and instanciate a Recognizer to make request, generate audio stream messages and decode responses.

As the API is asynchronous, streaming the audio and reading the returned events should be done in two different threads/tasks.

from uhlive.stream.recognition import *

stream_h2b_url, stream_h2b_headers = build_connection_request(token)
recognizer = Recognizer()

Now you can connect and interact with the API:

Synchronous example:

import websocket as ws

socket = ws.create_connection(stream_h2b_url, header=stream_h2b_headers)
socket.send(recognizer.open())
# Check if successful
reply = recognizer.receive(socket.recv())
assert isinstance(reply, Opened), f"Expected Opened, got {reply}"
# start streaming the user's voice in another thread
streamer_thread_handle = stream_mic(socket, recognizer)

Asynchronous example:

from aiohttp import ClientSession

    async with ClientSession() as session:
        async with session.ws_connect(stream_h2b_url, header=stream_h2b_headers) as socket:

            # Open a session
            # Commands are sent as text frames
            await socket.send_str(recognizer.open())
            # Check if successful
            msg = await socket.receive()
            reply = recognizer.receive(msg.data)
            assert isinstance(reply, Opened), f"Expected Opened, got {reply}"
            # start streaming the user's voice in another task
            streamer_task_handle = asyncio.create_task(stream(socket, recognizer))

As you can see, the I/O is cleanly decoupled from the protocol handling: the Recognizer object is only used to create the messages to send to the API and to decode the received messages as Event objects.

See the complete examples in the source distribution.

ProtocolError

Bases: RuntimeError

Exception raised when a Recognizer method is not available in the current state.

Recognizer

Recognizer()

The connection state machine.

Use this class to decode received frames as Events or to make command frames by calling the appropriate methods. If you call a method that is not appropriate in the current protocol state, a ProtocolError is raised.

channel_id property

channel_id: str

The current session ID.

open

open(custom_id: str = '', channel_id: str = '', session_id: str = '', audio_codec: str = 'linear') -> str

Open a new H2B session.

Parameters:

  • custom_id (str, default: '' ) –

    is any reference of yours that you want to appear in the logs and invoice reports; for example your client id.

  • channel_id (str, default: '' ) –

    when provided, it'll be used as a prefix for the actual channel ID generated by the server.

  • session_id (str, default: '' ) –

    you can set an alternate channel identifier to appear along side the channel_id in the logs and in the Dev Console, to make it easier to reconcile your client logs with the server logs.

  • audio_codec (str, default: 'linear' ) –

    the speech audio codec of the audio data:

    • "linear": (default) linear 16 bit SLE raw PCM audio at 8khz;
    • "g711a": G711 a-law audio at 8khz;
    • "g711u": G711 μ-law audio at 8khz.

Returns:

  • str

    A websocket text message to send to the server.

Raises:

send_audio_chunk

send_audio_chunk(chunk: bytes) -> bytes

Build an audio chunk frame for streaming.

Returns:

  • bytes

    A websocket binary message to send to the server.

Raises:

set_params

set_params(**params: Any) -> str

Set default ASR parameters for the session.

See the parameter list and the parameter visual explanations for an explanation of the different parameters available.

Returns:

  • str

    A websocket text message to send to the server.

Raises:

get_params

get_params() -> str

Retrieve the default values for the ASR parameters.

Returns:

  • str

    A websocket text message to send to the server.

Raises:

define_grammar

define_grammar(builtin: str, alias: str) -> str

Define a grammar alias for a parameterized builtin.

Parameters:

  • builtin (str) –

    the builtin URI to alias, including the query string, but without the "builtin:" prefix

  • alias (str) –

    the alias, without the "session:" prefix.

Returns:

  • str

    A websocket text message to send to the server.

Raises:

recognize

recognize(*grammars: str, start_timers: bool = True, recognition_mode: str = 'normal', **params: Any) -> str

Start a recognition process.

This method takes grammar URIs as positional arguments, including the builtin: or session: prefixes to make the difference between builtin grammars and custom aliases.

Other Parameters:

  • start_timers (bool) –

    default True

  • recognition_mode (str) –

    default is "normal"

  • **params (Any) –

    any other ASR parameter (no client side defaults).

Returns:

  • str

    A websocket text message to send to the server.

Raises:

close

close() -> str

Close the current session.

Returns:

  • str

    A websocket text message to send to the server.

Raises:

start_input_timers

start_input_timers() -> str

If the input timers were not started by the RECOGNIZE command, starts them now.

Returns:

  • str

    A websocket text message to send to the server.

Raises:

stop

stop() -> str

Stop ongoing recognition process

Returns:

  • str

    A websocket text message to send to the server.

Raises:

receive

receive(data: Union[str, bytes]) -> Event

Decode received text frame.

The server always replies with text frames.

Returns:

  • Event

    The appropriate Event subclass.

Closed

Closed(data: Dict[str, Any])

Bases: Event

The session is closed.

CompletionCause

Bases: Enum

The set of possible completion causes.

See all possible values.

DefaultParams

DefaultParams(data: Dict[str, Any])

Bases: Event

All the parameters and their values are in the headers property

Event

Event(data: Dict[str, Any])

Base class of all the events

request_id property

request_id: int

The request ID that event responds to.

channel_id property

channel_id: str

The channel ID.

headers property

headers: Dict[str, Any]

The response headers.

See also the header description.

completion_cause property

completion_cause: Optional[CompletionCause]

The response CompletionCause.

completion_reason property

completion_reason: Optional[str]

The completion message.

body property

body: Optional[RecogResult]

The content of the Event is a RecogResult if it is a RecognitionComplete event.

GrammarDefined

GrammarDefined(data: Dict[str, Any])

Bases: Event

The DefineGrammar command has been processed.

InputTimersStarted

InputTimersStarted(data: Dict[str, Any])

Bases: Event

The Input Timers are started.

Interpretation

Interpretation(data: Dict[str, Any])

The interpretation part of a recognition result

confidence property

confidence: float

The confidence of the interpretation.

type property

type: str

The type of the Interpretation is given by the builtin grammar URI

value property

value: Dict[str, Any]

The structured interpreted value.

The type/schema of the value is given by the self.type property.

See the Grammar reference documentaiton.

InvalidParamValue

InvalidParamValue(data: Dict[str, Any])

Bases: Event

The server received a request to set an invalid value for a parameter.

MethodFailed

MethodFailed(data: Dict[str, Any])

Bases: Event

The server was unable to complete the command.

MethodNotAllowed

MethodNotAllowed(data: Dict[str, Any])

Bases: Event

The command is not allowed in this state.

MethodNotValid

MethodNotValid(data: Dict[str, Any])

Bases: Event

The server received an invalid command.

MissingParam

MissingParam(data: Dict[str, Any])

Bases: Event

The command is missings some mandatory parameter.

Opened

Opened(data: Dict[str, Any])

Bases: Event

Session opened on the server

ParamsSet

ParamsSet(data: Dict[str, Any])

Bases: Event

The default parameters were set.

RecognitionComplete

RecognitionComplete(data: Dict[str, Any])

Bases: Event

The ASR recognition is complete.

RecognitionInProgress

RecognitionInProgress(data: Dict[str, Any])

Bases: Event

The ASR recognition is started.

RecogResult

RecogResult(data: dict)

When a recognition completes, this describe the result

asr property

asr: Optional[Transcript]

The ASR part of the result (transcription result)

nlu property

nlu: Optional[Interpretation]

The NLU part of the result (interpretation)

grammar_uri property

grammar_uri: str

The grammar that matched, as it was given to the RECOGNIZE command

StartOfInput

StartOfInput(data: Dict[str, Any])

Bases: Event

In normal recognition mode, this event is emitted when speech is detected.

Stopped

Stopped(data: Dict[str, Any])

Bases: Event

The ASR recognition has been stopped on the client request.

Transcript

Transcript(data: Dict[str, Any])

The Transcript part of a recognition result

transcript property

transcript: str

The raw ASR output.

confidence property

confidence: float

The ASR transcription confidence.

start property

start: datetime

Start of speech.

end property

end: datetime

End of speech.

build_connection_request

build_connection_request(token) -> Tuple[str, dict]

Make an authenticated URL and header to connect to the H2B Service.