uhlive.stream.recognition

The stream recognition API SDK for voice bots.

Stream for voicebots, or Stream Human to Bots, or Stream H2B is a set of API enabling clients to build interaction between a human end-user and a bot, for example to create Interactive Voice Response (IVR) on the phone, or a voicebot within an app.

For an overview of the concepts, protocols and workflow, see the higher level documenation and more specifically the Websocket H2B protocol reference.

The protocol is messages based and uses websockets as transport. You are free to use whatever websocket client library you like to communicate with the API, and use our SDK to encode/decode the messages.

Quickstart

First retrieve a one time access token with the Auth API.

Then use that token to build an authenticated URL, open a websocket connection to it with the websocket client library of your choice and instanciate a Recognizer to make request, generate audio stream messages and decode responses.

As the API is asynchronous, streaming the audio and reading the returned events should be done in two different threads/tasks.

from uhlive.stream.recognition import *

stream_h2b_url, stream_h2b_headers = build_connection_request(token)
recognizer = Recognizer()

Now you can connect and interact with the API:

Synchronous example:

import websocket as ws

socket = ws.create_connection(stream_h2b_url, header=stream_h2b_headers)
socket.send(recognizer.open())
# Check if successful
reply = recognizer.receive(socket.recv())
assert isinstance(reply, Opened), f"Expected Opened, got {reply}"
# start streaming the user's voice in another thread
streamer_thread_handle = stream_mic(socket, recognizer)

Asynchronous example:

from aiohttp import ClientSession

    async with ClientSession() as session:
        async with session.ws_connect(stream_h2b_url, header=stream_h2b_headers) as socket:

            # Open a session
            # Commands are sent as text frames
            await socket.send_str(recognizer.open())
            # Check if successful
            msg = await socket.receive()
            reply = recognizer.receive(msg.data)
            assert isinstance(reply, Opened), f"Expected Opened, got {reply}"
            # start streaming the user's voice in another task
            streamer_task_handle = asyncio.create_task(stream(socket, recognizer))

As you can see, the I/O is cleanly decoupled from the protocol handling: the Recognizer object is only used to create the messages to send to the API and to decode the received messages as Event objects.

See the complete examples in the source distribution.

ProtocolError

Bases: RuntimeError

Exception raised when a Recognizer method is not available in the current state.

Recognizer

Recognizer()

The connection state machine.

Use this class to decode received frames as Events or to make command frames by calling the appropriate methods. If you call a method that is not appropriate in the current protocol state, a ProtocolError is raised.

channel_id `property`

channel_id: str

The current session ID.

open

open(custom_id: str = '', channel_id: str = '', session_id: str = '', audio_codec: str = 'linear') -> str

Open a new H2B session.

Parameters:

custom_id (str, default: '' ) –

is any reference of yours that you want to appear in the logs and invoice reports; for example your client id.
channel_id (str, default: '' ) –

when provided, it'll be used as a prefix for the actual channel ID generated by the server.
session_id (str, default: '' ) –

you can set an alternate channel identifier to appear along side the channel_id in the logs and in the Dev Console, to make it easier to reconcile your client logs with the server logs.
audio_codec (str, default: 'linear' ) –
the speech audio codec of the audio data:
- "linear": (default) linear 16 bit SLE raw PCM audio at 8khz;
- "g711a": G711 a-law audio at 8khz;
- "g711u": G711 μ-law audio at 8khz.

Returns:

str –

A websocket text message to send to the server.

Raises:

ProtocolError –

if a session is already open.

send_audio_chunk

send_audio_chunk(chunk: bytes) -> bytes

Build an audio chunk frame for streaming.

Returns:

bytes –

A websocket binary message to send to the server.

Raises:

ProtocolError –

if no session opened.

set_params

set_params(**params: Any) -> str

Set default ASR parameters for the session.

See the parameter list and the parameter visual explanations for an explanation of the different parameters available.

Returns:

str –

A websocket text message to send to the server.

Raises:

ProtocolError –

if no session opened.

get_params

get_params() -> str

Retrieve the default values for the ASR parameters.

Returns:

str –

A websocket text message to send to the server.

Raises:

ProtocolError –

if no session opened.

define_grammar

define_grammar(builtin: str, alias: str) -> str

Define a grammar alias for a parameterized builtin.

Parameters:

builtin (str) –

the builtin URI to alias, including the query string, but without the "builtin:" prefix
alias (str) –

the alias, without the "session:" prefix.

Returns:

str –

A websocket text message to send to the server.

Raises:

ProtocolError –

if no session opened.

recognize

recognize(*grammars: str, start_timers: bool = True, recognition_mode: str = 'normal', **params: Any) -> str

Start a recognition process.

This method takes grammar URIs as positional arguments, including the builtin: or session: prefixes to make the difference between builtin grammars and custom aliases.

Other Parameters:

start_timers (bool) –

default True
recognition_mode (str) –

default is "normal"
**params (Any) –

any other ASR parameter (no client side defaults).

Returns:

str –

A websocket text message to send to the server.

Raises:

ProtocolError –

if no session opened.

close

close() -> str

Close the current session.

Returns:

str –

A websocket text message to send to the server.

Raises:

ProtocolError –

if no session opened.

start_input_timers

start_input_timers() -> str

If the input timers were not started by the RECOGNIZE command, starts them now.

Returns:

str –

A websocket text message to send to the server.

Raises:

ProtocolError –

if no on-going recognition process

stop

stop() -> str

Stop ongoing recognition process

Returns:

str –

A websocket text message to send to the server.

Raises:

ProtocolError –

if no on-going recognition process

receive

receive(data: Union[str, bytes]) -> Event

Decode received text frame.

The server always replies with text frames.

Returns:

Event –

The appropriate Event subclass.

Closed

Closed(data: Dict[str, Any])

Bases: Event

The session is closed.

request_id `property`

request_id: int

The request ID that event responds to.

channel_id `property`

channel_id: str

The channel ID.

headers `property`

headers: Dict[str, Any]

The response headers.

See also the header description.

completion_cause `property`

completion_cause: Optional[CompletionCause]

The response CompletionCause.

completion_reason `property`

completion_reason: Optional[str]

The completion message.

body `property`

body: Optional[RecogResult]

The content of the Event is a RecogResult if it is a RecognitionComplete event.

CompletionCause

Bases: Enum

The set of possible completion causes.

See all possible values.

DefaultParams

DefaultParams(data: Dict[str, Any])

Bases: Event

All the parameters and their values are in the headers property

request_id `property`

request_id: int

The request ID that event responds to.

channel_id `property`

channel_id: str

The channel ID.

headers `property`

headers: Dict[str, Any]

The response headers.

See also the header description.

completion_cause `property`

completion_cause: Optional[CompletionCause]

The response CompletionCause.

completion_reason `property`

completion_reason: Optional[str]

The completion message.

body `property`

body: Optional[RecogResult]

The content of the Event is a RecogResult if it is a RecognitionComplete event.

Event

Event(data: Dict[str, Any])

Base class of all the events

request_id `property`

request_id: int

The request ID that event responds to.

channel_id `property`

channel_id: str

The channel ID.

headers `property`

headers: Dict[str, Any]

The response headers.

See also the header description.

completion_cause `property`

completion_cause: Optional[CompletionCause]

The response CompletionCause.

completion_reason `property`

completion_reason: Optional[str]

The completion message.

body `property`

body: Optional[RecogResult]

The content of the Event is a RecogResult if it is a RecognitionComplete event.

GrammarDefined

GrammarDefined(data: Dict[str, Any])

Bases: Event

The DefineGrammar command has been processed.

request_id `property`

request_id: int

The request ID that event responds to.

channel_id `property`

channel_id: str

The channel ID.

headers `property`

headers: Dict[str, Any]

The response headers.

See also the header description.

completion_cause `property`

completion_cause: Optional[CompletionCause]

The response CompletionCause.

completion_reason `property`

completion_reason: Optional[str]

The completion message.

body `property`

body: Optional[RecogResult]

The content of the Event is a RecogResult if it is a RecognitionComplete event.

InputTimersStarted

InputTimersStarted(data: Dict[str, Any])

Bases: Event

The Input Timers are started.

request_id `property`

request_id: int

The request ID that event responds to.

channel_id `property`

channel_id: str

The channel ID.

headers `property`

headers: Dict[str, Any]

The response headers.

See also the header description.

completion_cause `property`

completion_cause: Optional[CompletionCause]

The response CompletionCause.

completion_reason `property`

completion_reason: Optional[str]

The completion message.

body `property`

body: Optional[RecogResult]

The content of the Event is a RecogResult if it is a RecognitionComplete event.

Interpretation

Interpretation(data: Dict[str, Any])

The interpretation part of a recognition result

confidence `property`

confidence: float

The confidence of the interpretation.

type `property`

type: str

The type of the Interpretation is given by the builtin grammar URI

value `property`

value: Dict[str, Any]

The structured interpreted value.

The type/schema of the value is given by the self.type property.

See the Grammar reference documentaiton.

InvalidParamValue

InvalidParamValue(data: Dict[str, Any])

Bases: Event

The server received a request to set an invalid value for a parameter.

request_id `property`

request_id: int

The request ID that event responds to.

channel_id `property`

channel_id: str

The channel ID.

headers `property`

headers: Dict[str, Any]

The response headers.

See also the header description.

completion_cause `property`

completion_cause: Optional[CompletionCause]

The response CompletionCause.

completion_reason `property`

completion_reason: Optional[str]

The completion message.

body `property`

body: Optional[RecogResult]

The content of the Event is a RecogResult if it is a RecognitionComplete event.

MethodFailed

MethodFailed(data: Dict[str, Any])

Bases: Event

The server was unable to complete the command.

request_id `property`

request_id: int

The request ID that event responds to.

channel_id `property`

channel_id: str

The channel ID.

headers `property`

headers: Dict[str, Any]

The response headers.

See also the header description.

completion_cause `property`

completion_cause: Optional[CompletionCause]

The response CompletionCause.

completion_reason `property`

completion_reason: Optional[str]

The completion message.

body `property`

body: Optional[RecogResult]

The content of the Event is a RecogResult if it is a RecognitionComplete event.

MethodNotAllowed

MethodNotAllowed(data: Dict[str, Any])

Bases: Event

The command is not allowed in this state.

request_id `property`

request_id: int

The request ID that event responds to.

channel_id `property`

channel_id: str

The channel ID.

headers `property`

headers: Dict[str, Any]

The response headers.

See also the header description.

completion_cause `property`

completion_cause: Optional[CompletionCause]

The response CompletionCause.

completion_reason `property`

completion_reason: Optional[str]

The completion message.

body `property`

body: Optional[RecogResult]

The content of the Event is a RecogResult if it is a RecognitionComplete event.

MethodNotValid

MethodNotValid(data: Dict[str, Any])

Bases: Event

The server received an invalid command.

request_id `property`

request_id: int

The request ID that event responds to.

channel_id `property`

channel_id: str

The channel ID.

headers `property`

headers: Dict[str, Any]

The response headers.

See also the header description.

completion_cause `property`

completion_cause: Optional[CompletionCause]

The response CompletionCause.

completion_reason `property`

completion_reason: Optional[str]

The completion message.

body `property`

body: Optional[RecogResult]

The content of the Event is a RecogResult if it is a RecognitionComplete event.

MissingParam

MissingParam(data: Dict[str, Any])

Bases: Event

The command is missings some mandatory parameter.

request_id `property`

request_id: int

The request ID that event responds to.

channel_id `property`

channel_id: str

The channel ID.

headers `property`

headers: Dict[str, Any]

The response headers.

See also the header description.

completion_cause `property`

completion_cause: Optional[CompletionCause]

The response CompletionCause.

completion_reason `property`

completion_reason: Optional[str]

The completion message.

body `property`

body: Optional[RecogResult]

The content of the Event is a RecogResult if it is a RecognitionComplete event.

Opened

Opened(data: Dict[str, Any])

Bases: Event

Session opened on the server

request_id `property`

request_id: int

The request ID that event responds to.

channel_id `property`

channel_id: str

The channel ID.

headers `property`

headers: Dict[str, Any]

The response headers.

See also the header description.

completion_cause `property`

completion_cause: Optional[CompletionCause]

The response CompletionCause.

completion_reason `property`

completion_reason: Optional[str]

The completion message.

body `property`

body: Optional[RecogResult]

The content of the Event is a RecogResult if it is a RecognitionComplete event.

ParamsSet

ParamsSet(data: Dict[str, Any])

Bases: Event

The default parameters were set.

request_id `property`

request_id: int

The request ID that event responds to.

channel_id `property`

channel_id: str

The channel ID.

headers `property`

headers: Dict[str, Any]

The response headers.

See also the header description.

completion_cause `property`

completion_cause: Optional[CompletionCause]

The response CompletionCause.

completion_reason `property`

completion_reason: Optional[str]

The completion message.

body `property`

body: Optional[RecogResult]

The content of the Event is a RecogResult if it is a RecognitionComplete event.

RecognitionComplete

RecognitionComplete(data: Dict[str, Any])

Bases: Event

The ASR recognition is complete.

request_id `property`

request_id: int

The request ID that event responds to.

channel_id `property`

channel_id: str

The channel ID.

headers `property`

headers: Dict[str, Any]

The response headers.

See also the header description.

completion_cause `property`

completion_cause: Optional[CompletionCause]

The response CompletionCause.

completion_reason `property`

completion_reason: Optional[str]

The completion message.

body `property`

body: Optional[RecogResult]

The content of the Event is a RecogResult if it is a RecognitionComplete event.

RecognitionInProgress

RecognitionInProgress(data: Dict[str, Any])

Bases: Event

The ASR recognition is started.

request_id `property`

request_id: int

The request ID that event responds to.

channel_id `property`

channel_id: str

The channel ID.

headers `property`

headers: Dict[str, Any]

The response headers.

See also the header description.

completion_cause `property`

completion_cause: Optional[CompletionCause]

The response CompletionCause.

completion_reason `property`

completion_reason: Optional[str]

The completion message.

body `property`

body: Optional[RecogResult]

The content of the Event is a RecogResult if it is a RecognitionComplete event.

RecogResult

RecogResult(data: dict)

When a recognition completes, this describe the result

asr `property`

asr: Optional[Transcript]

The ASR part of the result (transcription result)

nlu `property`

nlu: Optional[Interpretation]

The NLU part of the result (interpretation)

grammar_uri `property`

grammar_uri: str

The grammar that matched, as it was given to the RECOGNIZE command

alternatives `property`

alternatives: List[RecogResult]

if N-bests were requested, the additional results besides the best one are there.

StartOfInput

StartOfInput(data: Dict[str, Any])

Bases: Event

In normal recognition mode, this event is emitted when speech is detected.

request_id `property`

request_id: int

The request ID that event responds to.

channel_id `property`

channel_id: str

The channel ID.

headers `property`

headers: Dict[str, Any]

The response headers.

See also the header description.

completion_cause `property`

completion_cause: Optional[CompletionCause]

The response CompletionCause.

completion_reason `property`

completion_reason: Optional[str]

The completion message.

body `property`

body: Optional[RecogResult]

The content of the Event is a RecogResult if it is a RecognitionComplete event.

Stopped

Stopped(data: Dict[str, Any])

Bases: Event

The ASR recognition has been stopped on the client request.

request_id `property`

request_id: int

The request ID that event responds to.

channel_id `property`

channel_id: str

The channel ID.

headers `property`

headers: Dict[str, Any]

The response headers.

See also the header description.

completion_cause `property`

completion_cause: Optional[CompletionCause]

The response CompletionCause.

completion_reason `property`

completion_reason: Optional[str]

The completion message.

body `property`

body: Optional[RecogResult]

The content of the Event is a RecogResult if it is a RecognitionComplete event.

Transcript

Transcript(data: Dict[str, Any])

The Transcript part of a recognition result

transcript `property`

transcript: str

The raw ASR output.

phones `property`

phones: str

May contain the phone transcription, or empty if phones not activated.

confidence `property`

confidence: float

The ASR transcription confidence.

start `property`

start: datetime

Start of speech.

end `property`

end: datetime

End of speech.

build_connection_request

build_connection_request(token) -> Tuple[str, dict]

Make an authenticated URL and header to connect to the H2B Service.

uhlive.stream.recognition

Quickstart

ProtocolError

Recognizer

channel_id property

open

send_audio_chunk

set_params

get_params

define_grammar

recognize

close

start_input_timers

stop

receive

Closed

request_id property

channel_id property

headers property

completion_cause property

completion_reason property

body property

CompletionCause

DefaultParams

request_id property

channel_id property

headers property

completion_cause property

completion_reason property

body property

Event

request_id property

channel_id property

headers property

completion_cause property

completion_reason property

body property

GrammarDefined

request_id property

channel_id property

headers property

completion_cause property

completion_reason property

body property

InputTimersStarted

request_id property

channel_id property

headers property

completion_cause property

completion_reason property

body property

Interpretation

confidence property

type property

value property

InvalidParamValue

request_id property

channel_id property

headers property

completion_cause property

completion_reason property

body property

MethodFailed

request_id property

channel_id property

headers property

completion_cause property

completion_reason property

body property

MethodNotAllowed

request_id property

channel_id property

headers property

completion_cause property

completion_reason property

body property

MethodNotValid

request_id property

channel_id property

headers property

channel_id `property`

request_id `property`

channel_id `property`

headers `property`

completion_cause `property`

completion_reason `property`

body `property`

request_id `property`

channel_id `property`

headers `property`

completion_cause `property`

completion_reason `property`

body `property`

request_id `property`

channel_id `property`

headers `property`

completion_cause `property`

completion_reason `property`

body `property`

request_id `property`

channel_id `property`

headers `property`

completion_cause `property`

completion_reason `property`

body `property`

request_id `property`

channel_id `property`

headers `property`

completion_cause `property`

completion_reason `property`

body `property`

confidence `property`

type `property`

value `property`

request_id `property`

channel_id `property`

headers `property`

completion_cause `property`

completion_reason `property`

body `property`

request_id `property`

channel_id `property`

headers `property`

completion_cause `property`

completion_reason `property`

body `property`

request_id `property`

channel_id `property`

headers `property`

completion_cause `property`

completion_reason `property`

body `property`

request_id `property`

channel_id `property`

headers `property`

completion_cause `property`

completion_reason `property`

body `property`

request_id `property`

channel_id `property`

headers `property`

completion_cause `property`

completion_reason `property`

body `property`

request_id `property`

channel_id `property`

headers `property`

completion_cause `property`

completion_reason `property`

body `property`

request_id `property`

channel_id `property`

headers `property`

completion_cause `property`

completion_reason `property`

body `property`

request_id `property`

channel_id `property`

headers `property`

completion_cause `property`