uhlive.stream.recognition
The stream recognition API SDK for voice bots.
Stream for voicebots, or Stream Human to Bots, or Stream H2B is a set of API enabling clients to build interaction between a human end-user and a bot, for example to create Interactive Voice Response (IVR) on the phone, or a voicebot within an app.
For an overview of the concepts, protocols and workflow, see the higher level documenation and more specifically the Websocket H2B protocol reference.
The protocol is messages based and uses websockets as transport. You are free to use whatever websocket client library you like to communicate with the API, and use our SDK to encode/decode the messages.
Quickstart
First retrieve a one time access token with the Auth API.
Then use that token to build an authenticated URL, open a websocket connection to it with the websocket client library
of your choice and instanciate a Recognizer
to make request, generate
audio stream messages and decode responses.
As the API is asynchronous, streaming the audio and reading the returned events should be done in two different threads/tasks.
from uhlive.stream.recognition import *
stream_h2b_url, stream_h2b_headers = build_connection_request(token)
recognizer = Recognizer()
Now you can connect and interact with the API:
Synchronous example:
import websocket as ws
socket = ws.create_connection(stream_h2b_url, header=stream_h2b_headers)
socket.send(recognizer.open())
# Check if successful
reply = recognizer.receive(socket.recv())
assert isinstance(reply, Opened), f"Expected Opened, got {reply}"
# start streaming the user's voice in another thread
streamer_thread_handle = stream_mic(socket, recognizer)
Asynchronous example:
from aiohttp import ClientSession
async with ClientSession() as session:
async with session.ws_connect(stream_h2b_url, header=stream_h2b_headers) as socket:
# Open a session
# Commands are sent as text frames
await socket.send_str(recognizer.open())
# Check if successful
msg = await socket.receive()
reply = recognizer.receive(msg.data)
assert isinstance(reply, Opened), f"Expected Opened, got {reply}"
# start streaming the user's voice in another task
streamer_task_handle = asyncio.create_task(stream(socket, recognizer))
As you can see, the I/O is cleanly decoupled from the protocol handling: the Recognizer
object is only used
to create the messages to send to the API and to decode the received messages as Event
objects.
See the complete examples in the source distribution.
ProtocolError
Bases: RuntimeError
Exception raised when a Recognizer method is not available in the current state.
Recognizer
The connection state machine.
Use this class to decode received frames as Event
s or to
make command frames by calling the appropriate methods.
If you call a method that is not appropriate in the current protocol
state, a ProtocolError
is raised.
open
open(custom_id: str = '', channel_id: str = '', session_id: str = '', audio_codec: str = 'linear') -> str
Open a new H2B session.
Parameters:
-
custom_id
(str
, default:''
) –is any reference of yours that you want to appear in the logs and invoice reports; for example your client id.
-
channel_id
(str
, default:''
) –when provided, it'll be used as a prefix for the actual channel ID generated by the server.
-
session_id
(str
, default:''
) –you can set an alternate channel identifier to appear along side the channel_id in the logs and in the Dev Console, to make it easier to reconcile your client logs with the server logs.
-
audio_codec
(str
, default:'linear'
) –the speech audio codec of the audio data:
"linear"
: (default) linear 16 bit SLE raw PCM audio at 8khz;"g711a"
: G711 a-law audio at 8khz;"g711u"
: G711 μ-law audio at 8khz.
Returns:
-
str
–A websocket text message to send to the server.
Raises:
-
ProtocolError
–if a session is already open.
send_audio_chunk
Build an audio chunk frame for streaming.
Returns:
-
bytes
–A websocket binary message to send to the server.
Raises:
-
ProtocolError
–if no session opened.
set_params
Set default ASR parameters for the session.
See the parameter list and the parameter visual explanations for an explanation of the different parameters available.
Returns:
-
str
–A websocket text message to send to the server.
Raises:
-
ProtocolError
–if no session opened.
get_params
Retrieve the default values for the ASR parameters.
Returns:
-
str
–A websocket text message to send to the server.
Raises:
-
ProtocolError
–if no session opened.
define_grammar
Define a grammar alias for a parameterized builtin.
Parameters:
-
builtin
(str
) –the builtin URI to alias, including the query string, but without the "builtin:" prefix
-
alias
(str
) –the alias, without the "session:" prefix.
Returns:
-
str
–A websocket text message to send to the server.
Raises:
-
ProtocolError
–if no session opened.
recognize
recognize(*grammars: str, start_timers: bool = True, recognition_mode: str = 'normal', **params: Any) -> str
Start a recognition process.
This method takes grammar URIs as positional arguments, including the builtin:
or
session:
prefixes to make the difference between builtin grammars and custom aliases.
Other Parameters:
-
start_timers
(bool
) –default True
-
recognition_mode
(str
) –default is "normal"
-
**params
(Any
) –any other ASR parameter (no client side defaults).
Returns:
-
str
–A websocket text message to send to the server.
Raises:
-
ProtocolError
–if no session opened.
close
Close the current session.
Returns:
-
str
–A websocket text message to send to the server.
Raises:
-
ProtocolError
–if no session opened.
start_input_timers
If the input timers were not started by the RECOGNIZE command, starts them now.
Returns:
-
str
–A websocket text message to send to the server.
Raises:
-
ProtocolError
–if no on-going recognition process
stop
Stop ongoing recognition process
Returns:
-
str
–A websocket text message to send to the server.
Raises:
-
ProtocolError
–if no on-going recognition process
receive
Decode received text frame.
The server always replies with text frames.
Returns:
-
Event
–The appropriate
Event
subclass.
CompletionCause
DefaultParams
Bases: Event
All the parameters and their values are in the headers
property
Event
Base class of all the events
body
property
The content of the Event is a RecogResult
if it is a RecognitionComplete
event.
GrammarDefined
Bases: Event
The DefineGrammar
command has been processed.
InputTimersStarted
Bases: Event
The Input Timers are started.
Interpretation
The interpretation part of a recognition result
value
property
The structured interpreted value.
The type/schema of the value is given by the self.type
property.
See the Grammar reference documentaiton.
InvalidParamValue
Bases: Event
The server received a request to set an invalid value for a parameter.
MethodFailed
Bases: Event
The server was unable to complete the command.
MethodNotAllowed
Bases: Event
The command is not allowed in this state.
MethodNotValid
Bases: Event
The server received an invalid command.
MissingParam
Bases: Event
The command is missings some mandatory parameter.
RecognitionComplete
Bases: Event
The ASR recognition is complete.
RecognitionInProgress
Bases: Event
The ASR recognition is started.
RecogResult
When a recognition completes, this describe the result
StartOfInput
Bases: Event
In normal recognition mode, this event is emitted when speech is detected.
Stopped
Bases: Event
The ASR recognition has been stopped on the client request.