Controlling Call Flow¶
Connection Routines allow you to easily implement complex call flows that may involve playing audio, gathering voice responses, responding to DTMF tones and more.
This guide walks through implementing a simple interactive voice response (IVR) system. When the user calls in, we will play a welcome message and wait for the user to press a key to select a menu option. If the user presses “0” the system will play a prerecorded message. If the user presses “1”, the system will ask them to say the name of the party they wish to reach and call the appropriate number.
Answering calls¶
When an end user calls one of your phone numbers, Sift will consult the Application associated with that phone number to decide what to do with the call. The Application object may contain a callback URL, which will respond to an HTTP request every time there is a new incoming call. The response to this request should contain a routine, which is a JSON-formatted list of actions that dictate what should happen in the call. If the routine never changes for an Application, you can set a default routine for the application, in which case the callback URL is optional. See Setting up an Application for more details on setting up Applications.
For our IVR system, we would like to play a sound file welcoming the user, and then prompt them to press a key to choose an option from a menu. The routine would look like this:
[
{
"name": "Play",
"sound_url": "http://myserver.com/sounds/welcome.wav"
},
{
"name": "GetDigits",
"count": 1,
"timeout_seconds": 50,
"prompt_url": "http://myserver.com/sounds/main_menu.wav",
"on_done": "http://myserver.com/callback/digits"
}
]
Let’s take a closer look at this routine. Each object in the list has a “name” attribute, which specifies what type of command it is. Different commands accept different parameters based on what they do.
The first command in the list is a Play command, which plays an audio file to
the other end of the connection. When the sound is done playing, the routine moves
to the next command. In our case, we have provided a sound_url
, which should be
a URL accessible from the Internet that points to a .wav or .mp3 audio file.
Responding to DTMF tones¶
The second command in our routine is the GetDigits command. It will play a sound and wait for the user to press a certain number of keys on their phone.
We have specified a count
value of 1, which means that we will only wait for a single
digit before continuing. The prompt_url
attribute gives the URL of an audio file to play when
the command runs. The user can enter digits while the audio is playing, so this sound is
useful for announcing menu options.
We want to take an action depending on the key that the user presses, so the
GetDigits
command is the last one in the routine. Luckily, we can specify a new
routine for a connection at any time. We use the on_done
parameter
of the GetDigits
command to send the digits to our server via an HTTP POST request.
We will respond to the request with a new routine based on what key the user pressed.
The on_done callback¶
After the user enters the requested amount of digits, Sift will send an HTTP POST
request to the url we specified in the on_done
parameter. The exact format of this
POST is defined here. An example request might look
something like this:
{
"account_id": "372718353dcf4d16",
"connection_id": "6f5704748865267a",
"digits": "1"
}
In this case, the user has pressed the “1” key. For our simple IVR, we will have only two menu options: the “0” key plays a prerecorded message and the “1” key routes their call based on voice input. The server code for responding to the done callback will vary depending on the server language and framework of your application.
Collecting voice input¶
One of the most powerful features of the Sift API is the ability to respond to voice input on any call, in real-time.
When the user presses the “1” key, we would like to ask the user to state the name of the party they are trying to reach. In our digits callback handler, we will check if the value of “digits” in the POST body is equal to “1” and if so, we respond with the following JSON object:
{
"routine": [
{
"name": "GetMultipleChoice",
"choices": ["alice", "bob", "megatron"],
"prompt_url": "http://myserver.com/sounds/who_to_call.wav"
"on_done": "http://myserver.com/callback/partyname"
}
]
}
The routine
parameter of the response object specifies a new list of commands to
execute on the connection.
The first and only command in our new routine is a GetMultipleChoice command, which
waits for the user to say a word or phrase from a restricted list of options.
The GetMultipleChoice
command is one of several Connection commands that is able to
respond to the end user’s live voice.
The structure of the command is very similar to the GetDigits command we used
earlier: we provide a prompt_url
to play some audio to the user, and a callback to
receive the user input. We must also include a choices
property to tell Sift the valid
responses that we expect the user to say.
Accuracy Tip
Choosing from a list of available responses is generally an easier task for machines than transcribing free-form speech, so it is a good idea to use GetMultipleChoice or one of the restricted response commands whenever possible. This will minimize the chance that we mishear what the user said.
Receiving the callback¶
Once the user has made their choice, Sift will send an HTTP POST request to the URL
we provided in the on_done
attribute. The format of this request is fully documented
here.
{
"account_id": "372718353dcf4d16",
"connection_id": "6f5704748865267a",
"choice": "alice"
}
The most important property of the request object is the choice
field, which will contain
one of the strings we provided in our list of choices, or the empty string if the user did not say any
of the valid choices. Keep in mind that if the user says something else, the choice
field
will not contain the transcript of what they said.
Calling another endpoint¶
The last step in our simple IVR is to transfer the user to the right party. We can use the
choice
value from the GetMultipleChoice
callback request to determine how to route the call.
On the application server, suppose we have some mapping between names and phone numbers. In the
callback handler, we would look up the right phone number and then dynamically construct the
JSON response.
For example, if the number for “alice” in our directory is (555) 223-1212, then our response to the request above would be:
{
"routine": [
{
"name": "CallPhone",
"phone_number": "+15552231212",
"event_callback": "http://myserver.com/callback/callevent"
}
]
}
Unsurprisingly, the CallPhone command will place a call to Alice’s phone. Our original caller will hear a ringback tone. When Alice answers, she’ll be connected with the caller.
Collecting other entities from speech¶
The Sift API provides built-in functions for extracting common types of entities from
speech. Using these functions can provide higher accuracy than general transcription and
automatically handles conversion between transcribed words and expected value types.
All of the following functions take the same parameters as the GetMultipleChoice
command discussed above, excluding the choices
field.
GetYesOrNo¶
Gets a “yes” or “no” response from the user. Provides the response as a boolean value. See GetYesOrNo.