Controlling Call Flow

Connection Routines allow you to easily implement complex call flows that may involve playing audio, gathering voice responses, responding to DTMF tones and more.

This guide walks through implementing a simple interactive voice response (IVR) system. When the user calls in, we will play a welcome message and wait for the user to press a key to select a menu option. If the user presses “0” the system will play a prerecorded message. If the user presses “1”, the system will ask them to say the name of the party they wish to reach and call the appropriate number.

Answering calls

When an end user calls one of your phone numbers, Sift will consult the Application associated with that phone number to decide what to do with the call. The Application object may contain a callback URL, which will respond to an HTTP request every time there is a new incoming call. The response to this request should contain a routine, which is a JSON-formatted list of actions that dictate what should happen in the call. If the routine never changes for an Application, you can set a default routine for the application, in which case the callback URL is optional. See Setting up an Application for more details on setting up Applications.

For our IVR system, we would like to play a sound file welcoming the user, and then prompt them to press a key to choose an option from a menu. The routine would look like this:

[
    {
        "name": "Play",
        "sound_url": "http://myserver.com/sounds/welcome.wav"
    },
    {
        "name": "GetDigits",
        "count": 1,
        "timeout_seconds": 50,
        "prompt_url": "http://myserver.com/sounds/main_menu.wav",
        "on_done": "http://myserver.com/callback/digits"
    }
]

Let’s take a closer look at this routine. Each object in the list has a “name” attribute, which specifies what type of command it is. Different commands accept different parameters based on what they do.

The first command in the list is a Play command, which plays an audio file to the other end of the connection. When the sound is done playing, the routine moves to the next command. In our case, we have provided a sound_url, which should be a URL accessible from the Internet that points to a .wav or .mp3 audio file.

Responding to DTMF tones

The second command in our routine is the GetDigits command. It will play a sound and wait for the user to press a certain number of keys on their phone.

We have specified a count value of 1, which means that we will only wait for a single digit before continuing. The prompt_url attribute gives the URL of an audio file to play when the command runs. The user can enter digits while the audio is playing, so this sound is useful for announcing menu options.

We want to take an action depending on the key that the user presses, so the GetDigits command is the last one in the routine. Luckily, we can specify a new routine for a connection at any time. We use the on_done parameter of the GetDigits command to send the digits to our server via an HTTP POST request. We will respond to the request with a new routine based on what key the user pressed.

The on_done callback

After the user enters the requested amount of digits, Sift will send an HTTP POST request to the url we specified in the on_done parameter. The exact format of this POST is defined here. An example request might look something like this:

{
    "account_id": "372718353dcf4d16",
    "connection_id": "6f5704748865267a",
    "digits": "1"
}

In this case, the user has pressed the “1” key. For our simple IVR, we will have only two menu options: the “0” key plays a prerecorded message and the “1” key routes their call based on voice input. The server code for responding to the done callback will vary depending on the server language and framework of your application.

Collecting voice input

One of the most powerful features of the Sift API is the ability to respond to voice input on any call, in real-time.

When the user presses the “1” key, we would like to ask the user to state the name of the party they are trying to reach. In our digits callback handler, we will check if the value of “digits” in the POST body is equal to “1” and if so, we respond with the following JSON object:

{
    "routine": [
        {
            "name": "GetMultipleChoice",
            "choices": ["alice", "bob", "megatron"],
            "prompt_url": "http://myserver.com/sounds/who_to_call.wav"
            "on_done": "http://myserver.com/callback/partyname"
        }
    ]
}

The routine parameter of the response object specifies a new list of commands to execute on the connection.

The first and only command in our new routine is a GetMultipleChoice command, which waits for the user to say a word or phrase from a restricted list of options. The GetMultipleChoice command is one of several Connection commands that is able to respond to the end user’s live voice.

The structure of the command is very similar to the GetDigits command we used earlier: we provide a prompt_url to play some audio to the user, and a callback to receive the user input. We must also include a choices property to tell Sift the valid responses that we expect the user to say.

Accuracy Tip

Choosing from a list of available responses is generally an easier task for machines than transcribing free-form speech, so it is a good idea to use GetMultipleChoice or one of the restricted response commands whenever possible. This will minimize the chance that we mishear what the user said.

Receiving the callback

Once the user has made their choice, Sift will send an HTTP POST request to the URL we provided in the on_done attribute. The format of this request is fully documented here.

{
    "account_id": "372718353dcf4d16",
    "connection_id": "6f5704748865267a",
    "choice": "alice"
}

The most important property of the request object is the choice field, which will contain one of the strings we provided in our list of choices, or the empty string if the user did not say any of the valid choices. Keep in mind that if the user says something else, the choice field will not contain the transcript of what they said.

Calling another endpoint

The last step in our simple IVR is to transfer the user to the right party. We can use the choice value from the GetMultipleChoice callback request to determine how to route the call. On the application server, suppose we have some mapping between names and phone numbers. In the callback handler, we would look up the right phone number and then dynamically construct the JSON response.

For example, if the number for “alice” in our directory is (555) 223-1212, then our response to the request above would be:

{
    "routine": [
        {
            "name": "CallPhone",
            "phone_number": "+15552231212",
            "event_callback": "http://myserver.com/callback/callevent"
        }
    ]
}

Unsurprisingly, the CallPhone command will place a call to Alice’s phone. Our original caller will hear a ringback tone. When Alice answers, she’ll be connected with the caller.

Collecting other entities from speech

The Sift API provides built-in functions for extracting common types of entities from speech. Using these functions can provide higher accuracy than general transcription and automatically handles conversion between transcribed words and expected value types. All of the following functions take the same parameters as the GetMultipleChoice command discussed above, excluding the choices field.

GetYesOrNo

Gets a “yes” or “no” response from the user. Provides the response as a boolean value. See GetYesOrNo.

GetDate

Gets a date from the speaker and provides the result as an ISO 8601-formatted date string. Useful for extracting birth dates or scheduling calendar events. See GetDate

GetNumber

Gets a number from the speaker and provides it as an integer value. See GetNumber.

GetName

Gets a person’s full name from the speaker. The result is provided as a string. See GetName.