Writing Hosted Scripts¶
The Sift API provides a wide array of speech and telephony capabilities that makes it possible to build sophisticated voice applications. However, many applications necessitate running a server that is always ready to handle messages from the Sift API servers.
For large, permanent applications this may be a minor concern, however for prototypes, experiments, and smaller applications, this is a large overhead. Additionally, you may simply want to create your application quickly without setting up and maintaining a server.
Hosted scripts are small programs written in Javascript that are uploaded to the Gridspace servers and control the behavior of your communications applications. They offer all the power of the Sift REST API with less complexity and boilerplate code.
Hosted scripts may be stored in files on your local system and uploaded via a REST call, or composed in the browser using the online script editor. We recommend using the online editor when iterating on a new application, since it allows you to quickly update your script and view debug output and call logs.
Hello world¶
Our first hosted script application will simply answer an inbound call and use text-to-speech software to say “hello world” to the caller. The code for this script is:
gs.onIncomingCall = function(connection) {
connection.say("Hello world");
connection.hangUp();
};
You can try running this script at https://api.gridspace.com/scripts/try#helloworld.
Each hosted script must define at least one of the special entry point functions,
gs.onIncomingCall and gs.onStart. Since we want our script to run whenever
someone calls our phone number, we define gs.onIncomingCall
only.
The first and only parameter to gs.onIncomingCall
is a Connection object, which
represents the audio connection between the caller and your application. The connection object
defines a number of methods for sending audio, listening for spoken language, and bridging multiple
connections together.
We call the Connection.say function, which takes a string argument and speaks the string to the other side of the connection. The function will wait for the machine to finish speaking before returning.
Finally, we call the Connection.hangUp function to terminate the connection. Calling
any more functions on the connection after calling hangUp
will throw an error.
Note
If you are experienced with programming Javascript in the browser, you may be concerned
that the call to say
above can block for a few seconds while the computer speaks. In a
web application, this would block the UI thread and make the page unresponsive. However,
this is not so in the Gridspace hosted scripts environment. Each instance of a script
happens in its own independant Javascript context, so blocking scripts will not prevent
your application from handling multiple scripts at a time. Consequently, most hosted
script API calls block until their task is complete, which allows your scripts to be
written in a simple declarative style without excessive callback functions.
Starting a conversation¶
Let’s try connecting multiple participants together so they can talk to one another. We can accomplish this either by joining a connection to a conference or by calling another phone number directly.
Conferences¶
To join a conference, we use the Connection.joinConference function:
gs.onIncomingCall = function(connection) {
connection.joinConference('Conference');
};
The only required parameter is a string that uniquely
identifies the name of the conference room to join. Any other connection that receives a call
to joinConference
with the same string will be connected to the same conference room and
can speak to all other active connections in the room. You can also connect as a ghost,
in which case the connection can only hear the conversation and will not produce any sound.
This is useful if you are implementing an application where a QA agent will be monitoring calls
silently.
connection.joinConference('Conference', { ghost: true });
The hosted scripts API uses the convention that all required parameters are provided as the first arguments to the function. Optional parameters are be provided by passing a plain object with key-value pairs as the final parameter.
By default, all conversations are recorded. You can disable recording by setting the
doRecord
parameter to false.
connection.joinConference('Conference', { doRecord: false });
Keep in mind, however, that if you disable recording, you will not be able to transcribe or process the conversation after it ends.
Phone calls¶
You can also initiate a phone call directly from a connection:
gs.onIncomingCall = function(connection) {
connection.callPhone('+15552231452');
};
When the Connection.callPhone function is called, the target connection will hear a ringback tone until the other end picks up. The function will block until either the call fails or either party ends the call.
You can specify how long to wait for the outbound call to be answered before giving up
using the maxRings
parameter. In the following example, we wait for three rings and
if nobody answers, we say sorry to the caller.
gs.onIncomingCall = function(connection) {
var conversation = connection.callPhone('+15552231452', {maxRings: 3});
if (conversation.status == 'failed') {
connection.say("Sorry I am not available");
}
};
We check whether or not the call was successful by checking the status
property on the object
returned from callPhone
. If maxRings
is exceeded without the call being answered,
the status
property will be “failed”.
Playing sounds¶
You can play a sound to the user by calling the Connection.play function:
gs.onIncomingCall = function(connection) {
connection.play("http://apicdn.gridspace.com/examples/assets/alert.wav")
}
The parameter to this function is an http or https url which points to a sound file that will be played over the connection. This file must be a valid .wav or .mp3 file.
We also recommend that you use mono audio files, since the connection to the user is also mono. This reduces file size and also makes sure the file sounds the same to the user as when you play the file on your desktop computer.
Getting input from the user¶
Many applications need to interact directly with the end user. The hosted scripts API can respond to voice input as well as phone key presses.
Key presses¶
Traditionally, phone systems use Dual-tone multi-frequency signaling (DTMF) as the main user input mechanism. When the user presses a key on their phone’s dial pad, a signal is sent and detected by the phone system on the other end of the call.
You can collect a sequence of key presses from the user with the Connection.getDigits command.
gs.onIncomingCall = function(connection) {
var digits = connection.getDigits(3, {promptText: 'Press three keys'});
connection.say('You pressed ' + digits);
}
In this example, we ask the user to press three keys on their dial pad and say the digits
back to them. getDigits
will return a string representing the keys
that were pressed.
Normally, you will also want to provide some kind of prompt to the user telling them to press a key.
You can use either the promptText
or the promptUrl
parameter to play a sound to the user
while waiting for them to press the expected number of keys.
The promptText
parameter is a string that will be said to the user, similar to the say
command we encountered earlier. The promptUrl
is a sound file that will be played to the
user, similar to the play
function.
Voice responses¶
Although touch tone interfaces are simple and familiar, they are very limited in the type of information they can collect and cumbersome to use. Luckily, you can make use of all the powerful voice recognition tools in the Sift API from within hosted scripts.
The Connection.getFreeResponse function gets a spoken voice response from the user and then returns their transcribed words as a string as soon as they stop speaking.
gs.onIncomingCall = function(connection) {
var response = connection.getFreeResponse({
promptText: "What is your favorite ice cream?"
});
console.log(response);
}
As with the Connection.getDigits command, we can provide a prompt sound to the user as either text or an audio file URL.
You can help the speech recognition system give more accurate results if you know something
about the type of responses that you will get. For example, in the script above, we know that the
user is probably going to say something about ice cream. We can give a hint to
Gridspace’s transcription software by passing a list of words to the hintWords
parameter.
gs.onIncomingCall = function(connection) {
var response = connection.getFreeResponse({
promptText: "What is your favorite ice cream?",
hintWords: ["sherbet", "rocky", "road", "neopolitan", "vanilla", "fudge"]
});
console.log(response);
}
This is especially useful for words that are unusual or sound similar to other words. If you were running this application repeatedly and noticed that “cherry” was often getting mistaken for “ferry”, you could add “cherry” to the hint words to get better results.
Formatted speech¶
Often times, you’ll be interested in gathering some information the user, rather than transcribing their exact words. The API offers many functions to make it easier to extract relevant information from speech and return it in an easy to parse format.
For instance, if the user is selecting one choice from a fixed set of options, you can use the
Connection.getMultipleChoice function instead of the more general
getFreeResponse
.
gs.onIncomingCall = function(connection) {
var response = connection.getMultipleChoice(
['chocolate', 'vanilla', 'strawberry'],
{
promptText: "Would you prefer chocolate, vanilla, or strawberry ice cream?"
}
);
if (response == 'chocolate') {
response.say('You selected chocolate');
}
};
Using getMultipleChoice
has two benefits. First, it tells the API which words to listen for,
reducing the chance that it will mishear the user. Second, it will return exactly one of the
options you provide so you can use direct string comparison to check which option was
said, rather than having to search for each option within the transcript. If the user doesn’t
say anything, or they say something that does not match any of the provided options,
getMultipleChoice
will return null
.
To retrieve a numeric value like a quantity or account number, use the Connection.getNumber function:
gs.onIncomingCall = function(connection) {
var digits = connection.getNumber({promptText: "what is your account number?"});
// digits now contains a string of digits like "01648".
}
Like the getDigits
function above, this function returns a string of digits, this time
collected from speech rather than the keypad. The getNumber
function will work equally
well whether the user says each digit one at a time like “six oh five one” or
a single number like “six thousand fifty one”. Mixing the two is also ok, as in “sixty fifty one”,
as long as the result is unambiguous.
See the table below for a full list of the formatted voice response functions available on the Connection object.
Function | Return type | Extracts |
---|---|---|
getFreeResponse | String | General speech |
getNumber | String of digits | Numeric values |
getMultipleChoice | String | One of a fixed set of choices |
getDate | ISO 8601 date string | Dates |
getName | String | A person’s name or names |
getYesOrNo | Boolean | A negative or affermative response |
Key presses or voice responses¶
You can also let users make a selection with either their voice or the keypad. This is especially useful for situations when the caller is not an English speaker or is in a high-noise environment. In this example, we use the Connection.getMultipleChoiceOrDigits function to let the user navigate a menu system by voice or keypad:
const CHOICES = ["store hours", "locations"];
gs.onIncomingCall = function(connection) {
var result = connection.getMultipleChoiceOrDigits(
CHOICES, 1, {promptUrl: 'http://mysite.com/choices.mp3'}
);
if (result.digits == '0' || result.choice == 'store hours') {
connection.play('http://mysite.com/storehours.mp3');
} else if (result.digits == '1' || result.chocie == 'locations') {
connection.play('http://mysite.com/locations.mp3');
}
};
Instead of returning a string like the getMultipleChoice
function, we return a
result object which has both a digits
and a choice
attribute.
Advanced inputs with Scanners¶
For more sophisticated speech parsing, the API provides a tool called a Scanner. Scanners use a powerful query language that allows you to capture words and phrases while allowing for variation in phrasing or word order. Read the Parsing Language With Scanners guide for full documentation on scanners.
You can use scanners in your hosted scripts with Connection.getScannedResponse, which works just like other input functions:
gs.onIncomingCall = function(connection) {
var result = connection.getScannedResult("'{number}' then ('minutes' or 'hours')" {
promptText: "When should we call you back?"
});
}
The returned return value is a Scanner result object, or null
if the user’s response
does not match the scanner query.
Monitoring live calls¶
Any call that you facilitate using hosted scripts can be monitored in real-time. This enables you to create intelligent agents that react to what is being said on the call. Some things you might do with live calls are:
- Send an alert when a caller mentions a specific product.
- Connect someone to the call via voice command.
- Create a live stream of transcripts from all active calls.
Conversation objects¶
Some functions of the Connection object create an API resource called a Conversation object. Conversations track the state of a communication between one or more parties and store metadata about the interaction. During a call, the corresponding Conversation object will be in the “in-progress” state. As soon as the conversation ends, the Conversation object moves to the “finished” state.
Any Connection function that generates a Conversation accepts a set optional parameters
that allow you to transcribe and analyze the Conversation in real-time or after the fact. We will
be using the joinConference
function in the following examples, but the same parameters can be
used in any of the following calls:
Live transcripts¶
To receive live transcripts, you must provide an onTranscript callback to the
function that is initiating the conversation. If you are creating a conference
call, you would add the onTranscript
callback like this:
gs.onIncomingCall = function(connection) {
connection.joinConference('Conference', {
onTranscript: function(transcript, conversation) {
console.log(transcript);
}
});
};
Now any time that a participant in the conversation finishes a statement, the provided function will be called with two arguments: a transcript and a Conversation object.
The transcript object is a string-like object with some additional attributes,
including the speaker and the timestamp. The transcript also exposes the hasPhrase
helper function, which can be used to search for a particular words or words.
gs.onIncomingCall = function(connection) {
connection.joinConference('Conference', {
onTranscript: function(transcript, conversation) {
if (transcript.hasPhrase("password")) {
conversation.playToAll("http://apicdn.gridspace.com/examples/assets/alert.wav");
}
}
});
};
The above example will play an alert sound to everyone in the conversation if someone says the word “password”.
Live topics¶
You can request a periodic update with the current topics of a conversation by supplying the onTopics parameter.
gs.onIncomingCall = function(connection) {
connection.joinConference('Conference', {
onTopics: function(topics, conversation) {
console.log(topics);
}
});
};
The provided callback function will be invoked periodically with a list of topic strings. Some examples of topics are: “cars”, “Star Wars”, and “sales team”.