Recording Calls¶
The Sift API allows you individually record the voice of each participant in a call, as well as the full conversation.
Call recording is available for point to point calls initiated via the CallPhone and CallClient commands, as well as conferences created with JoinConference. In this example, we will set up recording on a conference call.
Enabling recording¶
There are many ways to initiate a call though Sift. For conference calls, it is usually most convenient to have participants connect via a dial-in number. The Receiving an incoming call guide covers how to set up a Sift phone number and associate it with an Application.
As before, we will implement an on_incoming
callback URL using Flask.
from flask import Flask, request, jsonify
app = Flask(__name__)
@app.route('/my_conference', methods=['POST'])
def my_conference():
"""Connects to the conference."""
response = {
'routine': [
{
'name': 'JoinConference',
'conference_name': 'my_conference',
'do_record': True,
'event_callback': 'http://myserver.com/conference_event'
}
]
}
return jsonify(response)
The callback simply connects all incoming calls to a conference room named “my_conference”. Since
we have set the do_record
property to True
, the Sift API will generate individual
recordings for each participant, as well as a unified conference recording. We have also
provided the event_callback
parameter so that our server will be notified when the
conference ends and our recordings are ready.
Recordings are available immediately after the call ends. The Sift API notifies us when
the conference ends via the event_callback
URI, which we have implemented below.
@app.route('/conference_event', methods=['POST'])
def conference_event():
"""Gets the recording when the conference ends."""
event = request.get_json()
if event['type'] == 'ended':
conversation_id = event['conversation_id']
response = requests.get('api.gridspace.com/v0/conversations/' + conversation_id)
conversation = response.json()
We first check the event type. There are multiple different events that may be sent during the
conference, including live transcripts and topics. We are interested in the
Conversation Ended Event. Once we know we have the right event, We can use the
conversation_id
property to get the conversation from the Retrieve a conversation
endpoint.
Full conversation recording¶
We can get a recording of the full conversation from the Conversation object returned by the Sift REST API.
wav_url = conversation['audio']['wav_file']
urllib.retrieveurl(wav_url, '~/')
Here we use the built-in urllib python library to download the .wav recording to our home
directory. The audio is also available in a compressed .mp3 format via the mp3_file
property.
Single-speaker recordings¶
In some scenarios, it is appropriate to record some participants in a conversation but not others. For instance, a salesperson may want to record their own voice during a call to review their sales pitch, but may want to avoid recording the voice of the customer for privacy reasons. The Sift API allows you to retrieve recordings for each individual channel of a call to assist with such scenarios.
Individual party recordings are also useful to improve understandability. If there are many speakers talking at once, or one side of a connection has a loud background noise, the listener can isolate a particular speaker to get a cleaner signal.
for channel in conversation['channels']:
if channel['from'] == '+15558881188':
# This is the recording we are interested in.
urllib3.retrieveurl(channel['audio']['wav_file'], '/path/to/file/')
Each connection that joins the conference adds a new channel object to the channels
property
of the Conversation object. Each channel object stores the isolated recording from that
connection, as well as information about the connection itself. In this example,
we examine the from
property of each channel in the conversation to find the phone number
of the participant we are interested in hearing.
Note
In some circumstances the same connection can generate multiple channels in the
same Conversation. For instance, if a connection leaves a conference and then re-joins some
time later, there will be two channels in the final Conversation with the same
connection_id
property but different start_ms
values. This ensures that each
channel is a single contiguous block of audio.
Audio quality metrics¶
Recording quality strongly impacts the accuracy of Sift’s transcription and analysis
capabilities. The API provides two measures of audio quality for each audio recording:
signal-to-noise ratio (SNR) and reverberation time (rt60). These values are available
in the audio
properties of the Conversation and Channel objects. Both are stored
as real numbers.
if conversation['audio']['snr'] < 3.0:
print 'The conversation was very noisy'
if conversation['audio']['rt60'] > 5.0:
print 'The conversation had lots of echo'
A high snr
value is desired, while lower values indicate that there was a lot of background
noise on the recording. A low rt60
value is desirable, while a high value indicates that
there was a lot of echo in the recording environment. Using a speaker phone in a large room
or speaking far away from the microphone can increase the rt60
value.
Both metrics are also available on each channel, which is useful for detecting which end of a call is the source of quality degradation.
for channel in conversation['channels']:
if channel['audio']['snr'] < 3.0:
print 'connection ' + channel['connection_id'] + ' was very noisy'
if channel['audio']['rt60'] > 5.0:
print 'connection ' + channel['connection_id'] + ' was very echoy'