title background

Solutions / Speech Analytics for Contact Centers

Yandex SpeechKit

Speech analytics based on Yandex SpeeckKit ML technologies with Kazakh language support

Yandex SpeechKit supports 16 languages, including Russian, Kazakh, and Uzbek. To train recognition models and improve speech recognition quality, Yandex Cloud developers use thousands of hours of audio for each language.

Speech ML models of SpeechKit can be deployed on your infrastructure. We offer both hybrid options and 100% traffic processing within your environment.

The Yandex SpeechKit service provides high-quality recognition of Kazakh speech, as well as mixed Kazakh-Russian speech.

Since 2023, Sanatel Consulting has been a Yandex partner in speech technologies.

A demo version of the solution is available via this link.

Speech analytics demo based on Yandex SpeeckKit

 

 

Applications of Speech Analytics

Analytics is useful for all customer-focused businesses, but especially effective in sectors with a high volume of calls: banks, insurance companies, online stores, medical centers, delivery services, and various call centers.

  • Script compliance monitoring – ensuring adherence to sales scripts, regulations, and phone communication standards.
  • Dissatisfaction analysis – identifying cases of customer dissatisfaction.
  • Needs identification – detecting and organizing customer needs.
  • Data accumulation – collecting historical communication data for future use, such as analytics by new criteria or training an AI bot.

Speech analytics presentation slides based on Yandex SpeeckKit:

PDF presentation on speech analytics based on Yandex SpeeckKit

 

 

Why Speech Analytics is Needed

Speech analytics enables monitoring of 100% of phone calls and automatic evaluation of employee performance, script compliance, and call standards. It also helps identify dissatisfied customers and reasons for customer churn.

Using speech analytics helps to:

  • Reduce labor and operational costs;
  • Shorten onboarding time for new employees;
  • Increase sales volume;
  • Proactively respond to dissatisfied clients;
  • Improve customer communication and loyalty;

 

 

How Speech Analytics Works

The “Speech Analytics” solution is deployed on the client’s server or in the cloud. It integrates with the client’s telephony system, retrieves voice recordings from the sales or contact center departments, transcribes speech into text, and analyzes it for script adherence, negative sentiment, prohibited or filler words. It then generates reports, calculates employee ratings, and provides individual and departmental analytics.

  • Telephony integration – retrieves recordings and transcribes them into text messages.
  • Text analysis – searches by text, using dictionaries and script fragments.
  • Reporting – calculates operator ratings, generates reports and charts for individuals or teams.

Video overview of speech analytics based on Yandex SpeeckKit:


 

 

Quantitative Analysis

Quantitative text analysis focuses on numerical data and statistics to evaluate text by measurable characteristics. The main goal is to objectively measure certain elements of text, such as words and phrases, without interpreting their meaning.

The following symbols can be used to search for words and phrases:

  1. "*" – any word ending in the phrase.
  2. "<>" – swap the words in the phrase.
  3. "/" – alternative word options in the phrase.

For example, the pattern "добрый<>день/вечер" searches for four phrase variations:

  1. добрый день
  2. добрый вечер
  3. день добрый
  4. вечер добрый

 

 

Semantic Analysis

Semantic analysis using GPT-based classifiers enables a deeper understanding and structuring of textual data by identifying topics, patterns, and categories. The GPT model, trained on vast datasets, classifies text into specified categories such as “Product novelty,” “Creating urgency,” and “Summary.”

To train a GPT classifier, a dataset of “correct” phrases is created with at least 100 examples, preferably up to 10,000. For instance, to train a GPT classifier on “Creating urgency,” the following phrases are selected from collected dialogues:

[ { "This product is in high demand. Are you sure you can't come soon? It might be out of stock tomorrow." }, { "Loan conditions change every month. You might miss a good deal if you delay the purchase." }, { "If we don't place the order for you soon, you'll have to wait for the next shipment, which takes a long time." } ]

The trained GPT classifier then evaluates new dialogues and highlights those that are semantically similar to the reference phrases.


Video – GPT classifiers in the sales department:


 

 

What Sections the Analytics Includes

Speech Transcription Section

Ability to listen to voice recordings from telephony with dialogue transcription. Clicking a dialogue line jumps to the corresponding audio timestamp.

Matches found using dictionaries and script validation results are highlighted in the dialogue text.

Call statistics: speech rate, pauses, whether the manager interrupted the client.

Integration with the customer's CRM system is possible to navigate directly to the lead or deal in CRM.

Speech analytics demo based on Yandex SpeeckKit


Dictionaries Section

Dictionary management: add or remove words. Upload word lists to dictionaries from a text file.

Speech analytics demo based on Yandex SpeeckKit


Scripts Section

Fragment management: edit, add, or delete words in fragments.

Script management: edit scripts, add or remove fragments. Configure script parameters and assign scripts to operator groups or departments.

Speech analytics demo based on Yandex SpeeckKit


Reports Section

“Summary Metrics” report – dialogue analysis over a period, general statistics.

Speech analytics demo based on Yandex SpeeckKit


“Average Score by Dictionaries” report – evaluates managers over a period, comparing dictionary scores.

Speech analytics demo based on Yandex SpeeckKit


“Script Execution by Employees” report – analyzes script adherence over time, compares manager ratings.

Speech analytics demo based on Yandex SpeeckKit


“Fragment Statistics” report – analyzes detected script fragments in dialogues. Ranks most common matches.

Speech analytics demo based on Yandex SpeeckKit


“STT Billing” report – analyzes speech-to-text recognition costs.

Speech analytics demo based on Yandex SpeeckKit

Other Section

Telegram Notifications – logs of Telegram alerts about matches from critical dictionaries, e.g. “Complaint” dictionary.

Users and roles in the system, access rights to sections, editing permissions.

System service log: speech recognition logs and technical error messages.

 

 

Integration with Telephony

The system needs to receive call data from the telephony system (date, time, department, direction, etc.), as well as the audio file of the call recording. Audio recordings must be in stereo format, with each side of the conversation recorded separately in the left and right stereo channels.

If audio recordings are in mono—for example, recorded from workplace microphones or audio badges—then during speech-to-text recognition, the system separates speakers by voice. However, the accuracy of such separation is around 70–80% and depends on the quality of the microphones used for recording.

The following telephony integration options are available:

Data Retrieval

Option 1. Call data is transmitted in the name of the call recording file. The number of parameters in the filename must be the same for both incoming and outgoing calls. The delimiter symbol between parameters cannot appear within the parameters themselves.

For example:

in-87771112233-Sergey_Mikhailov-Tatyana_Prokhorova-101-20230726-093151-2914299-1690356110.6593244.mp3

In this example, the call recording filename consists of nine parameters separated by hyphens:

  1. in or out – indicator of incoming or outgoing call;
  2. 87771112233 – client’s phone number;
  3. Sergey Mikhailov – client’s name;
  4. Tatyana_Prokhorova – manager’s name;
  5. 101 – manager’s department ID;
  6. 20230726 – call date;
  7. 093151 – call time;
  8. 2914299 – CRM deal (lead) ID;
  9. 1690356110.6593244 – call ID in the telephony system;

Option 2. Provide direct read-only access to the telephony database. The analytics system retrieves call data for the past hour or day.

Option 3. Implement a web service (REST + JSON) connected to the telephony system. It is recommended to create two methods in such a web service. The first method provides a list of all calls for a specific period (e.g., the past hour or day from the request time). The list should include a unique call ID, such as Uniqueid. The second method provides detailed call information in response to a request by call ID. Using this integration, the analytics system periodically (e.g., every 10 minutes or every hour) calls the first method, retrieves the list of recent calls for the period, and determines which calls have already been uploaded to the system and which have not. Then, the analytics system uses the second method to obtain detailed call information by call ID.

Retrieving the Audio File

Option 1. FTP access to the folder containing call recordings. The analytics system connects via FTP to the audio recordings folder and finds the file by matching the call ID in the filename. The main folder may be organized into subfolders by month and day.

Option 2. If the speech analytics server is located within the same local network as the telephony server, the telephony recordings folder can be mounted on the analytics server. In this case, the analytics system can access the audio recordings as regular files in its file system.

Option 3. Implement a web service (REST + JSON) connected to the telephony system for retrieving audio files via a direct link using GET or POST requests. The file ID to be retrieved should be passed in the request URL or body.

 

 

Future Prospects After Implementing Speech Analytics

Speech analytics accumulates data from company-client communications. In the future, the large volume of collected data can be used to train the company’s voice bot.

The Yandex SpeechKit module can be used not only for speech recognition but also for speech synthesis when implementing an intelligent bot within the company.