Make your apps understand speech with Microsoft Cognitive Services – part 1

By | April 10, 2017

Microsoft Cognitive Services are a new set of cloud APIs that can be used from practically any technical platform (Android apps, iOS apps, Windows apps, websites, etc) to leverage capabilities such as natural language processing, speech, vision, knowledge exploration, etc. These capabilities, like the whole field of artificial intelligence, are easy for humans but difficult for computers. Now you can leverage the results of Microsoft Research and powerful cloud computing to power your own applications with these capabilities.

Bing Speech API

The Bing Speech API offers the capabilities to:

  • Translate speech audio to text
  • Translate text to speech audio

This blog post will describe how to build an app that translates speech audio to text.

Click on the picture below to go to the Bing Speech API webpage where you can test the speech recognition.

Bing Speech API test

Getting an API key

To be able to use the Microsoft Cognitive Services APIs you will need a suitable API key. These keys are requested from the Azure Portal. Click New -> Intelligence & Analytics -> Cognitive Services APIs.

Get Cognitive Services API key

Choose API type: Bing Speech API. Also enter an account name and complete the remaining fields. You can select the free pricing tier F0.

Create Bing Speech API key

Your account should be created within a few minutes. Open it in the All resources list and select Keys. Copy any of the keys.

Get Cognitive Services API keys

Accessing the API

There are three ways of accessing the Bing Speech API:

  • REST API. This is a standard REST API that can be accessed from any technical platform. However it gives no partial results, only final results. Documentation for the REST API is available here. A sample (in C#) is available here.
  • Client Library. This is available for C#, JavaScript, Android and iOS. The Client Library can be downloaded here.
  • Service Library. This is typically used by backend services and only available in C#. It can be downloaded here.

My advice is to start by downloading a sample. They are located on GitHub. This is a sample for C#:

Github Bing Speech C# sample

Downloading it and opening it in Visual Studio allows you to compile and run it. You will need to enter your API key from the Azure Portal.

Bing Speech API sample 1

Inspecting the file MainWindow.xaml.cs, you will see how the library works. The interesting things start to happen in the StartButton_Click method. Also the methods CreateMicrophoneRecoClient and CreateMicrophoneRecoClientWithIntent are good reading. However, the “intent” functionality requires a bit more configuration and I will get back to that later.

Samples for other platforms

Similar samples are available for iOS, Android and JavaScript.

Continue reading

Translating speech to text is useful, but it is even more useful if the API can deduce what the speaker really wants and what are the key words in the phrases. This is the subject of my next blog post:
Make your apps understand speech with Microsoft Cognitive Services – part 2.

References