Speech to text conversion in iOS

There are lots of Speech to text conversion techniques are available but each one have their pros and cons. The Google Speech API is more convenient. I tried lot of ways for the conversion, but Google  API is more efficient and quite easy but having insufficient documentation.

I found that the API accepts flac encoded audio file, but in iOS the flac format is not supported. Finally I found the way is to use audio file encoded with linearPCM and with bit rate of 16000.

The configuration for audio is:

Channels       : 1

Sample Rate    : 16000

Precision      : 16-bit

Sample Encoding: 16-bit Signed Integer PCM


The settings dictionary for audio recorder is:

NSDictionary *recordSettings = [NSDictionary


[NSNumber numberWithInt:AVAudioQualityMax],     AVEncoderAudioQualityKey,

[NSNumber numberWithInt:16],                    AVEncoderBitRateKey,

[NSNumber numberWithInt: 1],                    AVNumberOfChannelsKey,

[NSNumber numberWithFloat:16000],               AVSampleRateKey,

[NSNumber numberWithInt:kAudioFormatLinearPCM],AVFormatIDKey,



And pass the recorded file in NSData format to api as: the most important thing in this is the content type that one use. It should match the audio file configuration

NSString *urlString = [NSString stringWithFormat:@”https://www.google.com/speech-api/v2/recognize?xjerr=1&client=chromium&lang=en-US&key=%@”,GOOGLE_API_KEY]; //GOOGLE_API_KEY: is obtained from google API Access

NSURL *url = [NSURL URLWithString:urlString];

NSMutableURLRequest *request = [[NSMutableURLRequest alloc] init];

[request setHTTPMethod:@”POST”];

[request setHTTPBody:byteData];

[request addValue:@”audio/l16; rate=16000″ forHTTPHeaderField:@”Content-Type”];

[request setURL:url];

[request setTimeoutInterval:15];

NSURLResponse *response;

NSError *error = nil;

NSData *data = [NSURLConnection sendSynchronousRequest:request returningResponse:&response error:&error]; //the data with trancribed result


