There are lots of Speech to text conversion techniques are available but each one have their pros and cons. The Google Speech API is more convenient. I tried lot of ways for the conversion, but Google API is more efficient and quite easy but having insufficient documentation.
I found that the API accepts flac encoded audio file, but in iOS the flac format is not supported. Finally I found the way is to use audio file encoded with linearPCM and with bit rate of 16000.
The configuration for audio is:
Channels : 1
Sample Rate : 16000
Precision : 16-bit
Sample Encoding: 16-bit Signed Integer PCM
The settings dictionary for audio recorder is:
NSDictionary *recordSettings = [NSDictionary
dictionaryWithObjectsAndKeys:
[NSNumber numberWithInt:AVAudioQualityMax], AVEncoderAudioQualityKey, [NSNumber numberWithInt:16], AVEncoderBitRateKey, [NSNumber numberWithInt: 1], AVNumberOfChannelsKey, [NSNumber numberWithFloat:16000], AVSampleRateKey, [NSNumber numberWithInt:kAudioFormatLinearPCM],AVFormatIDKey,nil];
And pass the recorded file in NSData format to api as: the most important thing in this is the content type that one use. It should match the audio file configuration
NSString *urlString = [NSString stringWithFormat:@”https://www.google.com/speech-api/v2/recognize?xjerr=1&client=chromium&lang=en-US&key=%@”,GOOGLE_API_KEY]; //GOOGLE_API_KEY: is obtained from google API Access
NSURL *url = [NSURL URLWithString:urlString];
NSMutableURLRequest *request = [[NSMutableURLRequest alloc] init];
[request setHTTPMethod:@”POST”]; [request setHTTPBody:byteData]; [request addValue:@”audio/l16; rate=16000″ forHTTPHeaderField:@”Content-Type”]; [request setURL:url]; [request setTimeoutInterval:15];NSURLResponse *response;
NSError *error = nil;
NSData *data = [NSURLConnection sendSynchronousRequest:request returningResponse:&response error:&error]; //the data with trancribed result