Make the first API call
This section describes the order of SpeechToText (STT) API calls and examples of scenarios that can be implemented with the API. The SpeechToText SDK offers SERVER, ON DEVICE, and HYBRID modes, and if not set in the app, HYBRID MODE is the default setting.
API call order
Initialize SDK
When starting the SpeechToText SDK, call the initialize API to initialize the SpeechToText SDK with the desired mode settings from the app.
- Kotlin
- Java
enum class Mode {
HYBRID, // If the network is not connected, the SDK will connect to OnDevice speech recognition as needed.
SERVER, // Connects to speech recognition only when the network is available.
ON_DEVICE // Connects to OnDevice speech recognition.
}
val speechToText = SpeechToText(context, Mode.HYBRID) // Mode must be set
speechToText.initialize();
public enum Mode {
HYBRID, // If the network is not connected, the SDK will connect to OnDevice speech recognition as needed.
SERVER, // Connects to speech recognition only when the network is available.
ON_DEVICE // Connects to OnDevice speech recognition.
}
SpeechToText speechToText = new SpeechToText(context, Mode.HYBRID); // Mode must be set
speechToText.initialize();
Add Listener
Before registering the SpeechToText app, call the addListener API to register the SpeechToText event callback.
- Kotlin
- Java
speechToText.addListener(object: ResultListener {
override fun onUpdated(stt: String, completed: Boolean) {
// stt : recognized text result
// completed : True if the user's voice recognition has been completed.
}
override fun onUpdatedEpdData(on: Long, off: Long) {
// on : sends the Buffer Size at the start point of the user's voice as Byte value.
// off : sends the Buffer Size at the end point of the user's voice as Byte value.
}
override fun onStartedRecognition() {
// Notifies that the user's voice has started.
}
override fun onEndedRecognition() {
// Notifies that the user's voice recognition has ended.
}
override fun onError() {
// Notifies that an error has occurred during voice recognition.
}
override fun onReady() {
// Notifies that the user can speak through the microphone.
}
})
speechToText.addListener(new ResultListener() {
@Override
void onUpdated(String stt, boolean completed) {
// stt : recognized text result
// completed : True if the user's voice recognition has been completed.
}
@Override
void onUpdatedEpdData(Long on, Long off) {
// on : sends the Buffer Size at the start point of the user's voice as Byte value.
// off : sends the Buffer Size at the end point of the user's voice as Byte value.
}
@Override
void onStartedRecognition() {
// Notifies that the user's voice has started.
}
@Override
void onEndedRecognition() {
// Notifies that the user's voice recognition has ended.
}
@Override
void onError() {
// Notifies that an error has occurred during voice recognition.
}
@Override
void onReady() {
// Notifies that the user can speak through the microphone.
}
});
Register App
Call the registerApp API to register the SpeechToText app with the SpeechToText server. If the MODE is ON_DEVICE, this API does not need to be used.
- Kotlin
- Java
val clientID = "your_client_id"
val clientSecret = "your_client_secret"
speechToText.registerApp(clientID, clientSecret, object : OnServerConnectionListener {
override fun onConnected() {
// The app can receive events once registration with the Speech server is complete.
}
override fun onFailed(msg: String) {
// If registration to the Speech server fails, an event with an error message is sent to the app.
}
})
String clientID = "your_client_id";
String clientSecret = "your_client_secret";
speechToText.registerApp(clientID, clientSecret, new OnServerConnectionListener() {
@Override
void onConnected() {
// The app can receive events once registration with the Speech server is complete.
}
@Override
void onFailed(String msg) {
// If registration to the Speech server fails, an event with an error message is sent to the app.
}
});
Remove Listener
Before exiting the SpeechToText app, release the previously registered event callback. To release the event callback, call the removeListener API.
- Kotlin
- Java
speechToText.removeListener(listener: ResultListener)
speechToText.removeListener(ResultListener listener);
Release SDK
Call the release API to release the SpeechToText SDK resources.
- Kotlin
- Java
speechToText.release()
speechToText.release();
Examples of API usage scenarios
1. Start/Stop SpeechToText voice input
The following is an example code to start and stop SpeechToText voice input.
- Kotlin
- Java
speechToText.request()
speechToText.addListener(object: ResultListener {
override fun onUpdated(stt: String, completed: Boolean) {
// e.g. no
// e.g. hello
// e.g. hello
// e.g. hello
// e.g. hello. complete(true)
}
override fun onUpdatedEpdData(on: Long, off: Long) {
// e.g. on(0) off(3200)
}
override fun onStartedRecognition() {
}
override fun onEndedRecognition() {
}
override fun onError() {
}
})
speechToText.stop()
speechToText.request();
speechToText.addListener(new ResultListener() {
@Override
void onUpdated(String stt, boolean completed) {
// e.g. no
// e.g. hello
// e.g. hello
// e.g. hello
// e.g. hello. complete(true)
}
@Override
void onUpdatedEpdData(Long on, Long off) {
// e.g. on(0) off(3200)
}
@Override
void onStartedRecognition() {
}
@Override
void onEndedRecognition() {
}
@Override
void onError() {
}
});
speechToText.stop();
2. Sending audio data
The following is an example code to send audio data. Data can be sent once, and it can also be sent every 100ms at a minimum. When sending data periodically, the completeAudioSend API should be called at the end of the transmission.
- Kotlin
- Java
- To send a single data
context.assets.open("test_audio.pcm").readBytes().also {
speechToText.sendAudio(it)
}
- To send data multiple times
var offset = 0
val testPCM = assets.open("call_test_audio.pcm").readAllBytes()
while (offset < testPCM.size) {
val end = (offset + 3200).coerceAtMost(testPCM.size)
val chunk = testPCM.copyOfRange(offset, end)
offset += BYTES_IN_100_MS
speechToText.sendAudio(chunk)
Thread.sleep(100L)
}
- After sending all data, call the completeAudioSend API.
speechToText.completeAudioSend()
- To send a single data
ByteArray data = context.assets.open("test_audio.pcm").readBytes()
speechToText.sendAudio(data);
- To send data multiple times
Int offset = 0;
ByteArray testPCM = context.assets.open("call_test_audio.pcm").readAllBytes();
while (offset < testPCM.size) {
Int end = (offset + 3200).coerceAtMost(testPCM.size);
ByteArray chunk = testPCM.copyOfRange(offset, end);
offset += 3200;
speechToText.sendAudio(chunk);
Thread.sleep(100L)
}
- After sending all data, call the completeAudioSend API.
speechToText.completeAudioSend();