Alexa Client Implementation on Loomo
The content of this page is provided by Daniel Schneegast. All rights reserved.
Project Documentation
In general the integration for Alexa is basically the same as the development of an Android app with Alexa functionalities based on Loomos Android device (Build in at the top of the unit). But you have to take care about two things: 1. Use the correct API Level (22) for building you app. 2. Import the lastest Loomo SDK dependencies, you can find here: Segwayrobotis Developer IDE Setup
The general documentation on the webpage Amazon Developer Websiteis pretty good, but there are no sample code snippets for any voice interaction. You only see the JSON files.
Create a test App with the option to use LWA (Login with Amazon) libraries
A detailed description for this step you can find here: Amazon Developer Install SKD for Android Documentation
Notes:
Step 3: Amazons documentation about the key extraction process for you local keys is not completely correct. It is easier to use the build in gradle signingReport. You can find it in the gradle area in Android Studio in the folder Run Configurations
Step 5: It is not necessary to use an Image Button with different types of sizes colours and formats. It easier to use a standard button or for record purposes an onTouchListener.
Enhance this app for using AVS (Amazon Voice Services)
Detailed documentation about this topic: Amazon Developer Documentation – Authorize on Product
You have to set up a Product_ID and a PRODUCT_DSN (Device Serial Number). Theses two Strings can be chosen as you like but you have to create both of them.
More information about AVS you can find here: Amazon Developer – Get started with AVS
Implementation of an “RAW Audio recorder”
To communicate with Alexa it is necessary to send an RAW Audio Stream to the Alexa Backend. Therefore we need an RAW Audio Recorder. The mayor difference between integrated Android options for recording that are much easier to handle are the file compression methods. For sending an RAW audio material to AVS, are not able to use mp3 because this is compressed audio. Furthermore, to improve the user experience and minimise the The implementation you can find below:
package tudresden.loomo_tud_v1; /** * Created by danielschneegast on 26.02.18. */ import android.media.AudioFormat; import android.media.AudioRecord; import android.media.MediaRecorder; import android.util.Log; /** * Records raw audio using AudioRecord and stores it into a byte array as * * - signed * - 16-bit * - little endian * - mono * - 16kHz (recommended, but a different sample rate can be specified in the constructor) * * For example, the corresponding arecord settings are * arecord --file-type raw --format=S16_LE --channels 1 --rate 16000 * * TODO: maybe use: ByteArrayOutputStream * * @author Kaarel Kaljurand */ public class RawAudioRecorder { private static final String LOG_TAG = RawAudioRecorder.class.getName(); private static final int DEFAULT_AUDIO_SOURCE = MediaRecorder.AudioSource.VOICE_RECOGNITION; private static final int DEFAULT_SAMPLE_RATE = 16000; private static final int RESOLUTION = AudioFormat.ENCODING_PCM_16BIT; private static final short RESOLUTION_IN_BYTES = 2; // Number of channels (MONO = 1, STEREO = 2) private static final short CHANNELS = 1; public enum State { // recorder is ready, but not yet recording READY, // recorder recording RECORDING, // error occurred, reconstruction needed ERROR, // recorder stopped STOPPED } private AudioRecord mRecorder = null; private double mAvgEnergy = 0; private final int mSampleRate; private final int mOneSec; // Recorder state private State mState; // Buffer size private int mBufferSize; // Number of frames written to byte array on each output private int mFramePeriod; // The complete space into which the recording in written. // Its maximum length is about: // 2 (bytes) * 1 (channels) * 30 (max rec time in seconds) * 44100 (times per second) = 2 646 000 bytes // but typically is: // 2 (bytes) * 1 (channels) * 20 (max rec time in seconds) * 16000 (times per second) = 640 000 bytes private final byte[] mRecording; // TODO: use: mRecording.length instead private int mRecordedLength = 0; // The number of bytes the client has already consumed private int mConsumedLength = 0; // Buffer for output private byte[] mBuffer; /** * Instantiates a new recorder and sets the state to INITIALIZING. * In case of errors, no exception is thrown, but the state is set to ERROR. * * Android docs say: 44100Hz is currently the only rate that is guaranteed to work on all devices, * but other rates such as 22050, 16000, and 11025 may work on some devices. * * @param audioSource Identifier of the audio source (e.g. microphone) * @param sampleRate Sample rate (e.g. 16000) */ public RawAudioRecorder(int audioSource, int sampleRate) { mSampleRate = sampleRate; // E.g. 1 second of 16kHz 16-bit mono audio takes 32000 bytes. mOneSec = RESOLUTION_IN_BYTES * CHANNELS * mSampleRate; // TODO: replace 35 with the max length of the recording (as specified in the settings) mRecording = new byte[mOneSec * 35]; try { setBufferSizeAndFramePeriod(); mRecorder = new AudioRecord(audioSource, mSampleRate, AudioFormat.CHANNEL_IN_MONO, RESOLUTION, mBufferSize); if (getAudioRecordState() != AudioRecord.STATE_INITIALIZED) { throw new Exception("AudioRecord initialization failed"); } mBuffer = new byte[mFramePeriod * RESOLUTION_IN_BYTES * CHANNELS]; setState(State.READY); } catch (Exception e) { handleError(); if (e.getMessage() == null) { Log.e(LOG_TAG, "Unknown error occured while initializing recording"); } else { Log.e(LOG_TAG, e.getMessage()); } } } public RawAudioRecorder(int sampleRate) { this(DEFAULT_AUDIO_SOURCE, sampleRate); } public RawAudioRecorder() { this(DEFAULT_AUDIO_SOURCE, DEFAULT_SAMPLE_RATE); } private int read(AudioRecord recorder) { // public int read (byte[] audioData, int offsetInBytes, int sizeInBytes) int numberOfBytes = recorder.read(mBuffer, 0, mBuffer.length); // Fill buffer // Some error checking if (numberOfBytes == AudioRecord.ERROR_INVALID_OPERATION) { Log.e(LOG_TAG, "The AudioRecord object was not properly initialized"); return -1; } else if (numberOfBytes == AudioRecord.ERROR_BAD_VALUE) { Log.e(LOG_TAG, "The parameters do not resolve to valid data and indexes."); return -2; } else if (numberOfBytes > mBuffer.length) { Log.e(LOG_TAG, "Read more bytes than is buffer length:" + numberOfBytes + ": " + mBuffer.length); return -3; } else if (numberOfBytes == 0) { Log.e(LOG_TAG, "Read zero bytes"); return -4; } // Everything seems to be OK, adding the buffer to the recording. add(mBuffer); return 0; } private void setBufferSizeAndFramePeriod() { int minBufferSizeInBytes = AudioRecord.getMinBufferSize(mSampleRate, AudioFormat.CHANNEL_IN_MONO, RESOLUTION); if (minBufferSizeInBytes == AudioRecord.ERROR_BAD_VALUE) { throw new IllegalArgumentException("AudioRecord.getMinBufferSize: parameters not supported by hardware"); } else if (minBufferSizeInBytes == AudioRecord.ERROR) { Log.e(LOG_TAG, "AudioRecord.getMinBufferSize: unable to query hardware for output properties"); minBufferSizeInBytes = mSampleRate * (120 / 1000) * RESOLUTION_IN_BYTES * CHANNELS; } mBufferSize = 2 * minBufferSizeInBytes; mFramePeriod = mBufferSize / ( 2 * RESOLUTION_IN_BYTES * CHANNELS ); Log.i(LOG_TAG, "AudioRecord buffer size: " + mBufferSize + ", min size = " + minBufferSizeInBytes); } /** * @return recorder state */ public State getState() { return mState; } private void setState(State state) { mState = state; } /** * @return bytes that have been recorded since the beginning */ public byte[] getCompleteRecording() { return getCurrentRecording(0); } /** * @return bytes that have been recorded since the beginning, with wav-header */ public byte[] getCompleteRecordingAsWav() { return getRecordingAsWav(getCompleteRecording(), mSampleRate); } public static byte[] getRecordingAsWav(byte[] pcm, int sampleRate) { int headerLen = 44; int byteRate = sampleRate * RESOLUTION_IN_BYTES; // mSampleRate*(16/8)*1 ??? int totalAudioLen = pcm.length; int totalDataLen = totalAudioLen + headerLen; byte[] header = new byte[headerLen]; header[0] = 'R'; // RIFF/WAVE header header[1] = 'I'; header[2] = 'F'; header[3] = 'F'; header[4] = (byte) (totalDataLen & 0xff); header[5] = (byte) ((totalDataLen >> 8) & 0xff); header[6] = (byte) ((totalDataLen >> 16) & 0xff); header[7] = (byte) ((totalDataLen >> 24) & 0xff); header[8] = 'W'; header[9] = 'A'; header[10] = 'V'; header[11] = 'E'; header[12] = 'f'; // 'fmt ' chunk header[13] = 'm'; header[14] = 't'; header[15] = ' '; header[16] = 16; // 4 bytes: size of 'fmt ' chunk header[17] = 0; header[18] = 0; header[19] = 0; header[20] = 1; // format = 1 header[21] = 0; header[22] = (byte) CHANNELS; header[23] = 0; header[24] = (byte) (sampleRate & 0xff); header[25] = (byte) ((sampleRate >> 8) & 0xff); header[26] = (byte) ((sampleRate >> 16) & 0xff); header[27] = (byte) ((sampleRate >> 24) & 0xff); header[28] = (byte) (byteRate & 0xff); header[29] = (byte) ((byteRate >> 8) & 0xff); header[30] = (byte) ((byteRate >> 16) & 0xff); header[31] = (byte) ((byteRate >> 24) & 0xff); header[32] = (byte) (2 * 16 / 8); // block align header[33] = 0; header[34] = (byte) 8*RESOLUTION_IN_BYTES; // bits per sample header[35] = 0; header[36] = 'd'; header[37] = 'a'; header[38] = 't'; header[39] = 'a'; header[40] = (byte) (totalAudioLen & 0xff); header[41] = (byte) ((totalAudioLen >> 8) & 0xff); header[42] = (byte) ((totalAudioLen >> 16) & 0xff); header[43] = (byte) ((totalAudioLen >> 24) & 0xff); byte[] wav = new byte[header.length + pcm.length]; System.arraycopy(header, 0, wav, 0, header.length); System.arraycopy(pcm, 0, wav, header.length, pcm.length); return wav; } /** * @return bytes that have been recorded since this method was last called */ public synchronized byte[] consumeRecording() { byte[] bytes = getCurrentRecording(mConsumedLength); Log.i(LOG_TAG, "Copied from: " + mConsumedLength + ": " + bytes.length + " bytes"); mConsumedLength = mRecordedLength; return bytes; } /** * Returns the recorded bytes since the last call, and resets the recording. * @return bytes that have been recorded since this method was last called */ public synchronized byte[] consumeRecordingAndTruncate() { byte[] bytes = getCurrentRecording(mConsumedLength); Log.i(LOG_TAG, "Copied from position: " + mConsumedLength + ": " + bytes.length + " bytes"); mRecordedLength = 0; mConsumedLength = mRecordedLength; return bytes; } private byte[] getCurrentRecording(int startPos) { int len = getLength() - startPos; byte[] bytes = new byte[len]; System.arraycopy(mRecording, startPos, bytes, 0, len); return bytes; } public int getLength() { return mRecordedLength; } /** * @return true iff a speech-ending pause has occurred at the end of the recorded data */ public boolean isPausing() { double pauseScore = getPauseScore(); Log.i(LOG_TAG, "Pause score: " + pauseScore); return pauseScore > 7; } /** * @return volume indicator that shows the average volume of the last read buffer */ public float getRmsdb() { long sumOfSquares = getRms(mRecordedLength, mBuffer.length); double rootMeanSquare = Math.sqrt(sumOfSquares / (mBuffer.length / 2)); if (rootMeanSquare > 1) { // TODO: why 10? return (float) (10 * Math.log10(rootMeanSquare)); } return 0; } /** * In order to calculate if the user has stopped speaking we take the * data from the last second of the recording, map it to a number * and compare this number to the numbers obtained previously. We * return a confidence score (0-INF) of a longer pause having occurred in the * speech input. * * TODO: base the implementation on some well-known technique. * * @return positive value which the caller can use to determine if there is a pause */ private double getPauseScore() { long t2 = getRms(mRecordedLength, mOneSec); if (t2 == 0) { return 0; } double t = mAvgEnergy / t2; mAvgEnergy = (2 * mAvgEnergy + t2) / 3; return t; } /** * Stops the recording (if needed) and releases the resources. * The object can no longer be used and the reference should be * set to null after a call to release(). */ public synchronized void release() { if (mRecorder != null) { if (mRecorder.getRecordingState() == AudioRecord.RECORDSTATE_RECORDING) { stop(); } mRecorder.release(); mRecorder = null; } } /** * Starts the recording, and sets the state to RECORDING. */ public void start() { if (getAudioRecordState() == AudioRecord.STATE_INITIALIZED) { mRecorder.startRecording(); if (mRecorder.getRecordingState() == AudioRecord.RECORDSTATE_RECORDING) { setState(State.RECORDING); new Thread() { public void run() { while (mRecorder != null && mRecorder.getRecordingState() == AudioRecord.RECORDSTATE_RECORDING) { int status = read(mRecorder); if (status < 0) { handleError(); break; } } } }.start(); } else { Log.e(LOG_TAG, "startRecording() failed"); handleError(); } } else { Log.e(LOG_TAG, "start() called on illegal state"); handleError(); } } /** * Stops the recording, and sets the state to STOPPED. * If stopping fails then sets the state to ERROR. */ public void stop() { // We check the underlying AudioRecord state trying to avoid IllegalStateException. // If it still occurs then we catch it. if (getAudioRecordState() == AudioRecord.STATE_INITIALIZED && mRecorder.getRecordingState() == AudioRecord.RECORDSTATE_RECORDING) { try { mRecorder.stop(); setState(State.STOPPED); } catch (IllegalStateException e) { Log.e(LOG_TAG, "native stop() called in illegal state: " + e.getMessage()); handleError(); } } else { Log.e(LOG_TAG, "stop() called in illegal state"); handleError(); } } /** * Copy the given byte array into the total recording array. * * The total recording array has been pre-allocated (e.g. for 35 seconds of audio). * If it gets full then the recording is stopped. * * @param buffer audio buffer */ private void add(byte[] buffer) { if (mRecording.length >= mRecordedLength + buffer.length) { // arraycopy(Object src, int srcPos, Object dest, int destPos, int length) System.arraycopy(buffer, 0, mRecording, mRecordedLength, buffer.length); mRecordedLength += buffer.length; } else { // This also happens on the emulator for some reason Log.e(LOG_TAG, "Recorder buffer overflow: " + mRecordedLength); release(); } } /* * Converts two bytes to a short, assuming that the 2nd byte is * more significant (LITTLE_ENDIAN format). * * 255 | (255 << 8) * 65535 */ private static short getShort(byte argB1, byte argB2) { return (short) (argB1 | (argB2 << 8)); } private void handleError() { setState(State.ERROR); release(); } private int getAudioRecordState() { if (mRecorder == null) { return AudioRecord.STATE_UNINITIALIZED; } return mRecorder.getState(); } }
Integrate a Library for stable communication
Using this library it is possible to communicate with Alexa if you follow the readme instructions: GitHub Repo
API Documentation for this repository: Github.io Willblaschko API Documentation
After testing with LWA and AVS projects and descriptions mentioned above, please delete the lwa jar file or other links from your application-level build.gradle file! If you forget it you will receive an multiple dex error from the java framework.
Furthermore, you need to write your own getRequestCallback() function and add a few lines in the checkQueue() function for correct functionality.
source code getRequestCallback()
public AsyncCallback<AvsResponse,Exception> getRequestCallback() { AsyncCallback<AvsResponse, Exception> requestCallback = new AsyncCallback<AvsResponse, Exception>() { @Override public void start() { } @Override public void success(AvsResponse result) { Log.i(TAG, "Voice Success"); handleResponse(result); } @Override public void failure(Exception error) { } @Override public void complete() { } }; return requestCallback; }
Change Alexa’s language
Changing Alexa’s language is not supported by the imported GitHub repo mentioned above but can be implemented manually. Supported languages are English (US,IN,AUS,GB), German and Japanese. Note: Please be aware that these are not the only supported languages for Alexa in general but these are the only languages you can choose using you AVS json requests.
The complete documentation for the “SettingsUpdated” method can be found here: Amazon AVS SettingsInterface
package tudresden.loomo_tud_v1; import java.util.UUID; /** * Created by danielschneegast on 19.03.18. */ public class JSONinputDE { private String jSONString = "{n" + " "event": {n" + " "header": {n" + " "namespace": "Settings", n" + " "name": "SettingsUpdated", n" + " "messageId": "18f2ff20-824a-4823-bbf9-5e7d4975fdf4"n" + " }, n" + " "payload": {n" + " "settings": [n" + " {n" + " "value": "de-DE", n" + " "key": "locale"n" + " }n" + " ]n" + " }n" + " }, n" + " "context": []n" + "}"; //for testing purposes //only works once, messageID has to be unique public String getjSONString() { return stringFirst + messageID + lastPart; //old //return jSONString; } private String stringFirst = "{n" + " "event": {n" + " "header": {n" + " "namespace": "Settings", n" + " "name": "SettingsUpdated", n" + " "messageId": ""; private String messageID = UUID.randomUUID().toString(); private String lastPart = ""n" + "}, n" + " "payload": {n" + " "settings": [n" + " {n" + " "value": "de-DE", n" + " "key": "locale"n" + " }n" + " ]n" + " }n" + " }, n" + " "context": []n" + "}"; }
Received Payload from AVS
If you receive an AVSItem (json response) from the AVS Backend it contains a header and a payload. The header itself contains the following information:
header={Directive$Header@5943} for instance namespace name messageID dialogrequestID
The structure of the payload looks like that:
payload={Directive$Payload@5944} for instance (Item id is increased about one oriented at the header) audioitem code description format mute playBehavior scheduledTime timeoutInMilliseconds token type url volume shadow$_klass_ shadow$_monitor_
For more information about the structure click here: structure http2 request
Alexa Skills
General usage
For further capabilities of the Voice Service by Amazon you are able to use Alexa Skills. To enable theses Skills, go to the developer console: Developer Amazonclick on “Skills”, and create a new skill. In the Video tutorial you get a brief introduction how to create your first skill. For more information please visit this page: Alexa Skills Kit
Further Links: Here you can find a very good documentation about Alexa Skill creation with AWS: YoutubeLink
Usage of Skills with bespoken.io instead of AWS
Normally, it is Amazons aim to use AWS for any skills interaction (you need to enter a link to a web service where the lambda function is stored). As an endpoint you can use the recommended AWS Lambda ARN or enter your own https link. (It is not necessary to install a ssl certificate on your local instance if you only use your skill for testing purposes. The certificate is only needed if you like to publish your skill. Therefore you have to care about several technical issues for instance no acceptance of unsecured http requests and so on).
To avoid AWS you can deploy your own node.js server locally using bespoken.io
1. install a node packet manager on your system if you don’t already have one
2. start bespoken installation following this guideline: bespoken.io installation
3. Now you need to install the amazon voice SDK into the same folder where you installed the bespoken.io project using the following cli command $ git clone https://github.com/alexa/skill-sample-nodejs-hello-world(on MacOS the bespoken.io project is stored here: /Users//.bst/)
4. Now you need to copy your online created Skill using the developer console (navigate to “Interaction Model” – “JSON Editor”) to your local bespoken.io server. But the online skill is a json file and you need a javascript file. To convert this easily it’s common to use https://skillinator.io/
5. Please insert this code to the following file: /Users//.bst/skill-sample-nodejs-hello-world/src/index.js (Be careful! In the …./.bst/ directory you can find more index.js files. Only use the upper one)
6. Now you can start you local bespoken.io server with the following prompt: $ bst proxy lambda index.js For more information, please have a look at the documentation: bespoken.io documentation
7. If you execute the prompt of point 6 you can directly see the link on your cli you need to copy to Alexa Skills Kit online as “https endpoint”.
8. That’s all. Now you can test your skill.
Project Skill for Loomo
For the university project, I’ve added an Skill with the following communication structure: “Drive Skill” Answer: “Welcome to Daniels Drive Skill. You can say drive forward to move Loomo.” “drive forward” Answer: “Loomo drives now. Skill ends automatically.”
In the last answer is a card response included. For testing purposes I’ve tried to read this card fron the response. Unfortunately, this information is not stored in the payload of the AVS response but will be send separately.