This post will explain how to build an Alexa skill for the new Echo Show device using the new video capabilities. By saying `Alexa, ask dropbox player to play ABC` the skill will search ABC in the linked “Alexa” folder and play remote dropbox videos in the echo device. You can use this logic with any storage service, for example S3, but we decided to use Dropbox because it is easier to use for non developers and doesn’t have any data transfer costs.
When I started this blog post, I was planning to directly stream video from Youtube into the alexa device; however, it seems that a skill that could do this would go against youtube terms of service. Note: we do not suggest going against Youtube’s terms; however, if someone is interested in learning more about the aforementioned skill, you can check it out here.
Once I discarded the youtube streaming, I decided to play media files from Dropbox.
For testing, if you don’t have an Echo, the Amazon Developer portal lets you play and test skills. Yet, if you are playing audio and video, you will see that the portal is not quite there yet.
I also tried using Alexa Pi but, while I was able to play with several skills and stream audio easily, streaming video needed a real Echo show device capable of working with all the new graphical features: lists, images, audio/video players and touch selection events.
Let’s start with the high-level flow diagram of the skill to be developed:
When a user speaks to Alexa, the Alexa Skill service will receive a POST request. This request body will contain the parameters for the Alexa service to generate a JSON reponse to a Web App or lambda function which will retrieve the video file and return it to the Alexa device. It will send something like the following to Alexa:
“title”: “Title for Sample Video”,
“subtitle”: “Secondary Title for Sample Video”
Response example using the video directive
This message sent from the web service or lambda function to the Alexa service would be interpreted and in this case, the Echo Show would play the sample-video-1.mp4.
Alexa Skill Configuration
For this skill, you will have to have an amazon device registered with your amazon developer account. Once it is done, you will have to create a custom skill in the amazon developer site.
First, go to the Alexa tab and click on get started under the Alexa Skills Kit menu and then click on add a new skill.
After that, you can create the skill with any name and Dropbox player invocation name. You will also enable the Audio, Video and template global fields in out Alexa custom skill.
Once saved, you will press next and set up the interaction model and the configuration of the skill. In the interaction model you will use the built-in intents plus the NumberIntent (Intent for searching specific track in folder), SearchIntent (Intent for search specific text filename in folder) and DemoIntent (Intent for playing a demo video). For detailed information about the model used and steps, go to the repository here.
In the configuration you will let the Alexa skill know where the lambda function is by passing to it the lambda ARN. For that you will have to create a lambda function.
Lambda Function Configuration
For creating the lambda function you will have to login to the AWS portal and click on the left cortner Services and then go to Lambda under compute. Then you will click on create a function and fill the function name and role. If you don’t have an existing role, you can create a default one and click on create function. After that, fill the environment variables with your Dropbox Token (This can be obtained creating a Dropbox developer app, more info here) and Media Bucket (Bucket name publicly available) and, then, configure the Basic settings with 320MB and 30s timeout.
Now you only need the function code that will contain all the logic for performing the API requests and responses to the Alexa service. This function can be downloaded from the following github repository. If you don’t want to change anything there, you can directly upload the .ZIP file to the lambda function.
Important Notes about the lambda function code and the Alexa skill:
- It requires a Alexa/ folder in your dropbox account with some media file.
- I discovered that we can’t include more than one directive in the same response when you are using an AudioPlayer.Play directive with long-form audio or a VideoApp.Launch. It won’t work.
- It is important to note the supportsDisplay function. It is necessary to detect if an Alexa device has screen or not.
- For a very detailed information on how to create custom skills you can go here
For testing, go back to the Amazon developer site (where we started creating the skill) and paste the ARN of the lambda function in the endpoint blank, under the Configuration tab. Click on save and you will be able to test. Some functionalities like video or long audio do not work in the Service Simulator so the best way will be to test using an Echo Show and saying: Alexa, launch dropbox player. But If you don’t have a echo show handy, you can simulate some visual interactions or single requests.
From here the user will be able to say: track 2 or play atletico and the track 2 video will launch.
And for stopping or exiting the app you will just have to say Alexa stop or Alexa exit.
In conclusion, although it is great to be able to build video skills, amazon is making his point clear. For them, Alexa is a voice first and the visual display is limited. Don’t expect to customize Alexa display as we do in Web development. But, if you don’t need a high degree of visual customization, it is easy to develop basic skills, there is good documentation, and, overall, Alexa skills can be great for extending current web apps and reaching people using amazon devices.
Contact us to build or improve your WebRTC app!
Do you need more information on how to build audio or video Alexa skills in your next project? Would you like to extend your current service to Amazon Alexa? We have a team with thousands of hours of experience in video and audio applications and happy to help you out.