I was working from the WebRTC.ventures Testing & QA headquarters in Panama City, Panama during the Super Bowl this year. I went to a sports bar filled with Panamanians watching the Chiefs win a close game over the 49ers.
In the US, the Super Bowl has a media dynamic different from other sporting events because the commercials (and the halftime show!) get as much or even more hype than the game itself. Watching from Panama, I did not see the same commercials that US viewers saw. So I went online to watch them the next day.
Google’s “Javier in Frame” Commercial
Google’s “Javier in Frame” commercial was particularly moving and a fantastic example of the power that AI can bring to video conversations to improve accessibility for blind and low-sighted individuals.
The commercial follows a low-sighted individual, Javier. Like so many people, Javier wants to document his day to day life, but it’s hard for him to determine what is “in frame” when he takes the selfie. The ad is highlighting the Guided Frame feature on Google Pixel phones, which “uses a combination of audio cues, high-contrast animations and haptic (tactile) feedback to help people who are blind and low-vision take selfies and group photos.”
We witness Javier taking selfies of himself and the phone giving him an audio aid that says “one face in frame” to help confirm his face is in the photo. Later, he is joined by his girlfriend for a selfie. The phone confirms “two faces in frame.” In a heartwarming final scene, it confirms “three faces in frame” as they welcome a new member to their family.
This is a powerful example of how AI can enhance life for those facing various disabilities. Sighted individuals like myself take it for granted that we can use our phones to take selfies as we enjoy life. The commercial illustrates quite clearly that AI can help others less fortunate to document similar moments in their lives, too.
I also recommend checking out Google’s beautiful behind the scenes video about how director Adam Morse made the commercial. Adam describes himself as “a filmmaker who happens to be blind.” The true story behind the camera is just as inspiring as the fictional story in front of it. I loved his quote, “Sight and Vision are two different things … our vision is our imagination.”
Accessibility in Communication Applications
This brought me back to our January 2024 episode of WebRTC Live, when we talked about Accessibility in Communication Applications. Accessibility in video applications goes beyond providing closed captions and sign language interpretation. With the emergence of AI and new tech innovations, we now have a unique chance to further improve accessibility and enrich the user experience for individuals with disabilities.
I had three guests for this panel discussion, all with a unique perspective on accessibility in software. Bryan Bashin is the Vice Chair of Be My Eyes, which is a mobile app connecting blind and low-vision individuals with volunteers from all around the globe who help them through video calls and AI. Georg Tschare is the CEO of Sign Time GmbH, makers of software that translates text into 3D animated sign language reproduced by digital avatars. Aleks Smechov is the Co-founder and CEO of Wordcab, which is a conversation intelligence suite that leverages AI to transcribe speech and generate qualitative insights.
Bryan is blind, so his personal experience and perspectives in the context of our conversation was particularly interesting. I also had an exposure to the importance of accessibility when he and I had a prep call to make sure that his screen reader was able to navigate our webinar broadcasting tool. Although the tool was not completely accessible, he was still able to join us for a fascinating discussion. (This matched up with Bryan’s comments in the episode about software in general!)
During the episode, we played a clip that Bryan sent us that was another great example of how AI can improve the lives of those facing disabilities. We witness a blind influencer named Lucy Edwards shopping in a Chinese supermarket with an AI-powered version of the BeMyEyes app, which helps her to identify and read labels on a product that she is purchasing, and even provides recipe advice on how to use it. Watch the (sponsored) video on TikTok.
I really enjoyed the absolute delight that Lucy shared in this video about an experience that many of us take for granted, similar to the selfies portrayed in the Google Super Bowl ad.
The ‘world of the word’ is now the ‘world of the picture’
Accessibility in web application development is generally not up to par. I know from projects we’ve worked on how important standards like Section 508 are to accessibility. For example, picking contrasting colors in your website design is very important for those with color blindness, who may not see the difference between shades of color.
The same can be said for accessibility in communications applications. Bryan made an especially thought-provoking comment about the video-based world we are entering. It really drove home why it’s so vital to intentionally include and design appropriate solutions for blind and low-sighted people. Bryan explained,
“Think about the new technology where everything is just screens and whiteboards and sharing this and that. It will be a massive shift from the world of the word to the world of the picture. And what that means for blind people, unless we are intentional about it, is ostracism, social distance, being there, but not really being there, not being able to know the cues as we start having a cultural shift that’s dialogue that’s largely visual.”
For a company like WebRTC.ventures that specializes in building live video applications, we and our clients generally take it for granted that our users are fully sighted. But accessibility is important for telehealth visits, EdTech applications, and even entertainment events like those we might build for live streaming and interactive broadcasting solutions.
How can we improve accessibility for the blind community in a visual medium like video?
The answer will vary depending on use cases and specific implementations. AI and LLMs will undoubtedly be a key theme to the solutions we build, and I find that very exciting. AI is allowing us to move well beyond the “funny hats” era of meeting tools where nothing particularly useful was accomplished.
Now, using services like Amazon Rekognition from AWS, we can build face detection and object detection into live or recorded video workflows. Amazon Rekognition can be used for tasks that may help all your users, such as content moderation to detect unsafe or inappropriate images during video, as well as larger data analysis tasks like labeling videos after recording based on content. It could also be used in use cases like banking, where identify verification and “Know Your Customer” types of applications.
For low-sighted individuals, Amazon Rekognition can be used to detect objects in a video and describe them, to identify how many people are in the meeting room that we just joined, or to help a blind user to determine if they are looking at the camera in their meeting tool. We can also use Amazon Rekognition to detect and read text in the video to help provide more meaningful context to blind users who are listening to the audio from the video.
I have one more video to help inspire you to consider how to improve the experiences of low sighted or blind people in your video application. One of the use cases I particularly like in our work is live streaming applications, for things such as live sports. While it’s a bit older and not really a technology video, it’s still worth checking out this video from the English football team Liverpool, about the experience of a blind fan on game day.
Mike Kearney is a visually impaired fan of Liverpool, who attends games in person with his cousin, Stephen. Throughout this video of Mike’s day at a Liverpool match, you witness how Stephen is narrating what he sees for Mike. From game action to meeting star player Mo Salah, Stephen is giving Mike a “play by play” of their experience so that Mike can still enjoy the game day environment. It’s a touching display that makes even a blue-hearted Chelsea fan like myself want to sing the Liverpool anthem, “You’ll never walk alone.”
Not every blind person is fortunate enough to have a cousin like Stephen who will narrate their daily experiences, or who would have the time to sit through work meetings with a low-sighted individual and help navigate and narrate what their meeting conference tool does. But this is one area where AI can help and make a big impact, similar to how the Google Super Bowl ad reminded me of the impact AI can make on selfies. Imagine AI being used to help you navigate and understand the visual live stream of a sporting event, providing the type of assistance that hearing-impaired users might get from an AI service that provides real-time captioning of a live broadcast.
All of us live in bubbles where it’s easy to forget the experiences of others. Many of us working in the WebRTC space live in a bubble of sighted people, and it’s easy for us to forget about the experience of those without the same advantages. By improving our development practices and by integrating AI into our video applications through such tools like Amazon Rekognition, we can provide a better experience for all of our users.
Would you like to add AI into your video application to improve its accessibility or to add additional innovative functionality? Our team of WebRTC experts can do that, in addition to building, testing, deploying and managing your video or audio communication applications. Contact us today to learn more!