GPT-4's new capabilities power a 'virtual volunteer' for the visually impaired
OpenAI has introduced the world to its latest powerful AI model, GPT-4, and refreshingly the first thing they partnered up on with its new capabilities is helping people with visual impairments. Be My Eyes, which lets blind and low vision folks ask sighted people to describe what their phone sees, is getting a "virtual volunteer" that offers AI-powered help at any time.
We've written about Be My Eyes plenty of times since it was started in 2015, and of course the rise of computer vision and other tools has figured prominently in its story of helping the visually impaired more easily navigate everyday life. But the app itself can only do so much, and a core feature was always being able to get a helping hand from a volunteer, who could look through your phone's camera view and give detailed descriptions or instructions.
The new version of the app is the first to integrate GPT-4's multimodal capability, which is to say its ability to not just chat intelligibly, but to inspect and understand images it's given:
Users can send images via the app to an AI-powered Virtual Volunteer, which will answer any question about that image and provide instantaneous visual assistance for a wide variety of tasks.
For example, if a user sends a picture of the inside of their refrigerator, the Virtual Volunteer will not only be able to correctly identify what’s in it, but also extrapolate and analyze what can be prepared with those ingredients. The tool can also then offer a number of recipes for those ingredients and send a step-by-step guide on how to make them.
But the video accompanying the description is more illuminating. In it, Be My Eyes user Lucy shows the app helping her with a bunch of things live. If you're not familiar with the rapid-fire patois of a screen reader you may miss some of the dialogue, but she has it describe the look of a dress, identify a plant, read a map, translate a label, direct her to a certain machine treadmill at the gym and tell her which buttons to push at a vending machine. (You can watch the video below.)
It's a very concise demonstration of how unfriendly much of our urban and commercial infrastructure is for people with vision issues. And it also shows how useful GPT-4's multimodal chat can be in the right circumstances.
No doubt human volunteers will continue to be instrumental for users of the Be My Eyes app — there's no replacing them, only raising the bar for when they're needed (and indeed they can be summoned immediately if the AI response isn't good enough).
As an example, the AI helpfully suggests at the gym that "the available machines are the ones without people on them." Thanks! As OpenAI co-founder Sam Altman said today, the capabilities are more impressive at first blush than once you've been using it for a while, but we must also be careful of looking this gift horse in the mouth too closely.
The team at Be My Eyes is working closely with OpenAI and with its community to define and guide its capabilities as its development continues.
Right now the feature is in closed beta among a "small subset" of Be My Eyes users, which will be expanded over the coming weeks. "We hope to make the Virtual Volunteer broadly available in the coming months," the team writes. "Just like our existing volunteer service, this tool is free for all blind and low-vision community members using the Be My Eyes app."
Considering how quickly ChatGPT was co-opted into providing services for corporate SaaS platforms and other rather prosaic applications, it's heartening to see this new one immediately put to work helping people. You can read more about GPT-4 here.