Microsoft Word and PowerPoint will use AI to automatically write photo descriptions

Jordan Novet @jordannovet December 2, 2016 2:48 PM

An automatically generated caption in PowerPoint.

Image Credit: Screenshot

Microsoft today said that starting in early 2017, its Word and PowerPoint applications will be able to automatically come up with descriptions of photos that users can add into documents. Office 365 subscribers will see this first in Word and PowerPoint for Windows PCs.

Ordinarily, if you drop a photo into PowerPoint, you can type out an “Alt Text” title and description for the photo. But not everyone does that when they’re making slide decks. Then, when a blind person opens the slide deck, they aren’t able to understand what’s going on in the picture, which could make the slide or the entire deck more difficult to fully grasp.

[aditude-amp id="flyingcarpet" targeting='{"env":"staging","page_type":"article","post_id":2121614,"post_type":"story","post_chan":"none","tags":null,"ai":true,"category":"none","all_categories":"ai,bots,business,","session":"A"}']

Microsoft wants to change that. So it has chosen to automate the process of making Alt Text for photos, drawing on its Cognitive Services Computer Vision application programming interface (API). “Through machine learning, this service will keep improving as more people use it, saving you significant time to make media-rich presentations accessible,” John Jendrezak, accessibility lead and partner director of program management for Microsoft’s Office Engineering team, wrote in a blog post. To use this feature, you’ll simply have to right-click on a photo and select “Automatic Alt Text.”

The technology uses a type of artificial intelligence (AI) called deep learning to recognize objects in photos and then figure out the best words to explain the photo in its entirety. From there, operating systems’ screen readers can read the captions aloud.

AI Weekly

The must-read newsletter for AI and Big Data industry written by Khari Johnson, Kyle Wiggers, and Seth Colaner.

Included with VentureBeat Insider and VentureBeat VIP memberships.

Deep learning generally involves training artificial neural networks on lots of data, such as photos, and then getting the neural networks to make inferences about new data. This is a method that has caught on at Apple, Facebook, Google, and Twitter, as well as at Microsoft.

In fact, earlier this year Facebook did something similar to what Microsoft is doing. It started automatically generating captions of photos that people share so that when blind people are scrolling through the News Feed on iOS, the VoiceOver screen reader embedded in iOS can quickly read out automatically generated captions. As a result, blind users can understand the text that people include with the pictures they post, as well as getting the comments that users make.

Twitter recently made it possible for users to manually write captions for images they post. While Twitter does do deep learning research, it hasn’t offered to automatically generate captions for images.

Also in today’s blog post, Jendrezak noted that Microsoft will be merging the separate fields for title and description. That way, he wrote, “you have no confusion about where to enter alt-text.” Currently, to add that information to an image, you have to open the Format pane, choose the Size & Properties tab (in PowerPoint — if you’re using Word, the tab is called Layout & Properties), and select the Alt Text dropdown.

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn More

Explore

None AI Bots Business