Facebook has deployed a machine learning system that can read the text in images and videos as well as understand the context of the text and the image together. Named Rosetta, the system is already live and is extracting text from almost one billion images and video frames posted on Facebook and Instagram daily, according to the social media giant. The AI system is meant to keep an eye out for inappropriate or harmful content on Facebook-owned platforms, as well as streamline photo search and enhance accessibility to visually-impaired via screen readers.
Facebook's new AI mechanism is being used to scan text in images of every kind - from captions in a meme, to street signs, or restaurant menus, and everything in between. While Facebook and Instagram did employ text recognition systems earlier, they did little to help due to more than one reason.
"Taking into account the sheer volume of photos shared each day on Facebook and Instagram, the number of languages supported on our global platform, and the variations of the text, the problem of understanding text in images is quite different from those solved by traditional optical character recognition (OCR) systems, which recognize the characters but don't understand the context of the associated image," Facebook said in a blog post.
Rosetta, on the other hand, extracts text from images and videos and inputs it into a text recognition model that has been trained on classifiers to understand the context of the text and the image together, the blog post further said. This happens in two stages - detection and recognition. In the first step, Rosetta detects rectangular regions that potentially contain text. For the second part, a text recognition system recognises the text in the detected regions and transcribes it into something that can be read by machines.
With an AI that can understand context, Facebook can now easily differentiate whether a meme is meant as a joke, or is offensive. For now, Facebook is employing Rosetta's help to make its photo searches more relevant, as well as automatically detect content that violates its hate-speech policy. The social media platform is also planning to use Rosetta to determine which content should appear in a user's News Feed.
"The rapid growth of videos as a way to share content, the need to support many more languages, and the increasing number of ways in which people share content make text extraction from images and videos an exciting challenge that helps push the frontiers of computer vision research and applications," Facebook said in its blog post regarding the future plans for Rosetta.
Edited by Vivek Punj