Image segmentation is an important aspect of computer vision used in different fields from filmmaking, scientific imagery, and editing photos to medicine. According to Stanford University, it is the process of dividing an image into different regions, based on pixels’ characteristics to identify objects to simplify and more efficiently analyse the image.
Earlier this week, to “democratise segmentation,” Facebook owner Meta announced an artificial model that can pick objects within images along with a dataset of image annotations.
In a blog post, Meta introduced Segment Anything Model, or SAM, to enable diverse applications and promote further research into foundation models for computer vision. It also launched the Segment Anything 1-Billion mask dataset (SA-1B), which it claims to be the largest ever segmentation dataset that will be available for research purposes. SAM is available under a permissive open license.
Also read: Week in tech: A monster black hole and updates on Apple’s next WWDC
SAM can identify objects in images and videos even without prior familiarity with the items in its training. Using the model, objects can be selected by clicking on them or writing text prompts. For instance, if you write the word “bird”, the tool will draw boxes around each of the birds in a photo.
Meta has used a similar technology internally for activities such as tagging photos or determining which posts to recommend to users of Facebook and Instagram, according to a Reuters report. SAM is aimed at broadening access to such technology. Through SAM, Meta aims to reduce the need for task-specific modelling expertise, training compute, and custom data annotation for image segmentation, according to the Meta blog post.
SAM could become a part of larger AI systems for a more general multimodal understanding of the world. For instance, understanding the visual and text content of a webpage.
In the virtual reality domain, this model could help make it possible to select an object based on a user’s gaze and then “lifting” it into 3D. For content creators, SAM can improve creative applications such as extracting image regions for collages or video editing. It could also help the scientific study of natural occurrences on Earth or even in space. For example, it can localise animals or objects to study and track in the video, according to the blog post.
The SAM model and dataset will be available for download under a non-commercial license. Users uploading their images to an accompanying prototype likewise must agree to use it only for research purposes, the Reuters report adds.
Also read: Xiaomi 13 Pro review: will the Leica luck shine?