This paper addresses the laborious, time-consuming and error-prone process of generating ground truth data to perform instance segmentation of fish in their natural habitat. Our proposal is to use the Segment Anything Model (SAM), which allows zero-shot inference, to automatically build the segmentation masks, significantly reducing the dataset creation time and enhancing scalability to larger datasets. Experimental results using You Only Look Once (YOLO) demonstrate only marginal performance differences between our approach –with segmentation masks created with no human intervention– and a standard training using a fully human-labeled dataset. The results underscore the effectiveness of the automated workflow discussed herein, showcasing substantial reduction in dataset creation time, particularly in demanding underwater scenarios.
Authors Xènia Rovira Coll | Antoni Burguera Burguera
In International WorkShop on Marine Technologies (MARTECH) , Palma (Spain), 2024.