This paper presents an end-to-end Neural Network (NN) to estimate the overlap between two scenes observed by an underwater robot endowed with a bottom-looking camera. This information is extremely valuable to perform visual loop detection in Simultaneous Localization and Mapping (SLAM). Contrarily to other existing approaches, this study does not depend on handcrafted features or similarity metrics, but jointly optimizes the image description and the loop detection by means of a Siamese NN architecture.
Twelve different configurations have been experimentally tested using large balanced datasets synthetically generated from real data. These experiments demonstrate the ability of our proposal to properly estimate the overlap with precisions, recalls and fall-outs close to 95%, 98% and 5% respectively and execution times close to 0.7 ms per loop in a standard laptop computer. The source code of this proposal is publicly available.