CAD and image processing software tools play a key role in the RAS-assisted inspection loop and therefore their development is a main objective of ROBINS. The software is expected to:
- Devise a 3D numeric model of the confined space subject to inspection by means of image processing algorithms capable of combining together 2D pictures, and/or by means of meshing algorithms capable to devise textured meshes from point clouds and photographs
- Provide virtual tours of the space subject to inspection. The user should be given the possibility to examine accurately the details of interest by moving in the 3D virtual space and setting the orientation of its viewpoint according to his need, and having always a detailed rendering of the surface observed consistent with the viewpoint
- Provide the possibility to add hotspots and/or associate additional information to selected parts of the 3D model (augmented virtual reality model)
- Identify critical or suspect areas from the analysis of visual data acquired during the inspection and highlight such areas in order to provide a valuable guidance to the surveyor
The ROBINS project also aims at integrating image-processing algorithms specifically developed for the recognition of ship hull’s critical or suspect areas in the software dedicated to virtual tours, thus creating a unified environment for virtual inspection. Dedicated software tools and algorithms for image processing will enhance the possibility to effectively and easily identify critical or suspect areas in inspected spaces.
3D reconstruction of real world objects has been an important research area for decades in computer vision as well as photogrammetric community. Accurate surface reconstruction has been established as a necessity for a variety of mapping, modelling, and monitoring applications (e.g. thickness measurement, cracks detection, coating condition assessment, evaluation of mechanical damages).
The most fundamental step required for surface reconstruction is generation of 3D point cloud. A point cloud is basically a large collection of points that are placed on a three-dimensional coordinate system. Two main methods of extracting point clouds exist: 3D scanning and photogrammetry. The 3D scanning techniques can be further subcategorized into laser scanning and structured light scanning. Spatial data is obtained by moving the laser head or the structured light cameras relative to the object being scanned to directly obtain point clouds of the object (or sections of the object if it is too large). With photogrammetry techniques, 3D point clouds are generated from a large set of images by matching 2D points and edges which are transformed in the 3D data by forward ray intersection. The following steps are required to extract point cloud data from a set of photos:
- Feature recognition – each photograph is analyzed and key features are identified, which are invariant to scale and rotation and may be potentially used to align the photos.
- Feature matching – features have to be matched between photos.
- Alignment of cameras – the coordinates of the cameras relative to each other and the recognized features is calculated by minimizing the error between distances on images and expected distances for all cameras. Minimization is usually performed using Levenberg-Marqardt algorithm and is collectively known as bundle adjustment.
- Construction of dense point cloud – once the cameras are aligned and distances between key features are known, construction of the dense point cloud can begin. This step is the most computationally intensive and can be performed using forward ray tracing.


At the final stage, the reconstructed 3D model is textured by finding the best images for each triangle and adjusting colors. Since both 3D TIN model and the whole set of captured images are registered within the global coordinate system, creating texture map is essentially a problem of combination of texture fragments. Individual texture fragments are obtained by mapping (“backprojecting”) input images onto the generated 3D model. Multiple views produce multiple texture fragments, and the domains of these fragments are different parts of TIN model. It should be noted that overlapping fragments may differ photometrically due to different lighting, camera settings, or nonlambertian object surface (e.g. glass windows, water basins, or structures of polished metal) and geometrically due
to model imprecision and/or imperfect registration. Thus, texturing process should generate texture fragments very carefully in order to minimize visible seams, and also perform some post-processing in order to remove remaining photometrical differences between the fragments.
Accurate reconstructed 3D models of industrial environments are required for many purposes like maintenance, documentation, training, and monitoring. Most of current research is focusing on applying Virtual and Augmented Reality for providing various services in industrial environments. Accurate georeferenced photorealistic 3D models of active construction site provide an important tool for impact assessment, decision-making, or project monitoring. Relatively cheap, flexible and general 3D model acquisition process is widely used for reverse-engineering and rapid prototyping. Inbuilt tools allow to measure distances, areas and volumes or to perform more sophisticated metric analysis on point clouds or meshes.
Summarizing, photogrammetry is a valuable and attractive approach in many applications which has the major advantage of being low-cost, portable, flexible and able to deliver, at the same time, highly detailed geometries and textures. At the same time, many objects are problematic for image-based 3D modeling techniques (unstructured, monochrome, translucent, reflective, and/or self-resembling surfaces). Moreover, additional constraints are related to lighting conditions which play a key role in production of high quality models. Problematic image capturing conditions result in a high level of noise in the final mesh models and more topological errors. Therefore, new improved and robust algorithms should be developed in order to deal with complex industrial sites.
The UIB team has been developing several methods for the detection of coating breakdown and corrosion (CBC) in images taken from vessels during inspection operations. The methods that have been developed are based on Deep Convolutional Neural Networks (DCNN), either for object detection or for image semantic segmentation. More precisely, three different approaches have been considered and developed:
- Regression of oriented bounding boxes parameters for defect detection. This is a two-stage detector trained for detecting bounding boxes containing CBC. The first stage is a modified version of the Single-Shot Multibox Detector (SSD), where an image pyramid-based approach is adopted to contemplate the detection at different scales and so capture minor details if needed. The output of this stage is a collection of bounding boxes containing the defects sought. The second stage implements a lightweight CNN for regressing the parameters of the oriented bounding boxes better fitting the defects that lie inside the unoriented bounding boxes produced by the first stage. Unoriented and oriented bounding boxes resulting from the detector for some images can be found next.
- Fully supervised semantic segmentation. In this case, we have adopted a Fully Convolutional Neural network that has been trained for the CBC detection case using up to four different loss functions to make the network capable of small defects detection: the Focal Loss, the Dice Loss, and the standard Softmax and binary Cross Entropy losses. Examples of results for this approach, after training using the Dice Loss, as the winning loss function, can be found next.
- Weakly supervised semantic segmentation. With the aim of reducing the cost of image annotations for training, in this point we propose a weakly-supervised segmentation approach based on U-Net, in which, similarly to Attention U-Net (AUN), we embed Attention Gates specifically modified for this application. We make use of coloured scribbles for the different classes as weak annotations. By means of a superpixel segmentation stage we generate a training pseudo-mask on the basis of the intersection between the scribbles and the superpixels. Both the weak annotations and the pseudo-masks are used by the loss function during training. The latter has been specifically devised to counteract the possibly incorrect labelling of certain pixels due to the vagueness of weak annotations and the pseudo-masks generation process. To this end, the loss function comprises three terms, namely a partial cross-entropy term, a novel Centroid Loss term, and a regularization term based on the mean squared error. They all are jointly optimized using an end-to-end learning model. Some detection results for this approach can be found next.
Automatic defect detection is a continuously advancing topic in research that lacks an effective market implementation in the case of shipping industry. Recently, there exists a boost for autonomous inspection of industrial, transport and building infrastructure through the synergy of robotics and computer vision. A first approach on rust detection on vessels has been developed within the MINOAS and INCASS frameworks.
The automation of corrosion and crack detection via non-contact and non-destructive techniques, instead of electrochemical methods, is an elaborate research problem. Typical methods perform image analysis on RGB data from metal surfaces. The various employed methods are mainly categorized in two groups; the ones based on an automated detection (e.g. on the wavelet domain, thresholding, spectral band combinations and analysis, image segmentation, boundary or shape analysis); and the methods based on image classification procedures.
ROBINS will promote the state-of-the-art in research and bring productive cutting-edge technologies closer to shipping market via the robotic integration of advanced defect detection and recording. Novel platforms of integrated lightweight sensors will be built, specifically oriented to defect detection. These platforms will take advantage of the modular structure of the proposed robots, aiming at adaptation on each case.
Simple image feature extraction may not be enough to detect damage in ship structure due to complex lighting conditions and great diversity of other conditions within data obtained from different robots. In this scenario, manual definition of features for defect representation is not feasible. Thus, digital image based defect detection should be built on a basis of highly robust machine-learning approach such as convolutional neural networks (CNN), which already showed good results in problem of object classification in photographs and looks promising for industrial inspection applications. In contrast to manually designed image processing solutions, deep CNN automatically generate powerful features by hierarchical learning strategies from massive amounts of training data with a minimum of human interaction or expert process knowledge.
Development of deep learning-based approaches for corrosion detection by means of bounding box regression.
Development of deep learning-based approaches for corrosion detection by means of semantic segmentation.
The overall architecture of the ROBINS software is designed considering the use cases and workflow assumed for the inspection of the ship using RAS.
The major use cases identified as elements of the workflow are:
- Preparation of the data for inspection (general ship data, CAD model),
- Registration of the inspection data (videos, photos, thickness measurements, etc.) in the data model,
- Analysis of the processed inspection results (point clouds, meshes) and recording the findings as annotations,
- Review of the survey results.
The ROBINS software is designed with a modular structure supporting different layouts of hardware, use cases, and user roles. It consists of:
- Authoring tool: a desktop application aimed at collecting, organizing, managing and presenting the data comprising the Survey project,
- Inspection tool: a limited version of the application providing functionality to review the inspection data, including notes and annotations added by Surveyor,
- Web interface: a client-server application that provides access to the Inspection tool executed remotely on a server from the End user’s web browser,
- Data model: a component managing organization and storage of the information collected within a Survey project,
- Processing tools: a framework for execution of data processing tasks separately from the main application, and a set of command-line applications implementing data processing tasks for reconstruction of 3D model from videos, and their analysis.

The architecture of the ROBINS software assumes configuration where end-user applications and processing tools can be executed on different computers in the network (client-server approach).

The application has user interface tailored to processing the data collected during inspection of ships.

The structure of the data model is defined in an open way, relying on standard formats of files. Different data items are stored as separate files so that the data model remains accessible and usable even without the ROBINS software.
The data are organized by ship spaces assumed to be inspected separately (ship holds, compartments etc.), with the data relevant to each space being composed of:
- CAD models,
- Videos with accompanying data such as IMU sensors data and / or positioning information,
- Images (extracted from videos or taken by camera),
- UTM data,
- Other digital documents (photographs, drawings, etc.) input by the user,
- Annotated markers,
- Reconstructed point clouds,
- Reconstructed textured mesh models.
Creation of the model starts from the data input by the user in the Authoring tool. Further a pipeline of processing tools is executed by the user on video or images, generating final data such as point clouds and textured meshes.
Processing tools are designed for execution on separate computer (server) or a cluster, with possibility to distribute the reconstruction tasks in parallel for efficient execution. ROBINS server component manages a queue of the processing tasks and distributes them among processing nodes automatically. The Authoring tool communicates with the server when necessary to submit a new task, query its status, or obtain results.