Efficient 3D Model Object Retrieval with Disentangled Representations
Author | : Luis Armando Pérez Rey |
Promotor(s) | : Prof.dr. J. Lukkien /Dr. M.J. Holenderski /Dr. D.S. Jarnikov |
University | : TU/e |
Year of publication | : 2024 |
Link to repository | : Link to thesis |
It’s late at night, and you feel very tired but you can’t stop sifting through endless pages of products in search of the ideal decoration that you have in mind for your home.
Sure, you could have used the image search of your browser to try to find what you have in mind but, in spite of countless attempts the images that you have given haven’t helped at all.
This same situation can happen to game developers, architects, designers, or anybody who tries to find a specific 3D model of an object in an online webpage to use in their projects such as video games,
augmented reality applications, architectural models, etc. In this scenario, the person who searches might have a 3D model of an object, also called the query object, that is similar to the one that they are
searching for. It is possible to use image search to find the 3D content that is needed, for this, multiple views of the available object need to be acquired. These views can be combined to better capture the
properties of the query object and help with the search. But this raises a question: How many and which views should be gathered to actually find the content needed?
Some of the previous approaches to this problem gather views of the query object by placing a virtual camera around the object at multiple locations. This can become inefficient given that some views might have redundant information, because they look similar to other views that have already been acquired, or they might be acquired from certain camera positions that do not allow to distinguish any properties of the object (imagine looking at a sofa from below,
you can’t really distinguish what color it is, the shape, etc.).
In his research, Luis worked towards the development of efficient algorithms to select fewer views of a query object that are more representative of the search
need of a user for finding content in databases of 3D models. The approach taken involves using neural networks to understand the properties of objects
and match them with existing 3D content. Three main steps were taken towards the development of such algorithms.
- Understanding Spatial Relationships: The first step was to teach the neural network to capture the geometry of the views gathered from an object.
By doing so, the network can grasp how different perspectives relate to each other. - Recognizing Rotations: The second step was to training the network to recognize rotations of the object between multiple views.
This skill allows the network to understand how changes to an object’s orientation affect the views. - Selective View Selection: The last step was to develop a method to guide the network into selecting
the rotations to apply to an object to generate the most representative view. By analyzing the initial image, the network determines
which additional views would provide the most information which can help reduce the number of views needed to find 3D content in a database.
In his thesis, Luis developed new methods to efficiently infer what are the most representative views of an object by capturing
properties of 3D models shown by neural networks. These approaches address key aspects of object identification that may
contribute to improved representations of objects in the future, enhancing the identification of 3D models.