Advanced School for Computing and Imaging (ASCI)

ASCI office
Delft University of Technology
Building 28, room 04.E120
Van Mourik Broekmanweg 6
2628 XE – DELFT, The Netherlands

P: +31 15 27 88032

Visiting hours office
Monday, Tuesday, Thursday: 10:00 – 15:00


The ASCI office is located at the Delft University of Technology campus.  It is easily accessible by bicycle, public transport and car. The numbers of buildings can help you find your way around the campus. Make sure you remember the name and building number of your destination.

Contact us at +31 15 278 8032 or send us an email at

Efficient 3D Model Object Retrieval with Disentangled Representations

Efficient 3D Model Object Retrieval with Disentangled Representations

Author : Luis Armando Pérez Rey
Promotor(s) : Prof.dr. J. Lukkien /Dr. M.J. Holenderski /Dr. D.S. Jarnikov
University : TU/e
Year of publication : 2024
Link to repository : Link to thesis

It’s late at night, and you feel very tired but you can’t stop sifting through endless pages of products in search of the ideal decoration that you have in mind for your home.
Sure, you could have used the image search of your browser to try to find what you have in mind but, in spite of countless attempts the images that you have given haven’t helped at all.
This same situation can happen to game developers, architects, designers, or anybody who tries to find a specific 3D model of an object in an online webpage to use in their projects such as video games,
augmented reality applications, architectural models, etc. In this scenario, the person who searches might have a 3D model of an object, also called the query object, that is similar to the one that they are
searching for. It is possible to use image search to find the 3D content that is needed, for this, multiple views of the available object need to be acquired. These views can be combined to better capture the
properties of the query object and help with the search. But this raises a question: How many and which views should be gathered to actually find the content needed?

Some of the previous approaches to this problem gather views of the query object by placing a virtual camera around the object at multiple locations. This can become inefficient given that some views might have redundant information, because they look similar to other views that have already been acquired, or they might be acquired from certain camera positions that do not allow to distinguish any properties of the object (imagine looking at a sofa from below,
you can’t really distinguish what color it is, the shape, etc.).

In his research, Luis worked towards the development of efficient algorithms to select fewer views of a query object that are more representative of the search
need of a user for finding content in databases of 3D models. The approach taken involves using neural networks to understand the properties of objects
and match them with existing 3D content. Three main steps were taken towards the development of such algorithms.

  1. Understanding Spatial Relationships: The first step was to teach the neural network to capture the geometry of the views gathered from an object.
    By doing so, the network can grasp how different perspectives relate to each other.
  2. Recognizing Rotations: The second step was to training the network to recognize rotations of the object between multiple views.
    This skill allows the network to understand how changes to an object’s orientation affect the views.
  3. Selective View Selection: The last step was to develop a method to guide the network into selecting
    the rotations to apply to an object to generate the most representative view. By analyzing the initial image, the network determines
    which additional views would provide the most information which can help reduce the number of views needed to find 3D content in a database.

In his thesis, Luis developed new methods to efficiently infer what are the most representative views of an object by capturing
properties of 3D models shown by neural networks. These approaches address key aspects of object identification that may
contribute to improved representations of objects in the future, enhancing the identification of 3D models.