Estimating the three-dimensional location of an object is one of the most important issues to be addressed in the field of computer vision. In situations where the end goal is to build automated solutions capable of detecting and recognizing objects from photographs, new models and algorithms that perform exceptionally well are needed. It is possible that estimating the 3D position of an item from a single 2D image is a difficult challenge because the single image lacks information that is critical to the task. The investigation focused on a particular task of computing the three-dimensional location of a soccer ball. Ball nets and temporal nets are two examples of deep learning models, and this thesis outlines a strategy that is able to tackle this problem and is based on these models. The former uses a deep convolutional neural network to extract meaningful features from images, while the latter uses temporal information to arrive at more accurate predictions. Both of these methods aim to improve computer vision. Compared to other existing computer vision algorithms, our approach achieves a lower mean absolute error across a variety of conditions and setups. A whole new data-driven pipeline has been developed to process the movies and extract three-dimensional information about an item. In the realm of computer vision, one of the most important things to discuss is the process of estimating the three-dimensional location of an object. In situations where the end goal is to build automated solutions capable of detecting and recognizing objects from photographs, new models and algorithms that perform exceptionally well are needed. It is possible that estimating 3D space is a difficult challenge because single 2D photographs provide only limited information that is important for the task.
This work is licensed under a Creative Commons Attribution 4.0 International License.
You may also start an advanced similarity search for this article.