Close

Anna Vorontsova

Data Scientist / AI Researcher, Computer Vision

Download Resume

About Me

I am a research scientist at Samsung Research, Spatial AI. I received an M.Sc. in Data Science, and a bachelor's degree in Applied Mathematics from one of the best Russian universities. For 4,5 years at Samsung Research, I have been working on various tasks related to 2D and 3D computer vision. Overall, I have almost 6 years of both industrial and research experience, focusing on computer vision throughout my career. I have co-authored a number of research papers accepted to top-tier conferences and have hands-on experience with various deep learning (CNN, RNN) models and frameworks (PyTorch, Tensorflow).

Experience

Samsung Research

AI Researcher, 2D/3D Computer Vision

Developed state-of-the-art algorithms addressing 2D and 3D computer vision tasks: SLAM, visual and sensor-based localization, 3D reconstruction of indoor scenes, depth estimation, object segmentation, 2D and 3D object detection. Formulated scientific hypotheses and conducted experiments to prove them. Wrote a number of academic papers accepted to top-tier CV and robotics conferences such as CVPR, ECCV, WACV, IROS. Overall, contributed to 16 papers. Outstanding Reviewer at NeurIPS 2022 Datasets and Benchmarks track. Own international patents on technical inventions. Developed demos and PoCs on visual odometry, visual indoor navigation, fruit and vegetable weight measurement based on RGB-D data. Collected, labeled and prepared data for prototyping and research purposes: visual navigation, 3D reconstruction of indoor scenes, visual analytics for retail. Mastered all kinds of writing: academic manuscripts, annual reports, patents, tasks for data annotators, documentation, and internal guides.

Rambler&Co

Research Intern / Junior Data Scientist, Computer Vision

Contributed to a project on cinema visitor monitoring based on video surveillance data. Developed algorithms based on deep neural networks (segmentation, classification, detection, tracking). Collected, labeled, and prepared training data. Conducted experiments and presented the results in the form of reports and slides. What started as a small toy project run by one intern (me), was considered so successful that it convinced top management to create a computer vision department, mostly to develop and maintain the cinema monitoring system. The implemented solution was used to collect statistics in over 700 cinema halls in Russia.

Education

HSE University

Sep 2018 - June 2020

Master of Data Science

Completed courses: Bayesian Networks, Functional Analysis, Convex Optimization, Autonomous Driving
Thesis: Visual Odometry with Ego-motion Sampling
GPA: 4.5 (8.68 / 10)

Yandex School of Data Analysis

Sep 2018 - June 2020

Data Science, Advanced track

HSE University

Sep 2014 - June 2018

Bachelor of Applied Mathematics, Machine Learning and Applications track

Completed courses: Machine Learning, Deep Learning, Statistical Learning Theory, NLP, Computer Vision, Reinforcement Learning, Bayesian ML, Advanced Algorithms and Data Structures, Probability Theory and Statistics
Thesis: Person Re-identification Based on Visual Attributes
GPA: 4.69 (8.1 / 10)

Projects & Publications

Top-Down Beats Bottom-Up in 3D Instance Segmentation

2024 Winter Conference on Applications of Computer Vision (WACV)

M. Kolodiazhnyi, D. Rukhovich, A. Vorontsova, A. Konushin

Most 3D instance segmentation methods are bottom-up and typically include resource-exhaustive post-processing. We address 3D instance segmentation with a TD3D: the pioneering cluster-free, fully-convolutional approach trained end-to-end. This is the first top-down method outperforming bottom-up approaches in a 3D domain. It demonstrates outstanding accuracy while being much up to 2.6x faster on inference than the current state-of-the-art grouping-based approaches.

Neural Global Illumination for Inverse Rendering

2023 International Conference on Image Processing (ICIP)

N. Patakin, D. Senushkin, A. Vorontsova, A. Konushin

We present the first neural inverse rendering approach capable of processing inter-reflections. We formulate a novel neural global illumination model, which estimates both direct environment light and indirect light as a surface light field, and build a Monte Carlo differentiable rendering framework. Our framework effectively handles complex lighting effects and facilitates the end-to-end reconstruction of physically-based spatially-varying materials.

TR3D: Towards Real-Time Indoor 3D Object Detection

2023 International Conference on Image Processing (ICIP)

D. Rukhovich, A. Vorontsova, A. Konushin

We introduce a fast fully-convolutional 3D object detection model trained end-to-end, that achieves state-of-the-art results on the standard benchmarks. Moreover, to take advantage of both point cloud and RGB inputs, we propose an early fusion of 2D and 3D features. The versatile and efficient fusion module can be applied to make a conventional 3D object detection method multimodal, thereby improving its detection accuracy.

Contour-based Interactive Segmentation

2023 International Joint Conference on Artificial Intelligence (IJCAI)

P. Popenova, D. Galeev, A. Vorontsova, A. Konushin

Interactive segmentation can be used to speed up and simplify image editing and labeling. Most approaches use clicks, which might be inconvenient when selecting small objects. We present a first-in-class contour-based interactive segmentation approach and demonstrate that a single contour provides the same accuracy as multiple clicks, thus reducing the number of interactions.

FCAF3D: Fully Convolutional Anchor-Free 3D Object Detection

2022 European Conference on Computer Vision (ECCV)

D. Rukhovich, A. Vorontsova, A. Konushin

FCAF3D is a first-in-class fully convolutional anchor-free indoor 3D object detection method. FCAF3D can handle large-scale scenes with minimal runtime through a single feed-forward pass. Moreover, we propose a novel parametrization of oriented bounding boxes that consistently improves detection accuracy. State-of-the-art on ScanNet, SUN RGB-D, and S3DIS datasets.

Floorplan-Aware Camera Poses Refinement

2022 International Conference on Intelligent Robots and Systems (IROS)

A. Sokolova, F. Nikitin, A. Vorontsova, A. Konushin

A technical floorplan depicts walls, partitions, and doors, being a valuable source of information about the general scene structure. We propose a novel floorplan-aware 3D reconstruction algorithm that extends bundle adjustment, and show that using a floorplan improves 3D reconstruction quality on the Redwood dataset and our self-captured data.

ImVoxelNet: Image to Voxels Projection for Monocular and Multi-view General-purpose 3D Object Detection

2022 Winter Conference on Applications of Computer Vision (WACV)

D. Rukhovich, A. Vorontsova, A. Konushin

ImVoxelNet is a fully convolutional 3D object detection method that operates in monocular and multi-view modes. ImVoxelNet takes an arbitrary number of RGB images with camera poses as inputs. General-purpose: state-of-the-art on outdoor (KITTI and nuScenes) and indoor (SUN RGB-D and ScanNet) datasets.

Single-Stage 3D Geometry-Preserving Depth Estimation Model Training on Dataset Mixtures with Uncalibrated Stereo Data

2022 Conference on Computer Vision and Pattern Recognition (CVPR)

N. Patakin, A. Vorontsova, M. Artemyev, A. Konushin

GP2 is a General-Purpose and Geometry-Preserving scheme of training single-view depth estimation models. GP2 allows training on a mixture of a small part of geometrically correct depth data and voluminous stereo data. State-of-the-art results in the general-purpose geometry-preserving single-view depth estimation.

DISCOMAN: Dataset of Indoor Scenes for Odometry, Mapping and Navigation

2019 International Conference on Intelligent Robots and Systems (IROS)

P. Kirsanov, A. Gaskarov, F. Konokhov, K. Sofiiuk, A. Vorontsova, I. Slinko, D. Zhukov, S. Bykov, O. Barinova, A. Konushin

A synthetic dataset for training and benchmarking semantic SLAM. Contains 200 sequences of 3000-5000 frames (RGB images generated using physically-based rendering, depth, IMU) and ground truth occupancy grids. In addition, we establish baseline results for SLAM, mapping, semantic and panoptic segmentation on our dataset.

Measuring Robustness of Visual SLAM

2019 International Conference on Machine Vision Applications (MVA)

D. Prokhorov, D. Zhukov, O. Barinova, A. Vorontsova, A. Konushin

A feasibility study of RGB-D SLAM. We extensively evaluate the popular ORBSLAM2 on several benchmarks, perform statistical analysis of the results, and find correlations between the metric values and the attributes of the trajectories. While the accuracy is high, robustness is still an issue.

Scene Motion Decomposition for Learnable Visual Odometry

SEMNAV 2019 : CVPR'19 Workshop on Deep Learning for Visual Navigation

I. Slinko, A. Vorontsova, F. Konokhov, O. Barinova, A. Konushin

Instead of ego-motion estimation, we address a dual problem of estimating the motion of a scene w.r.t a static camera. Using optical flow and depth, we calculate the motion of each point of a scene in terms of 6DoF and create motion maps, each one addressing a single degree of freedom. Such a decomposition improves accuracy over naive stacking of depth and optical flow.

Skills