Anna Vorontsova

Experience

NEURA Robotics

Deep Learning Expert, 2D/3D Computer Vision

Solved various 2D and 3D computer vision tasks in robotic scenarios. Adapted existing methods and/or developed new methods addressing 3D reconstruction, object segmentation, antipodal and suction grasp generation. Generated data for training and benchmarking developed methods. Contributed to documentation on AI Safety, wrote customer manuals and internal guides.

Samsung Research

AI Researcher, 2D/3D Computer Vision

Developed state-of-the-art algorithms addressing 2D and 3D computer vision tasks: SLAM, visual and sensor-based localization, 3D reconstruction of indoor scenes, depth estimation, object segmentation, 2D and 3D object detection. Formulated scientific hypotheses and conducted experiments to prove them. Wrote a number of academic papers accepted to top-tier CV and robotics conferences such as CVPR, ECCV, WACV, IROS. Overall, contributed to 16 papers. Outstanding Reviewer at NeurIPS 2022 Datasets and Benchmarks track. Own several international patents on technical inventions. Developed demos and PoCs: visual odometry, visual indoor navigation, object weight measurement based on RGB-D data. Collected, labeled and prepared data for prototyping and research purposes: visual navigation, 3D reconstruction of indoor scenes, visual analytics for retail. Mastered all kinds of writing: academic manuscripts, annual reports, patents, tasks for data annotators, documentation, and internal guides.

Rambler&Co

Research Intern / Junior Data Scientist, Computer Vision

Contributed to a project on cinema visitor monitoring based on video surveillance data. Developed algorithms based on deep neural networks (segmentation, classification, detection, tracking). Collected, labeled, and prepared training data. Conducted experiments and presented the results in the form of reports and slides. What started as a small toy project run by one intern (me), was considered so successful that it convinced top management to create a computer vision department, mostly to develop and maintain the cinema monitoring system. The implemented solution was used to collect statistics in over 700 cinema halls in Russia.

Projects & Publications

UniDet3D: One Transformer for Unified Point Cloud Segmentation

2024 Conference on Computer Vision and Pattern Recognition (CVPR)

M. Kolodiazhnyi, A. Vorontsova, A. Konushin, D. Rukhovich

UniDet3D is a 3D object detection model trained on a mixture of indoor datasets. By unifying various label spaces, UniDet3D learns a strong representation across multiple datasets through a supervised joint training scheme, thus achieving generalization in various indoor environments. It outperforms existing 3D object detection methods in 6 indoor benchmarks.

OneFormer3D: One Transformer for Unified Point Cloud Segmentation

2024 Conference on Computer Vision and Pattern Recognition (CVPR)

M. Kolodiazhnyi, A. Vorontsova, A. Konushin, D. Rukhovich

OneFormer3D is a unified, simple, and effective model jointly solving semantic, instance, and panoptic segmentation of 3D point clouds. The model is trained end-to-end in a single run with panoptic annotations, and achieves top performance on all three tasks simultaneously, thereby setting a new state-of-the-art in several 3D segmentation benchmarks.

TETRIS: Towards Exploring the Robustness of Interactive Segmentation

2024 AAAI Conference on Artificial Intelligence (AAAI)

A. Moskalenko, V. Shakhuro, A. Vorontsova, A. Konushin, A. Antonov, A. Krapukhin, D. Shepelev, K. Soshin

We conducted a user study of clicking patterns and found that the standard assumption made in the common evaluation strategy may not hold, making the accuracy and robustness of existing methods questionable. We propose a novel evaluation strategy providing a more comprehensive analysis of a model’s performance. Besides, we introduce a novel benchmark for measuring the robustness of interactive segmentation, and report the results of an extensive evaluation of numerous models.

SUPER: Selfie Undistortion and Head Pose Editing with Identity Preservation

2024 International Conference on Image Processing (ICIP)

P. Karpikova, A. Spiridonov, A. Vorontsova, A. Yaschenko, E. Radionova, I. Medvedev, A. Limonov

Selfies captured from a short distance might look unnatural due to heavy distortions and improper posing. We propose SUPER, a novel method of correcting distortions and adjusting head poses in selfies. SUPER combines generative and rendering approaches to ensure correct geometry while preserving identity.

FAWN: Floor-And-Walls Normal Regularization for Direct Neural TSDF Reconstruction

2024 International Conference on Image Processing (ICIP)

A. Sokolova, A. Vorontsova, B. Gabdullin, A. Limonov

FAWN is a modification of truncated signed distance function (TSDF) reconstruction methods. FAWN takes the standard scene structure in account by detecting walls and floor in a scene, and penalizing their normals for deviating from the horizontal and vertical directions. We add FAWN to state-of-the-art TSDF reconstruction methods and demonstrate a quality gain in a number of indoor benchmarks.

MEDeA: Multi-View Efficient Depth Alignment

2024 International Conference on Image Processing (ICIP)

M. Artemyev, A. Vorontsova, A. Sokolova, A. Limonov

Single-view depth estimation methods cannot guarantee consistency throughout a sequence of frames. Minimizing discrepancy across multiple views takes hours, making these methods infeasible. Our MeDEA takes RGB frames with camera parameters and outputs temporally-consistent depth maps orders of magnitude faster then previous test-time optimization approaches. MeDEA sets a new state-of-the-art in indoor benchmarks and handles smartphone-captured data.

Top-Down Beats Bottom-Up in 3D Instance Segmentation

2024 Winter Conference on Applications of Computer Vision (WACV)

M. Kolodiazhnyi, D. Rukhovich, A. Vorontsova, A. Konushin

Most 3D instance segmentation methods are bottom-up and typically include resource-exhaustive post-processing. TD3D is a pioneering cluster-free, fully-convolutional approach trained end-to-end. This is the first top-down method outperforming bottom-up approaches in a 3D domain. It demonstrates outstanding accuracy while being much up to 2.6x faster on inference than the current state-of-the-art grouping-based approaches.

Neural Global Illumination for Inverse Rendering

2023 International Conference on Image Processing (ICIP)

N. Patakin, D. Senushkin, A. Vorontsova, A. Konushin

NeGIL the first neural inverse rendering approach capable of processing inter-reflections. We formulate a novel neural global illumination model, which estimates both direct environment light and indirect light as a surface light field, and build a Monte Carlo differentiable rendering framework. Our framework effectively handles complex lighting effects and facilitates the end-to-end reconstruction of physically-based spatially-varying materials.

TR3D: Towards Real-Time Indoor 3D Object Detection

2023 International Conference on Image Processing (ICIP)

D. Rukhovich, A. Vorontsova, A. Konushin

TR3D is a fast fully-convolutional 3D object detection model trained end-to-end, that achieves state-of-the-art results on the standard benchmarks. Moreover, to take advantage of both point cloud and RGB inputs, we propose an early fusion of 2D and 3D features. The versatile and efficient fusion module can be applied to make a conventional 3D object detection method multimodal, thereby improving its detection accuracy.

Contour-based Interactive Segmentation

2023 International Joint Conference on Artificial Intelligence (IJCAI)

P. Popenova, D. Galeev, A. Vorontsova, A. Konushin

Interactive segmentation can be used to speed up and simplify image editing and labeling. Most approaches use clicks, which might be inconvenient when selecting small objects. We present a first-in-class contour-based interactive segmentation approach and demonstrate that a single contour provides the same accuracy as multiple clicks, thus reducing the number of interactions.

FCAF3D: Fully Convolutional Anchor-Free 3D Object Detection

2022 European Conference on Computer Vision (ECCV)

D. Rukhovich, A. Vorontsova, A. Konushin

FCAF3D is a first-in-class fully convolutional anchor-free indoor 3D object detection method. FCAF3D can handle large-scale scenes with minimal runtime through a single feed-forward pass. Moreover, we propose a novel parametrization of oriented bounding boxes that consistently improves detection accuracy. State-of-the-art on ScanNet, SUN RGB-D, and S3DIS datasets.

Floorplan-Aware Camera Poses Refinement

2022 International Conference on Intelligent Robots and Systems (IROS)

A. Sokolova, F. Nikitin, A. Vorontsova, A. Konushin

A technical floorplan depicts walls, partitions, and doors, being a valuable source of information about the general scene structure. We propose a novel floorplan-aware 3D reconstruction algorithm that extends bundle adjustment, and show that using a floorplan improves 3D reconstruction quality on the Redwood dataset and our self-captured data.

ImVoxelNet: Image to Voxels Projection for Monocular and Multi-view General-purpose 3D Object Detection

2022 Winter Conference on Applications of Computer Vision (WACV)

D. Rukhovich, A. Vorontsova, A. Konushin

ImVoxelNet is a fully convolutional 3D object detection method that operates in monocular and multi-view modes. ImVoxelNet takes an arbitrary number of RGB images with camera poses as inputs. General-purpose: state-of-the-art on outdoor (KITTI and nuScenes) and indoor (SUN RGB-D and ScanNet) datasets.

Single-Stage 3D Geometry-Preserving Depth Estimation Model Training on Dataset Mixtures with Uncalibrated Stereo Data

2022 Conference on Computer Vision and Pattern Recognition (CVPR)

N. Patakin, A. Vorontsova, M. Artemyev, A. Konushin

GP2 is a General-Purpose and Geometry-Preserving scheme of training single-view depth estimation models. GP2 allows training on a mixture of a small part of geometrically correct depth data and voluminous stereo data. State-of-the-art results in the general-purpose geometry-preserving single-view depth estimation.

DISCOMAN: Dataset of Indoor Scenes for Odometry, Mapping and Navigation

2019 International Conference on Intelligent Robots and Systems (IROS)

P. Kirsanov, A. Gaskarov, F. Konokhov, K. Sofiiuk, A. Vorontsova, I. Slinko, D. Zhukov, S. Bykov, O. Barinova, A. Konushin

A synthetic dataset for training and benchmarking semantic SLAM. Contains 200 sequences of 3000-5000 frames (RGB images generated using physically-based rendering, depth, IMU) and ground truth occupancy grids. In addition, we establish baseline results for SLAM, mapping, semantic and panoptic segmentation on our dataset.

Measuring Robustness of Visual SLAM

2019 International Conference on Machine Vision Applications (MVA)

D. Prokhorov, D. Zhukov, O. Barinova, A. Vorontsova, A. Konushin

A feasibility study of RGB-D SLAM. We extensively evaluate the popular ORBSLAM2 on several benchmarks, perform statistical analysis of the results, and find correlations between the metric values and the attributes of the trajectories. While the accuracy is high, robustness is still an issue.

Scene Motion Decomposition for Learnable Visual Odometry

SEMNAV 2019 : CVPR'19 Workshop on Deep Learning for Visual Navigation

I. Slinko, A. Vorontsova, F. Konokhov, O. Barinova, A. Konushin

Instead of ego-motion estimation, we address a dual problem of estimating the motion of a scene w.r.t a static camera. Using optical flow and depth, we calculate the motion of each point of a scene in terms of 6DoF and create motion maps, each one addressing a single degree of freedom. Such a decomposition improves accuracy over naive stacking of depth and optical flow.

Anna Vorontsova

Data Scientist / AI Researcher, Computer Vision

About Me

Experience

NEURA Robotics

Deep Learning Expert, 2D/3D Computer Vision

Samsung Research

AI Researcher, 2D/3D Computer Vision

Rambler&Co

Research Intern / Junior Data Scientist, Computer Vision

Education

HSE University

Master of Data Science

Yandex School of Data Analysis

Data Science, Advanced track

HSE University

Bachelor of Applied Mathematics, Machine Learning and Applications track

Projects & Publications

UniDet3D: One Transformer for Unified Point Cloud Segmentation

2024 Conference on Computer Vision and Pattern Recognition (CVPR)

OneFormer3D: One Transformer for Unified Point Cloud Segmentation

2024 Conference on Computer Vision and Pattern Recognition (CVPR)

TETRIS: Towards Exploring the Robustness of Interactive Segmentation

2024 AAAI Conference on Artificial Intelligence (AAAI)

SUPER: Selfie Undistortion and Head Pose Editing with Identity Preservation

2024 International Conference on Image Processing (ICIP)

FAWN: Floor-And-Walls Normal Regularization for Direct Neural TSDF Reconstruction

2024 International Conference on Image Processing (ICIP)

MEDeA: Multi-View Efficient Depth Alignment

2024 International Conference on Image Processing (ICIP)

Top-Down Beats Bottom-Up in 3D Instance Segmentation

2024 Winter Conference on Applications of Computer Vision (WACV)

Neural Global Illumination for Inverse Rendering

2023 International Conference on Image Processing (ICIP)

TR3D: Towards Real-Time Indoor 3D Object Detection

2023 International Conference on Image Processing (ICIP)

Contour-based Interactive Segmentation

2023 International Joint Conference on Artificial Intelligence (IJCAI)

FCAF3D: Fully Convolutional Anchor-Free 3D Object Detection

2022 European Conference on Computer Vision (ECCV)

Floorplan-Aware Camera Poses Refinement

2022 International Conference on Intelligent Robots and Systems (IROS)

ImVoxelNet: Image to Voxels Projection for Monocular and Multi-view General-purpose 3D Object Detection

2022 Winter Conference on Applications of Computer Vision (WACV)

Single-Stage 3D Geometry-Preserving Depth Estimation Model Training on Dataset Mixtures with Uncalibrated Stereo Data

2022 Conference on Computer Vision and Pattern Recognition (CVPR)

DISCOMAN: Dataset of Indoor Scenes for Odometry, Mapping and Navigation

2019 International Conference on Intelligent Robots and Systems (IROS)

Measuring Robustness of Visual SLAM

2019 International Conference on Machine Vision Applications (MVA)

Scene Motion Decomposition for Learnable Visual Odometry

SEMNAV 2019 : CVPR'19 Workshop on Deep Learning for Visual Navigation

Skills