
( 2017) has served as a main driving force for research. Similar to other areas in computer vision, the availability of benchmark datasets such as Shotton et al. 2012), as well as for augmented and virtual reality systems (Castle et al. 2019) and other autonomous robots such as drones (Lim et al. Visual localization is a core component of many interesting applications such as self-driving cars (Heng et al. Visual localization is the problem of estimating the camera pose, i.e., the position and orientation from which an image was taken, with respect to a known scene. We will make our reference poses and our framework publicly available upon publication. We extend the dataset with new nighttime test images, provide uncertainty estimates for our new reference poses, and introduce a new evaluation criterion.


We significantly improve the nighttime reference poses of the popular Aachen Day–Night dataset, showing that state-of-the-art visual localization methods perform better (up to 47%) than predicted by the original reference poses. Given an initial pose estimate, our approach iteratively refines the pose based on feature matches against a rendering of the model from the current pose estimate. In this work, we propose a semi-automated approach to generate reference poses based on feature matching between renderings of a 3D model and real images via learned features. At the same time, manually annotating feature correspondences is not scalable and potentially inaccurate. However, SfM itself relies on local features which are prone to fail when images were taken under different conditions, e.g., day/night changes. Traditionally, reference poses have been obtained via Structure-from-Motion (SfM).

High quality datasets with accurate 6 Degree-of-Freedom (DoF) reference poses are the foundation for benchmarking and improving existing methods. Visual Localization is one of the key enabling technologies for autonomous driving and augmented reality.
