Waymo, the Alphabet subsidiary that hopes to someday pepper roads with self-driving taxis, today pulled back the curtains on a portion of the data used to train the algorithms underpinning its cars: The Waymo Open Dataset. Waymo principal scientist Dragomir Anguelov claims it’s the largest multimodal sensor sample corpus for autonomous driving released to date.

“[W]e are inviting the research community to join us with the [debut] of the Waymo Open Dataset, [which is composed] of high-resolution sensor data collected by Waymo self-driving vehicles,” wrote Anguelov in a blog post published this morning. “Data is a critical ingredient for machine learning … [and] this rich and diverse set of real-world experiences has helped our engineers and researchers develop Waymo’s self-driving technology and innovative models and algorithms.”

The Waymo Open Dataset contains data collected over the course of the millions of miles Waymo’s cars have driven in Phoenix, Kirkland, Mountain View, and San Francisco, and it covers a wide variety of urban and suburban environments during day and night, dawn and dusk, and sunshine and rain. Samples are divided into 1,000 driving segments, each of which captures 20 seconds of continuous driving — corresponding to 200,000 frames at 10 Hz — through the sensors affixed to every Waymo car. These include five custom-designed lidars (which bounce light off of objects to map them three-dimensionally) and five front- and side-facing cameras.

The corpus additionally includes labeled lidar frames and images with vehicles, pedestrians, cyclists, and signage, capturing a total of 12 million 3D labels and 1.2 million 2D annotations. Waymo says the camera and lidar frames have been synchronized by its in-house 3D perception models that fuse data from multiple sources, obviating the need for manual alignment.

“Waymo designs our entire self-driving system — including hardware and software — to work seamlessly together, which includes choice of sensor placement and high-quality temporal synchronization,” wrote Anguelov. “This data has the potential to help researchers make advances in 2D and 3D perception and make progress on areas such as domain adaptation, scene understanding, and behavior prediction. We hope that the research community will generate more exciting directions with our data that will not only help to make self-driving vehicles more capable, but also impact other related fields and applications, such as computer vision and robotics.”

The launch of Waymo’s enormous data set comes after Lyft revealed its own open source corpus for autonomous vehicle development. In addition to over 55,000 human-labeled 3D annotated frames of traffic agents, it contains bitstreams from seven cameras and up to three lidar sensors, plus a drivable surface map and an underlying HD spatial semantic map that includes over 4,000 la