Need autonomous driving training data? ›

How to Get High-Quality Annotations for Training Data: The Easy Way & The Hard Way

How to Get High-Quality Annotations for Training Data: The Easy Way & The Hard Way

“This is awesome, but wow are you kidding me?” is the reaction we had after reading the fantastic research paper, “Quality Assessment for Crowdsourced Object Annotations.” It’s a 2011 paper so it’s not exactly new, but it came up in our recent general research and we dug right in.

The paper outlines a collection of strategies—“scoring functions”—machine learning practitioners can use to assess the quality of spatial object annotations acquired through crowdsourcing. As quality with traditional crowdsourcing providers is notoriously spotty, the researchers set out to build tools that would speed up the QA process, quickly identifying usable versus unusable annotations based on quality rankings. The strategies include edge detection, Bayesian matting, and object proposal, as compared to baseline quality measures of control points and annotation size.

The researchers were successful in their quest: the new annotation scoring functions outperformed the baseline ones, providing practitioners with better QA methods for training data generation. Great job, you crazy-smart people. But doesn’t it all sound like a crapload of work? It is!

These are wonderful improvements, but isn’t it unacceptable that the process of annotating data requires that much improvement in the first place? Isn’t it frustrating to have to do so much work on top of a platform you’re relying on to make your life easier?

Now yes, things have changed in the past five years, including QA methods. But the lack of consistent quality in crowdsourced annotations has not changed. We hear it from our reformed customers all the time, the ones who were previously using a crowdsourcing provider and struggling with sub-par annotations. They were having to run tasks multiple times, update the instructions over and over, etc.

If this sounds familiar, you need to consider a Training Data as a Service (TDaaS) solution. In Mighty AI’s case, all the QA is baked in, and the methods are continuously updated and refined with no grunt work on your part. In fact, the whole annotation process is taken completely off your plate. Sound good? Connect with us to learn more.

image credit: by John Mark Arnold via CC0 1.0

Note: Prior to January 10, 2017, Mighty AI was known as Spare5. While Spare5 remains the name of our consumer brand and application, we’ve relaunched our business-customer side as Mighty AI, which also serves as the parent company under which Spare5 now lives. Some posts on have been updated with the new company name to ease confusion.