Need autonomous driving training data? ›

A Data Scientist’s Top 3 Tips About Training Data

A Data Scientist’s Top 3 Tips About Training Data

People frequently tell us they learned a lot listening to Mighty AI Principal Data Scientist Angie Hugeback’s interview with Sam Charrington on his “This Week in Machine Learning & AI” podcast.

We’re not surprised—the AI industry continues to shift, but the conversation based in data science and statistics principles remains relevant and grounded.

So we’re opening the vault to resurface Angie’s secrets about training data for computer vision and natural language models. Here’s a sneak peek:

  1. Generating high-quality, accurate training datasets is job number one in scaling AI models. And anyone who has experience in the field knows that your model is only as good as your training data (and the humans training it).
  2. Your training data only represents a subset of the space in which you expect your model to function. You need good coverage—and you want to make sure to represent your input as best as possible if you want to get the most accurate results.
  3. Quality in computer vision and natural learning models is essential—but believe it or not, you can get away with lower-quality training data in certain situations. Wanna know what they are? Then, skip on over to the full episode. (While you’re at it, treat your ears to several informative interviews about AI and machine learning. We guarantee you’ll learn a ton!)

Hungry for more? Check back next week to get a list of our favorite podcasts in this space.

Note: Before January 10, 2017, Mighty AI was known as Spare5. We haven’t edited this podcast episode to reflect our new name.

 

image credit: Corey Blaz via Unsplash

Mighty AI