Need autonomous driving training data? ›

The AI Glossary: A Data Scientist’s No-Fluff Explanations for Key AI Concepts

The AI Glossary: A Data Scientist’s No-Fluff Explanations for Key AI Concepts
lighthouse

As a data scientist at an AI company, my colleagues and I are as tired of the hyperbole and conflicting information in the space as you are, friend. It seems like everyone’s got their own definition for the AI buzzword du jour, and it’s leading to a lot of contradictions and confusion—and that’s not helpful for anyone.

There have been a few noble attempts from academics, tech journalists, other AI companies, and fellow data scientists at simplifying industry concepts and laying some groundwork on key terms for us all to agree on. But I’ve found them either still too marketing-y or so rambling they leave your head spinning. We can do better.

Below are my fluff-free explanations of popular AI terms. I hope you find them helpful.

Artificial Intelligence (AI)
Artificial General Intelligence (AGI) or Strong AI
Artificial Superintelligence (ASI)
Machine Learning (ML)
Supervised Learning
Unsupervised Learning
Reinforcement Learning
Classification
Regression, Interpolation, Extrapolation
Overfitting
Clustering or Cluster Analysis
Neural Network
Deep Learning
Convolutional Neural Network (CNN)
Recurrent Neural Network (RNN)
Computer Vision (CV)
Image Segmentation
Object Detection
Semantic Segmentation
Superpixel Segmentation or Oversegmentation
Natural Language Processing (NLP)
Machine Translation
Named Entity Recognition (NER)
Optical Character Recognition (OCR)
Sentiment Analysis
Natural Language Understanding (NLU)

Artificial Intelligence (AI)

“Artificial intelligence” is actually a somewhat nebulous term. Formally, it is the study of “intelligent agents,” any devices that take in information from the outside world and take some kind of action based on that information to achieve success on some kind of task. For example, a piece of software that reads in digital images and performs facial recognition would be an example of an intelligent agent. However, it’s a very limited example: a system like that, which does one and only one task based on a large corpus of example data, is more often referred to as an example of “machine learning” than “artificial intelligence.”

The distinction between AI and ML is somewhat arbitrary, largely for historical reasons. Early AI researchers in the 1950s believed that they were only a few years away from creating a computer system that could fully mimic human thought—a general-purpose system that could learn to do anything a human could, but with the speed and accuracy of a machine. They were hideously wrong. The problem of creating such an “artificial general intelligence” or “strong AI” remains unsolved to this day, and despite decades of research it still remains purely speculative.

On the other hand, particularly since the 1980s there has been remarkable advancement in “weak AI,” systems designed to perform well on specific, dedicated tasks like our facial recognition example above. These approaches to “weak AI,” driven by large amounts of data and computational statistics and probability, have become widely known as “machine learning” to distinguish them from the previous (and ongoing) work on artificial general intelligence. But the term “artificial intelligence” today can still refer to either the strong or weak versions, making “machine learning” a subset of “artificial intelligence” work.

Artificial General Intelligence (AGI) or Strong AI

A term for a hypothetical computer system able to learn, reason, and solve novel problems as a human can, or possibly even better. Not limited to one specific task, an AGI would be a true machine intelligence, capable of original thought. Nothing like this exists in the world today, and although there is a broad sense that such a thing should be possible, nobody has any idea how to even begin creating such a thing despite decades of research.

Artificial Superintelligence (ASI)

The strongest form of strong AI is the concept of a “superintelligence,” an artificial general intelligence that can not only duplicate human thought, but improve on it. This is the classic AI of science fiction. When you see thinkers speculating on the possible existential threat of artificial intelligence, or the Singularity, artificial superintelligence is what they have in mind. It is important to note, however, that this notion is pure speculation. Although many researchers are convinced that a “vanilla” artificial general intelligence is possible, there is no consensus that superintelligence is even possible, or what the concept would even precisely mean.

Machine Learning (ML)

A subset of artificial intelligence work, machine learning is more narrowly focused on computer systems optimized to perform specific tasks, fed by large amounts of example data to “learn” from, using methods from computational statistics and probability theory. There is no attempt in machine learning to create systems that mimic the workings of human intelligence; rather, these are exercises in applied mathematics. Typically machine learning systems “learn” by performing mathematical optimization to minimize a problem-specific, user-specified measure of error across a large set of training examples for the task in question.

All of the big advances we’ve heard about in practical artificial intelligence in recent years have been examples of machine learning, such as image and speech recognition. There are many different architectures and techniques used in machine learning; neural networks have been especially prominent lately. Machine learning tasks are often categorized as “supervised learning,” “unsupervised learning,” or “reinforcement learning” depending on how they use their input training data.

Supervised Learning

A form of machine learning in which, for every input, there is one correct output that the system is being trained to predict. All the training examples it learns from have to be annotated before training the system with this correct output, by human beings. The system “learns” how to correctly generate outputs from inputs by looking at the human-annotated training data it is fed. Based on the human-labeled training data, the algorithm finds a mathematical way to generalize the patterns in this data and predict what the output ought to be on novel examples that no human has labeled. Classifiers are classic examples of supervised learning.

Unsupervised Learning

A form of machine learning in which there are no pre-existing labels or outputs defined on the input training data, and the system instead “learns” whatever patterns, clusters, or regularities it can extract from the training data. Clustering algorithms are classic examples of unsupervised learning. Another is the Google Brain project of 2012 that was fed millions of frames from YouTube videos without any labeling or annotation, and based on looking for common patterns, learned to recognize cat faces.

Reinforcement Learning

Reinforcement learning is a form of machine learning where the system interacts with a changing, dynamic environment and is presented with (positive and negative) feedback as it takes actions in response to this environment. There is no predefined notion of a “correct” response to a given stimulus, but there are notions of “better” or “worse” ones that can be specified mathematically in some way. Reinforcement learning is often used to train machine learning systems to play video games, or drive cars. The DeepMind system that learned to play Atari video games used reinforcement learning.

Classification

A kind of supervised learning task where the goal is to assign one or more labels to each input from a fixed, pre-defined set. All the examples in the training set must be labeled by humans before the system can be trained. In image classification, for example, the inputs are digital images, and the labels are the names of various objects that appear in these images (“cat”, “car”, “person”, etc.). To train a classifier, we need to not only label our data, but first define the set of labels we will use. The examples for different labels need to be distinguishable, and each label must have a reasonable number of example occurrences in our training set. Classifier training generally works best if the different labels are roughly “balanced,” that is, all have roughly the same number of examples. Popular machine learning systems for classification include neural networks, support vector machines, and random forests.

Regression, Interpolation, Extrapolation

Another kind of supervised learning task. In regression problems we are attempting to approximate some real-valued mathematical function based on a training set of inputs and outputs. For example, if we are studying the mean temperature across some region of the Earth over time, and we have measured this mean temperature at some finite number of times, we can create a regression model of temperature as a function of time based on these data points to predict what the temperature might be between two of our measurements (“interpolation”) or what the temperature might be at future times (“extrapolation”).

Overfitting

A problem that can occur in supervised learning tasks where the system learns patterns in the training data that are too specific or are there only by coincidence, so that it performs extremely well on examples it has been trained on but loses its ability to generalize and performs very poorly on anything new. Overfitting can be caused by an overly-complicated model, a limited training set without enough diversity, or by weaknesses in the training process itself.

Clustering or Cluster Analysis

An unsupervised learning task where the goal is to analyze a large number of examples and group them into “clusters” of examples that are all similar in some way. This is a very general sort of task, and there is never one “right” answer. There isn’t even any universal definition of what a “cluster” is!

Neural Network

A particular kind of algorithm or architecture used in machine learning. Loosely inspired by the structure of the brain, a neural network consists of some number of discrete elements called “artificial neurons” connected to one another in various ways, where the strengths of these connections can be varied to optimize the network’s performance on the task in question. Although inspired by the brain, it is very, very important to keep in mind that artificial neural networks absolutely do not work the same way the human brain does! The similarities are often wildly overstated in the popular press.

Neurons in a neural network are organized into layers, where the output of one layer becomes the input to the next layer, until the final output is produced at the final layer. Neural networks can be “shallow” or “deep,” depending on how many layers they have. The basic “feed-forward” neural network architecture has no memory; It treats every input as an independent event, without consideration for sequence or timing.

Deep Learning

An approach to machine learning using “deep” neural networks, with many layers of neurons between the input and the output. Because of the huge number of connections between neurons in a deep network, deep learning is able to extract some very subtle and complex patterns from large amounts of training data. The current state of the art in both image and speech recognition uses deep learning. Deep learning requires an intensive amount of computation, and so training a deep learning system in a reasonable amount of time generally requires special hardware. Interestingly, the GPUs used for gaming are also well-suited for use in deep learning! The low cost and availability of consumer-grade GPUs was the main reason deep learning has gained such popularity, starting about 2006.

Convolutional Neural Network (CNN)

A special neural network architecture especially useful for processing image and speech data. The difference between a normal feed-forward network and a convolutional network is primarily in the mathematical processing that takes place. Convolutional networks use an operation known as convolution to help correlate features of their input across space or time, making them good at picking out complex, extended features. However, they still treat each input separately, without a memory.

Recurrent Neural Network (RNN)

Neural network architecture that maintains some kind of state or memory from one input example to the next, making it especially well-suited for sequential data like text. That is, the output for a given input depends not just on that singular input, but also on the last several input examples as well. There are many different recurrent architectures, but the most important now is known as the Long Short-Term Memory (LSTM) network. These can be combined with convolutional networks, too.

Computer Vision (CV)

The application of machine learning to tasks involving digital images or video, such as identifying or tracking objects through a video sequence, or segmenting images into distinct objects. Convolutional neural networks are a powerful new tool widely used in computer vision.

Image Segmentation

A kind of CV task where the goal is to take a digital image, and partition up its pixels into regions corresponding to the image contents, such as the distinct objects appearing in the image and their visible boundaries.

Object Detection

A kind of image segmentation task where the goal is to identify one or more distinct foreground objects in a digital image and localize them within the image, typically by generating a bounding box around them.

Semantic Segmentation

A complex form of image segmentation that involves labeling every single pixel in an image with one of potentially hundreds of category labels describing what kind of object the pixel is a part of. For a semantic segmentation task in autonomous driving, the images might be street scenes and the labels might be things like “pavement,” “car,” “pedestrian,” “curb,” “snow,” etc.

Superpixel Segmentation or Oversegmentation

A kind of image segmentation where the goal is not to segment an image into the distinguishable high-level objects, but into smaller, finer “superpixel” regions that are approximately uniform in color and texture. In a good superpixel segmentation, each superpixel is contained entirely within one of the high-level objects, so that superpixels respect the natural object boundaries in the image. Superpixel segmentations are often used as stepping-stones to help build a semantic segmentation, and are often done using unsupervised learning techniques, as in the SLIC superpixel algorithm.

Natural Language Processing (NLP)

The application of AI to tasks involving human language, both written and spoken. NLP tasks can include both computer parsing of input natural language, and computer generation of naturalistic outputs in human language. Recurrent neural networks have become an important tool in this area recently. Chatbots and voice control are applications of NLP.

Machine Translation

Automatic translation between human languages, as in Google Translate or the real-time speech translation available via Skype Translator. This is an application of NLP that has been in the news lately. Historically this has been considered a fiendishly difficult problem to fully solve, but the state of the art in this area has advanced considerably in the last few years using deep recurrent neural networks, trained on large corpora of text that have previously been human-translated.

Named Entity Recognition (NER)

A kind of NLP task where the objective is to take as input sentences in some human language, and automatically identify which words refer to proper nouns (individual persons, places, company or brand names, etc.).

Optical Character Recognition (OCR)

OCR lies at the intersection of CV and NLP: the goal is to take an image containing text, and transcribe out the text. Scanning and transcribing books into digital form and reading off the digits as you deposit a check at an ATM are both OCR tasks.

Sentiment Analysis

Another popular NLP task, sentiment analysis involves taking variably-sized chunks of human-generated text as inputs, and automatically determining whether the views being expressed are overall positive, negative, or neutral. This can be a very subjective and context-specific task, but also one of immediate value to advertisers and marketers. Beyond positive/negative/neutral, other forms of sentiment analysis can involve classifying text as objective vs. subjective, or attempting to classify the emotional state of the author in more detail. One difficulty in accurately evaluating approaches to sentiment analysis is that even human experts don’t always agree about the sentiment of a given piece of text!

Natural Language Understanding (NLU)

A subfield of NLP, natural language understanding is the quest to build machines with true reading comprehension, so that humans can communicate with them in natural human language and the machine can respond appropriately. To qualify as “natural language understanding” rather than just NLP, a system should be very general, and not domain-specific, pushing this more into the realm of hard AI than just ML. IBM’s Watson system, for example, uses ML for text classification, making it an example of NLP, but is said not to be an example of NLU because it doesn’t attempt to “understand” questions, simply proceeding statistically. It is widely believed that perfecting NLU would require a “strong AI” system, but this remains entirely hypothetical.

image credit: Freerange Stock