The Machine Learning Dictionary

Here are some definitions for commonly used terms/technologies in machine learning. I'll try to update and improve this page with new entries over time.

Apache Spark A library for distributed computing for large-scale data manipulation and machine learning.
Artificial Neural Networks Machine learning algorithms inspired by biological neural networks.
Backpropagation An algorithm for training neural networks in which errors are propagated backwards through the network.
Big Data Data which is difficult to work upon using a single machine, typically in the order of terabytes or more. It can also mean machine learning and other types of analyses on data of this scale.
Classification A machine learning problem involving the prediction of two or more classes from an observation.
Clustering The process of grouping observations that are similar according to a particular criterion
Cython A Python-like language uses to give C-like performance to Python.
Cross Validation A method for evaluating the performance of a learning algorithm. Particularly useful for small datasets.
Data Science A field covering machine learning, data cleaning and preparation, and data analysis techniques such as visualisation.
Deep Learning A class of machine learning algorithms which use artificial neural networks with many layers.
Face Detection The problem of determining whether a face contains an image.
Face Recognition The problem of identifying a face in an image.
Feature Extraction The process of finding relevant features in a set of data.
Gradient Descent An optimisation method which can find a minimum of a function by following the gradient.
Hyperparameter A user-defined parameter in a machine learning algorithm.
k-nearest Neighbours An algorithm which makes a prediction based on the k-nearest observations.
Kaggle A data science competition.
Linear Algebra A field of mathematics concerning linear mappings between vector spaces. Essential to machine learning.
Machine Learning Algorithms which improve their performance with experience. A computational branch of statistics.
Model Selection The process of choosing hyperparameters for a machine learning algorithm
Natural Language Processing A field of computer science concerned with the analysis of natural (human) languages.
Numpy A Python array/matrix library.
OpenCV A computer vision library in C++ with bindings for Python.
Optimisation The branch of mathematics concerned with finding the minimum or maximum of a function. Essential to many machine learning algorithms.
Pandas The Python Data Analysis library.
Principal Components Analysis A classic feature extraction algorithm based on prediction into a subspace.
Python A high-level programming language, popular for machine learning applications.
Regression A machine learning problem involving the prediction of a real-valued scalar or vector.
Singular Value Decomposition A well-known matrix factorisation method.
Scikit-learn A library for Machine Learning in Python.
Scipy A Python library for scientific computing.
Statistics A branch of mathematics concerned with finding useful patterns in data.
Stochastic Gradient Descent A fast numerical optimisation algorithm commonly used in deep learning algorithms.
Tensor A multidimensional array.
Tensorflow A deep learning library developed by Google.
Test Set A set of examples/observations used for evaluating the prediction performance of an algorithm.
Theano A tensor manipulation library for Python which can run code on the GPU.
Training Set A set of examples/observations used for training a machine learning algorithm.
Validation Set A set of examples/observations used for tuning the parameters of an algorithm whilst training.