Omitir los comandos de cinta
Saltar al contenido principal
Inicio de sesión
Universidad EAFIT
Carrera 49 # 7 sur -50 Medellín Antioquia Colombia
Carrera 12 # 96-23, oficina 304 Bogotá Cundinamarca Colombia
(57)(4) 2619500 contacto@eafit.edu.co

“The field of Machine Learning seeks to answer the question: How can we build computer systems that automatically improve with experience, and what are the fundamental laws that govern all learning processes?” (Tom M. Mitchell (2006)

​​​As can be seen by the above definition Machine Learning is a broad field, it encompasses a variety of statistical techniques, computer science algorithms and heuristics.

All of these try, in one way or another to answer that question. Within the Machine Learning field there are two more specific areas that will be the interest of this work:


  • ​Classification tasks were we have features or measurements as inputs to our model and produce a class or category that these belong to. A simple example would be to have a model tell you when an apple is ripe or not (categories) based on its weight, color and density (measurements).​
  • ​​Pattern recognition methods are the ones were the focus consists in finding patterns or regularities in data. An example is finding unusual transactions in bank accounts.
T​​hese fields have a great number of applications in many areas (engineering, medicine, biology, psychology, economics, etc.) and they consist of several methods and techniques with publications in the area growing from around 152 publications in 1988 to 8494 in 2013 (source: Scopus). Clearly the area its current and rapidly growing.

The general outline in a task of this nature consists of three usual steps: The first and most crucial is feature generation and feature selection, where you preprocess the raw data to extract the features that your classifier will use. Its objective is to get features that clearly differentiate your classes and bring out the patterns in the data.

The second step is selecting a classifier suitable for the task, that is given the characteristics of my features and my goal which are the methods best suited for the task. Finally we have the training and validation step where the method will learn from the data and then its performance will be evaluated. These steps can be summed up in three questions: which will my inputs be?, what method should I use?, how does the method perform?

​The questions above are not always easy to answer and lots of work can be put into each one, none the less they will be addressed throughout the following chapters.

While machine learning has lots of methods, not all of them perform well for all problems so it is important to establish which of them work well for a given case. The methods that will be explored here are: Linear Discriminant Analysis (LDA), Perceptron, Support Vector Machine (SVM), Naive Bayesian Classifier, Bayesian Classifier, Gaussian Mixture Model (GMM), Artificial Neural Networks (ANN), K-means clustering, Fuzzy C-means and Classification Trees. While there will be a brief introduction to each of these methods the goal is not to dwell on each technique but rather to contextualize them with three main applications in speech processing, biology and business strategies.

Última modificación: 01/11/2016 9:51