This is the code repository for Mastering Machine Learning with scikit-learn - Second Edition, published by Packt. It contains all the supporting project files necessary to work through the book from start to finish.
This book examines a variety of machine learning models including k-nearest neighbors, logistic regression, naive Bayes, k-means, decision trees, and artificial neural networks. It discusses data preprocessing, hyperparameter optimization, and ensemble methods. You will build systems that classify documents, recognize images, detect ads, and more. You will learn to use scikit-learn’s API to extract features from categorical variables, text and images; evaluate model performance; and develop an intuition for how to improve your model’s performance.
All of the code is organized into folders. Each folder starts with a number followed by the application name. For example, Chapter02.
The code will look like the following:
Code words in text, database table names, folder names, filenames, file extensions,
pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "The
package is named sklearn because scikit-learn is not a valid Python package name."
# In[1]:
import sklearn
sklearn.__version__
# Out[1]:
'0.18.1'
The examples in this book require Python >= 2.7 or >= 3.3 and pip, the PyPA recommended tool for installing Python packages. The examples are intended to be executed in a Jupyter notebook or an IPython interpreter. Chapter 1, The Fundamentals of Machine Learning shows how to install scikit-learn 0.18.1, its dependencies, and other libraries on Ubuntu, Mac OS, and Windows.