Skip to content

Applied supervised learning techniques on data collected for the U.S. census to help CharityML (a fictitious charity organization) identify people most likely to donate to their cause.

Notifications You must be signed in to change notification settings

sanjeevai/Finding_Donors_For_CharityML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Machine Learning Engineer Nanodegree

Supervised Learning

Project: Finding Donors for Charity ML

Project Overview

This is the first project in ML Basics Nanodegree and Data Scientist Nanodegree programs. In this project, we applied supervised learning techniques on data collected for the U.S. census to help CharityML (a fictitious charity organization) identify people most likely to donate to their cause.

We first explored the data to learn how the census data is recorded. Next, we applied a series of transformations and preprocessing techniques to manipulate the data into a workable format. We then evaluated several supervised learners of our choice on the data, and considered which was best suited for the solution. Afterwards, I did optimize the model. Finally, I explore the chosen model and its predictions under the hood, to see just how well it's performing when considering the data it's given.

Project Highlights

This project is designed to get you acquainted with the many supervised learning algorithms available in sklearn, and to also provide for a method of evaluating just how each model works and performs on a certain type of data.

It is important in machine learning to understand exactly when and where a certain algorithm should be used, and when one should be avoided. Things you will learn by completing this project:

  • How to identify when preprocessing is needed, and how to apply it.
  • How to establish a benchmark for a solution to the problem.
  • What each of several supervised learning algorithms accomplishes given a specific dataset.
  • How to investigate whether a candidate solution model is adequate for the problem.

Description

CharityML is a fictitious charity organization located in the heart of Silicon Valley that was established to provide financial support for people eager to learn machine learning. After nearly 32,000 letters sent to people in the community, CharityML determined that every donation they received came from someone that was making more than $50,000 annually.

To expand their potential donor base, CharityML has decided to send letters to residents of California, but to only those most likely to donate to the charity. With nearly 15 million working Californians, CharityML has brought us on board to help build an algorithm to best identify potential donors and reduce overhead cost of sending mail. Your goal will be evaluate and optimize several different supervised learners to determine which algorithm will provide the highest donation yield while also reducing the total number of letters being sent.

Software and Libraries

This project uses the following software and Python libraries:

  • Python 2.7
  • NumPy
  • pandas
  • scikit-learn (v0.17)
  • matplotlib

About

Applied supervised learning techniques on data collected for the U.S. census to help CharityML (a fictitious charity organization) identify people most likely to donate to their cause.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published