Skip to content

Latest commit

 

History

History
7 lines (4 loc) · 822 Bytes

README.md

File metadata and controls

7 lines (4 loc) · 822 Bytes

This is my code for an in-class Kaggle Competetion. The competition was modified form a loan-delay-detection competition by erasing the column names and adding random noise as new features.

For preprocessing, I divide the features into categorical and numerical and implement the null values with either most frequent values (for categorical features) or mean values (for numerical features). I then dummy encoded the categorical features and standardize the the numerical features.

For feature engineering, I emplement backward elemination on a fine tuning random forest model. PCA was not quite useful in this case.

My final best model is a two-level stacking model. I used Extra Trees, AdaBoost, Gradient Boost and Random Forest as first level classifier, and used another Random Forest as second level classifier.