This project aims to predict housing prices based on various features using a Decision Tree Regressor. By analyzing real estate data, we can provide price estimates for houses based on input parameters such as bedroom ratio, population level, and median income.
Pandas: For data manipulation and analysis.
NumPy: For numerical computations.
Scikit-learn: For implementing the Decision Tree Regressor.
Flask: For creating the web application.
Matplotlib & Seaborn: For creating visualizations during the data exploration phase.
Loading the housing data from housing.csv.
Handling missing values by dropping rows with missing data.
Applying log transformations to skewed features like total_rooms, total_bedrooms, population, and households.
Converting categorical feature (ocean_proximity) to binary using one-hot encoding.
Splitting the data into training and testing sets.
Plotting histograms to visualize the distribution of various features.
Using heatmaps to visualize correlations between features and the target variable (median_house_value).
Applying log transformations to reduce skewness in features.
Creating new features such as bedroom_ratio (total_bedrooms / total_rooms) and household_rooms (total_rooms / households).
Training a Decision Tree Regressor on the preprocessed data.
Creating a Flask web application with an input form to collect user inputs.
Making predictions based on user inputs and displaying the predicted house price.
Python 3.x Required Python libraries: pandas, numpy, scikit-learn, flask, matplotlib, seaborn
Installation
bash
cd Real-Estate-Price-Prediction
bash
python app.py
Open a web browser and go to http address to access the application.
Enter the required input parameters and submit the form to get the predicted house price.
Implement additional machine learning models for improved predictions.
Enhance the web application with more interactive features and visualizations.
Integrate more detailed data for better model accuracy.
The model might be prone to overfitiing, investigate the use of L1 or L2 regularization.
skewed graph: corrected graph: Blue is less expensive, consider the blank sector below red as the ocean: Heatmap after combining bedroom features: