This repository contains our final project for the Data Science course at UABC's Business Intelligence bachelor's degree program. The project analyzes startup survival patterns based on funding, industry sectors, and other key indicators using data from the "Big Startup Success/Fail Dataset from Crunchbase" available on Kaggle.
Using statistical analysis and machine learning techniques, we investigated:
- The relationship between funding thresholds and startup survival
- Industry sector impact on success rates
- The influence of funding rounds on company longevity
Project includes data preprocessing, exploratory analysis, statistical inference, and machine learning models (logistic regression and K-means clustering) implemented in Python.
The dataset used in this project is available at: https://www.kaggle.com/datasets/yanmaksi/big-startup-secsees-fail-dataset-from-crunchbase
- Pedro David Guevara Rodríguez
- Brandon Alan López Ruiz
- David Godina Ramos
My co-authors are welcome to use this repository to showcase their work to others. Feel free to fork, share, or reference this project as needed.
Special thanks to Dr. Karina Caro for her supervision and guidance throughout this course. Her teaching and mentorship were instrumental in the successful completion of this data science project.
Feel free to use this repository to explore our analysis and findings. The full draft paper (in Spanish) is included in this repository.