This project involves cleaning and analyzing employee attendance data using Excel and Python. The goal is to standardize data formats, perform ad hoc analysis, and generate insights to understand employee attendance patterns.
-
Date Standardization: Converted date values to the format YYYY-MM-DD and extracted month names and day types.
-
Employee ID Cleanup: Removed extra characters from employee IDs, such as '@'.
-
Name Standardization: Capitalized names to title case.
-
Status Mapping: Mapped status values to abbreviations:
-
Work From Office (WFO)
-
Work From Home (WFH)
-
Birthday Leave (BL)
-
Menstrual Leave (ML)
-
Paid Leave (PL)
-
Sick Leave (SL)
-
Weekly Off (WO)
- Duplicate Removal: Identified and removed duplicate records.
-
Distinct Employee Count: Calculated the total count of distinct employee names.
-
WFH Percentage in May: Computed the work-from-home (WFH %) percentage for May.
-
Attendance Analysis for June: Identified the day of the week with the highest attendance percentage in June.
-
WFH Percentage in April: Determined the number of employees with a WFH percentage greater than 10% in April.
-
Excel: Data cleaning and standardization.
-
Python: Ad hoc analysis and computations.
- Data Preparation:
- Load the dataset into Excel for cleaning and standardization.
- Ad Hoc Analysis:
- Use Python scripts to perform the specified analyses and generate insights.
- Review Results:
- Interpret the results to understand attendance patterns and make data-driven decisions.
This project demonstrates effective data cleaning and analysis techniques to provide actionable insights into employee attendance patterns.