Welcome to the UC Davis Data Science Club Projects and Workshops repository! This repository showcases various projects and workshops related to data science that I participated in and co-led during my tenure with the Davis Data Science Club.
I initially participated as a regular member in the credit card fraud project under the leadership of Ru Han Wang. Later, I became an officer in the club, specifically serving as the education lead. In this role, I co-led workshops aimed at educating members about various aspects of data science.
Feel free to explore each project and workshop located in their respective folders to gain insights into the data science techniques and methodologies taught and applied within our club.
-
File:
Credit Card Fraud Project/Credit Card Fraud Project.ipynb
-
Description: Collaborative effort on a credit card fraud detection project, led by Ru Han Wang. The project aimed to identify fraudulent transactions using machine learning techniques.
- Loads and preprocesses transaction data from a pickled DataFrame.
- Handles missing values by imputing with column means to ensure data completeness.
- Visualizes categorical feature distributions ('DeviceType') to analyze their impact on fraud detection.
- Applies a Decision Tree classifier to predict fraudulent transactions and evaluates model performance using accuracy, AUC score, confusion matrix, and classification report.
- Visualizes ROC and Precision-Recall curves to assess classifier performance.
-
File:
Credit Card Fraud Project/Credit Card Fraud Presentation.pptx
-
Description: Presentation slides explaining project findings and methodologies, presented to the club.
- Discusses data visualizations, feature engineering (e.g., frequency encoding), and model accuracy improvements.
-
File:
Workshops/Data Visualization Workshop/Data Visualization Workshop.ipynb
-
Description: Co-led workshop with Apoorva Hooda and Steven Ha on data visualization techniques in R and Python.
- Covers ggplot2, matplotlib, and seaborn for creating effective visualizations.
-
File:
Workshops/Data Visualization Workshop/Data Visualization Workshop.pptx
-
Description: Presentation slides emphasizing the importance of data visualization and demonstrating various plotting libraries.
-
File:
Workshops/ML Project Workshops/ML Project Workshop 1.ipynb
-
Description: First part of a machine learning project focused on fire detection, co-led with Apoorva Hooda and Steven Ha.
- Includes data loading, cleaning, and initial exploratory data analysis.
-
File:
Workshops/ML Project Workshops/ML Project Workshop 1.pptx
-
Description: Presentation slides covering project introduction, environment setup, and exploratory data analysis.
-
File:
Workshops/ML Project Workshops/ML Project Workshop 2.ipynb
-
Description: Second part of the fire detection machine learning project, continuing from Workshop 1. Co-led with Apoorva Hooda and Steven Ha.
- Involves train/test splitting, comparison of machine learning algorithms, and evaluation of model accuracy.
-
File:
Workshops/ML Project Workshops/ML Project Workshop 2.pptx
-
Description: Presentation slides discussing types of machine learning algorithms, model evaluation metrics, and techniques for improving models.
-
File:
Workshops/R Workshop/R Workshop.ipynb
-
Description: Introduction to R workshop, co-led with Cindy Chen and Aditya Seth.
- Covers R basics, data exploration, cleansing, manipulation, and ggplot2 for visualization.
-
File:
Workshops/R Workshop/R Workshop.pptx
-
Description: Presentation slides providing an overview of R, code walkthroughs, and introduction to ggplot2.
-
File:
Workshops/SQL Workshops/SQL Workshop.pptx
-
Description: Introduction to SQL workshop, co-led with Cindy Chen and Aditya Seth.
- Discusses SQL workspace setup, basic queries, data access, modification, and joins.
-
File:
Workshops/SQL Workshops/SQL Workshop 2.pptx
-
Description: Advanced SQL workshop, co-led with Apoorva Hooda and Steven Ha.
- Covers review of SQL basics, continued exploration of joins, and introduction to analytic functions.