Skip to content

simtsc/categorical-data-in-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Categorical Data Analysis in Python

Chapter 1 - Understand categorical data

  • Lesson 1.1 - Introduction to variables
    • A learning objective: Learn about variable types: categorical (nominal, ordinal), interval, ratio.
  • Lesson 1.2 - Summary Statistics
    • A learning objective: Frequencies, proportions, data types
  • Lesson 1.3 - Visual exploration
    • A learning objective: Visualize data via bar chart, pairplot (seaborn).

Chapter 2 - Taking a closer look

  • Lesson 2.1 - Contigency tables
    • A learning objective: create a contigency table in pandas, collapse larger groups into smaller (['baby', 'toddler', 'child', 'adolescent', 'young adult', 'adult', 'senior'] -> ['young', 'old'])
  • Lesson 2.2 - Measures of Agreement
    • A learning objective: Cohen's Kappa; Use statsmodels.stats.inter_rater.cohens_kappa or implement function
  • Lesson 2.3 - Correlation
    • A learning objective: Use Point-Biserial Correlation Coefficient and Phi Correlation Coefficient to understand relationships between one binary categorical and numerical variables and between multiple categorical binary variables respectively. Use Pearson's rank-order coefficient and Kendall's Tau for ordinal variables. Use scipy.stats.pointbiserialr, scipy.stats.pearsonr, scipy.stats.kendalltau. For Phi either create function or use sklearn.metrics.matthews_corrcoef.

Chapter 3 - Hypothesis testing

  • Lesson 3.1 - Chi-Square Distribution/ Pearson's Chi-Square Test
    • A learning objective: Learn about the distribution, calculate critical values, perform 3 flavours of Chi-Square tests: test for independence, test for equality of properties, test of goodness of fit; use scipy.stats.chisquare and scipy.stats.chi2_contingency
  • Lesson 3.2 - Fisher's Exact Test
    • A learning objective: use scipy.stats.fisher_exact
  • Lesson 3.3 - ANOVA
    • A learning objective: use scipy.stats.f_oneway

Chapter 4 - Use case Simpson's Paradox

  • Lesson 4.1 - Problem description
    • A learning objective: Get data from CSV, take a quick look at the data, create categories
  • Lesson 4.2 - Understand and test data
    • A learning objective: Test for correlation and significance, combine several groups, create visualizations
  • Lesson 4.3 - Draw conclusion
    • A learning objective: Observe and understand Simpson's paradox: reversal of trend in combined group vs. looking at groups individually

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages