Course Projects

Course projects provide hands-on experience applying the methods and tools covered in this course to real-world problems. Students work on either replication projects or Kaggle competitions, developing practical skills in causal inference, data analysis, and reproducible research. All projects use the repository template which provides the standard project structure, GitHub Actions configuration, and setup instructions.

Project Types

Replication Projects: Students reproduce key results from published research articles, critically assess the quality of the original analysis, and contribute independent extensions such as robustness checks or alternative specifications.

Kaggle Projects: Students participate in data science competitions on Kaggle, document the competition context and evaluation metrics, and develop solution strategies through iterative experimentation and parameter tuning.

Example Project

Lindo et al. (2010): Academic Probation and Student Outcomes

This replication project by Annica Gehlen replicates Lindo et al. [LSO10], examining the effects of academic probation on student outcomes using a regression discontinuity design. The analysis demonstrates how negative incentives influence performance and dropout decisions at a large Canadian university.

Frequently Asked Questions

Why are the projects public? Transparency and reproducibility are core values in research and we want to learn from each other.

What does the typical workflow look like? See the GitHub Workflow guide for the standard workflow from initial setup through submission, including branching, pull requests, and reproducibility requirements.

What is the scope of replication? For replication projects, focus on reproducing the core results and main findings of the original paper. You do not need to replicate every table, figure, or robustness check—prioritize the central analyses that support the paper’s key conclusions.

Where can I find research data? Some journals offer data supplements on their websites. Useful compilations include the Harvard Dataverse, MDRC Public-Use Files, UC Irvine Machine Learning Repository, and Google Dataset Search.

Do we get to present our projects at the end of the course? Yes, if you would like feedback on your project, make sure to reach out.