A Reproducible Research Checklist for Applied Social Science

Reproducibility is a quality standard, not an optional extra. A reproducible workflow lets you rerun the full project, verify results, and update outputs without manual reconstruction.

This checklist is designed for applied research projects using survey, administrative, or mixed data sources.

1. Lock the Project Structure

Create a clear folder layout at the start:

raw_data/
data/clean/
scripts/
outputs/
docs/

Never edit raw files directly. All transformations should happen through scripts.

2. Make the Pipeline Executable End-to-End

Your project should run in a predictable sequence:

ingest and clean
construct analysis dataset
estimate models
export tables and figures

One command (or one clearly documented sequence) should rebuild key outputs.

3. Keep Steps Modular and Named Clearly

Use numbered scripts for readability and debugging:

01_clean
02_construct
03_analysis
04_export

Small modular scripts are easier to review and less fragile than one large script.

4. Version Control Code and Metadata

Track in Git:

scripts
codebooks and variable maps
output templates
documentation

Do not commit restricted data or direct identifiers.

5. Record Assumptions Explicitly

For each major result, document:

outcome definition
key explanatory variables
exclusion rules
model specification rationale

If assumptions are not documented, interpretation becomes hard to defend.

6. Add Validation Gates Before Modeling

Run checks before estimation:

missingness summaries
duplicate IDs
invalid ranges
merge diagnostics
key balance checks

Early validation catches structural problems that can invalidate later results.

7. Freeze Deliverables for Each Release

When producing a final version:

tag the commit
archive exact output files
log software versions and package dependencies
record run date and environment notes

This prevents “drift” between draft and final outputs.

8. Write Minimal but Complete Documentation

A strong README.md should explain:

project objective
data inputs
run instructions
expected outputs
contact point or maintainer

Short, accurate documentation is better than long, outdated documentation.

9. Build for Handover

Assume someone else must run your code next month:

remove local machine-specific paths
avoid hidden manual steps
keep parameter choices centralized

A project is not fully reproducible until another person can run it successfully.

Final Test

Ask one direct question: can the main table or figure be regenerated from scripts and documentation alone? If not, the workflow needs another revision.

Reproducibility improves accuracy, collaboration, and policy credibility.