Reproducibility is a quality standard, not an optional extra. A reproducible workflow lets you rerun the full project, verify results, and update outputs without manual reconstruction.

This checklist is designed for applied research projects using survey, administrative, or mixed data sources.

1. Lock the Project Structure

Create a clear folder layout at the start:

  • raw_data/
  • data/clean/
  • scripts/
  • outputs/
  • docs/

Never edit raw files directly. All transformations should happen through scripts.

2. Make the Pipeline Executable End-to-End

Your project should run in a predictable sequence:

  1. ingest and clean
  2. construct analysis dataset
  3. estimate models
  4. export tables and figures

One command (or one clearly documented sequence) should rebuild key outputs.

3. Keep Steps Modular and Named Clearly

Use numbered scripts for readability and debugging:

  • 01_clean
  • 02_construct
  • 03_analysis
  • 04_export

Small modular scripts are easier to review and less fragile than one large script.

4. Version Control Code and Metadata

Track in Git:

  • scripts
  • codebooks and variable maps
  • output templates
  • documentation

Do not commit restricted data or direct identifiers.

5. Record Assumptions Explicitly

For each major result, document:

  • outcome definition
  • key explanatory variables
  • exclusion rules
  • model specification rationale

If assumptions are not documented, interpretation becomes hard to defend.

6. Add Validation Gates Before Modeling

Run checks before estimation:

  • missingness summaries
  • duplicate IDs
  • invalid ranges
  • merge diagnostics
  • key balance checks

Early validation catches structural problems that can invalidate later results.

7. Freeze Deliverables for Each Release

When producing a final version:

  • tag the commit
  • archive exact output files
  • log software versions and package dependencies
  • record run date and environment notes

This prevents “drift” between draft and final outputs.

8. Write Minimal but Complete Documentation

A strong README.md should explain:

  • project objective
  • data inputs
  • run instructions
  • expected outputs
  • contact point or maintainer

Short, accurate documentation is better than long, outdated documentation.

9. Build for Handover

Assume someone else must run your code next month:

  • remove local machine-specific paths
  • avoid hidden manual steps
  • keep parameter choices centralized

A project is not fully reproducible until another person can run it successfully.

Final Test

Ask one direct question: can the main table or figure be regenerated from scripts and documentation alone? If not, the workflow needs another revision.

Reproducibility improves accuracy, collaboration, and policy credibility.