Reproducibility is a quality standard, not an optional extra. A reproducible workflow lets you rerun the full project, verify results, and update outputs without manual reconstruction.
This checklist is designed for applied research projects using survey, administrative, or mixed data sources.
1. Lock the Project Structure
Create a clear folder layout at the start:
raw_data/data/clean/scripts/outputs/docs/
Never edit raw files directly. All transformations should happen through scripts.
2. Make the Pipeline Executable End-to-End
Your project should run in a predictable sequence:
- ingest and clean
- construct analysis dataset
- estimate models
- export tables and figures
One command (or one clearly documented sequence) should rebuild key outputs.
3. Keep Steps Modular and Named Clearly
Use numbered scripts for readability and debugging:
01_clean02_construct03_analysis04_export
Small modular scripts are easier to review and less fragile than one large script.
4. Version Control Code and Metadata
Track in Git:
- scripts
- codebooks and variable maps
- output templates
- documentation
Do not commit restricted data or direct identifiers.
5. Record Assumptions Explicitly
For each major result, document:
- outcome definition
- key explanatory variables
- exclusion rules
- model specification rationale
If assumptions are not documented, interpretation becomes hard to defend.
6. Add Validation Gates Before Modeling
Run checks before estimation:
- missingness summaries
- duplicate IDs
- invalid ranges
- merge diagnostics
- key balance checks
Early validation catches structural problems that can invalidate later results.
7. Freeze Deliverables for Each Release
When producing a final version:
- tag the commit
- archive exact output files
- log software versions and package dependencies
- record run date and environment notes
This prevents “drift” between draft and final outputs.
8. Write Minimal but Complete Documentation
A strong README.md should explain:
- project objective
- data inputs
- run instructions
- expected outputs
- contact point or maintainer
Short, accurate documentation is better than long, outdated documentation.
9. Build for Handover
Assume someone else must run your code next month:
- remove local machine-specific paths
- avoid hidden manual steps
- keep parameter choices centralized
A project is not fully reproducible until another person can run it successfully.
Final Test
Ask one direct question: can the main table or figure be regenerated from scripts and documentation alone? If not, the workflow needs another revision.
Reproducibility improves accuracy, collaboration, and policy credibility.