A Reproducibility and Reference Material
A.1 Purpose of this appendix
This appendix collects supporting material that does not belong in the main lesson flow but is essential for good GWAS practice.
The main lessons focus on concepts and workflows.
This appendix focuses on reproducibility, conventions, and references that learners can return to as needed.
A.2 Reproducibility principles
All analyses in this guide aim to be:
- reproducible on a fresh system
- explicit about inputs and assumptions
- transparent about decisions that affect results
Key practices include:
- fixed random seeds where applicable
- explicit software versions
- saving intermediate results when appropriate
- separating exploratory work from reported results
A.3 Software environment
A typical environment for this guide includes:
- Python (3.10+)
- NumPy, pandas
- matplotlib
- statsmodels
- domain-specific tools introduced later in the guide
Exact versions may evolve over time.
When results matter, always record the environment used.
A.4 Data conventions used in this guide
Throughout the guide, we use consistent conventions:
- samples are identified by a unique
sample_id - genotypes are coded additively as 0, 1, or 2
- missing values are represented explicitly
- phenotypes and covariates are stored in tidy tables
These conventions make it easier to reason about models and results.
A.5 On thresholds and defaults
Many GWAS steps involve thresholds:
- missingness cutoffs
- minor allele frequency filters
- significance thresholds
Defaults shown in examples are illustrative, not universal.
Always consider:
- study design
- sample size
- population structure
- downstream goals
There is no single correct set of thresholds.
A.6 Reporting GWAS results
When reporting GWAS findings:
- distinguish statistical significance from biological relevance
- report effect sizes and uncertainty, not only p values
- describe quality control and model choices clearly
- avoid overinterpretation of single-study signals
Clear reporting is as important as correct analysis.