Lesson 2 GWAS Study Design and Traits
2.1 Why study design matters
In GWAS, study design decisions shape everything that follows.
Poor design cannot be fixed later with better models or stricter thresholds.
Before any genotypes are analyzed, you must be clear about:
- the trait you are studying
- how that trait is measured
- who is included in the study
- which variables may confound the association
GWAS works best when these questions are addressed explicitly and early.
2.2 Types of traits in GWAS
Traits analyzed in GWAS generally fall into two broad categories.
2.3 Phenotype definition
A phenotype is not just a column in a table.
It is an operational definition of the biological or clinical concept you want to study.
Good phenotype definitions are: - precise - reproducible - consistent across samples
Ambiguous or noisy phenotypes reduce power and increase false discoveries.
2.4 Case control studies
In case control designs, individuals are grouped based on disease status.
Key considerations include:
- how cases are defined
- how controls are selected
- whether controls are representative of the same population
Imbalances between cases and controls can introduce bias that appears as genetic association.
2.5 Quantitative trait studies
For quantitative traits, attention should be paid to:
- measurement scale
- outliers
- transformation or normalization
Trait distributions that strongly deviate from model assumptions can distort association results.
2.6 Covariates and confounders
Covariates are variables included in the model to control for non-genetic effects.
Common covariates include:
- age
- sex
- batch or study site
- ancestry components
Failing to include relevant covariates can lead to spurious associations. Including unnecessary covariates can reduce power.
2.7 Sample size and power
GWAS typically detects variants with small effect sizes.
Adequate sample size is therefore critical.
Power depends on:
- sample size
- allele frequency
- effect size
- significance threshold
Underpowered studies are prone to false negatives and unstable results.
2.8 Key takeaways
- Study design decisions precede all analysis steps
- Trait definition affects power and interpretability
- Binary and quantitative traits require different models
- Covariates help control confounding but must be chosen carefully
- Sample size strongly influences GWAS success
Continue to → Lesson 03: Genotype and Phenotype Data Structures