Lesson 2 GWAS Study Design and Traits

2.1 Why study design matters

In GWAS, study design decisions shape everything that follows.
Poor design cannot be fixed later with better models or stricter thresholds.

Before any genotypes are analyzed, you must be clear about:

  • the trait you are studying
  • how that trait is measured
  • who is included in the study
  • which variables may confound the association

GWAS works best when these questions are addressed explicitly and early.

2.2 Types of traits in GWAS

Traits analyzed in GWAS generally fall into two broad categories.

2.2.1 Binary traits

Binary traits describe the presence or absence of a condition.

Examples include:

  • disease status (case vs control)
  • treatment response (responder vs non-responder)

Binary traits are commonly analyzed using logistic regression models.

2.2.2 Quantitative traits

Quantitative traits take continuous values.

Examples include:

  • height
  • blood pressure
  • gene expression levels

These traits are typically analyzed using linear regression models.

2.3 Phenotype definition

A phenotype is not just a column in a table.
It is an operational definition of the biological or clinical concept you want to study.

Good phenotype definitions are: - precise - reproducible - consistent across samples

Ambiguous or noisy phenotypes reduce power and increase false discoveries.

2.4 Case control studies

In case control designs, individuals are grouped based on disease status.

Key considerations include:

  • how cases are defined
  • how controls are selected
  • whether controls are representative of the same population

Imbalances between cases and controls can introduce bias that appears as genetic association.

2.5 Quantitative trait studies

For quantitative traits, attention should be paid to:

  • measurement scale
  • outliers
  • transformation or normalization

Trait distributions that strongly deviate from model assumptions can distort association results.

2.6 Covariates and confounders

Covariates are variables included in the model to control for non-genetic effects.

Common covariates include:

  • age
  • sex
  • batch or study site
  • ancestry components

Failing to include relevant covariates can lead to spurious associations. Including unnecessary covariates can reduce power.

2.7 Sample size and power

GWAS typically detects variants with small effect sizes.
Adequate sample size is therefore critical.

Power depends on:

  • sample size
  • allele frequency
  • effect size
  • significance threshold

Underpowered studies are prone to false negatives and unstable results.

2.8 Key takeaways

  • Study design decisions precede all analysis steps
  • Trait definition affects power and interpretability
  • Binary and quantitative traits require different models
  • Covariates help control confounding but must be chosen carefully
  • Sample size strongly influences GWAS success