Lesson 1 Introduction to GWAS

1.1 What is a genome-wide association study?

A genome-wide association study (GWAS) is a statistical framework used to identify associations between genetic variants and traits of interest across the genome.

In a typical GWAS, millions of genetic variants (usually single-nucleotide polymorphisms, SNPs) are tested individually for association with a phenotype, such as:

  • a disease status (case–control)
  • a quantitative trait (e.g. height, blood pressure)
  • a molecular or clinical measurement

The output of a GWAS is not a diagnosis or a causal claim, but a set of statistical associations that must be interpreted carefully.

1.2 What GWAS can — and cannot — tell you

1.2.1 What GWAS can do

GWAS can:

  • identify genomic regions associated with a trait
  • highlight variants that are statistically enriched among cases or correlated with trait values
  • generate hypotheses about biological mechanisms
  • guide downstream analyses (fine-mapping, functional follow-up, polygenic scores)

1.2.2 What GWAS cannot do

GWAS alone cannot:

  • prove causality
  • pinpoint the exact causal variant in most cases
  • explain the full genetic architecture of complex traits
  • replace biological validation or experimental follow-up

1.3 The core idea behind GWAS

At its heart, GWAS asks a simple question repeatedly:

Is this genetic variant associated with the trait, after accounting for known confounders?

This question is asked variant by variant, across the genome, using statistical models that compare genotype values, phenotype values, and covariates.

Because this question is asked millions of times, issues such as multiple testing, population structure, and relatedness become central concerns rather than technical details.

1.4 GWAS as a workflow, not a single test

GWAS is a multi-stage workflow, typically involving:

  1. Study design and phenotype definition
  2. Genotype data preparation and formatting
  3. Quality control (samples and variants)
  4. Assessment of population structure and relatedness
  5. Association testing with appropriate models
  6. Interpretation, visualization, and reporting

Each stage influences the validity of the final results.

1.5 Why GWAS requires careful reasoning

GWAS operates in a setting where effect sizes are often small, confounders are subtle, and false positives are easy to generate.

Success depends less on memorizing commands and more on understanding assumptions, making defensible decisions, and interpreting results conservatively.

1.6 Key takeaways

  • GWAS identifies associations, not causes
  • The analysis is a pipeline, not a single test
  • Early decisions shape downstream validity
  • Interpretation is as important as computation