Q&A 3 How do you efficiently load and store GWAS data files in R?

3.1 Explanation

GWAS datasets can be large, especially the genotype file in .ped format, which stores two columns per SNP. Repeatedly reading this file is time-consuming, so it’s best to:

Load it once using readr::read_table() for speed and consistency.
Save the loaded object as a compressed .rds file for fast future access.
Load supporting metadata files (.map, .fam) and phenotype data using consistent column names.

All files are stored in the data/ directory and prefixed with sativa413_.

3.2 R Code

library(tidyverse)

# Step 1: Load genotype data using readr
ped_data <- read_table("data/sativas413.ped", col_names = FALSE, show_col_types = FALSE)

# Step 2: Save as compressed RDS file
write_rds(ped_data, file = "data/sativas413.rds")

# Step 3: Load metadata files
map_data <- read_table("data/sativas413.map", 
                       col_names = c("chr", "snp_id", "gen_dist", "bp_pos"), 
                       show_col_types = FALSE)

fam_data <- read_table("data/sativas413.fam", 
                       col_names = c("FID", "IID", "PID", "MID", "sex", "phenotype"), 
                       show_col_types = FALSE)

phenotype_data <- read_tsv("data/sativas413_phenotypes.txt", show_col_types = FALSE)

✅ Takeaway: For large genotype files, load once using readr, save as .rds, and always use clear column names when importing metadata and phenotype files to streamline analysis and ensure reproducibility.