Q&A 3 How do you efficiently load and store GWAS data files in R?
3.1 Explanation
GWAS datasets can be large, especially the genotype file in .ped format, which stores two columns per SNP. Repeatedly reading this file is time-consuming, so itβs best to:
- Load it once using
readr::read_table()for speed and consistency. - Save the loaded object as a compressed
.rdsfile for fast future access. - Load supporting metadata files (
.map,.fam) and phenotype data using consistent column names.
All files are stored in the data/ directory and prefixed with sativa413_.
3.2 R Code
library(tidyverse)
# Step 1: Load genotype data using readr
ped_data <- read_table("data/sativas413.ped", col_names = FALSE, show_col_types = FALSE)
# Step 2: Save as compressed RDS file
write_rds(ped_data, file = "data/sativas413.rds")
# Step 3: Load metadata files
map_data <- read_table("data/sativas413.map",
col_names = c("chr", "snp_id", "gen_dist", "bp_pos"),
show_col_types = FALSE)
fam_data <- read_table("data/sativas413.fam",
col_names = c("FID", "IID", "PID", "MID", "sex", "phenotype"),
show_col_types = FALSE)
phenotype_data <- read_tsv("data/sativas413_phenotypes.txt", show_col_types = FALSE)β Takeaway: For large genotype files, load once using
readr, save as.rds, and always use clear column names when importing metadata and phenotype files to streamline analysis and ensure reproducibility.