Q&A 4 How do you inspect the structure and contents of GWAS input files in R?
4.1 Explanation
Before analysis, itβs essential to preview each file to check:
- Row and column dimensions
- Sample ID consistency
- General format of genotype and phenotype data
This ensures everything is aligned before tidying or merging.
4.2 R Code
# Load genotype file (if needed)
ped_data <- read_rds("data/sativas413.rds")
# Check size of each file
dim(ped_data) # Genotype matrix[1] 413 73808
[1] 36901 4
[1] 413 6
[1] 413 38
# Preview first few rows and columns
head(ped_data[, 1:5]) # First 5 genotype columns (show alleles)# A tibble: 6 Γ 5
X1 X2 X3 X4 X5
<chr> <dbl> <dbl> <dbl> <dbl>
1 081215-A05 1 0 0 -9
2 081215-A06 3 0 0 -9
3 081215-A07 4 0 0 -9
4 081215-A08 5 0 0 -9
5 090414-A09 6 0 0 -9
6 090414-A10 7 0 0 -9
# A tibble: 6 Γ 4
chr snp_id gen_dist bp_pos
<dbl> <chr> <dbl> <dbl>
1 1 id1000001 0 13147
2 1 id1000003 0 73192
3 1 id1000005 0 74969
4 1 id1000007 0 75852
5 1 id1000008 0 75953
6 1 id1000011 0 91016
# A tibble: 6 Γ 5
FID IID PID MID sex
<chr> <dbl> <dbl> <dbl> <dbl>
1 081215-A05 1 0 0 -9
2 081215-A06 3 0 0 -9
3 081215-A07 4 0 0 -9
4 081215-A08 5 0 0 -9
5 090414-A09 6 0 0 -9
6 090414-A10 7 0 0 -9
# A tibble: 6 Γ 5
HybID NSFTVID `Flowering time at Arkansas` `Flowering time at Faridpur`
<chr> <dbl> <dbl> <dbl>
1 081215-A05 1 75.1 64
2 081215-A06 3 89.5 66
3 081215-A07 4 94.5 67
4 081215-A08 5 87.5 70
5 090414-A09 6 89.1 73
6 090414-A10 7 105 NA
# βΉ 1 more variable: `Flowering time at Aberdeen` <dbl>
β Takeaway: Use
dim()andhead()to quickly check file structure and confirm that samples and traits align before transforming or analyzing GWAS data.