Q&A 4 How do you inspect the structure and contents of GWAS input files in R?

4.1 Explanation

Before analysis, it’s essential to preview each file to check:

  • Row and column dimensions
  • Sample ID consistency
  • General format of genotype and phenotype data

This ensures everything is aligned before tidying or merging.

4.2 R Code

# Load genotype file (if needed)
ped_data <- read_rds("data/sativas413.rds")

# Check size of each file
dim(ped_data)         # Genotype matrix
[1]   413 73808
dim(map_data)         # SNP metadata
[1] 36901     4
dim(fam_data)         # Sample info
[1] 413   6
dim(phenotype_data)   # Trait info
[1] 413  38
# Preview first few rows and columns
head(ped_data[, 1:5])        # First 5 genotype columns (show alleles)
# A tibble: 6 Γ— 5
  X1            X2    X3    X4    X5
  <chr>      <dbl> <dbl> <dbl> <dbl>
1 081215-A05     1     0     0    -9
2 081215-A06     3     0     0    -9
3 081215-A07     4     0     0    -9
4 081215-A08     5     0     0    -9
5 090414-A09     6     0     0    -9
6 090414-A10     7     0     0    -9
head(map_data)
# A tibble: 6 Γ— 4
    chr snp_id    gen_dist bp_pos
  <dbl> <chr>        <dbl>  <dbl>
1     1 id1000001        0  13147
2     1 id1000003        0  73192
3     1 id1000005        0  74969
4     1 id1000007        0  75852
5     1 id1000008        0  75953
6     1 id1000011        0  91016
head(fam_data[, 1:5])
# A tibble: 6 Γ— 5
  FID          IID   PID   MID   sex
  <chr>      <dbl> <dbl> <dbl> <dbl>
1 081215-A05     1     0     0    -9
2 081215-A06     3     0     0    -9
3 081215-A07     4     0     0    -9
4 081215-A08     5     0     0    -9
5 090414-A09     6     0     0    -9
6 090414-A10     7     0     0    -9
head(phenotype_data[, 1:5])
# A tibble: 6 Γ— 5
  HybID      NSFTVID `Flowering time at Arkansas` `Flowering time at Faridpur`
  <chr>        <dbl>                        <dbl>                        <dbl>
1 081215-A05       1                         75.1                           64
2 081215-A06       3                         89.5                           66
3 081215-A07       4                         94.5                           67
4 081215-A08       5                         87.5                           70
5 090414-A09       6                         89.1                           73
6 090414-A10       7                        105                             NA
# β„Ή 1 more variable: `Flowering time at Aberdeen` <dbl>

βœ… Takeaway: Use dim() and head() to quickly check file structure and confirm that samples and traits align before transforming or analyzing GWAS data.