Overview
Study Design | Additional studies | Logistics | Population sub-structure | Data releaseThe Wellcome Trust Case Control Consortium (WTCCC) was established with an aim to harness the power of newly-available genotyping technologies to improve our understanding of the aetiological basis of several major causes of global disease. Over the last year the consortium has gathered genotype data for up to 500,000 sites of genome sequence variation (single nucleotide polymorphisms or SNPs) in samples ascertained for the disease phenotypes listed in Table 1. Analysis of the genome-wide association data generated has lead to the identification of many SNPs and genes showing evidence of association with disease susceptibility, some of which will be followed up in future studies (Nature. 2007;447;661-78). In addition, the Consortium has gained important insights into the technical, analytical, methodological and biological aspects of genome-wide association analysis.
Study Design
The core of the study comprised an analysis of 2,000 samples from each of seven diseases (type 1 diabetes, type 2 diabetes, coronary heart disease, hypertension, bipolar disorder, rheumatoid arthritis and Crohn's disease). For each disease, the case samples have been ascertained from sites widely distributed across Great Britain, allowing us to obtain considerable efficiencies by comparing each of these case populations to a common set of 3,000 nationally-ascertained controls also from England, Scotland and Wales. These controls come from two sources: 1,500 are representative samples from the 1958 British Birth Cohort and 1,500 are blood donors recruited by the three national UK Blood Services. One of the questions that the WTCCC study has addressed relates to the relative merits of these alternative strategies for the generation of representative population cohorts.
Genotyping for this "main" Case Control study was conducted by Affymetrix using the ("commercial") Affymetrix 500K chip. As part of this study a total of 17,000 samples were typed for 500,000 SNPs.
^Additional studies
There are two additional components to the study.
First, the WTCCC award is part-funding a study of host resistance to infectious diseases in African populations. The same approach has been used to type 2,000 cases of tuberculosis (TB) and 2,000 cases of malaria, as well as 2,000 shared controls. As well as addressing diseases of major global significance, and extending WTCCC coverage into the area of infectious disease, the inclusion of samples of African origin has obvious benefits with respect to methodological aspects of genome-wide association analysis. This part of the study has also received substantial additional funding from the Wellcome Trust and the Gates Foundation through the MalariaGen initiative, and the Wellcome Trust Sanger Institute (WTSI).
Second, the WTCCC has, for four additional diseases (autoimmune thyroid disease, breast cancer, ankylosing spondylitis, multiple sclerosis), completed an analysis of 15,000 SNPs designed to represent a large proportion of the known non-synonymous coding SNPs across the genome. This analysis has been performed at the WTSI using a custom Infinium chip (Illumina).
| Disease | Co-Principal Applicants | Cohort Abbreviation |
|---|---|---|
| Disease cohorts | ||
| Type 1 diabetes | John Todd & David Clayton | T1D |
| Type 2 diabetes | Mark McCarthy & Andrew Hattersley | T2D |
| Crohn's disease | Miles Parkes & Chris Mathew | CD |
| Breast cancer | Michael Stratton & Nanzeen Rahmad | BC |
| Coronary heart disease | Alistair Hall & Nilesh Samani | CHD |
| Hypertension | Mark Caulfield & Martin Farrall | HT |
| Bipolar disorder | Nick Craddock | BD |
| Rheumatoid arthritis | Jane Worthington | RA |
| Multiple sclerosis | Alastair Compston | MS |
| Ankylosing spondylitis | Matthew Brown | AS |
| Autoimmune thyroid disease | Stephen Gough | ATD |
| Malaria | Dominic Kwiatkowski | ML |
| Tuberculosis | Adrian Hill, Melanie Newport & Giorgio Sirugo | TB |
| Control cohorts | ||
| 1958 Birth Cohort | Marcus Pembrey, David Strachan & Peter Shepherd | 58C |
| UK Blood Service | Willem Ouwehand | UKBS |
^
Logistics
The Wellcome Trust Sanger Institute (WTSI) and the JDRF/WT Diabetes and Inflammation Laboratory (DIL), Cambridge established an operation to import and quality control (QC) anonymised DNA samples from all cohorts. Sample QC consisted of quantification (picogreen method), tests for degradation on agarose gels and genotyping of 20 SNP markers (Sequenom platform). Where quantities of native DNA were limited (as with the African samples), whole genome amplification was undertaken at Geneservice Ltd.
For each anonymised DNA sample, genotype data, case/control status, broad geographical information, gender and age group (10 year intervals) is stored at a central ORACLE database at the WTSI.
WTSI and the DIL coordinated sample shipment as well as initial receipt, QC/QA and storage of genotypic data. Statistical analysis (testing individual markers and/or haplotypes for association to the disease phenotypes) was carried out in a centralised fashion by the Data Analysis Group chaired by Professor David Clayton at the DIL, and Professor Lon Cardon at the Wellcome Trust Centre for Human Genetics (Oxford).
^Population sub-structure
It has been known for some time that geographical population structure (i.e. differences in allele frequencies in different geographical regions) and geographical variation in disease prevalence can lead to false positive, and false negative, results in population-based disease association studies. For studies of this size, it has been shown recently that population structure within the British Caucasian population can result in poorly calibrated tests of association. In the statistical analysis, geographical subregion information was used to assess the extent and nature of any population structure present in Great Britain, and to advise on design strategies and analysis methods that efficiently and accurately allow for this. Information on the results of this analysis are described in full in the WTCCC paper.
^Data release
The genotypic data of the control samples (1958 British Birth Cohort and UK Blood Service) and from seven diseases analysed in the main study are now available to qualified researchers. Summary genotype statistics for these collections are available directly from the website. Access to the individual-level genotype data and summary genotype statistics is by application to the Consortium Data Access Committee (CDAC) and approval subject to a Data Access Agreement. For further details, see Access to WTCCC genotype data and samples.