Wellcome Trust Case Control Consortium

OverviewWTCCC1 studies ParticipantsPublications policy AcknowledgementsAbout CNV Typing Array
Press & publications
Press Release: 28/09/2005 Press Release: 06/06/2007 Publications and presentations
Data access
Access to data Approved Applications CDAC members Data formats FAQ
Open access
Available software Results: summary statistics
Participant access
Participant Login WTCCC1 Genotype Data
Feedback and queries


Study Design | Additional studies | Logistics | Population sub-structure | Data release

The Wellcome Trust Case Control Consortium (WTCCC) was established with an aim to harness the power of newly-available genotyping technologies to improve our understanding of the aetiological basis of several major causes of global disease. The consortium has gathered genotype data for up to 500,000 sites of genome sequence variation (single nucleotide polymorphisms or SNPs) in samples ascertained for the disease phenotypes listed in Table 1. Analysis of the genome-wide association data generated has lead to the identification of many SNPs and genes showing evidence of association with disease susceptibility, some of which will be followed up in future studies (Nature. 2007;447;661-78). In addition, the Consortium has gained important insights into the technical, analytical, methodological and biological aspects of genome-wide association analysis.

Study Design

The core of the study comprised an analysis of 2,000 samples from each of seven diseases (type 1 diabetes, type 2 diabetes, coronary heart disease, hypertension, bipolar disorder, rheumatoid arthritis and Crohn's disease). For each disease, the case samples have been ascertained from sites widely distributed across Great Britain, allowing us to obtain considerable efficiencies by comparing each of these case populations to a common set of 3,000 nationally-ascertained controls also from England, Scotland and Wales. These controls come from two sources: 1,500 are representative samples from the 1958 British Birth Cohort and 1,500 are blood donors recruited by the three national UK Blood Services. One of the questions that the WTCCC study has addressed relates to the relative merits of these alternative strategies for the generation of representative population cohorts.

Genotyping for this "main" Case Control study was conducted by Affymetrix using the ("commercial") Affymetrix 500K chip. As part of this study a total of 17,000 samples were typed for 500,000 SNPs.


Additional studies

There are two additional components to the study.

First, the WTCCC award is part-funding a study of host resistance to infectious diseases in African populations. The same approach has been used to type 2,000 cases of tuberculosis (TB) and 2,000 cases of malaria, as well as 2,000 shared controls. As well as addressing diseases of major global significance, and extending WTCCC coverage into the area of infectious disease, the inclusion of samples of African origin has obvious benefits with respect to methodological aspects of genome-wide association analysis. This part of the study has also received substantial additional funding from the Wellcome Trust and the Gates Foundation through the MalariaGen initiative, and the Wellcome Sanger Institute (WTSI).

Second, the WTCCC has, for four additional diseases (autoimmune thyroid disease, breast cancer, ankylosing spondylitis, multiple sclerosis), completed an analysis of 15,000 SNPs designed to represent a large proportion of the known non-synonymous coding SNPs across the genome. This analysis has been performed at the WTSI using a custom Infinium chip (Illumina).

Table 1: Disease samples
Disease Co-Principal Applicants Cohort Abbreviation
Disease cohorts
Type 1 diabetes John Todd & David Clayton T1D
Type 2 diabetes Mark McCarthy & Andrew Hattersley T2D
Crohn's disease Miles Parkes & Chris Mathew CD
Breast cancer Michael Stratton & Nanzeen Rahmad BC
Coronary heart disease Alistair Hall & Nilesh Samani CHD
Hypertension Mark Caulfield & Martin Farrall HT
Bipolar disorder Nick Craddock BD
Rheumatoid arthritis Jane Worthington RA
Multiple sclerosis Alastair Compston MS
Ankylosing spondylitis Matthew Brown AS
Autoimmune thyroid disease Stephen Gough ATD
Malaria Dominic Kwiatkowski ML
Tuberculosis Adrian Hill, Melanie Newport & Giorgio Sirugo TB
Control cohorts
1958 Birth Cohort Marcus Pembrey, David Strachan & Peter Shepherd 58C
UK Blood Service Willem Ouwehand UKBS



The Wellcome Sanger Institute (WTSI) and the JDRF/WT Diabetes and Inflammation Laboratory (DIL), Cambridge established an operation to import and quality control (QC) anonymised DNA samples from all cohorts. Sample QC consisted of quantification (picogreen method), tests for degradation on agarose gels and genotyping of 20 SNP markers (Sequenom platform). Where quantities of native DNA were limited (as with the African samples), whole genome amplification was undertaken at Geneservice Ltd.

For each anonymised DNA sample, genotype data, case/control status, broad geographical information, gender and age group (10 year intervals) is stored at a central ORACLE database at the WTSI.

WTSI and the DIL coordinated sample shipment as well as initial receipt, QC/QA and storage of genotypic data. Statistical analysis (testing individual markers and/or haplotypes for association to the disease phenotypes) was carried out in a centralised fashion by the Data Analysis Group chaired by Professor David Clayton at the DIL, and Professor Lon Cardon at the Wellcome Trust Centre for Human Genetics (Oxford).


Population sub-structure

It has been known for some time that geographical population structure (i.e. differences in allele frequencies in different geographical regions) and geographical variation in disease prevalence can lead to false positive, and false negative, results in population-based disease association studies. For studies of this size, it has been shown recently that population structure within the British Caucasian population can result in poorly calibrated tests of association. In the statistical analysis, geographical subregion information was used to assess the extent and nature of any population structure present in Great Britain, and to advise on design strategies and analysis methods that efficiently and accurately allow for this. Information on the results of this analysis are described in full in the WTCCC paper.


Data release

The genotypic data of the control samples (1958 British Birth Cohort and UK Blood Service) and from seven diseases analysed in the main study are now available to qualified researchers. Summary genotype statistics for these collections are available directly from the website. Access to the individual-level genotype data and summary genotype statistics is by application to the Consortium Data Access Committee (CDAC) and approval subject to a Data Access Agreement. For further details, see Access to WTCCC genotype data and samples.