Frequently Asked Questions

I have some questions regarding the various forms and agreements. Whom should I contact?

For questions about the access procedure and data use policies, please contact . For queries regarding the web site and the genotyping data, please fill in the . New queries should preferablly be sent via this method rather than directly to a member of WTCCC staff. For information regarding the European Genotype Archive, please contact ega-admin@ebi.ac.uk.

How long will it take to process my application?

This will vary depending on the timing of the committee meetings, but we aim to process the application within two months.

I wish to access the data as part of a group of collaborators. Will each collaborator need to make a separate application?

A single application can be submitted, but the full contact details for each collaborator must be provided. If more than one Institution is involved, a separate signed Data Access Agreement must be submitted for each Institution.

I work in the group of another researcher who has been granted access to the data. May I also have access?

If you are under the direct supervision of the approved user, it is not necessary for you to make a separate application. Your supervisor must alert the CDAC that you will be viewing the data, by email to . If you are not under the direct supervision of the approved user, it will be necessary for you to make a new application for access to the data.

I have already been granted access to genotype data for the controls. May I have access to genotype data for the cases too? What do I need to do?

Users previously granted access to the controls through the CDAC may have access to case data, without a separate application, by e-mailing . You may be asked to sign the most recent version of the Data Access Agreement. See Access to WTCCC genotype data and samples.

I have confirmation from the Wellcome Trust that I am an "approved user". How may I access the data?

The data from WTCCC phase I (as published in Nature and Nature Genetics, 2007) are now available from the European Genotype Archive, http://www.ebi.ac.uk/ega. For further information, please contact ega-admin@ebi.ac.uk.

Are phenotypic data available for the disease samples?

The WTCCC has limited phenotype data on the disease samples: disease status, age, sex and broad geographical region within Britain. Access to additional phenotype data must be arranged directly with the relevant principal investigator. The principal investigators for each disease group are provided in the Overview page on the web site. For the 1958 Birth cohort controls, access is by application to the 1958 Oversight Committee. Further details can be found at http://www2.le.ac.uk/projects/birthcohort/oversight-committee.

Which genotype calling methods did the consortium use in its analysis?

The analysis in the consortium papers used genotypes derived using Chiamo (Affymetrix 500K) and GenCall (Infinium 15K). WTCCC2 genotypes were called using Chiamo (Affymetrix) and Illuminus (Illumina).

What score thresholds should be used in selecting no calls for individual genotypes?

It is recommended these genotypes be discarded: probability < 0.9 (Chiamo); score > 0.5 (BRLMM); score < 0.15 (GenCall). Exclusion lists (indicating poorly performing assays and samples) and filtered data (with such data removed) are also available.

What are CEL files?

CEL files contain the ‘raw’ probe intensities from the Affymetrix chips. It is from these data that genotype calls are derived, using algorithms such as BRLMM and CHIAMO.

Are the CEL files available to download?

CEL file from WTCCC phase I are now available from the European Genotype Archive. For further information, please contact ega-admin@ebi.ac.uk.

Have you got software to open CEL files? Why are they so big?

Affymetrix provides tools and libraries to manipulate the various data files, such as the Affymetrix Power Tools API written in C++. This code is available, under GPL, via the Affymetrix web site.

Could you provide files in a format for use by plink?

Please see Data formats for descriptions of the formats currently exported. It is not our intention to provide data in too many different formats, mainly due to the size of the data: we anticipate bioinformaticians will have sufficient expertise in making the relevant conversions. Where this is not the case, the Sanger team may be able to help;

What software should I use to manipulate the data I have downloaded?

See also the previous question. The data are provided 'as is' in formats deemed to be appropriate. Users are expected to be able to handle the data they download. It should be stressed that some of these formats are designed to be processed computationally rather than read by eye or opened with, for example, standard office packages.

Is it possible to automate the downloading of files from the site?

The authentication system stores information in a cookie. You could automate downloading with utilities such as "wget" and "curl". Once logged in, a command such as
wget --load-cookies COOKIES_FILE URL
should work. Clearly, you will still have to do this once per file (and get the list of URLs from the web page).

How do I download the data using ftp or sftp?

The individual level genotype data are currently only available via the web interface. There are no immediate plans to provide an alternative.

Where has the "Data Access" link gone?

All Data Access is now available from the European Genotype Archive, http://www.ebi.ac.uk/ega