1000 Genomes Project Mirror

What does this mirror contain?

From 2010-2011, the 1000 Genomes Project Consortium sequenced 2,504 human samples from 26 populations at 4X coverage in order to provide a global reference and comprehensive resource on human genetic variation. The goal of the project is to find most genetic variants that have frequencies of at least 1% in the populations studied. In total the dataset consists of some 260 terabytes of data in more than 250,000 publicly accessible files. This is a complete copy of the data sets created by the 1000 genomes project for use by Australian researchers.

How is the data arranged?

The data is organised as a complete mirror of the original site. However, only a proportion of these files are available for immediate access and which proportion depends on when the files were last requested for access - files that have been recently accessed will be more likely to be available immediately.

I'm interested in accessing this data, what's the best way of going about it?

There are several ways of accessing the data and which one you choose will depend on your use case. Surveying all of the files in the complete mirror can be done via HTTP in you web browser here. All of these files are also available via anonymous FTP here and this would probably suit a heavy user more than HTTP access.

The project is supported by:
Research Computing Centre, UQ
QCIF - the Queensland Cyber Infrastructure Foundation

The server was in part funded by the Research Data Services - Life Sciences (Genomics) 1.2 Project.