The software and data provided by these sites can be used for in-silico experiments. Do you want to predict what impact a particular sequence variant will have on the expression of a gene and the properties of its protein product? Do you want to identify the somatic changes identified in particular cancers?
Basic Local Alignment Search Tool (BLAST)
This databases and programs linked through BLAST can be used to find areas of sequence similarity for nucleic acids and proteins.
Databases exist for many species including humans.
BLAST can be used to calculate how well matched sequences are and to infer evolutionary relationships between sequences.
This software can be used to interrogate the possible functional significance of non-synonymous SNPs; i.e. variants that alter the protein-coding sequence.
Predictions are made at the molecular level about the possible effects of sequence variation on phenotype.
The Encyclopedia of DNA Elements (ENCODE)
The ENCODE consortium is compiling a list of functional elements within the genome. Regulatory elements may act at the RNA level or at the level of the protein. Regulatory mechanisms include the differential binding of transcription factors and variations in chromatin state.
Data is a searchable and freely available.
Regulome Database (DB)
This database collates and annotates information about SNPs that affect known or putative regulatory elements within the human genome.
The database is searchable by chromosomal region, SNP ID or individual nucleotide defined by human genome reference sequence co-ordinates.
The Cancer Genome Atlas (TCGA) Portal
The United States’ National Cancer Institute supports TCGA to produce and store high level sequence data from cancer genomes. The TCGA portal allows researchers to search, download and analyse data sets. There are directories for open and controlled access to genome data.
The International Cancer Genome Consortium (ICGC)
The ICGC publishes data from cancer genetic studies and provides tools for the interrogation of the data. Application has to be made to ICGC for access to some controlled data.
Their stated goal is “to obtain a comprehensive description of genomic, transcriptomic and epigenomic changes in 50 different tumor types and/or subtypes which are of clinical and societal importance across the globe”.
SNPedia and Promethease
SNPedia is a wiki about human genome variants and collates data from peer-reviewed publications about the phenotypic impact of individual SNPs.
These data are used in linked software, Promethease, which will build a personal report about sequence variants from an individual’s DNA/genotype results, e.g. results derived from a direct-to-consumer DNA testing service. The reports present probabilistic data about associations with given phenotypes.