Data source included in VarCards

Part one: variant-level implication

Allele frequency
dbSNP, gnomAD, ExAC, 1000Genomes, ESP, Kaviar, HRC, CG69
dbSNP The Single Nucleotide Polymorphism database (dbSNP) is a public-domain archive for a broad collection of simple genetic polymorphisms. https://www.ncbi.nlm.nih.gov/snp
gnomAD The Genome Aggregation Database (gnomAD), is a coalition of investigators seeking to aggregate and harmonize exome and genome sequencing data from a variety of large-scale sequencing projects, and to make summary data available for the wider scientific community.The data set provided on this website spans 123,136 exomes and 15,496 genomes from unrelated individuals sequenced as part of various disease-specific and population genetic studies. http://gnomad.broadinstitute.org/
ExAC The Exome Aggregation Consortium (ExAC) is a coalition of investigators seeking to aggregate and harmonize exome sequencing data from a variety of large-scale sequencing projects, and to make summary data available for the wider scientific community.The data set provided on this website spans 60,706 unrelated individuals sequenced as part of various disease-specific and population genetic studies. http://exac.broadinstitute.org/
1000Genomes The 1000 Genomes Project was the first project to sequence the genomes of a large number of people, to provide a comprehensive resource on human genetic variation. In the final phase of the project, data from 2,504 samples was combined to allow highly accurate assignment of the genotypes in each sample at all the variant sites the project discovered and the data was from 26 populations,including African, Ad Mixed American, East Asian,European, South Asian, and so on. http://www.1000genomes.org/
ESP The dataset in NHLBI GO Exome Sequencing Project (ESP)is from the NHLBI GO Exome Sequencing Project and its ongoing studies which produced and provided exome variant calls for comparison .The current EVS data release (ESP6500SI-V2) is taken from 6503 samples drawn from multiple ESP cohorts and represents all of the ESP exome variant data. http://evs.gs.washington.edu/EVS/
Kaviar Kaviar is a compilation of SNVs, indels, and complex variants observed in humans, designed to facilitate testing for the novelty and frequency of observed variants.Kaviar contains 162 million SNV sites (including 25M not in dbSNP) and incorporates data from 35 projects encompassing 77,781 individuals (13.2K whole genome, 64.6K exome). http://db.systemsbiology.net/kaviar/
HRC The Haplotype Reference Consortium (HRC) is used for genotype imputation and phasing in other cohorts, typically genome-wide association studies (GWAS), where genotypes are available from genome-wide SNP microarrays.And it contains haplotypes from individuals with predominantly European ancestry, although the HRC includes the 1000 Genomes Project data.The first release consists of 64,976 haplotypes at 39,235,157 SNPs, all with an estimated minor allele count of greater than 5. http://www.haplotype-reference-consortium.org
CG69 The database includes 69 DNA samples sequenced using their Standard Sequencing Service, which includes whole genome sequencing, mapping of the resulting reads to a human reference genome, comprehensive detection of variations, scoring, and informative annotation. http://www.completegenomics.com/public-data/69-Genomes/
Missense prediction
SIFT, PolyPhen2 HDIV, PolyPhen2 HVAR, LRT, MutationTaster, MutationAssessor, FATHMM, PROVEAN, MetaSVM, MetaLR, VEST, M-CAP, CADD, GERP++, DANN, fathmm-MKL, Eigen, GenoCanyon, fitCons, PhyloP, PhastCons, SiPhy, REVEL, dbNSFP
SIFT SIFT predicts whether an amino acid substitution affects protein function. SIFT prediction is based on the degree of conservation of amino acid residues in sequence alignments derived from closely related sequences, collected through PSI-BLAST. SIFT can be applied to naturally occurring nonsynonymous polymorphisms or laboratory-induced missense mutations. http://sift.jcvi.org
PolyPhen2_HDIV PolyPhen-2 is a tool which predicts possible impact of an amino acid substitution on the structure and function of a human protein using straightforward physical and comparative considerations.HumDiv-trained PolyPhen-2 is used for evaluating rare alleles at loci potentially involved in complex phenotypes, dense mapping of regions identified by genome-wide association studies, and analysis of natural selection from sequence data, where even mildly deleterious alleles must be treated as damaging. http://genetics.bwh.harvard.edu/pph2
PolyPhen2_HVAR PolyPhen-2 is a tool which predicts possible impact of an amino acid substitution on the structure and function of a human protein using straightforward physical and comparative considerations.HumVar-trained PolyPhen-2 can diagnose Mendelian diseases that requires distinguishing mutations with drastic effects from all the remaining human variation, including abundant mildly deleterious alleles. http://genetics.bwh.harvard.edu/pph2
LRT A likelihood ratio test (LRT) can accurately identify a subset of deleterious mutations that disrupt highly conserved amino acids within protein-coding sequences, which are likely to be unconditionally deleterious. http://www.genetics.wustl.edu/jflab/lrt_query.html
MutationTaster MutationTaster employs a Bayes classifier to eventually predict the disease potential of an alteration. The Bayes classifier is fed with the outcome of all tests and the features of the alterations and calculates probabilities for the alteration to be either a disease mutation or a harmless polymorphism. http://www.mutationtaster.org
MutationAssessor MutationAssessor predicts the functional impact of amino-acid substitutions in proteins, such as mutations discovered in cancer or missense polymorphisms. The functional impact is assessed based on evolutionary conservation of the affected amino acid in protein homologs. http://mutationassessor.org
FATHMM Functional Analysis through Hidden Markov Models(FATHMM) is specifically designed for non-synonymous single nucleotide variants (nsSNVs). http://fathmm.biocompute.org.uk
PROVEAN Protein Variation Effect Analyzer (PROVEAN) is a software tool which predicts whether an amino acid substitution or indel has an impact on the biological function of a protein. It is useful for filtering sequence variants to identify nonsynonymous or indel variants that are predicted to be functionally important. http://provean.jcvi.org/
MetaSVM MetaSVM is a ensemble scoring method for deleterious missense mutations.It integratea nine deleteriousness prediction scores and maximum minor allele frequency for more accurate and comprehensive evaluation of deleteriousness of missense mutations. https://www.ncbi.nlm.nih.gov/pubmed/25552646
MetaLR MetaLR is a ensemble scoring method for deleterious missense mutations. It achieves the highest discriminative power compared to all eighteen existing deleteriousness prediction scores, which demonstrated the value of combining information from multiple orthologous approaches. https://www.ncbi.nlm.nih.gov/pubmed/25552646
VEST 3.0 The Variant Effect Scoring Tool (VEST) 3.0 is a machine learning method that predicts the functional significance of missense mutations observed through genome sequencing, allowing mutations to be prioritized in subsequent functional studies, based on the probability that they impair protein activity. http://wiki.chasmsoftware.org
M-CAP M-CAP is a pathogenicity classifier for rare missense variants in the human genome that is tuned to the high sensitivity required in the clinic. By combining previous pathogenicity scores (including SIFT, Polyphen-2 and CADD) with novel features and a powerful model, they attain the best classifier at all thresholds, reducing a typical exome/genome rare (<1%) missense variant (VUS) list from 300 to 120, while never mistaking 95% of known pathogenic variants as benign. http://bejerano.stanford.edu/MCAP
CADD Combined Annotation Dependent Depletion (CADD) is a tool for scoring the deleteriousness of single nucleotide variants as well as insertion/deletions variants in the human genome. It is a framework that integrates multiple annotations into one metric by contrasting variants that survived natural selection with simulated mutations. http://cadd.gs.washington.edu/
GERP++ GERP++ is a new tool that uses maximum likelihood evolutionary rate estimation for position-specific scoring and, in contrast to previous bottomup methods, a novel dynamic programming approach to subsequently define constrained elements. GERP++ evaluates a richer set of candidate element breakpoints and ranks them based on statistical significance, eliminating the need for biased heuristic extension techniques. http://mendel.stanford.edu/SidowLab/downloads/gerp/index.html
DANN DANN is a deep learning approach for annotating the pathogenicity of whole-genome genetic variants.DANN uses the same feature set and training data as CADD to train a deep neural network (DNN). DNNs can capture non-linear relationships among features and are better suited than SVMs for problems with a large number of samples and features. https://cbcl.ics.uci.edu/public_data/DANN/
fathmm-MKL fathmm-MKL is capable of predicting the functional effects of protein missense mutations by combining sequence conservation within hidden Markov models (HMMs), representing the alignment of homologous sequences and conserved protein domains, with "pathogenicity weights", representing the overall tolerance of the protein/domain to mutations. http://fathmm.biocompute.org.uk
Eigen Eigen is a spectral approach to the functional annotation of genetic variants in coding and noncoding regions. Eigen makes use of a variety of functional annotations in both coding and noncoding regions (such as made available by the ENCODE and Roadmap Epigenomics projects), and combines them into one single measure of functional importance. http://www.columbia.edu/~ii2135/eigen.html
GenoCanyon GenoCanyon is a statistical framework to predict functional non-coding regions in the human genome through integrated analysis of annotation data.Meanwhile,it is a whole-genome functional annotation approach based on unsupervised statistical learning. It integrates genomic conservation measures and biochemical annotation data to predict the functional potential at each nucleotide. http://genocanyon.med.yale.edu/
fitCons The fitness consequences of functional annotation(fitCons) integrates functional assays (such as ChIP-Seq) with selective pressure inferred using the INSIGHT method. The result is a score ρ in the range [0.0-1.0] that indicates the fraction of genomic positions evincing a particular pattern (or "fingerprint") of functional assay results, that are under selective pressure. http://compgen.cshl.edu/fitCons/
PhyloP PhyloP scores measure evolutionary conservation at individual alignment sites.And the phyloP scores are useful to evaluate signatures of selection at particular nucleotides or classes of nucleotides (e.g., third codon positions, or first positions of miRNA target sites). http://compgen.bscb.cornell.edu/phast
PhastCons PHAST is a freely available software package for comparative and evolutionary genomics. It consists of about half a dozen major programs, plus more than a dozen utilities for manipulating sequence alignments, phylogenetic trees, and genomic annotations. http://compgen.cshl.edu/phast/
SiPhy SiPhy is a approach that takes advantage of deeply sequenced clades to identify evolutionary selection by uncovering not only signatures of rate-based conservation but also substitution patterns characteristic of sequence undergoing natural selection. http://portals.broadinstitute.org/genome_bio/siphy/
REVEL REVEL is a new ensemble method for predicting the pathogenicity of missense variants based on a combination of scores from 13 individual tools: MutPred, FATHMM v2.3, VEST 3.0, Polyphen-2, SIFT, PROVEAN, MutationAssessor, MutationTaster, LRT, GERP++, SiPhy, phyloP, and phastCons. REVEL was trained using recently discovered pathogenic and rare neutral missense variants, excluding those previously used to train its constituent tools https://sites.google.com/site/revelgenomics/
dbNSFP The purpose of the dbNSFP is to provide a one-stop resource for functional predictions and annotations for human nonsynonymous single-nucleotide variants (nsSNVs) and splice-site variants (ssSNVs), and to facilitate the steps of filtering and prioritizing SNVs from a large list of SNVs discovered in an exome-sequencing study. http://sites.google.com/site/jpopgen/dbNSFP
Disease-related
InterVar, ClinVar, denovo-db, COSMIC, ICGC, GWAS Catalog
InterVar InterVar is a bioinformatics software tool for clinical interpretation of genetic variants by the ACMG/AMP 2015 guideline. The input to InterVar is an annotated file generated from ANNOVAR, while the output of InterVar is the classification of variants into 'Benign', 'Likely benign', 'Uncertain significance', 'Likely pathogenic' and 'Pathogenic', together with detailed evidence code. http://wintervar.wglab.org/
ClinVar ClinVar is a freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence.The database includes germline and somatic variants of any size, type or genomic location. Interpretations are submitted by clinical testing laboratories, research laboratories, locus-specific databases, OMIM, GeneReviews, UniProt, expert panels and practice guidelines. https://www.ncbi.nlm.nih.gov/clinvar
denovo-db denovo-db is a collection of germline de novo variants identified in the human genome. As of July 2016, denovo-db contained 40 different studies and 32,991 de novo variants from 23,098 trios. Database features include basic variant information (chromosome location, change, type); detailed annotation at the transcript and protein levels; severity scores; frequency; validation status; and, most importantly, the phenotype of the individual with the variant. http://denovo-db.gs.washington.edu/denovo-db/
COSMIC COSMIC is designed to store and display somatic mutation information and related details and contains information relating to human cancers. There are two types of data in COSMIC: Expert manual curation data and systematic screen data.The information in COSMIC is curated by expert scientists, primarily by scrutinizing large numbers of scientific publications. http://cancer.sanger.ac.uk/cosmic
ICGC The International Cancer Genome Consortium (ICGC) generates comprehensive catalogues of genomic abnormalities (somatic mutations, abnormal expression of genes, epigenetic modifications) in tumors from 50 different cancer types and/or subtypes which are of clinical and societal importance across the globe and make the data available to the entire research community as rapidly as possible, and with minimal restrictions, to accelerate research into the causes and control of cancer. https://icgc.org
GWAS Catalog The GWAS Catalog is a quality controlled, manually curated, literature-derived collection of all published genome-wide association studies assaying at least 100,000 SNPs and all SNP-trait associations with p-values < 1.0 x 10-5.The Catalog also publishes the iconic GWAS diagram of all SNP-trait associations, with p-values ≤ 5.0 x 10-8, mapped onto the human genome by chromosomal locations and displayed on the human karyotype. http://www.ebi.ac.uk/gwas/
Other data
RefSeq, InterPro, Segmental duplication
RefSeq A comprehensive, integrated, non-redundant, well-annotated set of reference sequences including genomic, transcript, and protein. https://www.ncbi.nlm.nih.gov/refseq/  
InterPro InterPro is a resource that provides functional analysis of protein sequences by classifying them into families and predicting the presence of domains and important sites. To classify proteins in this way, InterPro uses predictive models, known as signatures, provided by several different databases (referred to as member databases) that make up the InterPro consortium.It combines signatures from multiple, diverse databases into a single searchable resource, reducing redundancy and helping users interpret their sequence analysis results. http://www.ebi.ac.uk/interpro/
Segmental duplication Segmental duplications is a method to detect identity between long stretches of genomic sequence despite the presence of high copy repeats and large insertion-deletions(> 90% identity and >1kb in length). http://humanparalogy.gs.washington.edu/

Part two: gene-level implication

Basic information
UniProtKB, HomoloGene, NCBI Gene
UniProtKB The UniProt Knowledgebase (UniProtKB) is the central hub for the collection of functional information on proteins, with accurate, consistent and rich annotation. In addition to capturing the core data mandatory for each UniProtKB entry (mainly, the amino acid sequence, protein name or description, taxonomic data and citation information), as much annotation information as possible is added.This includes widely accepted biological ontologies, classifications and cross-references, and clear indications of the quality of annotation in the form of evidence attribution of experimental and computational data. http://www.uniprot.org/uniprot/
HomoloGene HomoloGene is an automated system for constructing putative homology groups from the complete gene sets of a wide range of eukaryotic species. https://www.ncbi.nlm.nih.gov/homologene
Ensembl Ensembl is a genome browser for vertebrate genomes that supports research in comparative genomics, evolution, sequence variation and transcriptional regulation. Ensembl annotate genes, computes multiple alignments, predicts regulatory function and collects disease data. Ensembl tools include BLAST, BLAT, BioMart and the Variant Effect Predictor (VEP) for all supported species. http://asia.ensembl.org/
NCBI Gene Gene integrates information from a wide range of species. A record may include nomenclature, Reference Sequences (RefSeqs), maps, pathways, variations, phenotypes, and links to genome-, phenotype-, and locus-specific resources worldwide. https://www.ncbi.nlm.nih.gov/gene/
Genic intolerance
RVIS, LoFtool, heptanucleotide context intolerance score
RVIS Residual Variation Intolerance Score (RVIS) is a gene-based score intended to help in the interpretation of human sequence data. The intolerance score in its current form is based upon allele frequency data as represented in whole exome sequence data from the NHLBI-ESP6500 data set. The score is designed to rank genes in terms of whether they have more or less common functional genetic variation relative to the genome wide expectation given the amount of apparently neutral variation the gene has. A gene with a positive score has more common functional variation, and a gene with a negative score has less and is referred to as "intolerant". By convention they rank all genes in order from most intolerant to least. http://genic-intolerance.org/
LoFtool LoFtool is a novel gene intolerance ranking system which may help in ranking genes of interest based on their LoF intolerance and tissue expression.It predicts genome-wide de novo haploinsufficient mutations accurately and could be of help in search for genetic causes of rare Mendelian diseases. Moreover, its brain expression enrichment coupled to a ROC AUC of 0.86 in detecting neurodevelopmental disorder genes makes LoFtool also an attractive method for investigating complex brain diseases with strong genetic effects. https://www.ncbi.nlm.nih.gov/pubmed/27563026
heptanucleotide context intolerance score Varun Aggarwala and Benjamin F Voight applied 7-mer coding substitution probabilities to develop an intolerance score quantifying the difference between the expected and observed numbers of functional variants at a gene, with higher scores consistent with functional constraint. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4811712/
Gene function
UniProtKB, Gene Ontology, InterPro, InBio Map™, NCBI BioSystems
UniProtKB The UniProt Knowledgebase (UniProtKB) is the central hub for the collection of functional information on proteins, with accurate, consistent and rich annotation. In addition to capturing the core data mandatory for each UniProtKB entry (mainly, the amino acid sequence, protein name or description, taxonomic data and citation information), as much annotation information as possible is added. This includes widely accepted biological ontologies, classifications and cross-references, and clear indications of the quality of annotation in the form of evidence attribution of experimental and computational data. http://www.uniprot.org/uniprot/
Gene Ontology The Gene Ontology (GO) project is a major bioinformatics initiative to develop a computational representation of our evolving knowledge of how genes encode biological functions at the molecular, cellular and tissue system levels. The project has developed formal ontologies that represent over 40,000 biological concepts, and are constantly being revised to reflect new discoveries. To date, these concepts have been used to "annotate" gene functions based on experiments reported in over 100,000 peer-reviewed scientific papers. http://geneontology.org/
InterPro InterPro is a resource that provides functional analysis of protein sequences by classifying them into families and predicting the presence of domains and important sites. To classify proteins in this way, InterPro uses predictive models, known as signatures, provided by several different databases (referred to as member databases) that make up the InterPro consortium.It combines signatures from multiple, diverse databases into a single searchable resource, reducing redundancy and helping users interpret their sequence analysis results. http://www.ebi.ac.uk/interpro/
InBio Map InBio Map is a high coverage, high quality, convenient and transparent platform for investigating and visualizing protein-protein interactions. https://www.intomics.com/inbio/map/#home
NCBI BioSystems The NCBI BioSystems Database provides integrated access to biological systems and their component genes, proteins, and small molecules, as well as literature describing those biosystems and other related data throughout Entrez. https://www.ncbi.nlm.nih.gov/biosystems
Disease-related
OMIM, MGI, ClinVar, HPO
OMIM OMIM (online mendelian inheritance in man) is a comprehensive, authoritative compendium of human genes and genetic phenotypes. It contains information on all known mendelian disorders and over 15,000 genes. And it focuses on the relationship between phenotype and genotype. https://omim.org/
MGI MGI is the international database resource for the laboratory mouse, providing integrated genetic, genomic, and biological data to facilitate the study of human health and disease. http://www.informatics.jax.org/
ClinVar ClinVar is a freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence.The database includes germline and somatic variants of any size, type or genomic location. Interpretations are submitted by clinical testing laboratories, research laboratories, locus-specific databases, OMIM, GeneReviews, UniProt, expert panels and practice guidelines. https://www.ncbi.nlm.nih.gov/clinvar
HPO The Human Phenotype Ontology (HPO) provides a standardized vocabulary of phenotypic abnormalities encountered in human disease. Each term in the HPO describes a phenotypic abnormality, such as atrial septal defect. The HPO is currently being developed using the medical literature, Orphanet, DECIPHER, and OMIM. HPO currently contains approximately 11,000 terms (still growing) and over 115,000 annotations to hereditary diseases. The HPO also provides a large set of HPO annotations to approximately 4000 common diseases. http://human-phenotype-ontology.github.io/
Gene expression
UniProtKB, GTEx, The Human Protein Atlas
UniProtKB The UniProt Knowledgebase (UniProtKB) is the central hub for the collection of functional information on proteins, with accurate, consistent and rich annotation. In addition to capturing the core data mandatory for each UniProtKB entry (mainly, the amino acid sequence, protein name or description, taxonomic data and citation information), as much annotation information as possible is added. This includes widely accepted biological ontologies, classifications and cross-references, and clear indications of the quality of annotation in the form of evidence attribution of experimental and computational data. http://www.uniprot.org/uniprot/
GTEx The Genotype-Tissue Expression (GTEx) project provides to the scientific community a resource with which to study human gene expression and regulation and its relationship to genetic variation. This project will collect and analyze multiple human tissues from donors who are also densely genotyped, to assess genetic variation within their genomes. By analyzing global RNA expression within individual tissues and treating the expression levels of genes as quantitative traits, variations in gene expression that are highly correlated with genetic variation can be identified as expression quantitative trait loci, or eQTLs. http://www.gtexportal.org/
The Human Protein Atlas The Human Protein Atlas contains information for a large majority of all human protein-coding genes regarding the expression and localization of the corresponding proteins based on both RNA and protein data. The atlas consists of three subparts; cell, normal tissue, and cancer with each subpart containing images and data based on antibody-based proteomics and transcriptomics. The tissue atlas contains information of 44 different human tissues and organs with annotation data for altogether 76 different cell types. http://www.proteinatlas.org/
Target drug
DGIdb
DGIdb The Drug-Gene Interaction database (DGIdb) mines existing resources that generate hypotheses about how mutated genes might be targeted therapeutically or prioritized for drug development. It provides an interface for searching lists of genes against a compendium of drug-gene interactions and potentially 'druggable' genes.It integrates data from 13 primary sources that cover disease-relevant human genes, drugs, drug-gene interactions and potential druggability. Currently, DGIdb contains over 14,144 drug-gene interactions involving 2,611 genes and 6,307 drugs, and in addition it includes 6,761 genes belonging to one or more of 39 potentially druggable gene categories. A total of 7,668 unique genes have either known or potential druggability. http://dgidb.genome.wustl.edu/