Program 2 – A National Approach to Data Federation and Analysis

The widespread adoption of genomic technologies brings unique challenges around technical and data infrastructure. If sequencing a gene contains the information of a chapter of a book, then sequencing a genome is writing a book of 25,000 chapters.

Genomic medicine needs high performance computing, data storage infrastructure and services. It also requires local and national networking infrastructure to enable access to, and data exchange with, large internationally federated databases.

In this research program, we are developing recommendations for national guidelines and piloting infrastructure for a scalable, shared, standardised national data repository of clinical genomics information.

This system will support genomics in both clinical and research environments, helping to realize the full potential of genomic medicine in routine healthcare. However, in order to draw clinical benefit from the genomic information generated, there needs to be an evidence-based relationship between the subset of genes leading to a particular trait (the genotype) and the physical characteristics, symptoms, or physical manifestation of that genotype (the phenotype); hence ‘genomic’ and ‘phenomic’ information.

In order to achieve this, three key challenges will need to be resolved:

  1. A national genotype-phenotype database
  2. Standardisation of genomic and phenomic data
  3. Scalable computing and storage infrastructure

Aspects of these three challenges have already been met or are being addressed by different groups across Australia. A fundamental tenet of Australian Genomics Health Alliance (AGHA) is to avoid duplication of effort and investment, and to leverage and unite this existing infrastructure — through evaluation, standardisation and support.

We propose to pilot the development of world-class genotype-phenotype databases and interfaces that also link to international data sharing initiatives.

This will be done through:

  • Detailed mapping of current practices in each state and best practice internationally
  • Establishing agreed upon standards and guidelines across diagnostic laboratories for storing phenomic and genomic data, variant curation and classification, and reporting of patient results
  • Adoption of national standards for data quality and pathology reports
  • A federated secure data sharing system that leverages existing institutional, state and federal infrastructure investments
  • Standards and protocols for governance and sharing of genomic data for clinical and research use, nationally and internationally

Identification of a variant in a gene (genotype) cannot influence medical management if we are uncertain of its impact on phenotype or disease.  Variants of Unknown Significance (VUS) are frequently reported in pathology test results – we know there is a change to the genome, but we are not sure of what this change means.

By collecting more evidence from different patients and linking a genotype with an associated phenotype, a causative relationship can be established, for example, knowing that a specific change in the genome (a variation) is linked with causing, or increasing the risk of, a specific disease.

The aggregation and sharing of genomic and phenomic data in a secure and access-controlled environment is the best way to establish this genotype-phenotype (gen-phen) relationship, and is essential to the efficient and optimal delivery of genomic medicine.

The accuracy of genetic testing (and its continued development) is also dependent on access to high-quality, annotated genotype and phenotype data.

There have been five projects of work identified as part of Program 2 that will underpin the delivery of genomic medicine:

  1. Clinical variant classificationdeveloping systems and standards that allow a specific genetic alteration to be given the same definition, and same medical interpretation nationally.
  2. Genotype-phenotype national data resource – establishing a national database of clinical-grade genomic and phenomic data that will be used for storing, retrieving and analysing phenotype-variant associations.
  3. Accurate phenotype information – federating a common system across Australia for recording medical information like symptoms, demographics and phenotypes of patients to enable these to be shared anonymously and cross-referenced with the genotype data generated.
  4. A common framework for the evaluation of individual production analysis pipelinesgenomic sequencing produces masses of data, often hundreds of genes in panels, or thousands of genes in Whole Exome Sequencing or Whole Genome Sequencing. Manual analysis or curation of this data would be impossibly difficult and time consuming, so bioinformatics pipelines are designed to process the data for quality, for reliability of the changes identified, and often provide information on the variants and possible implications of these genetic changes. AGHA intends to establish common standards for these pipelines to ensure the data contributing to medical reports is consistent, reliable and accurate, Australia-wide.
  5. Data sharing and archiving solutions for a federated data repository will be piloted to enable the storage, sharing, retrieving and archiving of the genetic/genomic data produced in diagnostic laboratories for clinical and research access.

Data tools

See our Data Tools page to access the AGHA Matchmaker Exchange / Patient Archive and Australian Beacon.