Searching for Gene-Environment Interactions in Cancer: Biorepository Support for the New York Cancer Project

Abstract

CRS Arm

Carousel

The last decade has witnessed remarkable success in defining the genetic basis of a large number of single gene disorders, as well as the identification of certain high penetrance genes that underlie a proportion of more common illnesses, such as BRCA-1 in breast cancer. However, the promise of genetic analysis is now widely acknowledged to extend into the understanding of almost all common human diseases, including cardiovascular disease, autoimmunity, neurodegenerative disorders and most forms of cancer. In this context, it is likely that large numbers of different genes contribute to disease risk, in different combinations, and probably only in the setting of certain environmental risk factors such as diet, smoking, chemical exposures or lack of physical activity, to name only a few. The recent identification of genes controlling longevity in yeast emphasize the potential power of genetic analysis to shed light on both normal as well as abnormal biology. Thus, obtaining genetic information on populations of normal subjects is also of great interest to the scientific community.

In order to study these questions in human populations, extremely large sample sizes are required to detect the generally modest and interacting effects of individual genes. In addition, each individual needs to provide detailed information about environmental exposures as well as the presence of disease or other phenotypes of interest. Ideally, one should obtain such information before the development of disease in order to more reliably document exposures. In addition, by enrolling subjects before the development of disease, one can study a large number of different disease outcomes using one study population.

OmniGrid

Tecan Genesis

ReGripper

Tecan Mol Bank

Sealer

This approach, known as a cohort study, comprises the study design of the New York Cancer Project.

The New York Cancer Project was launched in November of 1998 with a $12 million pilot grant from the City of New York. The goal of this project is to enroll up to 300,000 New Yorkers in a long term cohort study of cancer risk, with an emphasis on gene-gene and gene environment interactions. This ambitious project was initiated by the Academic Medicine Development Corporation (AMDeC), in collaboration with New York area researchers and institutions. AMDeC is a unique non-profit organization whose mission is to support biomedical research in New York State through collaboration with 36 New York academic health centers, research institutions and medical schools. AMDeC's President, Dr. Maria Mitchell, has been instrumental in fostering a number initiatives to increase the ability of these institutions to pursue large collaborative projects, many of which are focused on human genetics and functional genomics. The New York Cancer Project itself is headed by Dr. Tom Rohan, Chairman of the Department of Epidemiology at the Albert Einstein College of Medicine.

In addition to the recruitment and characterization of study subjects, a major challenge for large population studies is the proper preparation and storage of biological materials, particularly DNA and RNA, in a biorepository. Broadly, a biorepository is a biological specimen bank which will store one or more types of biological specimens under conditions which permit rapid retrieval and optimum stability. Robotic systems are highly desirable in this context, in order to eliminate the errors associated with human handling of vast numbers of visually identical samples, and to provide continuous monitoring of sample quality and storage conditions.

The need for such biorepository facilities is clear when one considers the increasing number of large scale projects that are underway which require storage of DNA and other biological materials (see boxed inset). The New York Cancer Project has approached this problem by developing a novel robotic system in collaboration with the Medical Automation Research Center (MARC) at the University of Virginia. The overall design of this system has been a product of over a year of planning by the authors. Robin A Felder, Ph.D., is the Director of MARC, and Peter K. Gregersen, M.D., serves as the Director of the NYCP Biorepository. Other participants in the system development include Robert Lundsten, Assistant Director of the NYCP Biorepository, Sean Graves, Ph.D., Director of MARC's robotics division, and Theodore E. Mifflin, Ph.D., Director of MARC's molecular automation division.

The NYCP Biorepository facility is located at the North Shore University Hospital in Manhasset, New York. This facility currently receives 50–100 blood specimens per day from NYCP recruitment sites. After a streamlined process of manual sample preparation, DNA, plasma and RNA derived from these samples are handled robotically for specimen aliquoting and storage. Extracted DNA, plasma and sources of RNA will be automatically quantitated by the robotic system so that aliquots of similar concentration and volume may be stored in individually retrievable aliquots for studies. For short to intermediate term storage, DNA samples will be placed in an automated refrigerator (Mol Bank, Gira, France) for convenient access by investigators. In addition, samples will be archived for long term storage systems (Revco Technologies Inc., Asheville, NC) to assure availability of specimens for many years into the future. It is anticipated utilization will grow substantially in the later years, when longitudinal follow up of the study population reveals the emergence of diseases and phenotypes of interest.

This section compiled by Catherine Piche, University of Virginia

DNA Sciences (Mountain View, CA) <http://www.DNA.com>

DNA Science's Gene Trust Project is a repository of patient specimens and health information. All participants in the repository are volunteers who register on DNA Science's website, answer questions regarding their medical histories, and arrange a time and place for DNA Sciences to obtain a blood sample.

Genset (Paris, France) <http://www.genxy.com>

Genset has a proprietary library — NetGene‰ — that contains more than 56,000 human genes, as well as 5 Prime sequences and genomic sequences. An isolated subset of NetGene‰ is SignalTag‰, a gene library of over 3,000 gene sequences which is specialized for secreted proteins. Genset has ongoing projects and collaborative research agreements on isolating genes linked to schizophrenia, Alzheimer's disease, Type II diabetes, and bipolar disorder.

DeCODE Genetics (Reykjavik, Iceland) <http://www.decode.com>

The deCODE Combined Data Processing system (DCDP) integrates genetic data with phenotypic and genealogical information from two additional sources. DeCODE Genetics has a twelve-year license to the Icelandic Health Sector Database (IHSD), which contains anonymized patient records data for the population of Iceland. The DCDP cross-references this database with a genealogical database linking Iceland's 275,000 inhabitants in family trees and showing living Icelanders suffering from certain diseases.

Gemini Genomics (Cambridge, UK) <http://www.gemini-genomics.com>

Gemini has clinical data on a variety of different population groups, including identical and non-identical twins, sibling pairs in disease-affected families, and “founder” populations. The data on pairs of twins is part of an eight-year old study linking physiological information gained through comprehensive clinical studies with genetic information. The “founder” study focuses on the population of Newfoundland and Labrador, Canada, which consists of 550,000 people largely descended from a small population of British, Scottish, and Irish settlers. The prevalence of certain diseases in this population should lead to easier identification of the genes linked to those diseases.

Estonian Genome Center Foundation (Tartu, Estonia) <http://genomics.ee>

Estonia has begun the five-year Estonian Genome Project to collect DNA and personal data for almost all of its population of 1.4 million. The Estonian population has been geographically and culturally localized, if not isolated, for the past 5,000–6,000 years. As of May 2000, over 102,000 marker loci, or single nucleotide polymorphisms, had been placed in the database, with a goal of 300,000 SNPs being entered by 2001.

Genomics Collaborative (Cambridge, MA) <http://getDNA.com>

Genomics Collaborative has a network of 250 doctors that have indicated their willingness to gather both medical records, including family histories and reactions to pharmaceutical treatments, and specimens from their patients under informed consent. Furthermore, they have signed an agreement to get DNA extracts from over 3 million tissue biopsies a year, which Ameripath (a surgical pathologist network) gathers from its pathologists. Genomics Collaborative expects to have more than 500,000 samples by the end of 2003, and they are capable of storing up to 1 million samples.

United States Armed Forces DNA Identification Laboratory <http://www.afip.org/homes/oafme/dna/afdil.html>

Since 1993 the US Armed Forces Institute of Pathology has collected DNA samples, which are used to augment standard procedures for remains identification. These samples are collected from both active duty personnel and selected high-risk reserve personnel. As of 1995 the repository held 1.15 million samples.

One of the key features of the NYCP Biorepository is the ability to provide biorepository personnel with rapid random access to hundreds of thousands of DNA samples stored in the Molbank. These samples can be aliquoted and supplied in 96 well formats for PCR analysis by the system itself, or by collaborating investigators. Since large numbers of SNPs and other polymorphisms are being defined in the human genome, it is likely that tens or even hundreds of thousands of analyses may be done on some samples. The relatively large amount of DNA collected by the NYCP on each individual (> 1 mg), and the low requirements for PCR based analyses (1 ng or less), make this feasible. Additional information about the New York Cancer Project and AMDeC can be found at www.amdec.org.

The cover image is a computer-generated illustration that includes equipment from a variety of laboratory automation companies. The different automation systems represented are the Genesis, Spectrafluor and Molbank from Tecan AG; the Carousel, Regripper, and Robot Arm (with controller), from CRS; the OmniGrid from GeneMachines; a balance from Sartorius and a Microplate Sealer from ABgene.

The cover image was generated by Elle Kovarikova, Medical Automation Research Center (MARC), University of Virginia.