Joint Genome Institute: Summer 2017-2018 Cohort
Antonio Gonzalez
Major: Biological Sciences Home City: Contact: Faculty Mentor: Dr. Anna Lipzen |
Comparison Analysis of HiSeq & NovaSeq
Antonio Gonzalez1,2, Matthew J. Blow, PhD2, and Anna Lipzen, PhD2; School of Natural Sciences, University of California, Merced1; Computational Analysis Group, DOE Joint Genome Institute2
DNA resequencing involves sequencing genomic DNA of an organism for purposes of identifying sequence variants compared with a related organism with a sequenced reference genome. The most recent generation of high-throughput sequencing technologies has provided unprecedented opportunities for high-throughput functional genomic research. At the Department of Energy Joint Genome Institute over 100 trillion bases of DNA are sequenced annually on Illumina’s HiSeq 2500 System. Illumina’s newest iteration, the NovaSeq 6000 System was recently installed at JGI with plans to eventually replace the HiSeq System as the main sequencing platform. This report evaluates the use of the NovaSeq System for resequencing projects by performing a comparison analysis between both platforms on an E.coli data sample. We sequenced the same DNA sample on established technology (Illumina HiSeq) and on the new NovaSeq System. We compared both datasets to the reference genome and identified sequence variants. At the individual read level, Novaseq data contained more mismatches per base, due to an increased error rate. However, the consensus data from the two platforms was identical, confirming that Novaseq is suitable for high throughput re- sequencing projects.
Mahrukh Mujeeb
Major: Biological Sciences Home City: Contact: Faculty Mentor: Dr. Tatiparthi Reddy |
Quality Control and Metadata Management in Genomes Online Database (GOLD)
Mahrukh Mujeeb,1,2 Supratim Mukherjee, PhD1, and Tatiparthi Reddy, PhD1; DOE Joint Genome Institute, Lawrence Berkeley National Laboratory1; School of Natural Sciences, University of California, Merced2
Genomes OnLine Database (GOLD) is a manually curated database that captures metadata for genomes and metagenome sequencing projects from around the world. Currently GOLD is one of the largest repositories worldwide. All projects in GOLD are organized based on a four level classification system: Study, Organism (for isolates) or Biosample (for environmental samples), Sequencing Project and Analysis Project. Currently, GOLD provides information for 32 598 Studies, 295 254 Organisms, 36 216 Biosamples, 164 380 Sequencing Projects and 144 757 Analysis Projects. Data in GOLD comes from three different sources: JGI (Joint Genome Institute) internal projects, from external users and from public database resources like NCBI (National Center for Biotechnology Information). GOLD implements standardized metagenome sample naming. This involves curating external user entered projects and imported projects as per GOLD nomenclature standards. The current research study as an intern is focused on Metadata Management and Quality Control in Genome OnLine Database (GOLD). Specific project goals for the research work are: Identifying and associating genome publications to GOLD sequencing projects, Capturing metadata from publications and Geolocation information curation/management. GOLD database is used along with Google Map Program, Two journals: Standards in Genomic Sciences and Genome Announcement as well as PubMed sources as the methodology for the research work. Approximately 12 000 biosamples’ and 7000 organisms’ geolocations are curated and 350 genome publications are associated. This research work is important as it aids in comparative analysis and hypothesis testing.
Sai Prabhakar
Major: Biological Sciences Home City: Contact: Faculty Mentor: Dr. Axel Visel |
Genome-wide Identification of Photoperiod Dependent Bacterial Plant Colonization Genes
Sai Prabhakar1,2, Benjamin Cole, PhD1, and Axel Visel, PhD1; DOE Joint Genome Institute, Lawrence Berkeley National Laboratory1; School of Natural Sciences, University of California, Merced2
Plants have different lifestyles depending on the latitude of their habitat. Plants that have evolved to grow in northern or southern latitudes tend to be sensitive to day length, as this tends to be a good predictor of environmental features associated with seasonality. Since metabolites exuded from roots are essential to sustain bacterial colonization of the root, and photoperiod alters the allocation of starches and other metabolites, we hypothesize that photoperiod will significantly alter the ability of bacteria to colonize roots, and change the functional significance of colonization associated genes. To test these hypotheses, we colonized Arabidopsis seedlings with a transposon mutagenesis library of P. simiae WCS417r under short and long conditions, and harvested bacteria from colonized roots after 7 days. We will compare overall colonization levels under short and long days, as well as the fitness of each insertion mutant strain under these conditions. Should a difference in the microbial community arise, we will be able to identify potential colonizing genes in P. simiae based on either on their genetic or metabolic effect.
Brenda Yu
Major: Biological Sciences Home City: Contact: Faculty Mentor: |
Salinity and Wetland Restoration Alter Soil Microbial Phylogenetic and Functional Diversity
Brenda Yu1,2, Wyatt H. Hartman, PhD1, and Susannah G. Tringe, PhD1; DOE Joint Genome Institute, Lawrence Berkeley National Laboratory1; School of Natural Sciences, University of California, Merced2
Wetlands cover about 9% of Earth’s land surface area and store around 35% of global terrestrial carbon. In the San Francisco Bay and Delta, efforts to restore converted wetlands have been motivated by their potential to store carbon, although harmful emissions of methane (CH4) can result in this habitat serving as a greenhouse gas source instead of sink. Studying microbial diversity across historic and restored sites will increase the understanding of carbon cycling factors by uncovering associations with biogeochemistry. Finding predictive relationships between diversity and salinity will explain how stress can be a selective force in microbial composition, revealing indicators of resilience and adaptability. We hypothesized that the phylogenetic and functional gene diversity would be lower in restored wetlands compared to natural wetlands, and in sites of high salinity compared to low salinity. Sixteen sites spanning a range of salinities (0-62 ppt) and restoration status (historic or restored) were sampled throughout the San Francisco Bay- Sacramento Delta region. Alpha diversity of 168 samples was plotted with corresponding salinity measurements. Microbial communities showed a decreasing trend in diversity across an increasing salinity gradient, while restored wetlands contained less diversity when compared to historic wetlands of comparable salinity. The study contributes a more comprehensive understanding of significant relationships between microbial diversity and environmental factors.