Leveraging “agrigenomics” for crop improvement

Melaku Gedil (m.gedil@cgiar.org) and Ismail Rabbi
M. Gedil, Head, Bioscience Center; I. Rabbi, Postdoctoral Fellow (Molecular Genetics), IITA, Ibadan, Nigeria

Harnessing state-of-the art genomics technologies
The potential application of “Omics” technology, as demonstrated by the steadily growing impact of biosciences, in alleviating the multitude of constraints in agricultural production is rapidly becoming a reality with the advent of next-generation DNA sequencing and genotyping technologies, high throughput (HTP) metabolomics and transcriptomics, informatics, and decision-making tools. These technologies, together with rapidly evolving bio-computational tools, are accelerating the discovery of genes and closely linked molecular markers underlying important traits, leading to the rapid accumulation of genomic resources necessary for devising an efficient and effective breeding strategy geared toward the faster development of varieties of choice.

Researchers in IITA's Bioscience Center. Photo by L. Kumar.
Researchers in IITA's Bioscience Center. Photo by L. Kumar.
The state-of-the-art technologies including the next-generation sequencing (NGS) for genome and transcriptome analysis, as well as genotyping-by-sequencing (GBS) are being adopted in R4D programs at IITA. For instance, the NGS through outsourcing and multi-partner collaboration; the RNAseq for HTP expression study in cassava; the Illumina’s Golden Gate Assay for HTP single nucleotide polymorphism (SNP) genotyping in cassava, soybean, and maize as well as GBS in maize and cassava. Data generated by these techniques are being applied for marker-assisted recurrent selection (MARS) of drought-tolerant maize, and genome selection (GS) for high-yielding, disease-resistant cassava.

Development of an integrated molecular breeding platform
The new technologies, however, are very data-intensive and demand advanced computational and communication technologies and infrastructure for data acquisition, analysis, and management. For the effective integration of genomics technologies in our breeding schemes, we are building capacity (connectivity to the internet, the necessary hardware/software, and skilled personpower) to acquire, store, and analyze terabytes of data.

The Generation Challenge Program (GCP) of the CGIAR is developing an integrated breeding platform (IBP) to build a comprehensive and integrated crop information system enabling linkages among molecular, phenotypic, and pedigree data. The maize version of International Crop Information System (ICIS), dubbed International Maize Information System (IMIS), has been expanded to include all pedigrees of IITA maize under the Drought Tolerant Maize for Africa (DTMA) project. It has some functionality in terms of molecular data storage but this is limited and we are now generating data sets of hundreds of thousands of markers per line that require different storage solutions. The GCP is consulting with other initiatives such as iPlant and DArT and is working on collaboratively creating solutions for the needs of several user-cases including DTMA, Tropical Legumes (TL)-I, and TL-II projects. In the IBP initiative, IITA is the leading crop center to host the main web-accessible databases of cassava, cowpea, yam, and soybean. The form and functionality of the databases are still a work in progress although activities are ongoing in the application of current versions of ICIS to cassava, yam, and cowpea.

In view of the IBP initiative, we are developing a bioinformatics capacity to (a) manage the newly generated genomic resources of IITA’s research crops, particularly those clonally propagated, (b) use the genomic resources in the public sector for soybean and maize, (c) use comparative genomics techniques for other African orphan crops of high importance, such as cassava, yam, and cowpea, and (d) create a bioinformatics center of excellence to train and provide access for African research scientists.

HTP by genotyping and informatics support tools
The increasing affordability of the NGS technologies has shifted critical consideration from genotyping to phenotyping. According to leading experts, it is now cheaper to genotype than to phenotype a plant. Quality phenotypic data are essential for the interpretation and use of the deluge of genomic data to identify the changes in DNA sequences that influence important traits. The fact that priority agronomic traits are complex and polygenic and interact with the environment necessitates conducting extensive and precise multi-environment evaluations of candidate breeding materials (over several years and in several locations). Therefore, there is a need to invest in precision phenotyping of traits and data capture (from electronic sample tracking to non-invasive HTP) through the use of hand-held devices such as barcode readers and near-infrared spectroscopy. Efforts are being made to develop rapid and accurate phenotyping protocols to integrate with genomic tools in establishing breeding schemes at IITA.

A wide array of techniques and tools is being deployed to associate molecular markers with desirable phenotypic traits. Associated markers can be used to accelerate germplasm enhancement via MARS, marker-assisted backcrossing for the introgression of disease resistance and other simple traits, hence bypassing the necessity of evaluating breeding materials in the field; MARS for rapid cycle population improvement in bi-parental crosses based on genomic estimated breeding value; and GS based on a model developed with a training population to select untested samples.

Our efforts to harness the unparalleled scientific progress in the fields of genomics and bioinformatics are expected to find solutions to the recalcitrant problems confronting small-holder farmers in sub-Saharan Africa.

Genomic tools for improving African crops

Melaku Gedil, m.gedil@cgiar.org

The increase in genomic techniques in the past few decades has thrown the doors of research wide open to agricultural scientists. Conventional breeding has been augmented by various innovative molecular marker-aided techniques. The first wave of molecular marker technology introduced biochemical markers (isozymes and allozymes).

Digital imaging and microscopy: a tool for research and training. Photo by O. Adebayo, IITA
Digital imaging and microscopy: a tool for research and training. Photo by IITA

These quickly gave way to the first generation DNA-level markers such as Restriction Fragment Length Polymorphism (RFLP, DNA analysis), Randomly Amplified Polymorphic DNA (RAPD), Amplified Fragment Length Polymorphism (AFLP), and simple sequence repeat (SSR)—all mouthfuls to the layperson. Those that lend themselves well to automation and multiplexing (use of simultaneous or more than one set of primers in the reaction mix) prevailed because of their cost-effectiveness.

Advances in sequencing technology enhanced the use of DNA sequence-based markers such as SSR and single neuclotide polymorphism (SNP), allowing the development of automated, high throughput (output) genotyping platforms. In a decade, the cost of genotyping has dramatically declined with various techniques developed that allow flexibility under different circumstances. This emphasized the feasibility of molecular breeding.

New tools
Some of the new molecular biology tools used at IITA include molecular markers for marker-assisted breeding, resistance gene analogs (RGA), Targeting Induced Local Lesions In Genomes (Tilling), DNA chips, application of DArT markers, and bioinformatics.

In IITA, the development of new genomic tools for molecular breeding and gene discovery is under way for the mandate crops. For instance, new markers have been identified, in silico (online), from cassava Expressed Sequence Tags and hundreds of markers validated using a diverse panel of cultivated cassava varieties. After filtering with various criteria, over a hundred new markers were developed, useful for fingerprinting and other molecular genetic applications.

The rapid accumulation of genome sequence data led to the development of an array of functional genomics tools that are being used to understand the complex pathways involved in host plant–pathogen interaction. The RGA technique has applications in cloning, profiling, and host–pathogen interaction.

Photomicroscopy of transformed material, Biotech Lab, IITA. Photo by IITA
Photomicroscopy of transformed material, Biotech Lab, IITA. Photo by IITA

The RGA technique was used in IITA to assess DNA sequence variation in several elite cassava clones, resulting in several novel sequences, some of which were found to be similar to previously reported RGAs. This information is expected to facilitate the identification of gene-targeted markers for molecular breeding and gene discovery in cassava.

Another new tool is Tilling, a popular technique of reverse genetics for detecting mutations in a target gene, followed by the assignment of phenotypes to the gene sequence. It rapidly gained popularity because it is suitable for automation and for screening thousands of samples. Besides being a non-GMO approach for broadening the genetic base, it provides tools for developing markers for marker-assisted breeding for traits that are cumbersome and expensive to measure.

Tilling work to discover induced and natural mutation in cassava was geared towards specific traits that are intractable (or not easily managed or manipulated) using conventional methods. Adaptation of the technique to other IITA mandate crops such as yam, banana, and cowpea entails the selection of target tissue or organ for mutation, and the selection of similar or different target genes. Crops such as maize and soybean have numerous germplasm resources that can be easily adopted and adapted.

Knowledge of the nucleotide sequences of the target genes is a prerequisite for Tilling. The major IITA mandate crops—cassava, yam, and banana—have very limited genomic resources. To date, nucleotide sequence information for a very few, largely chloroplast, genes could be found in Entrez Gene. Investigations in the past decade resulted in the cloning and characterization of expressed cassava genes involved in starch, cyanogen glucosides, and carotenoid biosynthesis. However, even in the absence of a nucleotide sequence for the gene of interest, comparative genomics has been successfully used to identify candidate genes. The completion of the genome sequence of poplar and, more recently, of castor bean, is expected to provide useful genetic tools for identifying candidate genes in cassava. Besides, the ongoing cassava genome sequencing is anticipated to be completed soon, opening a new avenue of research in functional postgenomic studies such as Tilling.

A genome-wide 14K DNA chip for cassava (left) and a scan showing 14,000 different genes (right). Photo by IITA
A genome-wide 14K DNA chip for cassava (left) and a scan showing 14,000 different genes (right). Photo by IITA

DNA chips have also become popular tools for gene discovery and also for diagnostics. They also provide a reverse genetics tool for identifying gene-targeted markers for molecular breeding. A genome-wide DNA microarray for cassava with ~14,000 probes has been developed at IITA. This is the most comprehensive DNA chip for cassava available to date. This microarray has been used for transcriptome analysis of cassava. Candidate genes that are differentially expressed after virus infection have also been identified.

A cassava DArT chip with 735 polymorphic markers was used to fingerprint a diverse cassava population comprising genotypes from Africa, Latin America, Asia, and breeder lines maintained at IITA. Overall reproducibility of the marker set was very high and average call rate was 97%. DArT markers provide reliable and high throughput molecular information for managing biodiversity in germplasm collections and make rapid genome profiles possible for quantitative trait loci (QTL) mapping.

Advances in bioscience technologies such as sequencing, synthesis, imaging, and various other nanoscale assays, have dramatically increased the volume of biological data, which in turn, started the concurrent growth of bioinformatics tools. Bioinformatics is broadly defined as the application of computer technology to the storage, retrieval, and analysis of large amounts of biological information.

The major areas of high-end bioinformatics include the development of databases and algorithms for analyzing and annotating various types of microarray platforms, high-density oligonucleotide chips, variety of mass spectrometry, and diverse platforms of new-generation sequencing data. However, the majority of life science scientists and investigators tend to turn to the Internet to seek end-user web tools and resources (software packages). Countless institutions in the West provide a myriad biological data resources and services, including expert-curated databases of nucleic acid and protein sequences, data and text mining tools, genome and transcriptome analysis; protein and other macromolecular structure analysis; networks, pathways, and systems biology; evolution and systems biology tools.

The major tools in the public domain are, however, the development of peer-reviewed, up-to-date, web-accessible databases and web tools (analysis software packages). These resources typically provide an advanced query interface.

User accessing virtual knowledge repository. Photo by J. Oliver
User accessing virtual knowledge repository. Photo by IITA

The explosive growth of web sites has necessitated that users distinguish between inaccurate personal web sites and reliable resources maintained by a consortium of investigators and/or a legitimate institution. The journal Nucleic Acid Research began to publish annually a collection of molecular biology databases and bioinformatics links directory. The most recent updates of molecular biology databases feature over 1000 databases, over 300 of which are on plants, whereas the latest Bioinformatics Links Directory published by the same journal lists over 1200 links.

Another outstanding issue in the use of online bioinformatics tools is that, as the number of such web resources grows astronomically, even learning how to use the interface is becoming cumbersome, prompting the need for one-stop gateway type of tools for integrated querying (e.g., BioMart, OBRC from the University of Pittsburgh; Bioclipse).

One of the advances in bioinformatics is the availability of programming and scripting languages (Perl, Bioperl, Phyton, and Java) for automating complex but routine steps, such as search, retrieval, and parsing (resolving into and examining component parts) search results. While varieties of commercial integrated analysis packages are available, the cost of initial installation and maintenance becomes prohibitive. Developing our capacity for such routine end-user applications is vital to the support of our molecular biology work.

African researchers working on well-studied crops such as rice, wheat, maize, and soybean will have the best genomic resources at their fingertips, provided that they have Internet connection. To take advantage of publicly accessible web resources, including the variety of databases, online software, publications, and multimedia learning materials, African scientists and students need institutional support and considerable internal and external funding. As in other fields of science, bioinformatics lags in SSA due, partly, to poor or nonexistent Internet connection. Fast and broad Internet connection is the key to successful online research.

Research in molecular biology is slowly gaining ground in Africa. Any molecular biology research needs to be augmented by a bioinformatics database and online tools.

There is no shortage of available tools for agricultural research or agricultural information and database management. The challenge is in finding the best ones or combinations that suit institutional needs, resources, or preferences.

IITA will continue to use suitable and affordable conventional and new genomic tools to undertake research on its mandate crops.