U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • BioTechnologia (Pozn)
  • v.103(1); 2022

Logo of biotech

CRISPR/Cas9 in plant biotechnology: applications and challenges

CRISPR – clustered regularly interspaced short palindromic repeats

Cas9 – CRISPR-associated protein 9

GMO – genetically modified organism

PBT – plant biotechnology

SSN – sequence-specific nucleases

TALENs – transcription activator-like effector nucleases

ZFNs – Zinc finger nucleases

The application of plant biotechnology to enhance beneficial traits in crops is now indispensable because of food insecurity due to increasing global population and climate change. The recent biotechnological development of the Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)/CRISPR-associated system 9 (Cas9) allows for a more simple and precise method of gene editing, which is now preferred compared to Zinc Finger Nucleases (ZFNs) and Transcription Activator-like Effector Nucleases (TALENs). In this review, recent progress in utilizing CRISPR/Cas9-mediated gene editing in plants to enhance certain traits in beneficial crops, including rice, soybean, and oilseed rape, is discussed. In addition, novel methods of applying the CRISPR/Cas9 system in live cell imaging are also extensively reviewed. Despite all the applications, the existing delivery methods of CRISPR/Cas9 fail to provide consistent results and are inefficient for in planta transformation. Hence, research should be focused on improving current delivery methods or developing novel ones to facilitate CRISPR/Cas9-based gene editing studies. Strict regulations on the sale and commercial growth of gene-edited crops have restricted more efforts in applying CRISPR/Cas9 technology in plant species. Therefore, a shift in public viewpoint toward gene editing would help to propel scientific progress rapidly.

Introduction

With the rising demand of food security due to the ever-increasing population growth coupled with a looming threat of climate change (Haque et al. 2018 ; United Nations 2019 ), the urgency to develop reliable and efficient methods to secure a steady and sufficient nutrition to the global population is higher than ever. Hence, the role and application of plant biotechnology to engineer plants to suit global agricultural demands are now indispensable. Plant biotechnology (PBT), in essence, comprises the set of scientific methods and techniques used to identify and manipulate plant genes in order to develop desired traits or specific products in plants (Kalia, 2018 ). By using the methods available in PBT, beneficial traits of crops can be expressed and amplified, while undesirable traits and components such as allergens in rice, peanuts, or soybeans can be eliminated (Fuchs and Mackey, 2003 ; Barh and Azevedo, 2018 ).

The emergence of Sequence-Specific Nucleases (SSNs) such as Zinc Finger Nucleases (ZFNs), Transcriptional Activator-like Effector Nucleases (TALENs), and the more recently developed Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)/CRISPR-associated protein 9 (Cas9) are among the advanced methods that allowed a less sporadic and more precise means of genetic modifications in plants (Baltes et al., 2014 ; Fauser et al., 2014 ; Endo et al., 2016 ; Li et al., 2016 ; Sun et al., 2016a ; Sun et al., 2016b ). A detailed review on the comparison between the abovementioned three methods was done by Sun et al. ( 2016a ). The CRISPR/Cas9 system is generally preferred over the other methods because of its precision, efficiency, simplicity, and cost-effectiveness. Hence, it has gained attention in the genome editing community (Haque et al., 2018 ; Wang et al., 2018 ). Since the discovery of the first CRISPR locus by Ishino et al. ( 1987 ) and the pioneering extensive study on the CRISPR/Cas system by Jansen et al. ( 2002 ), the technology has been further studied for genome editing of various organisms.

In essence, CRISPR/Cas9 technology exploits the adaptive immunity system of the bacteria Streptococcus pyogenes in DNA repair to modify the genetic sequences or even edit the genome of the targeted organism. This is achieved by constructing a single guide RNA (sgRNA) specific to the target DNA sequence, which forms a complex with the Cas9 protein, thereby initiating specific doublestranded breaks in the target DNA, as shown in Fig. 1 (Costa et al., 2017 ). The double-stranded breaks enable further gene editing as shown in Figure 1D and Figure 1E . Many studies have been conducted to describe the mechanism of the CRISPR/Cas9 system, as were extensively reviewed by researchers (Sander and Joung, 2014 ; Westra et al., 2014 ; Bortesi and Fischer, 2015 ; Ma et al., 2015 ; Ma et al., 2016 ; Musunuru, 2017 ; Adhikari and Poudel, 2020 ). Hence, to follow suit, this review discusses 1) the recent advances in utilizing the CRISPR/Cas9 system in a diverse range of plant species for crop enhancement and facilitate plant cell imaging, 2) the current challenges faced regarding the delivery methods of CRISPR/Cas9 reagents into plant cells, and 3) the regulatory systems of gene edited crops compared to those for genetically modified organisms (GMOs).

An external file that holds a picture, illustration, etc.
Object name is BTA-103-1-46486-g001.jpg

Mechanism of CRISPR/Cas9 gene editing. A) The constructed target-specific single guide RNA (sgRNA) forms a complex with the Cas9 protein; B) The CRISPR/Cas9 complex binds to the target DNA; C) The CRISPR/Cas9 cleaves the target DNA at specific sequences, leading to further gene editing; D) Gene knock-in through homology-directed repair (HDR); E) Gene knock-out through non-homologous end joining (NHEJ)

Applications of CRISPR/Cas9 genome editing in plant species

Certain phenotypes or traits that are expressed by plants, or in this case crops, can be tweaked and adjusted through the manipulation of their genes. In doing so, the expected outcome would be to produce an enhanced version of the crop, which can be beneficial to the general population from certain aspects. The precision of CRISPR/Cas9 technology ensures a highly reliable method in genome editing that does not randomly produce unforeseen alterations elsewhere in the genome (Schiml et al., 2016 ). The efforts in trying to apply CRISPR/Cas9 genome editing in plants have been widespread since the discovery of the technology. Prior to applying the genome editing technology to crops, much of the research was conducted on Arabidopsis thaliana as a model plant organism because of its convenience and usefulness in genetic experiments (Koornneef and Meinke, 2010 ; Lee et al., 2018 ).

For example, A. thaliana was used as a model plant in implementing a sequential transformation method, which improved CRISPR gene targeting (Miki et al., 2018 ). The efficiency of pKAMA-ITACHI Red vector in CRISPR/Cas9 was also first investigated in A. thaliana when a study involving genes such as PDS3, AG , and DUO1 , was conducted by Tsutsui and Higashiyama ( 2017 ). After the initial validation on A. thaliana , the potentials of the technology are being further explored in other plant species. Some examples of plants and crops that have been successfully manipulated using CRISPR/Cas9 technology during the recent years are outlined in Table 1 .

Examples of successful genome editing of plant species

Improvement on quality of crops

One of the more impactful applications of CRISPR/Cas9 technology from the sustainability aspect is the ability of the genome editing tool to enhance the quality of agricultural products. Rice, a major food source for the global population (Fukagawa and Ziska, 2019 ), was first successfully manipulated using the CRISPR/Cas9 technology by Miao et al. ( 2013 ), who demonstrated the possibility of applying the system for targeted mutations in rice. Since this finding, many efforts have been channeled to elucidate the functions of individual genes and observe the effect of gene alterations in rice in the hope to apply the findings practically. An example would be a study by Guo et al. ( 2020 ), who used CRISPR/Cas9 to both induce overexpression and knockout the OsProDH gene in rice. The OsProDH gene encodes for a mitochondrial enzyme, proline dehydrogenase, which is responsible for the degradation of proline in rice. Proline plays a significant role in protecting plants from various biotic and abiotic stresses by inducing diverse physiological responses of the plants and by scavenging reactive oxygen species (ROS) (Hayat et al., 2012 ). It was found that mutation in OsProDH in rice resulted in the accumulation of proline, which in turn led to lower levels of ROS (Guo et al., 2020 ). Hence, by manipulating the OsProDH gene and subsequently the metabolism of proline, higher thermotolerance could be conferred onto rice (Guo et al., 2020 ).

On the other hand, salt-tolerant rice can be potentially achieved through the manipulation of the OsNAC45 gene (Yu et al., 2018 ; Zhang et al., 2020b ). Through the regulation of several other plant stress response genes ( OsCYP89G1 , OsDREB1F , OsEREBP2 , OsERF104 , OsPM1 , OsSAMDC2 , OsSIK1 ), OsNAC45 may be significant in regulating abscisic acid signal responses in rice, which could be the key to produce rice with increased salt tolerance (Zhang et al., 2020b ). In another study, CRISPR/Cas9 was used to elucidate the role of polygalacturonase in regulating the cell wall immune response through the gene OsPG1 (Cao et al., 2021 ). This not only deepens understanding of the role of cell wall integrity in plant immune response but also highlights the potential to exploit the cell wall physiology in conferring bacterial resistance. Other research studies used CRISPR/Cas9 in a similar manner with the aim of producing an observable effect either through gene overexpression or gene knockout in rice. Through these methods, various genes have been identified and successfully manipulated, for example, genes responsible for traits such as pigment (anthocyanin) content (Zheng et al., 2019 ; Hu et al., 2020 ), resistance to disease (bacterial blight and blast disease) (Zhou et al., 2015 ; Wang et al., 2016 ; Kim et al., 2019 ), and grain length (Li et al., 2020a; Usman et al., 2021 ). These findings prove that CRISPR/Cas9 technology is undoubtedly effective in manipulating traits in rice, and it is expected that these methods can be applied to generate more resilient, robust, and nutritious rice, which can drive global sustainability.

CRISPR/Cas9-based mutagenesis has also been successfully performed on other impactful plant species with significant mutation efficiency. The first application of CRISPR/Cas9 gene editing in soybean was conducted by Jacobs et al. (2015) where gene knockout was performed on the green fluorescent protein (GFP) gene. This pioneer work kickstarted numerous efforts in applying CRISPR/Cas9 gene editing in soybean. Han et al. ( 2019 ) utilized CRISPR/Cas9 to induce a targeted mutation in the E1 gene in controlling soybean flowering and found that the truncation of the E1 protein prevented the inhibition of the GmFT2a/5a gene, increased its expression, and led to an earlier flowering time under long-day (LD) conditions. This transformation led to the development of a photo-insensitive soybean variant, which is potentially suitable for the introduction of soybean in higher latitudes (Han et al., 2019 ).

Similarly, a study conducted by Cai et al. ( 2020 ) demonstrated the role of the GmFT2a/5a gene in soybean in regulating flowering times and yield under different photoperiods by comparing double-knockouts and overexpression of the gene using CRISPR/Cas9 technology. These findings collectively established the involvement of certain genes in soybean that may contribute to its adaptability in different environments and conditions. Furthermore, the GmFT2a/5a double-knockout mutants were found to produce a significantly higher amount of pods and seeds per plants as compared to the wild-type plant, despite having a longer flowering time (Cai et al., 2020 ). In addition, the overexpression of GmPRR37 was found to lengthen flowering time under LD conditions and was involved in downregulating the aforementioned GmFT2a/5a , which promotes flowering, and in upregulating GmFT1a that inhibits flowering, thereby contributing to the regional adaptability of soybean (Wang et al., 2020 ). From these results, soybean variants with a higher productivity can be bred and adapted to a more diverse environment. Triple knockouts of GmF3H1 , GmF3H2 , and GmFNSII-1 were effectively performed using a multiplex CRISPR/Cas9 system in soybean and resulted in an increase in isoflavone content within the plants that at the same time conferred enhanced resistance to the soybean mosaic virus (SMV) (Zhang et al., 2020a ). Several genome edits in soybean were successfully inherited to subsequent generations (Han et al., 2019 ; Zhang et al., 2020a ), indicating that selective breeding of CRISPR/Cas9-edited soybean could potentially generate beneficial novel crop variants. However, the inheritance of CRISPR/Cas9 mutations requires further studies as the efficiency of its occurrence is still rather sporadic.

Oilseed rape

Oilseed rape ( Brassica napus ), also known as rapeseed, is another impactful crop that is notable for the production of edible oils (Cartea et al., 2019 ). The success in CRISPR/Cas9-mediated mutagenesis of rapeseed was first reported by Yang et al. ( 2017 ) where 12 genes from four gene families ( BnaA9.RGA , BnaC9.RGA , BnaA6.RGA , and BnaC7.RGA from the BnaRGA family; BnaA9.FUL , BnaC2.FUL , and BnaC7FUL from the BnaFUL family; and BnaA2.DA2.1 , BnaA2.DA2.2 , BnaC6.DA2 , BnaC5.DA1 , and BnaA6.DA1 from the BnaDA2 and BnaDA1 families) were tested in the study. Subsequently, stable inheritance of the induced mutations by the following progeny was observed in the study, indicating the effectiveness of CRISPR/Cas9 in producing an enhanced variant of oilseed rape (Yang et al., 2017 ). Following this study, Jiang et al. (2018) successfully identified the role of the BnaSDG8.A and BnaSGD8.C genes in promoting the expression of histone 3 lysine 36 (H3K36) methyltransferase, consequently influencing floral transition in oilseed rape as well as mutating the aforementioned genes to produce an early flowering phenotype. In addition, silencing the BnSFAR4 and BnSFAR5 genes in CRISPR/Cas9-mediated double gene knockout could increase the seed oil content (SOC) in oilseed rape without affecting seed germination, vigor, and oil mobilization, as demonstrated by Karunarathna et al. ( 2020 ). In another study, CRISPR/Cas9-mediated cytosine base-editing (CBE) was used in mutating the BnALS1 gene by introducing a C to T conversion at the specific region (Wu et al., 2020 ). This mutation produced a mutant oilseed rape that could resist tribenuron-methyl, a herbicide commonly used against weeds (Wu et al., 2020 ). Hence, the development of herbicide resistance in oilseed rape will help farmers in weed management. Taken together, these findings help to drive the productivity and to simplify the management of oilseed rape crop.

Other crop species

Currently, CRISPR/Cas9 genome editing has been demonstrated to be successful on a number of influential crops such as maize (Liu et al., 2020 ; Li et al., 2020b), wheat (Hayta et al., 2019 ; Liu et al., 2020 ), and apples (Pompili et al., 2020 ), with a relatively high transformation efficiency (Haque et al., 2018 ; Adhikari and Poudel, 2020 ). The sequencing of novel plant genomes had widened the applications of CRISPR/Cas9 genome editing in testing higher number of genes in various plant species. CRISPR/Cas9 was recently reported to be effective in knocking out the phytoene desaturase gene in muskmelon ( CmPDS ), which is the first reported study to apply CRISPR/Cas9 genome editing on the species (Hooghvorst et al., 2019 ). The same PDS gene was also successfully knocked out to produce an albino phenotype in CRISPR/Cas9 genome editing pioneering studies on watermelon and apples (Nishitani et al., 2016 ; Tian et al., 2017 ). However, the rate of inheritance by the subsequent generations of transgenic plants could not be investigated through PDS gene knockout as the albino variants had low in vitro survival rates (Hooghvorst et al., 2019 ); hence, other genes should be targeted to determine the rate of inheritance of mutations in these plant species.

Targeted mutagenesis in sweet orange was achieved by Jia and Nian ( 2014 ), where a novel tool for delivering the CRISPR/Cas9 reagents was developed for citrus plants through the Xcc-facilitated agroinfiltration, and involved the use of Xanthomonas citri subsp. citri (Xcc) to infect the citrus plant. Knockout of the CsWRKY22 gene in Wanjincheng orange using CRISPR/Cas9 genome editing exhibited enhanced resistance toward citrus canker, a destructive disease in citrus plants caused by Xcc, thereby further establishing the efficacy of CRISPR/Cas9 technology in citrus (Wang et al., 2019 ). Similar enhancement of disease resistance was observed in apples where the successful CRISPR/Cas9-mediated gene knockout of MdDIPM4 conferred increased resistance to Erwinia amylovora , a bacterium that causes fire blight disease in apples (Pompili et al., 2020 ). Pompili et al. ( 2020 ) could successfully clear CRISPR/Cas9 reagents from the genome by using T-DNA removal, which reduced the chances of occurrence of unnecessary or off-target mutations. As a conclusion, CRISPR/Cas9 technology can be applied to a diverse range of plant species and can produce a multitude of effects expressed by the plants. It is expected that the benefits of CRISPR/Cas9-edited crops and products would be able to reach the consumers. This, however, comes with its own set of challenges, one of which will be discussed in the later sections.

Live cell CRISPR imaging

Conventional cellular imaging methods applied in subnuclear dynamics studies such as fluorescence in situ hybridization (FISH) (Langer-Safer et al., 1982 ; Schwarzacher and Heslop-Harrison, 1994 ; Wu et al., 2019 ) are limited by the need of cellular fixation and the heat denaturation step that influence chromatin structure and organization, consequently impeding temporal studies in plant cells (Kozubek et al., 2000 ; Boettiger et al., 2016 ; Dreissig et al., 2017 ). Live cell imaging in plants allows spatiotemporal organization of chromatin to be studied in greater detail, which may potentially deepen the understanding of various gene expression patterns. Novel approaches in live cellular imaging tend to use Zinc Fingers (ZFs) or Transcription Activator-like Effectors (TALEs), which are proteins that can be programmed to bind to specific DNA sequences (Qin et al., 2017 ; Wu et al., 2019 ). Even though ZFs and TALEs are more flexible than FISH, there are technical challenges that one has to face as complicated processes are involved in constructing a large array of ZFs and TALEs proteins (Qin et al., 2017 ) and in constructing their expression vectors capable of targeting multiple DNA sequences (Wu et al., 2019 ). The necessity of re-engineering TALEs in targeting to a new gene sequence is also time-consuming and labor-intensive (Khosravi et al., 2020 ).

In view of the limitations of ZFs, TALEs, and FISH, researchers are utilizing the CRISPR/Cas system to achieve a live cell imaging method with greater flexibility and to overcome the limitations of visualizing non-repetitive regions (Dreissig et al., 2017 ). In this most recent approach, the nuclease activity-deficient dead Cas9 (dCas9), which was shown to possess specific DNA binding ability without DNA alterations (Qi et al., 2013 ; Dominguez et al., 2016 ), is combined with a fluorescence protein (FP) to visualize telomeric repeats in live leaf cells of Nicotiana benthamiana . The study proved the usefulness of this method to observe DNA-protein interactions in live plant cells (Dreissig et al., 2017 ). Telomere repeats in Nicotiana tabacum were also successfully labeled by transiently expressing dCas9-FP, mediated by an Agrobacterium vector (Fujimoto and Matsunaga, 2017 ). A protocol on conducting live plant cell imaging using CRISPR/Cas9 from S. pyogenes and Staphylococcus aureus was developed by Khosravi et al. ( 2020 ), where a telomere-specific guide RNA was used to target the telomeric sequences in N. benthamiana . Through these initial findings, the CRISPR/Cas9 imaging system shows potential for further development in visualizing gene sequences with low repetition or low abundance. Simple and reliable imaging of chromatin spatiotemporal organization would also ease further research on gene expression at various stages of the plant cell cycle. dCas9 can also be applied in gene expression inhibition, transcriptional regulation, gene promoter activation and for monitoring spatiotemporal patterns of gene expression in plants (Bikard and Marraffini, 2013 ; Yang, 2015 ; Arora and Narula, 2017 ). This shows that studies on a single system may potentially yield outcomes that can be beneficial and applied to multiple areas of interest. The potential of the CRISPR/Cas9 system has barely been explored, and more is yet to come.

Challenges in applying CRISPR/Cas9 technology in plants

As a relatively novel toolbox for genome editing, there are certainly some obstacles to be resolved when trying to apply CRISPR/Cas9 technology in plants. First, before any manipulation can be performed on the genome, the specific gene responsible for the intended function must be identified to enable precise editing. Despite the efforts conducted to sequence the genomes of many relevant plant species, there is still insufficient knowledge on the function of sequenced genes within the plants’ genome, which impedes efforts in precision editing to produce intended effects (Haque et al., 2018 ; Adhikari and Poudel, 2020 ). Fortunately, by conducting Genome-Wide Association Studies (GWAS), gene functions can be effectively predicted with accuracy, which can drive further research on necessary manipulations in plants. For instance, Zheng et al. ( 2019 ) discovered the genes OsC1 and OsRb that are involved in regulating anthocyanins in rice leaf. This enabled Hu et al. ( 2020 ) to further use the CRISPR/Cas9 technology in manipulating anthocyanin levels in rice. A similar approach was also undertaken to study the RDP1 gene of A. thaliana (Tsuchimatsu et al., 2020 ). Just as how GWAS can propel CRISPR/Cas9 plant editing, CRISPR/Cas9 technology is also used as an alternative method for cross population validation (Alseekh et al., 2021 ), such as to validate GWAS findings in rice ( Oryza sativa ) (Lu et al., 2017 ; Meng et al., 2017 ) and maize ( Zea mays ) (Liu et al., 2020 ). This provides an insight into the importance of establishing the causal relationships and interactions between genes that can further drive the development of CRISPR/Cas9 technology in plants (Yin et al., 2017 ).

Delivery and disposal of CRISPR/Cas9 reagents in plants

The delivery process of the necessary CRISPR/Cas9 components into intended cells remains a challenge to its application in plant and animal cells alike, especially in an in vivo setting (Li et al., 2015 ). Agrobacterium -mediated delivery using A. tumefaciens or A. rhizogenes is a commonly used method for plant transformation in various species (Ron et al., 2014 ; Mikami et al., 2015 ; Budiani et al., 2018 ; Hooghvorst et al., 2019 ; Mao et al., 2019 ; Pompili et al., 2020 ; Li et al., 2020b). Despite its popularity, there is still a degree of uncertainty when utilizing this method as its success depends on the choice of the plasmid and the cultivar used (Mangena et al., 2017 ). Various studies have reported that the A. rhizogenes -mediated transformation system could have been the cause of low transformation efficiency observed in soybean (Li et al., 2019 ; Bai et al., 2020 ; Zhang et al., 2020a ), rice (Butt et al., 2017 ; Usman et al., 2021 ), and tomato (Ron et al., 2014 ) genome editing. Varying culture conditions can also influence the infection and regeneration rates of the Agrobacterium- infected explants, which affects the reproducibility of the results obtained (Hamada et al., 2018 ) as observed in soybean (Li et al., 2017; Hada, 2018 ; Mangena, 2018 ), clover ( Trifolium subterraneum L.) (Rojo, 2021 ), and cassava (Nyaboga et al., 2015 ). This further showed inconsistencies observed in transformation efficiency through Agrobacterium mediated delivery. In addition, while a high degree of success was observed in A. thaliana , the feasibility of Agrobacterium -mediated transformation in other plant species such as soybean (Mangena et al., 2017 ), melon (Hooghvorst et al., 2019 ), and wheat (Zhang et al., 2018 ) is still questionable, where the regeneration of transgenic plants would require the use of explant-derived calluses (Mao et al., 2019 ). Hence, further studies are required to enhance the Agrobacterium -mediated delivery method to increase its transformation efficiency, effectiveness in diverse plant species, and its success for in planta transformation.

An alternative to the Agrobacterium -mediated delivery system is biolistic delivery (Carter and Shieh, 2015 ). Biolistic delivery is the direct delivery of DNA material into plant cells, where DNA is coated onto heavy metal particles such as gold or tungsten (Baltes et al., 2017 ). As the DNA-coated metal particles penetrate and get trapped inside plant cells, DNA can dissociate from the particles and become integrated into the host genome (Baltes et al., 2017 ). Although recent success in inducing in planta genome manipulation was observed in wheat ( Triticum aestivum L.), the mutation efficiency that was reported using the biolistic method remains very low, less than 6% of samples being mutated and less than 2% of samples with the mutations inherited (Hamada et al., 2018 ).

Another alternative involves the use of viral vectors as a delivery system for the CRISPR/Cas9 components. A study by Ma et al. ( 2020 ) utilizing the sonchus yellow net rhabdovirus (SYNV) to infect tobacco plants reported relatively high mutation efficiency with minimal costs, but the disadvantage of using viral vectors lies in the range of infectivity of the proposed virus. Nonetheless, reverse genetic tools can aid in expanding the range of infectivity for other rhabdoviruses (Ma et al., 2020 ). Hence, in planta genome editing using CRISPR/Cas9 is currently limited by the availability of effective delivery systems, and further studies and development of conventional and novel delivery methods would contribute to efficient research of CRISPR/Cas9 in plants.

CRISPR/Cas9-edited crop regulation

The ultimate goal of developing novel methods and innovations in applying the CRISPR/Cas9 technology in PBT is to enhance the quality of life of consumers through the production of transgenic plants or crops. Gene-edited organisms such as the ones edited using CRISPR/Cas9 technology involve mutagenesis of their genomes through either deletions, substitutions, or insertions of base pairs, while GMOs involve the introduction of a foreign genetic material or transgene into the organism that may or may not be integrated into the genome (Callaway, 2018 ). Despite this fundamental difference, gene-edited organisms are often governed by the same set of rules and regulations as those for GMOs in many countries (El-Mounadi et al., 2020 ). For instance, the Court of Justice of the European Union (CJEU) had recently ruled that gene-edited crops are not exempted by laws and regulations governing GM crops (Callaway, 2018 ; Confédération paysanne and others v. Premier ministre and Ministre de l’Agriculture de l’Agroalimentaire et de la Forêt, 2018 ). This implies that the high hurdles that were put in place in developing GM crops also apply to CRISPR/Cas9-edited crops, which may drive funding and investment away from future research on CRISPR/Cas9 as a viable plant breeding technology. The EU’s unchanging definition of GMOs as “not naturally altered” further impacted the public perception toward CRISPR technology and genetic modification as a whole (Plan and Eede, 2010 ). The road to gain public confidence toward GMOs on their safety, efficacy, and benefits is already riddled with various aspects of social, economic, and legal challenges (Zimny et al., 2019 ). However, shifting the public perspective toward gene technology is the key to trigger much needed changes across the board.

In contrast to the EU, the US Department of Agriculture (USDA) ruled out regulation of genome-edited plants, provided its production does not involve plant pests (USDA, 2018 ). In addition to highlighting the safety and the lack of risks involved with genome-edited plants, this new ruling would promote further progress in the development of the technology (Hoffman, 2021 ). The first of the genome-edited crops allowed to bypass USDA regulations is a CRISPR/Cas9-edited white button mushroom resistant to browning (Waltz, 2016 ). The USDA has also been continuously funding research involving CRISPR-edited plants such as rice ( O. sativa ) (Lee et al., 2019 ), pennycress ( Thlaspi arvense L.) (Jarvis et al., 2021 ), and cocoa ( Theobroma cacao ) (Fister et al., 2018 ). Integrating modern technological approaches into regulations that were designed for older technology cannot possibly be the way forward. In contrast, law and regulations require modernization to keep up with the transformative power of innovation. Hence, rather than treating old GMO regulations as an umbrella that cannot continuously cover new and upcoming technologies such as CRISPR, regulations need to be amended as necessary.

However, despite periodic updates in GMO regulations and the development of novel guidelines, Malaysia is yet to approve the commercial growth of genome-edited crops (Singh et al., 2019 ). Similar to EU, Malaysia’s regulatory system classified genome-edited crops under GMOs; hence, any plant or crops would be difficult to gain approval by the system (El-Mounadi et al., 2020 ). Although Malaysia is relatively reserved in approving gene-edited crop propagation in the open field, it allowed more than 30 cases of import of transgenic products, albeit solely for the purpose of consumption or processing, in addition to approving confined field tests of transgenic plants such as rubber and papaya (Singh et al., 2019 ).

To be fair, crops produced by CRISPR/Cas9 gene editing and other gene editing methods utilized globally challenge the conventional perspectives and definition of gene modification and GMOs. Hence, there is no doubt that the regulatory bodies worldwide are still adapting to the rapid development of this technology. Therefore, despite legal hurdles, researchers, investors, and consumers alike should retain their interests in the development and research of more beneficial crops so that the supply would be able to cope with the rise in food demand.

Conclusions

CRISPR/Cas9 has received much attention in recent years as a revolutionary technology to genetically manipulate organisms to suit our demands. While initial research and development studied were focused on animal cell lines, the utilization of CRISPR/Cas9 gene editing has now been expanded to be inclusive of a diverse range of plant species, specifically beneficial and important crops. Through the enhancement of agricultural crops, agricultural and nutrition demands are expected to be met in an effort to improve the global quality of life. It has also been shown that the potential of CRISPR/Cas9 is not limited to the improvement of phenotypical traits, as this technology can also be used in live plant cell imaging to facilitate scientific research. There could be additional new methods to exploit CRISPR/Cas9 in the coming years, and this development should be anticipated in the fast-paced modernized era of scientific innovation. Therefore, scientific progress should not be discouraged or even impeded by issues concerning outdated regulation systems. This, coupled with low public acceptance and valuation of GMOs and CRISPR in general (Shew et al., 2018 ), indirectly influence the availability of funding toward further research. However, with patience and collaborative efforts from scientific community in sharing the knowledge and presenting advances in practical aspects of science, a shift in public perspective toward not just CRISPR/Cas9 but gene editing as a whole, would help to propel rapidly scientific progress in genome editing.

This review did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Conflict of interest

The authors declare that there is no conflict of interest.

Acknowledgements

The authors wish to thank Prof. Hoe I. Ling of Columbia University (New York, USA) for his editorial input.

  • Adhikari P., Poudel M. (2020) CRISPR-Cas9 in agriculture: Approaches, applications, future perspectives, and associated challenges . Malays. J. Halal Res . 3 ( 1 ): 6–16. 10.2478/mjhr-2020-0002 [ CrossRef ] [ Google Scholar ]
  • Alseekh S., Kostova D., Bulut M., Fernie A.R. (2021) Genome-wide association studies: assessing trait characteristics in model and crop plants . Cell. Mol. Life Sci . 78 ( 15 ): 5743–5754. 10.1007/S00018-021-03868-W [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Arora L., Narula A. (2017) Gene editing and crop improvement using CRISPR-cas9 system . Front. Plant Sci . 8 :1932. 10.3389/fpls.2017.01932 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bai M., Yuan J., Kuang H., Gong P., Li S., Zhang Z., Liu B., Sun J., Yang M., Yang L., et al.. (2020) Generation of a multiplex mutagenesis population via pooled CRISPR-Cas9 in soya bean . Plant Biotechnol J . 18 ( 3 ): 721–731. 10.1111/PBI.13239 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Baltes N.J., Gil-Humanes J., Cermak T., Atkins P.A., Voytas D.F. (2014) DNA replicons for plant genome engineering . Plant Cell . 26 ( 1 ): 151–163. 10.1105/TPC.113.119792 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Baltes N.J., Gil-Humanes J., Voytas D.F. (2017) Genome engineering and agriculture: opportunities and challenges. Chapter 1 . [In:] Progress in molecular biology and translational science. Vol. 149 . Ed. Weeks D.P., Yang B., San Diego: Academic Press: 1–26. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Barh D., Azevedo V. (2018) Omics technologies and bio-engineering. Volume 2: towards improving quality of life . London: Elsevier. [ Google Scholar ]
  • Bikard D., Marraffini L.A. (2013) Control of gene expression by CRISPR-Cas systems . F1000Prime Rep . 5 : 47. 10.12703/P5-47 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Boettiger A.N., Bintu B., Moffitt J.R., Wang S., Beliveau B.J., Fudenberg G., Imakaev M., Mirny L.A., Wu C.T., Zhuang X. (2016) Super-resolution imaging reveals distinct chromatin folding for different epigenetic states . Nature . 529 ( 7586 ): 418–422. 10.1038/nature16496 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bortesi L., Fischer R. (2015) The CRISPR/Cas9 system for plant genome editing and beyond . Biotechnol. Adv . 33 ( 1 ): 41–52. 10.1016/j.biotechadv.2014.12.006 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Budiani A., Putranto R.A., Riyadi I., Sumaryono, Minarsih H.,Faizah R. (2018) Transformation of oil palm calli using CRISPR/Cas9 System: Toward genome editing of oil palm . IOP Conf. Ser. Earth Environ. Sci . 183 : 12003. [ Google Scholar ]
  • Butt H., Eid A., Ali Z., Atia M.A.M., Mokhtar M.M., Hassan N., Lee C.M., Bao G., Mahfouz M.M. (2017) Efficient CRISPR/Cas9-mediated genome editing using a chimeric single-guide RNA molecule . Front Plant Sci . 0 : 1441. 10.3389/FPLS.2017.01441 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Cai Y., Wang L., Chen L., Wu T., Liu L., Sun S., Wu C., Yao W., Jiang B., Yuan S., et al.. (2020) Mutagenesis of GmFT2a and GmFT5a mediated by CRISPR/Cas9 contributes for expanding the regional adaptability of soybean . Plant Biotechnol. J . 18 ( 1 ): 298–309. 10.1111/pbi.13199 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Callaway E. (2018) CRISPR plants now subject to tough GM laws in European Union . Nature . 560 ( 7716 ): 16. 10.1038/d41586-018-05814-6 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Cao Y., Zhang Y., Chen Y., Yu N., Liaqat S., Wu W., Chen D., Cheng S., Wei X., Cao L., et al.. (2021) OsPG1 encodes a olygalacturonase that determines cell wall architecture and affects resistance to bacterial blight pathogen in rice . Rice . 14 ( 1 ): 1–15. 10.1186/S12284-021-00478-9 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Cartea E., Haro-Bailón A. de, Padilla G., Obregón-Cano S., Rio-Celestino M.D., Ordás A. (2019) Seed oil quality of Brassica napus and Brassica rapa germplasm from Northwestern Spain . Foods . 8 ( 8 ): 292. 10.3390/FOODS8080292 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Carter M., Shieh J. (2015) Gene delivery strategies. Chapter 11 . [in:] Guide to Research Techniques in Neuroscience (Second Edition). London: Academic Press: 239–252. [ Google Scholar ]
  • Confédération paysanne and others v. Premier ministre and Ministre de l’Agriculture de l’Agroalimentaire et de la Forêt . (2018) Judgment of the Court of Justice of the European Union in the case C-528/16 . Luxembourg. [accessed 2021 Sep 14]. https://curia.europa.eu/jcms/upload/docs/application/pdf/2018-07/cp180111en.pdf [ Google Scholar ]
  • Costa J.R., Bejcek B.E., McGee J.E., Fogel A.I., Brimacombe K.R., Ketteler R. (2017) Genome editing using engineered nucleases and their use in genomic screening . [In:] Assay guidance manual . Ed. Markossian S., Grossman A., Brimacombe K., Bethesda: Eli Lilly & Company and the National Center for Advancing Translational Sciences. [ PubMed ] [ Google Scholar ]
  • Dominguez A.A., Lim W.A., Qi L.S. (2016) Beyond editing: repurposing CRISPR-Cas9 for precision genome regulation and interrogation . Nat. Rev. Mol. Cell Biol . 17 ( 1 ): 5–15. 10.1038/nrm.2015.2 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Dreissig S., Schiml S., Schindele P., Weiss O., Rutten T., Schubert V., Gladilin E., Mette M.F., Puchta H., Houben A. (2017) Live-cell CRISPR imaging in plants reveals dynamic telomere movements . Plant J . 91 ( 4 ): 565–573. 10.1111/tpj.13601 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • El-Mounadi K., Morales-Floriano M.L., Garcia-Ruiz H. (2020) Principles, applications, and biosafety of plant genome editing using CRISPR-Cas9 . Front. Plant Sci . 11 : 56. 10.3389/FPLS.2020.00056 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Endo M., Mikami M., Toki S. (2016) Biallelic gene targeting in rice . Plant Physiol . 170 ( 2 ): 667–677. 10.1104/PP.15.01663 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Fauser F., Schiml S., Puchta H. (2014) Both CRISPR/Casbased nucleases and nickases can be used efficiently for genome engineering in Arabidopsis thaliana . Plant J . 79 ( 2 ): 348–359. 10.1111/TPJ.12554 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Fister A.S., Landherr L., Maximova S.N., Guiltinan M.J. (2018) Transient expression of CRISPR/Cas9 machinery targeting TcNPR3 enhances defense response in Theobroma cacao . Front. Plant Sci . 268. 10.3389/FPLS.2018.00268 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Fuchs R.L., Mackey M.A. (2003) Genetically modified foods . [In:] Encyclopedia of food sciences and nutrition , Second Edition Ed. Caballero B. Elsevier Science: 2876–2882. [ Google Scholar ]
  • Fujimoto S., Matsunaga S. (2017) Visualization of chromatin loci with transiently expressed CRISPR/Cas9 in plants . Cytologia 82 ( 5 ): 559–562. 10.1508/cytologia.82.559 [ CrossRef ] [ Google Scholar ]
  • Fukagawa N.K., Ziska L.H. (2019) Rice: importance for global nutrition . J. Nutr. Sci. Vitaminol . 65 ( Supplement ): S2–S3. 10.3177/JNSV.65.S2 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Guo M., Zhang X., Liu J., Hou L., Liu H., Zhao X. (2020) OsProDH negatively regulates thermotolerance in rice by modulating proline metabolism and reactive oxygen species scavenging . Rice . 13 ( 1 ): 1–5. 10.1186/s12284-020-00422-3 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hada A., Krishnan V., Mohamed Jaabir M.S., Kumari A., Jolly M., Praveen S., Sachdev A. (2018) Improved Agrobacterium tumefaciens-mediated transformation of soybean [Glycine max (L.) Merr.] following optimization of culture conditions and mechanical techniques . Vitr Cell Dev Biol-Plant . 54 ( 6 ): 672–688. 10.1007/S11627-018-9944-8 [ CrossRef ] [ Google Scholar ]
  • Hamada H., Liu Y., Nagira Y., Miki R., Taoka N., Imai R. (2018) Biolistic-delivery-based transient CRISPR/Cas9 expression enables in planta genome editing in wheat . Sci. Rep . 8 ( 1 ): 14422. 10.1038/s41598-018-32714-6 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Han J., Guo B., Guo Y., Zhang B., Wang X., Qiu L.J. (2019) Creation of early flowering germplasm of soybean by CRISPR/Cas9 technology . Front. Plant Sci . 10 : 1446. 10.3389/fpls.2019.01446 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Haque E., Taniguchi H., Hassan M.M., Bhowmik P., Karim M.R., Śmiech M., Zhao K., Rahman M., Islam T. (2018) Application of CRISPR/Cas9 genome editing technology for the improvement of crops cultivated in tropical climates: Recent progress, prospects, and challenges . Front. Plant Sci . 9 : 617. 10.3389/fpls.2018.00617 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hayat S., Hayat Q., Alyemeni M.N., Wani A.S., Pichtel J., Ahmad A. (2012) Role of proline under changing environments: A review . Plant Signal. Behav . 7 ( 11 ): 1456–1466. 10.4161/PSB.21949 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hayta S., Smedley M.A., Demir S.U., Blundell R., Hinchliffe A., Atkinson N., Harwood W.A. (2019) An efficient and reproducible Agrobacterium-mediated transformation method for hexaploid wheat (Triticum aestivum L.) . Plant Meth . 15 ( 1 ): 1–15. 10.1186/S13007-019-0503-Z [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hoffman N.E. (2021) Revisions to USDA biotechnology regulations: the SECURE rule . Proc. Natl. Acad. Sci. USA 118 ( 22 ): e2004841118. 10.1073/PNAS.2004841118 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hooghvorst I., López-Cristoffanini C., Nogués S. (2019) Efficient knockout of phytoene desaturase gene using CRISPR/Cas9 in melon . Sci. Rep . 9 ( 1 ): 1–7. 10.1038/s41598-019-53710-4 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hu W., Zhou T., Han Z., Tan C., Xing Y. (2020) Dominant complementary interaction between OsC1 and two tightly linked genes, Rb1 and Rb2, controls the purple leaf sheath in rice . Theor. Appl. Genet . 133 ( 9 ): 2555–2566. 10.1007/s00122-020-03617-w [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ishino Y., Shinagawa H., Makino K., Amemura M., Nakatura A. (1987) Nucleotide sequence of the iap gene, responsible for alkaline phosphatase isoenzyme conversion in Escherichia coli, and identification of the gene product . J. Bacteriol . 169 ( 12 ): 5429–5433. 10.1128/jb.169.12.5429-5433.1987 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Jansen R., van Embden J.D.A., Gaastra W., Schouls L.M. (2002) Identification of genes that are associated with DNA repeats in prokaryotes . Mol. Microbiol . 43 ( 6 ): 1565–1575. 10.1046/j.1365-2958.2002.02839.x [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Jarvis B.A., Romsdahl T.B., McGinn M.G., Nazarenus T.J., Cahoon E.B., Chapman K.D., Sedbrook J.C. (2021) CRISPR/Cas9-induced fad2 and rod1 mutations stacked with fae1 confer high oleic acid seed oil in Pennycress (Thlaspi arvense L.) . Front. Plant Sci . 652. 10.3389/FPLS.2021.652319 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Jia H., Nian W. (2014) Targeted genome editing of sweet orange using Cas9/sgRNA . PLoS One . 9 ( 4 ): e93806. 10.1371/journal.pone.0093806 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kalia A. (2018). Nanotechnology in bioengineering: transmogrifying plant biotechnology . [In:] Omics Technologies and Bio-engineering. Vol. 2: Towards Improving Quality of Life . Academic Press: 211–229. [ Google Scholar ]
  • Karunarathna N.L., Wang H., Harloff H.J., Jiang L., Jung C. (2020) Elevating seed oil content in a polyploid crop by induced mutations in seed fatty acid reducer genes . Plant Biotechnol. J . 18 : 2251–2266. 10.1111/pbi.13381 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Khosravi S., Dreissig S., Schindele P., Wolter F., Rutten T., Puchta H., Houben A. (2020) Live-cell CRISPR imaging in plant cells with a telomere-specific guide RNA . Methods Mol. Biol . 2166 :343–356. 10.1007/978-10716-0712-1_20 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kim Y.A., Moon H., Park C.J. (2019) CRISPR/Cas9-targeted mutagenesis of Os8N3 in rice to confer resistance to Xanthomonas oryzae pv. oryzae . Rice . 12 ( 1 ): 67. 10.1186/s12284-019-0325-7 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Koornneef M., Meinke D. (2010) The development of Arabidopsis as a model plant . Plant J . 61 ( 6 ): 909–921. 10.1111/j.1365-313X.2009.04086.x [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kozubek S., Lukášová E., Amrichová J., Kozubek M., Lišková A., Šlotová J. (2000) Influence of cell fixation on chromatin topography . Anal. Biochem . 282 ( 1 ): 29–38. 10.1006/abio.2000.4538 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Langer-Safer P.R., Levine M., Ward D.C. (1982) Immunological methods for mapping genes on Drosophila polytene chromosomes . Proc. Natl. Acad. Sci. USA . 79 ( 14 ): 4381–4385. 10.1073/pnas.79.14.4381 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lee K., Eggenberger A.L., Banakar R., McCaw M.E., Zhu H., Main M., Kang M., Gelvin S.B., Wang K. (2019) CRISPR/Cas9-mediated targeted T-DNA integration in rice . Plant Mol. Biol . 99 ( 4 ): 317–328. 10.1007/S11103-018-00819-1 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lee Z.H., Yamaguchi N., Ito T. (2018) Using CRISPR/Cas9 system to introduce targeted mutation in Arabidopsis . Meth. Mol. Biol . 1830 : 93–108. 10.1007/978-1-4939-8657-6_6 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Li C., Nguyen V., Liu J., Fu W., Chen C., Yu K., Cui Y. (2019) Mutagenesis of seed storage protein genes in soybean using CRISPR/Cas9 . BMC Res. Notes . 12 ( 1 ): 176. 10.1186/s13104-019-4207-2 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Li L., He Z.Y., Wei X.W., Gao G.P., Wei Y.Q. (2015) Challenges in CRISPR/CAS9 delivery: potential roles of nonviral vectors . Hum. Gene Ther . 26 ( 7 ): 452–462. 10.1089/hum.2015.069 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Li Q., Lu L., Liu H., Bai X., Zhou X., Wu B., Yuan M., Yang L., Xing Y. (2020) A minor QTL, SG3, encoding an R2R3-MYB protein, negatively controls grain length in rice . Theor. Appl. Genet . 133 ( 8 ): 2387–2399. 10.1007/s00122-020-03606-z [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Li Q., Wu G., Zhao Y., Wang B., Zhao B., Kong D., Wei H., Chen C., Wang H. (2020) CRISPR/Cas9-mediated knockout and overexpression studies reveal a role of maize phytochrome C in regulating flowering time and plant height . Plant Biotechnol. J . 18 ( 12 ): 2520–2532. 10.1111/pbi.13429 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Li T., Liu B., Chen C.Y., Yang B. (2016) TALEN-mediated homologous recombination produces site-directed DNA base change and herbicide-resistant rice . J. Genet. Genomics . 43 ( 5 ): 297–305. 10.1016/j.jgg.2016.03.005 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Liu H., Wang K., Jia Z., Gong Q., Lin Z., Du L., Pei X., Ye X. (2020) Efficient induction of haploid plants in wheat by editing of TaMTL using an optimized Agrobacterium-mediated CRISPR system . J. Exp. Bot . 71 ( 4 ): 1337–1349. 10.1093/JXB/ERZ529 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Liu H.J., Jian L., Xu J., Zhang Q., Zhang M., Jin M., Peng Y., Yan J., Han B., Liu J. et al.. (2020) High-throughput CRISPR/Cas9 mutagenesis streamlines trait gene identification in maize . Plant Cell . 32 ( 5 ): 1397–1413. 10.1105/TPC.19.00934 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lu Y., Ye X., Guo R., Huang J., Wang W., Tang J., Tan L., Zhu J., Chu C., Qian Y. (2017) Genome-wide targeted mutagenesis in rice using the CRISPR/Cas9 system . Mol. Plant . 10 ( 9 ): 1242–1245. 10.1016/J.MOLP.2017.06.007 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ma X., Zhang Q., Zhu Q., Liu W., Chen Y., Qiu R., Wang B., Yang Z., Li H., Lin Y., et al.. (2015). A robust CRISPR/Cas9 system for convenient, high-efficiency multiplex genome editing in monocot and dicot plants . Mol. Plant . 8 ( 8 ): 1274–1284. 10.1016/j.molp.2015.04.007 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ma X., Zhang X., Liu H., Li Z. (2020) Highly efficient DNA-free plant genome editing using virally delivered CRISPR –Cas9 . Nat. Plants . 6 ( 7 ): 773–779. 10.1038/s41477-020-0704-5 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ma X., Zhu Q., Chen Y., Liu Y.G. (2016) CRISPR/Cas9 platforms for genome editing in plants: developments and applications . Mol. Plant . 9 ( 7 ): 961–974. 10.1016/j.molp.2016.04.009. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Mangena P., Mokwala P.W., Nikolova R.V. (2017) Challenges of in vitro and in vivo Agrobacterium-mediated genetic transformation in soybean . [In:] Soybean – the basis of yield, biomass and productivity . Ed. Kasai M., InTech Open. [ Google Scholar ]
  • Mangena P. (2018) The role of plant genotype, culture medium and Agrobacterium on soybean plantlets regeneration during genetic transformation . [In:] Transgenic Crops – Emerging Trends and Future Perspectives . Ed. Khan M.S., Malik K.A., InTechOpen. [ Google Scholar ]
  • Mao Y., Botella J.R., Liu Y., Zhu J.K. (2019) Gene editing in plants: Progress and challenges . Natl. Sci. Rev . 6 ( 3 ): 421–437. 10.1093/nsr/nwz005 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Meng X., Yu H., Zhang Y., Zhuang F., Song X., Gao S., Gao C., Li J. (2017) Construction of a genome-wide mutant library in rice using CRISPR/Cas9 . Mol. Plant . 10 ( 9 ): 1238–1241. 10.1016/J.MOLP.2017.06.006 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Miao J., Guo D., Zhang J., Huang Q., Qin G., Zhang X., Wan J., Gu H., Qu L.J. (2013) Targeted mutagenesis in rice using CRISPR-Cas system . Cell Res . 23 ( 10 ): 1233–1236. 10.1038/cr.2013.123 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Mikami M., Toki S., Endo M. (2015) Comparison of CRISPR/Cas9 expression constructs for efficient targeted mutagenesis in rice . Plant Mol. Biol . 88 ( 6 ): 561–572. 10.1007/s11103-015-0342-x [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Miki D., Zhang W., Zeng W., Feng Z., Zhu J.K. (2018) CRISPR/Cas9-mediated gene targeting in Arabidopsis using sequential transformation . Nat. Commun . 9 ( 1 ): 1967. 10.1038/S41467-018-04416-0. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Musunuru K. (2017) The hope and hype of CRISPR-Cas9 genome editing: A review . JAMA Cardiol . 2 ( 8 ): 914–919. 10.1001/jamacardio.2017.1713 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Nishitani C., Hirai N., Komori S., Wada M., Okada K., Osakabe K., Yamamoto T., Osakabe Y. (2016) Efficient genome editing in apple using a CRISPR/Cas9 system . Sci. Rep . 6 : 31481. 10.1038/srep31481 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Nyaboga E.N., Njiru J.M., Tripathi L. (2015) Factors influencing somatic embryogenesis, regeneration, and Agrobacterium-mediated transformation of cassava (Manihot esculenta Crantz) cultivar TME14 . Front Plant Sci . 6 :411. 10.3389/FPLS.2015.00411 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Plan D., Van Den Eede G. (2010) The EU Legislation on GMOs – an overview . EUR 24279 EN . Luxembourg (Luxembourg): Publications Office of the European Union; JRC57223. [ Google Scholar ]
  • Pompili V., Dalla Costa L., Piazza S., Pindo M., Malnoy M. (2020) Reduced fire blight susceptibility in apple cultivars using a high-efficiency CRISPR/Cas9-FLP/FRT-based gene editing system . Plant Biotechnol. J . 18 ( 3 ): 845–858. 10.1111/pbi.13253 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Qi L.S., Larson M.H., Gilbert L.A., Doudna J.A., Weissman J.S., Arkin A.P., Lim W.A. (2013) Repurposing CRISPR as an RNA-guided platform for sequence-specific control of gene expression . Cell . 152 ( 5 ): 1173–1183. 10.1016/j.cell.2013.02.022 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Qin P., Parlak M., Kuscu C., Bandaria J., Mir M., Szlachta K., Singh R., Darzacq X., Yildiz A., Adli M. (2017) Live cell imaging of low- and non-repetitive chromosome loci using CRISPR-Cas9 . Nat. Commun . 8 ( 1 ): 1–10. 10.1038/ncomms14725 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Rojo F.P., Seth S., Erskine W., Kaur P. (2021) An improved protocol for Agrobacterium-mediated transformation in subterranean clover (Trifolium subterraneum l.) . Int. J. Mol. Sci . 22 ( 8 ): 4181. 10.3390/ijms22084181 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ron M., Kajala K., Pauluzzi G., Wang D., Reynoso M.A., Zumstein K., Garcha J., Winte S., Masson H., Inagaki S., et al.. (2014) Hairy root transformation using Agrobacterium rhizogenes as a tool for exploring cell type-specific gene expression and function using tomato as a model . Plant Physiol . 166 ( 2 ): 455–469. 10.1104/pp.114.239392 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Sander J.D., Joung J.K. (2014) CRISPR-Cas systems for editing, regulating and targeting genomes . Nature Biotechnol . 32 ( 4 ): 347–350. 10.1038/nbt.2842 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Schiml S., Fauser F., Puchta H. (2016) CRISPR/Cas-mediated site-specific mutagenesis in Arabidopsis thaliana using Cas9 nucleases and paired nickases . Methods Mol. Biol . 1469 : 111–122. 10.1007/978-1-4939-4931-1_8 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Schwarzacher T., Heslop-Harrison J.S. (1994) Direct fluorochrome-labeled DNA probes for direct fluorescent in situ hybridization to chromosomes . Methods Mol. Biol . 28 : 167–176. 10.1385/0-89603-254-x:167 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Singh D.J.K., Mat Jalaluddin N.S., Sanan-Mishra N., Harikrishna J.A. (2019) Genetic modification in Malaysia and India: current regulatory framework and the special case of non-transformative RNAi in agriculture . Plant Cell Rep . 38 ( 12 ): 1449–1463. 10.1007/s00299-019-02446-6 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Shew A.M., Nalley L.L., Snell H.A., Nayga R.M., Dixon B.L. (2018) CRISPR versus GMOs: Public acceptance and valuation . Glob. Food Sec . 19 : 71–80. 10.1016/J.GFS.2018.10.005 [ CrossRef ] [ Google Scholar ]
  • Sun Y., Li J., Xia L. (2016a) Precise genome modification via sequence-specific nucleases-mediated gene targeting for crop improvement . Front. Plant Sci . 7 : 1928. 10.3389/fpls.2016.01928 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Sun Y., Zhang X., Wu C., He Y., Ma Y., Hou H., Guo X., Du W., Zhao Y., Xia L. (2016b) Engineering herbicide-resistant rice plants through CRISPR/Cas9-mediated homologous recombination of acetolactate synthase . Mol. Plant . 9 ( 4 ): 628–631. 10.1016/J.MOLP.2016.01.001 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Tian S., Jiang L., Gao Q., Zhang J., Zong M., Zhang H., Ren Y., Guo S., Gong G., Liu F., et al.. (2017) Efficient CRISPR/Cas9-based gene knockout in watermelon . Plant Cell Rep . 36 ( 3 ): 399–406. 10.1007/s00299-016-2089-5 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Tsuchimatsu T., Kakui H., Yamazaki M., Marona C., Tsutsui H., Hedhly A., Meng D., Sato Y., Städler T., Grossniklaus U. et al.. (2020) Adaptive reduction of male gamete number in the selfing plant Arabidopsis thaliana . Nat. Commun . 11 ( 1 ): 1–9. 10.1038/s41467-020-16679-7 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Tsutsui H., Higashiyama T. (2017) pKAMA-ITACHI vectors for highly efficient CRISPR/Cas9-mediated gene knockout in Arabidopsis thaliana . Plant Cell Physiol . 58 ( 1 ): 46–56. 10.1093/PCP/PCW191. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • United Nations . (2019) World Population Prospects 2019 . [accessed 2020 Sep 7]. https://population.un.org/wpp/Download/Standard/Population/ .
  • USDA . (2018) Secretary Perdue Issues USDA Statement on Plant Breeding Innovation . [accessed 2021 Sep 15]. https://www.usda.gov/media/press-releases/2018/03/28/secretary-perdue-issues-usda-statement-plant-breeding-innovation .
  • Usman B., Zhao N., Nawaz G., Qin B., Liu F., Liu Y., Li R. (2021) CRISPR/Cas9 guided mutagenesis of grain size 3 confers increased rice (Oryza sativa L.) grain length by regulating cysteine proteinase inhibitor and ubiquitin-related proteins . Int. J. Mol. Sci . 22 ( 6 ): 1–19. 10.3390/IJMS22063225 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Waltz E. (2016) Gene-edited CRISPR mushroom escapes US regulation . Nature . 532 ( 7599 ): 293. 10.1038/nature.2016.19754 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Wang F., Wang C., Liu P., Lei C., Hao W., Gao Y., Liu Y.G., Zhao K. (2016) Enhanced rice blast resistance by CRISPR/Cas9-targeted mutagenesis of the ERF transcription factor gene OsERF922 . PLoS One . 11 ( 4 ): e0154027. 10.1371/journal.pone.0154027 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Wang L., Chen S., Peng A., Xie Z., He Y., Zou X. (2019) CRISPR/Cas9-mediated editing of CsWRKY22 reduces susceptibility to Xanthomonas citri subsp. citri in Wanjin-cheng orange (Citrus sinensis (L.) Osbeck) . Plant Biotechnol. Rep . 13 ( 5 ): 501–510. 10.1007/s11816-019-00556-x [ CrossRef ] [ Google Scholar ]
  • Wang L., Sun S., Wu T., Liu L., Sun X., Cai Y., Li J., Jia H., Yuan S., Chen L. et al.. (2020) Natural variation and CRISPR/Cas9-mediated mutation in GmPRR37 affect photoperiodic flowering and contribute to regional adaptation of soybean . Plant Biotechnol. J . 18 ( 9 ): 1869–1881. 10.1111/pbi.13346 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Wang M., Wang S., Liang Z., Shi W., Gao C., Xia G. (2018) From genetic stock to genome editing: gene exploitation in wheat . Trends Biotechnol . 36 ( 2 ): 160–172. 10.1016/j.tibtech.2017.10.002 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Westra E.R., Buckling A., Fineran P.C. (2014) CRISPR-Cas systems: Beyond adaptive immunity . Nature Rev. Microbiol . 12 ( 5 ): 317–326. 10.1038/nrmicro3241 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Wu J., Chen C., Xian G., Liu D., Lin L., Yin S., Sun Q., Fang Y., Zhang H., Wang Y. (2020) Engineering herbicide-resistant oilseed rape by CRISPR/Cas9-mediated cytosine base-editing . Plant. Biotechnol. J . 18 ( 9 ): 1857–1859. 10.1111/pbi.13368 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Wu X., Mao S., Ying Y., Krueger C.J., Chen A.K. (2019) Progress and challenges for live-cell imaging of genomic loci using CRISPR-based platforms . Genomics Proteomics Bioinformatics . 17 ( 2 ): 119–128. 10.1016/j.gpb.2018.10.001 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Yang H., Wu J.J., Tang T., Liu K.D., Dai C. (2017) CRISPR/Cas9-mediated genome editing efficiently creates specific mutations at multiple loci using one sgRNA in Brassica napus . Sci. Rep . 7 ( 1 ): 1–13. 10.1038/s41598-017-07871-9 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Yang X. (2015) Applications of CRISPR-Cas9 mediated genome engineering . Mil. Med. Res . 2 ( 1 ): 11. 10.1186/s40779-015-0038-1 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Yin K., Gao C., Qiu J.L. (2017) Progress and prospects in plant genome editing . Nat. Plants . 3 : 17107. 10.1038/nplants.2017.107 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Yu S., Huang A., Li J., Gao L., Feng Y., Pemberton E., Chen C. (2018) OsNAC45 plays complex roles by mediating POD activity and the expression of development-related genes under various abiotic stresses in rice root . Plant Growth Regul . 84 ( 3 ): 519–531. 10.1007/S10725-017-0358 [ CrossRef ] [ Google Scholar ]
  • Zhang P., Du H., Wang .J, Pu Y., Yang C., Yan R., Yang H., Cheng H., Yu D. (2020a) Multiplex CRISPR/Cas9-mediated metabolic engineering increases soya bean isoflavone content and resistance to soya bean mosaic virus . Plant Biotechnol. J . 18 ( 6 ): 1384–1395. 10.1111/pbi.13302 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Zhang S., Zhang R., Song G., Gao J., Li W., Han X., Chen M., Li Y., Li G. (2018) Targeted mutagenesis using the Agrobacterium tumefaciens-mediated CRISPR-Cas9 system in common wheat . BMC Plant Biol . 18 ( 1 ): 1–12. 10.1186/S12870-018-1496-X [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Zhang X., Long Y., Huang J., Xia J. (2020b) OsNAC45 is involved in ABA response and salt tolerance in rice . Rice . 13 ( 1 ): 1–13. 10.1186/s12284-020-00440-1 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Zheng J., Wu H., Zhu H., Huang C., Liu C., Chang Y., Kong Z., Zhou Z., Wang G., Lin Y., et al.. (2019) Determining factors, regulation system, and domestication of anthocyanin biosynthesis in rice leaves . New Phytol . 223 ( 2 ): 705–721. 10.1111/nph.15807 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Zhou J., Peng Z., Long J., Sosso D., Liu B., Eom J.S., Huang S., Liu S., Vera Cruz C., Frommer W.B., et al.. (2015) Gene targeting by the TAL effector PthXo2 reveals cryptic resistance gene for bacterial blight of rice . Plant J . 82 ( 4 ): 632–643. 10.1111/tpj.12838 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Zimny T., Sowa S., Tyczewska A., Twardowski T. (2019) Certain new plant breeding techniques and their marketability in the context of EU GMO legislation – recent developments . New Biotechnol . 51 : 49–56. 10.1016/J.NBT.2019.02.003 [ PubMed ] [ CrossRef ] [ Google Scholar ]

Insights in Plant Biotechnology: 2021

Cover image for research topic "Insights in Plant Biotechnology: 2021"

Loading... Editorial 30 January 2023 Editorial: Insights in plant biotechnology: 2021 James R. Lloyd , Ralf Wilhelm , Manoj K. Sharma , Jens Kossmann  and  Peng Zhang 974 views 0 citations

research paper on plant biotechnology

Original Research 25 July 2022 Pan-genome analysis of three main Chinese chestnut varieties Guanglong Hu ,  4 more  and  Yanping Lan 4,030 views 6 citations

Original Research 14 June 2022 Differential Gene Expression and Withanolides Biosynthesis During in vitro and ex vitro Growth of Withania somnifera (L.) Dunal Sachin Ashok Thorat ,  8 more  and  Annamalai Muthusamy 2,633 views 3 citations

Original Research 06 June 2022 Identification of Reference Genes for Reverse Transcription-Quantitative PCR Analysis of Ginger Under Abiotic Stress and for Postharvest Biology Studies Gang Li ,  9 more  and  Yongxing Zhu 2,603 views 14 citations

Original Research 02 June 2022 Nested miRNA Secondary Structure Is a Unique Determinant of miR159 Efficacy in Arabidopsis Muhammad Imran ,  8 more  and  Min Zhang 1,492 views 2 citations

Original Research 31 May 2022 Production of Recombinant Active Human TGFβ1 in Nicotiana benthamiana Aditya Prakash Soni ,  3 more  and  Inhwan Hwang 3,486 views 3 citations

Perspective 27 May 2022 Gene Editing to Accelerate Crop Breeding Kanwarpal S. Dhugga 3,585 views 9 citations

Mini Review 26 May 2022 Clustered Regularly Interspaced Short Palindromic Repeats-Associated Protein System for Resistance Against Plant Viruses: Applications and Perspectives Fredy D. A. Silva  and  Elizabeth P. B. Fontes 2,239 views 4 citations

Original Research 20 May 2022 Genomics-Assisted Improvement of Super High-Yield Hybrid Rice Variety “Super 1000” for Resistance to Bacterial Blight and Blast Diseases Zhizhou He ,  15 more  and  Junhua Peng 1,784 views 5 citations

Loading... Original Research 19 May 2022 Eradication of Potato Virus S, Potato Virus A, and Potato Virus M From Infected in vitro-Grown Potato Shoots Using in vitro Therapies Jean Carlos Bettoni ,  7 more  and  Jayanthi Nadarajan 6,087 views 21 citations

Original Research 13 May 2022 Successful Production and Ligninolytic Activity of a Bacterial Laccase, Lac51, Made in Nicotiana benthamiana via Transient Expression André van Eerde ,  8 more  and  Jihong Liu Clarke 1,938 views 3 citations

Loading... Review 09 May 2022 Molecular Determinants of in vitro Plant Regeneration: Prospects for Enhanced Manipulation of Lettuce (Lactuca sativa L.) Tawni Bull  and  Richard Michelmore 8,425 views 8 citations

Review 29 April 2022 A Walk Through the Maze of Secondary Metabolism in Orchids: A Transcriptomic Approach Devina Ghai ,  3 more  and  Jaspreet K. Sembi 2,477 views 4 citations

Review 29 April 2022 Glyco-Engineering Plants to Produce Helminth Glycoproteins as Prospective Biopharmaceuticals: Recent Advances, Challenges and Future Prospects Alex van der Kaaij ,  3 more  and  Arjen Schots 3,650 views 3 citations

Loading... Review 29 April 2022 Heat Stress-Mediated Constraints in Maize (Zea mays) Production: Challenges and Solutions Ahmed H. El-Sappah ,  13 more  and  Manzar Abbas 9,458 views 36 citations

Review 25 April 2022 Biotechnological Road Map for Innovative Weed Management Albert Chern Sun Wong ,  3 more  and  Bhagirath Singh Chauhan 5,077 views 5 citations

Original Research 13 April 2022 A Multi-Omics Approach for Rapid Identification of Large Genomic Lesions at the Wheat Dense Spike (wds) Locus Zhenyu Wang ,  14 more  and  Aili Li 2,049 views 2 citations

Loading... Review 08 April 2022 CRISPR/Cas9 and Nanotechnology Pertinence in Agricultural Crop Refinement Banavath Jayanna Naik ,  7 more  and  Soo-Hong Lee 7,570 views 16 citations

Original Research 22 March 2022 Physiological and Molecular Changes in Cherry Red Tobacco in Response to Iron Deficiency Stress Fei Liu ,  6 more  and  Zhongbang Song 1,486 views 3 citations

Original Research 10 March 2022 Salt Stress Alleviation in Triticum aestivum Through Primary and Secondary Metabolites Modulation by Aspergillus terreus BTK-1 Muhammad Ikram Khan ,  6 more  and  In-Jung Lee 1,916 views 13 citations

  • Open access
  • Published: 15 July 2022

Bioinformatics approaches and applications in plant biotechnology

  • Yung Cheng Tan 1 ,
  • Asqwin Uthaya Kumar   ORCID: orcid.org/0000-0002-8785-6260 1 , 2 ,
  • Ying Pei Wong 1 &
  • Anna Pick Kiong Ling   ORCID: orcid.org/0000-0003-0930-0619 1  

Journal of Genetic Engineering and Biotechnology volume  20 , Article number:  106 ( 2022 ) Cite this article

11k Accesses

4 Citations

Metrics details

In recent years, major advance in molecular biology and genomic technologies have led to an exponential growth in biological information. As the deluge of genomic information, there is a parallel growth in the demands of tools in the storage and management of data, and the development of software for analysis, visualization, modelling, and prediction of large data set.

Particularly in plant biotechnology, the amount of information has multiplied exponentially with a large number of databases available from many individual plant species. Efficient bioinformatics tools and methodologies are also developed to allow rapid genome sequence and the study of plant genome in the ‘omics’ approach. This review focuses on the various bioinformatic applications in plant biotechnology, and their advantages in improving the outcome in agriculture. The challenges or limitations faced in plant biotechnology in the aspect of bioinformatics approach that explained the low progression in plant genomics than in animal genomics are also reviewed and assessed.

There is a critical need for effective bioinformatic tools, which are able to provide longer reads with unbiased coverage in order to overcome the complexity of the plant’s genome. The advancement in bioinformatics is not only beneficial to the field of plant biotechnology and agriculture sectors, but will also contribute enormously to the future of humanity.

Over the past decades, the term ‘bioinformatics’ has become a buzzword in all areas of research in biological science. With the continuous development and advancement in molecular biology, the explosive growth of biological information required a more organized, computerized system to collect, store, manage, and analyse the vast amount of biological data generated in the experiments from all fields [ 1 ]. Bioinformatics, as a new emerging interdisciplinary field for the past few decades, has many tools and techniques that are essential for efficient sorting and organizing of biological data into databases [ 1 , 2 ]. Bioinformatics can be referred as a computer-based scientific field which applies mathematics, biology, and computer science to form into a single discipline for the analyses and interpretation of genomics and proteomics data [ 2 , 3 ]. In short, the main components of bioinformatics are (a) the collection and analysis of database and (b) the development of software tools and algorithm as a tool for interpretation of biological data [ 2 ]. Bioinformatics played a crucial role in many areas of biology as its applications provide various types of data, including nucleotide and amino acid sequences, protein domains and structure as well as expression patterns from various organisms [ 3 ]. Similarly, the field of plant biotechnology has also taken advantages of bioinformatics, which provides full genomic information of various plant species to allow for efficient exploration into plants as biological resource to humans [ 1 , 3 , 4 ]. The intention of this article is to describe some of the key concepts, tools, and its applications in bioinformatics that are relevant to plant biotechnologies. The current challenges and limitations for improvement and continuous development of bioinformatics in plant science are also described.

Applications of bioinformatics in plant biotechnology

The introduction of bioinformatics and computational biology into the area of plant biology is drastically accelerating scientific invention in life science. With the aid of sequencing technology, scientists in plant biology have revealed the genetic architecture of various plant and microorganism species, such as proteome, transcriptome, metabolome, and even their metabolic pathway [ 1 ]. Sequence analysis is the most fundamental approach to obtain the whole genome sequence such as DNA, RNA, and protein sequence from an organism’s genome in modern science. The sequencing of whole genome permits the determination of organization of different species and provides a starting point to understand their functionality. A complete sequence data consists of coding and non-coding regions, which can act as a necessary precursor for any functional gene that determines the unique traits possessed by organisms. The resulting sequence includes all regions such as exons, introns, regulator, and promoter, which often leads to a vastly large amount of genome information [ 5 ]. With the emergence of next-generation sequencing (NGS) and some other omics technologies used to examine plants genomics, more and more sequenced plants genome will be revealed [ 1 , 6 , 7 , 8 ]. To deal with these vast amounts of data, the development and implementation of bioinformatics allow scientists to capture, store, and organize them in a systematic database [ 1 , 5 ].

Bioinformatics databases and tools for plant biotechnology

In the field of bioinformatics, there are a variety of options of databases and tools that are available to perform analysis related to plant biotechnology. Next-generation sequencing (NGS) and bioinformatics analysis on the plant genomes over the years have generated a large amount of data. All these data are submitted to various and multiple databases that are publicly available online. Each database is unique and has its focus. For instance, CottonGen, database is solely dedicated to obtaining genomics and breeding information of any cotton species of interest [ 9 ]. The establishment of such database eases the researchers who are working on cotton genomic studies by focussing on using just one database instead of searching through other available databases. However, some databases are established and designed to cater not only to one specific species or genus, but focus on all the plant species, such as the National Center for Biotechnology Information (NCBI) ( https://www.ncbi.nlm.nih.gov/ ) database, which as of 2021 possesses almost 21,000 plant genomes that are available for access [ 10 ]. Such a database is useful for studies that do not focus on one specific genus or species. This eases the researchers in accessing to all kinds of genomic data in one database. This section will briefly discuss some of the available plant genome databases, which are publicly accessible and not designated for one genus or species alone.

First would be the globally known and recognized database by all the researchers and biologists, which is the NCBI database. NCBI has been dedicated for gathering and analysing information about molecular biology, biochemistry, and genetics. In the NCBI database, one can download the genome information of the plant species of interest from either gene expression omnibus (GEO) ( https://www.ncbi.nlm.nih.gov/geo/ ) or sequence read archive (SRA) ( https://www.ncbi.nlm.nih.gov/sra ) by simply stating the scientific name of the plant in the search bar and the entire genomic information of the plant can then be obtained. The GEO and SRA comprise processed or raw gene expression data or RNA sequencing of plants that are reposited in the repository. For instance, to obtain the genomics of Rosa chinensis (Rose plant), by inputting the name in the search bar, it will direct to the search result page where the researcher can select the most recent or suitable datasets with specific accession number. Depending on the profiling platform used in each dataset, researchers could retrieve either gene symbols, Ensemble ID, open reading frame, chromosomal location, regulatory elements, etc. The information allows researcher to further analyse the subject of study using bioinformatics tools such as gene ontology ( http://geneontology.org/ ), Database for Annotation, Visualization and integration Discovery (DAVID) ( https://david.ncifcrf.gov/ ), Basic Local Alignment Search Tool (BLAST) ( https://blast.ncbi.nlm.nih.gov/Blast.cgi ), and others that is relevant for the study.

Another database that is available for accessing plant genome database is EnsemblPlants ( https://plants.ensembl.org/index.html ). Unlike the NCBI database, which is not only dedicated to plant genomes, EnsemblPlants is specifically dedicated to accessing plant genomes. EnsemblPlant is part of the Ensembl project that started in 1999, where the project aimed to automatically annotate the genome and integrate the outcome of the annotation with other publicly available biological data and establish an open access archive or database online for the use of the research community [ 11 ]. Ensembl project later launched the taxonomic specific websites designated for each taxon under their project that also includes the plants. The database is a user-friendly integrative platform, where it is continuously updated with the new addition of plant species every time a plant genome is completely sequenced. Compared to the NCBI database mentioned earlier, EnsemblPlant not only provides genome sequence, gene models, and functional annotation of the plant species of interest, but also includes the polymorphic loci, population structure, genotype, linkage, and phenotype information [ 11 , 12 ]. Unlike, NCBI, EnsemblPlant does also provide comparative genomics data of the plant species of interest. This indicates that the platform does not only offer genome sequence data but provide additional analytical data about the plant species of interest and help the researchers who are working on plant bioinformatics to save a lot of time by reducing the tedious work in running the analysis. Yet, the researchers could re-assess the data if necessary, depending on the stringency of their work.

Aside from the abovementioned databases that are widely used for retrieving plant genome sequence, there are still other plant databases such as PlantGDB, MaizeDIG, and Phytozome that can also be considered. Table 1 lists the available database and tools that are widely applied in plant biotechnology.

Biotechnology and bioinformatics for plant breeding

Plant breeding can be defined as the changing or improvement of desired traits in plants to produce improved new crop cultivars for the benefits of humankind [ 8 ]. Jhansi and Usha [ 13 ] mentioned a few benefits brought by genetically engineered plants such as improved quality, enhanced nutritional value, and maximized yield. The revolution of life science in molecular biology and genomics has enabled the leaps forward in plant breeding by applying the knowledge and biological data obtained in genomics research on crops [ 6 , 8 , 13 ]. In modern agriculture, transgenic technology on plants refers to genetic modification, which is done on plants or crops by altering or introducing foreign genes into the plant, to make them useful and productive and enhance their characteristic [ 13 , 14 ]. As mentioned above, the evolution of next-generation sequencing (NGS) and other sequencing technologies produces a large size of biological data which require databases to store the information. The accessibility of whole genome sequences in databases allows free association across genomes with respect to gene sequence, putative function, or genetic map position. With the aid of software, it is possible to formulate predictive hypothesis and incorporate the desired phenotypes from a complex combination into plants by looking at those genetic markers which score well and gives a higher reliability in breeding [ 2 , 15 ]. Other than genome sequence information, databases which store the information of metabolites also play a crucial role in the study of interaction with proteomics and genomics to reflect the changes in phenotype and specific function of an organism [ 1 ]. Some of the most widely used metabolomics databases for plants and crops such as Metlin ( http://metlin.scripps.edu ), provides multiple metabolite searching and about 240,000 metabolites, nearly 72,000 high-resolution MS/MS spectra, and PlantCyc ( https://plantcyc.org/ ), a database which stores information about biochemical pathway and their catalytic enzyme and genes from plants [ 1 , 16 ]. Moreover, single-nucleotide polymorphism markers also benefit from the revolution of NGS and other sequencing technologies. By using NGS, RNA sequencing (RNA-seq) allows direct measure of mRNA profile in order to identify known single-nucleotide polymorphism (SNP) [ 1 ]. SNP is the unique allelic variation within a genome of same species, which can be used as biological markers to locate the genes associated with desired traits in plants [ 17 , 18 ]. Besides, transcriptome resequencing using NGS allows rapid and inexpensive SNP discovery within a large, complex gene with highly repetitive regions of a genome such as wheat, maize, sugarcane, avocado, and black currant [ 17 ]. Figure 1 illustrates briefly the process involved in plant breeding using NGS and bioinformatics.

figure 1

Brief process of plant breeding involving NGS and bioinformatics

Ever since the first transgenic rice production in 2000, there has been a significant revolution in crop genome sequencing projects, along with the advancement in technologies, rapidly increasing the pace in genetically modified organism (GMO) [ 2 , 13 , 19 ]. Among all the products in rice biotechnology, one of the most widely known GM rice is golden rice. Golden rice is a variety of rice engineered by introducing the biosynthetic pathway to produce β-carotene (pro-vitamin A) into staple food in order to resolve vitamin A deficiency. The World Health Organization has classified vitamin A deficiency as public health problem as it causes half a million of children to childhood blindness [ 13 ]. Vitamin A is an essential nutrient to humans as it helps with development of vision, growth, cellular differentiation, and proliferation of immune system; insufficient intake of vitamin A may lead to childhood blindness, anaemia, and reduced immune responsiveness against infection [ 20 ]. Being the first crop genome to be sequenced, rice has become the most suitable model to initiate the development and improvement of other species in genomic aspect [ 21 , 22 , 23 , 24 ]. The particular reason is due to its small genome size and diploidy, which enables rice to be an excellent model for other cereal crops with larger genomes, such as maize and wheat [ 21 , 23 ]. Song et al. [ 22 ] reported the complete genome sequence of two rice subspecies, japonica and indica , in 2005 that laid a strong foundation for molecular studies and plant breeding research [ 22 , 24 ]. With recent advancement in bioinformatics, it is now possible to run the sequence alignment between large and complex genome from other crop species with genomic data available from rice, by using different software or tools, in order to find out the shared conserved sequence through comparative genomics [ 2 , 7 ]. Vassilev et al. stated some of the most commonly used programmes such as BLAST and FASTA format allowed rapid sequence searching in databases and give the best possible alignment to each sequence [ 25 ]. The programming algorithm calculates the alignment score to measure the proportion of homology matching residue between sequence from related species [ 2 ].

Wheat, as the most widely grown consumed crops, together with rice and maize contributes more than 60% of the calories and protein for our daily life [ 26 , 27 ]. To meet the demands of human population growth, it is necessary to achieve more understanding in wheat research and breeding in order to accelerate the production of wheat yield by 2050 [ 26 , 27 , 28 ]. Despite its importance, the improvement of wheat has been challenging as the researchers have to overcome the complexity of the wheat genome such as highly repetitive and large polyploid in order to get a fully sequenced reference genome [ 26 , 29 ]. Advances in next-generation sequencing (NGS) platforms and other bioinformatics tools have revealed the extensive structural rearrangements and complex gene content in wheat, which revolutionized wheat genomics with the improvement of wheat yield and its adaptation to diversed environments [ 26 , 29 ]. The NGS platforms allow the swift detection of DNA markers from the huge genome data in a short period of time. These NGS-based approaches have undoubtedly revolutionized the allele discovery and genotype-by-sequencing (GBS). By providing a high-quality reference genome of wheat in databases, it allows more sequence comparison between wheat and other species to find out more homologous gene. Moreover, the development of sequencing technologies in both high-throughput genotyping and read length, combining with biological databases, allow the rapid development of novel algorithm to complex wheat genome [ 29 , 30 ]. For instance, genome-wide association studies (GWAS) are an approach used in genome research which allows rapid screening of raw data to select specific regions with agronomic traits [ 29 , 31 ]. It allows multiple genetic variants across genome to be tested to study the genotype-phenotype association; thus, this method can be used to facilitate improvement in crop breeding via genomic selection and genetic modification [ 16 , 29 ].

Maize, a globally important crop, not only has a wide variety of uses in terms of economic impact, but can also serve as genetic model species in genotype to phenotype relationship in plant genomic studies [ 32 , 33 ]. Besides, due to its extremely high level of gene diversity, maize has high potential in the improvement of yield to meet the demands of population growth [ 33 ]. Despite the combination of economic and genomic impact, the progress in generating a whole genome sequence in maize has been a computational challenge due to the presence of tremendous structural variation (SV) in its genome [ 34 ]. The introduction of NGS techniques in several crops including maize allowed the rapid de novo genome sequencing and production of huge amount genomics and phenomics information [ 1 , 35 ]. A better integration of data within multiple genome assemblies is much needed to study the connection between phenotype and genotype in order to achieve yield and quality improvement of maize [ 35 ]. Nowadays, some user-friendly online databases such as qTeller, MaizeDIG, and MaizeMine are designed to ease the comparison and visualization of relationships between genotypes and phenotypes [ 36 ]. MaizeGDB, a model organism database for maize, provides the access of data on genes, alleles, molecular markers, metabolic pathway information, phenotypic images with description, and more which are useful for maize research [ 35 , 36 ]. MaizeMine is a data mining resource under MaizeGDB, which was designed to accelerate the genomics analysis by allowing the researchers to better script their own research data in downstream analysis [ 36 ] whereas MaizeDIG is a genotype-phenotype database which allows the users to link the association of genotype with phenotype expressed by image [ 35 , 36 ]. Cho et al. [ 35 ] reported that with the accessibility via image search tool, the relationship between a gene and its phenotype features can be visualized within image. The integration and visualization of high-quality data with these tools enables quick prioritizing phenotype of interest in crops, which play a crucial role in the improvement of plant breeding.

Bioinformatics for studying stress resistance in plants

The understanding of the stress response on plants is vital for the improvement of breeding efforts in agriculture, and to predict the fate of natural plants under abiotic change especially in the current era of continuous climate change [ 37 ]. Stress response in plants can be divided into biotic and abiotic. Biotic stress mainly refers to negative influence caused by living organism such as virus, fungi, bacteria, insects, nematodes, and weeds [ 38 ] while abiotic stress refers to factors such as extreme temperature, drought, flood, salinity, and radiation which dramatically affect the crop yield [ 37 ]. NGS technologies and other potent computational tools, which allowed sequencing of whole genome and transcriptome, have led to the extensive studies of plants towards stress response on a molecular basis [ 1 , 2 , 37 ]. The tremendous amount of plant genome data obtained from genome sequencing allows the investigation of correlations between the molecular backbone of living organism and their adaptations towards the environment [ 16 ].

Biotic and abiotic stress management

How the plants and crops respond towards stress environment is the key to ensure their growth and development, and to avoid the great crop yield penalty caused by harsh condition [ 35 , 39 ]. Therefore, the utilization of bioinformatic tools is important to study and analyse the plant transcriptome in response to biotic and abiotic stress. Besides, the application of bioinformatics tools on plants and crops genome can benefit the agricultural community by searching the desired gene among genome from different species and elucidate their function on the crops [ 35 ]. The genome databases play a crucial role in storing and mining large and complex genome sequence from the plants. Besides data storage, some genome databases are also able to perform gene expression profiling to predict the pattern of gene expressed at the level of transcript in cell or tissues. By using in silico genomic technologies, the disease resistance gene-enzyme with their respective transcription factor, which plays a role in defence mechanism against stress, are able to be identified [ 40 , 41 ]. For instance, a large-scale transcriptome sequencing of chrysanthemum plants was carried out by Xu et al. [ 40 ] to study the dehydration stress in chrysanthemum plants. An online database called Chrysanthemum Transcriptome Database ( http://www.icugi.org/chrysanthemum ) was developed to allow the storage and distribution of transcriptome sequence and its analysis result among research community [ 40 ]. With the aid of different protein databases, the biochemical pathway and kinase activity of chrysanthemum in response to dehydration stress are able to be predicted [ 40 ]. Xu et al. [ 40 ] also reported a total of 306 transcription factor and 228 protein kinase that are important upstream regulator in plants when encountered with various biotic and abiotic stresses.

Bioinformatics approaches to study resistance to plant pathogen

One of the challenges in modern agriculture to supply the nutrition’s demand along with the world population growth is the crop loss due to disease. The study of plant pathogen plays an essential role in the study of plant diseases, including pathogen identification, disease aetiology, disease resistance, and economic impact, among others [ 41 ]. Plants protect themselves through a complex defence system against variety of pathogen, including insects, bacteria, fungi, and viruses. Plant-pathogen interaction is a multicomponent system mediated by the detection of pathogen-derived molecules in the form of protein, sugar, and polysaccharide, by pattern recognition receptor (PRRs) within the plants [ 42 , 43 , 44 , 45 ]. After the recognition of enemy molecules, signal transduction is carried out accordingly and plant immune systems will respond defensively through different pathways involving different genes [ 42 ]. According to Schneider et al. [ 46 ], the development of molecular plant pathology can be broadly divided into three eras, begins with the disease physiology starting from early 1900s until 1980s [ 46 ]. In the second era of molecular plant genetic studies, one or a few genes of bacterial pathogens were focused whereas the third era of plant genomic studies began in 2000 with the sequencing of genome, and the first complete genome of bacterial pathogen, Xylella fastidiosa , was obtained [ 46 ]. The recent advance in DNA sequence technologies allow researchers to study the immune system of plants on genomic and transcriptomics level [ 1 , 41 , 42 ]. Genomics has revealed the mystery and complexity and consequently the various information about phytopathogen. A clearer picture of plant-pathogen interactions in the context of transcriptomic and proteomics can be visualized through the application of different bioinformatics tools, which in turn made feasible the engineering resistance to microbial pathogen in plant [ 43 ].

PRGdb: bioinformatics web for plant pathogen resistance gene analysis

Plants have developed a wide range of defence mechanism against different pathogen and ultimately inhibit growth and spread of pathogen [ 47 , 48 ]. Plant defence system is mediated by resistance (R) gene [ 47 ]. R gene plays an important role in defence mechanism. They encode for protein that recognizes specific avirulent (Avr) pathogen proteins and initiated the defence mechanism through one or more signal transduction pathway in a hypersensitive response (HR) [ 41 , 47 , 48 ]. However, the essential components needed for protein to exert their resistance are still unidentified [ 48 ]. With the intention to study and identify more novel R gene, high-throughput genomic experiments and plant genomic sequence are essential to explore their function and new R gene discovery [ 47 ]. In 2009, Plant Disease Resistance Gene database (PRGdb), a comprehensive bioinformatics resource across hundreds of plant species, was launched in order to facilitate the plant genome research on discovery and predict plant disease resistance gene [ 47 , 48 ]. To date, PRGdb 3.0 has been released with 153 reference resistance genes and 177,072 annotated candidate pathogen receptor genes (PRGs) [ 49 ]. This database act as an important reference site and repository to all the research studies on exploration and use of plant resistance genes [ 48 , 49 ].

Apart from resistance gene storage, this easily accessible platform also allows different tools that are essential for exploration and discovery of novel R gene. For instance, the DRAGO 2.0 tool, which was built to explore known and novel disease resistance gene, can be launched on any transcriptome or proteome to annotate and predict PRG from DNA or amino acid with high accuracy [ 49 ]. Besides, BLAST search tools available in PRGdb provide comparison of different sequences which allowed the determination of gene homology and expression analysis. Apart from the database, plant pathology field also benefited from whole genome sequence technologies. The new DNA sequencing technologies such as NGS and Sanger sequencing allowed the study of genomics, proteomics, metabolomics, and transcriptomics on both the host plant and the pathogen [ 1 ]. The phytopathogen genomes which have been sequenced are expected to provide valuable information on the molecular basis for infection of plant host and explore the potential novel virulence factors [ 1 ]. Figure 2 illustrates a brief process involved in producing stress-resistant plant using bioinformatics approach.

figure 2

Brief process involved in producing stress-resistant plant using bioinformatics approach

Metagenomics in plant biotechnology and Cas9 modification

The effects of environment microorganisms’ community, especially soil microorganism on plants, may contribute to plant’s growth and pathogenesis. Through metagenomics approaches, the soil microorganism community that contributed to plant growth may provide a great genomic insight into physiology and pathology [ 50 , 51 , 52 , 53 ]. In metagenomics approaches, the overall genetic materials obtained from soil are sequenced and advancing to microbial community analysis via data analytics [ 53 , 54 , 55 ]. The extracted genetic materials from the soil were subjected to high-throughput metagenomics analysis via various NGS approaches such as 16S rRNA sequencing, shotgun metagenomic sequencing, MiSeq sequencing [ 54 , 55 , 56 ] for microbial species identification, functional genomics study, and structural metagenomic analysis. A NGS produces huge genomics data for each study; thus, application of bioinformatics tools would add value in the metagenomics analysis as the target genes identified could advance into elucidation of plant growth, plant disease, soil contamination, and microbial taxonomy [ 52 ]. For example, the use of UNITE ( https://unite.ut.ee/ ) for fungi identification [ 57 ], SILVA ( https://www.arb-silva.de/ ) for 16S rRNA [ 58 ], and MGnify ( https://www.ebi.ac.uk/metagenomics/ ) possesses metagenomics data of microbiome [ 59 ]. These databases allow the researchers to retrieve and analyse the relevant metagenomic sequenced data for a specific study.

Since metagenomics analysis provides the greater output on plant-microbe interaction, the genes that are responsible for plant immunity may play a crucial role in protecting against disease-causing microorganism [ 60 , 61 ]. With the emergence of Clustered Regularly Interspaced Short Palindrome Repeats (CRISPR) gene editing technique, Cas9 modification could produce a better plant trait and disease-resistant plant [ 62 , 63 ]. The CRISPR/Cas9 system is employed in studying the functional genomics in plants in relation to plant-microbe interaction. CRISPR/Cas9 system facilitated the gene editing by creating a mutant through double-stranded break forming a targeted gene mutation and followed by genome repair [ 63 , 64 , 65 ]. The CRISPR/Cas9 modification on OsSWEET14 genes protects the Super Basmati Rice from bacterial blight causes by Xanthomonas oryzae pv. oryzae [ 66 ]. Gene editing to knockout OsMPK5 and OsERF922 genes in rice protects against Magnaporthe grisea and Magnaporthe oryzae , respectively [ 67 , 68 , 69 ]. Besides that, Cas9 modification on Cs WRKY22 and TcNPR3 increased host defence immunity through regulating salicylic acid in Citrus sinensis and Theobroma cacao , respectively [ 70 , 71 ]. Thus, CRISPR/Cas9 modification could be one of important science advancements to validate the metagenomics analysis on plant-microbe interaction.

Current challenges of bioinformatics applications in plant biotechnology

Despite the beneficial prospect of the bioinformatics applied in plant biotechnology, there are many challenges and limitations must be addressed in order to fully utilize their potentials [ 1 ]. Along with the rapid growth in plant genome data mining and database development, there are a few challenges faced by bioinformaticians and scientists which can be divided into number of areas as mentioned in the subsections below.

Bioinformatic data management and organization and synchronize update resources

Since the introduction of the next-generation sequencing (NGS), which is commercially available in 2004, enormous amount of data has been generated in plant genome research. Thousands of Gb of plants sequences are deposited in various public databases monthly [ 1 , 72 , 73 ]. Moreover, the constantly sequenced and re-sequenced of the plant genome has developed a vast amount of new genome sequence in all public databases. The increase in sequenced plant genome driven by technological improvement has led to a problem that arises along with the storage and update of a large amount of data [ 72 , 74 ]. The update process should occur in all the comparative databases, not just solely individual genome database [ 72 ]. With this, the synchronized update of genome data resources among different plant genomic platform is able to provide a strong, updated, reliable database community that all the plant researchers can rely on [ 72 ].

Complexity of plant genetic content

Other than the tremendous amount of genome sequence generated, the complexity of the plant genetic content is also a challenging issue faced by plant research community. Even though the arrival of next-generation sequencing technologies has allowed the rapid DNA sequencing for non-model or orphan plant species, the sequencing pace for plants is far from that of animal and microorganism [ 74 ]. The main factor which contributes to this situation is because sometimes the plant genome can be nearly hundred times larger than the currently sequenced animal and microorganism genome [ 73 ]. Needless to say, some of the plant genome even can have polyploidy, a duplication of an entire genome, which is estimated to occur in 80% of the plant species [ 73 , 75 ]. According to Schatz et al., the genome assembly in the case of large size plant genome with abundance of repetitive sequence can be metaphorically described as build-up of a large puzzle consisting of blue sky separated by nearly indistinguishable white clouds of small gene [ 73 ]. The particular reason for this is mainly because the sequence length in NGS is relatively shorter than in Sanger sequencing and required dedicated assembly algorithm [ 74 ]. Therefore, most plant genomes sequenced by NGS can only be used for establishing gene catalogues, interpreting the repeat content, glimpsing evolutionary mechanism, and performing on comparative genomics in early study [ 74 ].

Advance in sequencing technologies

There are two basic approaches to genome assembly, i.e. comparative genome assembly and de novo genome assembly [ 75 ]. It is important to distinguish between these two different approaches. Comparative is a reference-guided method which use a genome or transcriptome, or both, for guidance, whereas de novo assembly refers to reconstruction of a genome from organisms that have not been sequenced before [ 74 , 75 ]. Table 2 compares some of the available assembly and NGS technology available for genome sequencing. However, these two approaches are not completely exclusive due to a lack of bioinformatic tools designed to cope with the unique and challenging features of plant genomes [ 74 , 75 ]. One of the biggest challenges in the development of bioinformatic software is the algorithm development [ 76 ]. As is known, all the programmes or software in bioinformatic are very computationally intensive. As most of the assemblies available now solely rely on single assembly, a development in better algorithm in terms of resource requirement is essential for combining different assemblers by using a different underlying algorithm in order to give a more credible final assembly [ 74 , 76 ].

Database accessibility

To date, there are about 374,000 known plant species in the world [ 77 ]. The first full plant genome sequencing was completed on A rabidopsis thaliana through Sanger sequencing methods in 2000 [ 78 ]. Although introduction of molecular biology decades ago may have facilitated the species identification, obtaining the full plant genomic data remains challenging due to the genome complexity. The development of NGS platform may foster the plant genome sequencing, yet there are limited sequenced datasets reposited to the database. To date, there are only 29 plant genome databases accessible in PlantGDB genome browser allowing researchers to retrieve the information about gene structure, matched GSS contigs, similar protein, spliced alignments EST, etc. Besides, the PlaD database ( http://systbio.cau.edu.cn/plad/index.php ) that focuses on the microarray data of the plants developed by China Agricultural University comprises transcriptomic database for plant defence against pathogen. However, it is limited to Arabidopsis , rice, maize, and wheat [ 79 ]. The Plant Omics Data Center ( http://plantomics.mind.meiji.ac.jp/podc/ ) is another publicly available web-based plant database featuring omics data for co-expressed profile, regulatory network, and plant ontology information [ 80 ]. Although curated omics datasets could be retrieved from PODC, information are restricted for certain plants and crops such as Arabidopsis , tobacco, earthmoss, barrelclover, soybean, potato, rice, tomato, grape, maize, and sorghum. Furthermore, all these publicly available databases require constant updating with new released data or resequencing data so that the researcher could obtain the most updated version of genome datasets for their research.

The application of bioinformatics in plant biotechnology represents a fundamental shift in the way scientists study living organisms. Bioinformatics play a significant role in the development of agriculture sector as it helps to study the stress resistance and plant pathogen, which are critical in advancing crop breeding [ 75 ]. NGS and other sequencing technologies will make more plant genome data accessible in all public databases and enable the identification of genomic variants and prediction of protein structure and function [ 75 , 76 ]. Moreover, GWAS, which allows the identification of loci and allelic variation related to valuable traits, eased the crop modification and improvement [ 74 ]. In brief, the advance in bioinformatics application in plant biotechnology enables researchers to achieve fundamental and systematic understanding of economically important plant. However, despite all these exciting achievement by the application of bioinformatic on plant biotechnology, it is still a long way from automated full genome sequencing and assembly at a low cost [ 76 ]. There is a critical need for effective bioinformatic tools which are able to provide longer reads with unbiased coverage in order to overcome the complexity of the plant’s genome. To achieve this, an enhanced algorithm development is essential to enable data mining and analysis, comparison, and so on. Therefore, bioinformaticians and experts with mathematical and programming skills will play an important role in bringing fresh approaches and knowledge into bioinformatics, not only for the advancement in plant biotechnology and agriculture sector, but the future of humanity as well.

Availability of data and materials

Not applicable.

Abbreviations

Genome-wide association studies

Next-generation sequencing

Plant Disease Resistance Gene database

RNA sequencing

Single-nucleotide polymorphism

Gomez-Casati DF, Busi MV, Barchiesi J, Peralta DA, Hedin N, Bhadauria V (2018) Applications of bioinformatics to plant biotechnology. Curr Issues Mol Biol 27:89–104. https://doi.org/10.21775/cimb.027.089

Article   Google Scholar  

Zhang SY, Liu SL (2013) Bioinformatics. In: Maloy S, Hughes K (eds) Brenner’s Encyclopedia of Genetics, 2nd edn. Academic Press, London. https://doi.org/10.1016/B978-0-12-374984-0.00155-8

Chapter   Google Scholar  

Tiwari A, Singh P, Kumawat S (2020) Applications of bioinformatics in plant breeding system. Int J Curr Microbial App Sci. 11:2825–2831

Google Scholar  

Rhee SY, Dickerson J, Xu D (2006) Bioinformatics and its applications in plant biology. Annu Rev Plant Biol 57:335–360. https://doi.org/10.1146/annurev.arplant.56.032604.144103

Normand EA, Van den Veyyer IB (2019) Next-generation sequencing for gene panels and clinical exomes. In: Leung PCK, Qiao J (eds) Human Reproductive and Prenatal Genetics, 1st edn. Academic Press, London. https://doi.org/10.1016/B978-0-12-813570-9.00025-5

Blätke MA, Szymanski JJ, Gladilin E, Scholz U, Beier S (2021) Editorial: advances in applied bioinformatics in crops. Front Plant Sci 12:640394. https://doi.org/10.3389/fpls.2021.640394

Kushwaha UKS, Deo I, Jaiswal JP, Prasad B (2017) Role of bioinformatics in crop improvement. Glob J Sci Front Res D Agric Vet 17(1):13–23

Caligari PDS, Brown J (2017) Plant Breeding, Practice. In: Thomas B, Murray BG, Murphy DJ (eds) Encyclopedia of Applied Plant Sciences, 2nd edn. Academic Press, London. https://doi.org/10.1016/B978-0-12-394807-6.00195-7

Yu J, Jung S, Cheng CH, Lee T, Zheng P, Buble K et al (2021) CottonGen: the community database for cotton genomics, genetics, and breeding research. Plants. 10(12):2805. https://doi.org/10.3390/plants10122805

Sayers EW, Bolton EE, Brister JR, Canese K, Chan J, Comeau DC et al (2022) Database resources of the national center for biotechnology information. Nucleic Acids Res 50(D1):D20–D26. https://doi.org/10.1093/nar/gkab1112

Howe KL, Contreras-Moreira B, De Silva N, Maslen G, Akanni W, Allen J et al (2019) Ensembl Genomes 2020 – enabling non-vertebrate genomic research. Nucleic Acids Res 48(D1):D689–D695. https://doi.org/10.1093/nar/gkz890

Bolser D, Staines DM, Pritchard E, Kersey P (2016) Ensembl plants: integrating tools for visualizing, mining, and analyzing plant genomics data. In: Edwards D (ed) Plant Bioinformatics. Methods in Molecular Biology, vol 1374. Humana Press. https://doi.org/10.1007/978-1-4939-3167-5_6

Jhansi Rani S, Usha R (2013) Transgenic plants: Types, benefits, public concerns and future. J Pharm Res 6(8):879–883. https://doi.org/10.1016/j.jopr.2013.08.008

Barragán-Ocaña A, Reyes-Ruiz G, Olmos-Peña S, Gómez-Viquez H (2019) Transgenic crops: trends and dynamics in the world and in Latin America. Transgenic Res 28(3-4):391–399. https://doi.org/10.1007/s11248-019-00123-8

Platten JD, Cobb JN, Zantua RE (2019) Criteria for evaluating molecular markers: Comprehensive quality metrics to improve marker-assisted selection. PLoS One 14(1):e0210529. https://doi.org/10.1371/journal.pone.0210529

Filho HA, Machicao J, Bruno OM (2018) A hierarchical model of metabolic machinery based on the kcore decomposition of plant metabolic networks. PLoS One 13(5):e0195843. https://doi.org/10.1371/journal.pone.0195843

Mammadov J, Aggarwal R, Buyyarapu R, Kumpatla S (2012) SNP markers and their impact on plant breeding. Int J Plant Genomics 728398:1–11. https://doi.org/10.1155/2012/728398

Hoskins RA, Phan AC, Naeemuddin M, Mapa FA, Ruddy DA, Ryan JJ et al (2001) Single nucleotide polymorphism markers for genetics mapping in Drosophila melanogaster . Genome Res 11(6):1100–1113. https://doi.org/10.1101/gr.gr-1780r

Edwards D, Batley J (2010) Plant genome sequencing: applications for crop improvement. Plant Biotechnol J 8(1):2–9. https://doi.org/10.1111/j.1467-7652.2009.00459.x

Tang G, Qin J, Dolnikowski GG, Russell RM, Grusak MA (2009) Golden Rice is an effective source of vitamin A. Am J Clin Nutr 89(6):1776–1783. https://doi.org/10.3945/ajcn.2008.27119

Yu J, Hu S, Wang J, Wong GKS, Li S, Liu B et al (2002) A draft sequence of the rice genome ( Oryza sativa L. ssp. Indica ). Science. 296(5565):79–92. https://doi.org/10.1126/science.1068037

Song S, Tian D, Zhang Z, Hu S, Yu J (2018) Rice genomics: over the past two decades and into the future. Genomics Proteomics Bioinformatics 16(6):397–404. https://doi.org/10.1016/j.gpb.2019.01.001

Jackson SA (2016) Rice: The First Crop Genome. Rice. 9(14). https://doi.org/10.1186/s12284-016-0087-4

Jain R, Jenkins J, Shu S, Chern M, Martin JA, Copetti D et al (2019) Genome sequence of the model rice variety KitaakeX. BMC Genomics 20(905). https://doi.org/10.1186/s12864-019-6262-4

Vassilev D, Leunissen J, Atanassov A, Nenov A, Dimov G (2005) Application of bioinformatics in plant breeding. Biotechnol Biotechnol Equip 19(sup3):139–152. https://doi.org/10.1080/13102818.2005.10817293

Walkowiak S, Gao L, Monat C, Haberer G, Kassa MT, Brinton J et al (2020) Multiple wheat genomes reveal global variation in modern breeding. Nature. 588(7837):277–283. https://doi.org/10.1038/s41586-020-2961-x

Appels R, Eversole K, Stein N, Feuillet C, Keller B, Rogers J et al (2018) Shifting the limits in wheat research and breeding using a fully annotated reference genome. Science. 361(6403). https://doi.org/10.1126/science.aar7191

Gill BS, Appels R, Borta-Oberholster AM, Buell CR, Bennetzen JL, Chalhoub B et al (2004) A workshop report on wheat genome sequencing: International Genome Research on Wheat Consortium. Genetics. 168(2):1087–1096. https://doi.org/10.1534/genetics.104.034769

Babu P, Baranwal DK, Harikrishna PD, Bharti H, Joshi P et al (2020) Application of genomics tools in wheat breeding to attain durable rust resistance. Front Plant Sci 11:567147. https://doi.org/10.3389/fpls.2020.567147

Guan J, Garcia DF, Zhou Y, Appels R, Li A, Mao L (2020) The battle to sequence the bread wheat genome: a tale of the three kingdoms. Genomics Proteomics Bioinformatics 18(3):221–229. https://doi.org/10.1016/j.gpb.2019.09.005

Bolser D, Staines DM, Pritchard E, Kersey P (2016) Ensembl plants: integrating tools for visualizing, mining and analyzing plant genomics data. Methods Mol Biol 1374:115–140. https://doi.org/10.1007/978-1-4939-3167-5_6

Haberer G, Young S, Bharti AK, Gundlach H, Raymond C, Fuks G et al (2005) Structure and architecture of the maize genome. Plant Physiol 139(4):1612–1624. https://doi.org/10.1104/pp.105.068718

Li C, Song W, Luo Y, Gao S, Zhang R, Shi Z et al (2019) The HuangZaoSi maize genome provides insights into genomic variation and improvement history of maize. Mol Plant 12(3):402–409. https://doi.org/10.1016/j.molp.2019.02.009

Lu F, Romay MC, Glaubitz JC, Bradbury PJ, Elshire RJ, Wang T et al (2015) High-resolution genetic mapping of maize pan-genome sequence anchors. Nat Commun 6:6914. https://doi.org/10.1038/ncomms7914

Cho KT, Portwood JL, Gardiner JM, Harper LC, Lawrence-Dill CJ, Friedberg I et al (2019) MaizeDIG: maize database of images and genomes. Front Plant Sci 10:1050. https://doi.org/10.3389/fpls.2019.01050

Portwood JL, Woodhouse MR, Cannon EK, Gardiner JM, Harper LC, Schaeffer ML et al (2018) MaizeGDB 2018: the maize multi-genome genetics and genomics database. Nucleic Acids Res 47(D1):D1146–D1154. https://doi.org/10.1093/nar/gky1046

Ambrosino L, Colantuono C, Diretto G, Fiore A, Chiusano ML (2020) Bioinformatics resources for plant abiotic stress responses: state of the art and opportunities in the fast evolving -omics era. Plants. 9(5):591. https://doi.org/10.3390/plants9050591

Singla J, Krattinger SG (2016) Biotic stress resistance genes in wheat. Reference Module in Food Science. https://doi.org/10.1016/B978-0-08-100596-5.00229-8

Costa MCD, Farrant JM (2019) Plant resistance to abiotic stresses. Plants (Basel) 8(12):553. https://doi.org/10.3390/plants8120553

Xu Y, Gao S, Yang Y, Huang M, Cheng L, Wei Q et al (2013) Transcriptome sequencing and whole genome expression profiling of chrysanthemum under dehydration stress. BMC Genomics 14:662. https://doi.org/10.1186/1471-2164-14-662

Nishad R, Ahmed T, Rahman VJ, Kareem A (2020) Modulation of plant defense system in response to microbial interactions. Front Microbiol 11:1298. https://doi.org/10.3389/fmicb.2020.01298

Andersen EJ, Ali S, Byamukama E, Yen Y, Nepal MP (2018) Disease resistance mechanisms in plants. Genes (Basel) 9(7):339. https://doi.org/10.3390/genes9070339

Dong OX, Ronald PC (2019) Genetic engineering for disease resistance in plants: recent progress and future perspectives. Plant Physiol 180(1):26–38. https://doi.org/10.1104/pp.18.01224

Abdulkhair WM, Alghuthaymi MA (2016) Plant pathogens. In: Rigobelo EC (ed) Plant Growth, 1st edn. InTechOpen. https://doi.org/10.5772/65325 Available from: https://www.intechopen.com/chapters/52387

Gupta R, Lee SE, Agrawal GK, Rakwal R, Sangryeol P, Wang Y et al (2015) Understanding the plant-pathogen interactions in the context of proteomics-generated apoplastic proteins inventory. Front Plant Sci 6:352. https://doi.org/10.3389/fpls.2015.00352

Schneider DJ, Collmer A (2010) Studying plant-pathogen interactions in the genomics era: beyond Molecular Koch’s postulates to systems biology. Annu Rev Phytopathol 48:457–479. https://doi.org/10.1146/annurev-phyto-073009-114411

Sanseverino W, Hermoso A, D’Alessandro R, Vlasova A, Andolfo G, Frusciante L et al (2013) PRGdb 2.0: towards a community-based database model for the analysis of R-genes in plants. Nucleic Acids Res 41(Database Issue):D1167–D1171. https://doi.org/10.1093/nar/gks1183

Sanseverino W, Roma G, Simone MD, Faino L, Melito S, Stupka E et al (2010) PRGdb: a bioinformatics platform for plant resistance gene analysis. Nucleic Acids Res 38(Database Issue):D814–D821. https://doi.org/10.1093/nar/gkp978

Osuna-Cruz CM, Paytuvi-Gallart A, Donato AD, Sundesha V, Andolfo G, Cigliano RA et al (2018) PRGdb 3.0: a comprehensive platform for prediction and analysis of plant disease resistance genes. Nucleic Acids Res 46(D1):D1197–D1201. https://doi.org/10.1093/nar/gkx1119

Hily JM, Demanèche S, Poulicard N, Tannières M, Djennane S, Beuve M et al (2018) Metagenomic-based impact study of transgenic grapevine rootstock on its associated virome and soil bacteriome. Plant Biotechnol J 16(1):208–220. https://doi.org/10.1111/pbi.12761

Fadiji AE, Babalola OO (2020) Metagenomics methods for the study of plant-associated microbial communities: a review. J Microbiol Methods 70:105860. https://doi.org/10.1016/j.mimet.2020.105860

Piombo E, Abdelfattah A, Droby S, Wisniewski M, Spadaro D, Schena L (2021) Metagenomics approaches for the detection and surveillance of emerging and recurrent plant pathogens. Microorganisms. 9(1):188. https://doi.org/10.3390/microorganisms9010188

Chaudhary P, Khati P, Chaudhary A, Maithani D, Kumar G, Sharma A (2021) Cultivable and metagenomic approach to study the combined impact of nanogypsum and Pseudomonas taiwanensis on maize plant health and its rhizospheric microbiome. PLoS One 16(4):e0250574. https://doi.org/10.1371/journal.pone.0250574

Chukwuneme CF, Ayangbenro AS, Babalola OO (2021) Metagenomic analyses of plant growth-promoting and carbon-cycling genes in maize rhizosphere soils with distinct land-use and management histories. Genes (Basel) 12(9):1431. https://doi.org/10.3390/genes12091431

Zhao J, Ma J, Yang Y, Yu H, Zhang S, Chen F (2021) Response of soil microbial community to vegetation reconstruction modes in mining areas of the Loess Plateau, China. Front Microbiol 12:714967. https://doi.org/10.3389/fmicb.2021.714967

Babalola OO, Fadiji AE, Ayangbenro AS (2020) Shotgun metagenomic data of root endophytic microbiome of maize ( Zea mays L.). Data Brief 31(105893). https://doi.org/10.1016/j.dib.2020.105893

Nilsson RH, Larsson KH, Taylor AFS, Bengtsson-Palme J, Jeppesen TS, Schigel D et al (2019) The UNITE database for molecular identification of fungi: handling dark taxa and parallel taxonomic classifications. Nucleic Acids Res 47(D1):D259–D264. https://doi.org/10.1093/nar/gky1022

Quast C, Pruesse E, Yilmaz P et al (2013) The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res 41(Database issue):D590–D596. https://doi.org/10.1093/nar/gks1219

Mitchell AL, Almeida A, Beracochea M, Boland M, Burgin J, Cochrane G et al (2020) MGnify: the microbiome analysis resource in 2020. Nucleic Acids Res 48(D1):D570–D578. https://doi.org/10.1093/nar/gkz1035

Musidlak O, Buchwald W, Nawrot R (2014) Plant defense responses against viral and bacterial pathogen infections. Focus on RNA-binding proteins (RBPs). Herba Polonica 60:60–73. https://doi.org/10.1515/hepo-2015-0005

Silva MS, Arraes FBM, Campos MDA, Grossi-de-Sa M, Fernandez D, Cândido EDS et al (2018) Review: potential biotechnological assets related to plant immunity modulation applicable in engineering disease-resistant crops. Plant Sci 270:72–84. https://doi.org/10.1016/j.plantsci.2018.02.013

Feng Z, Zhang B, Ding W, Liu X, Yang DL, Wei P et al (2013) Efficient genome editing in plants using a CRISPR/Cas system. Cell Res 23(10):1229–1232. https://doi.org/10.1038/cr.2013.114

Wada N, Ueta R, Osakabe Y, Osakabe K (2020) Precision genome editing in plants: state-of-the-art in CRISPR/Cas9-based genome engineering. BMC Plant Biol 20:234. https://doi.org/10.1186/s12870-020-02385-5

Nekrasov V, Staskawicz B, Weigel D, Jones JD, Kamoun S (2013) Targeted mutagenesis in the model plant Nicotiana benthamiana using Cas9 RNA-guided endonuclease. Nat Biotechnol 31(8):691–693. https://doi.org/10.1038/nbt.2655

Langner T, Kamoun S, Belhaj K (2018) CRISPR crops: plant genome editing toward disease resistance. Annu Rev Phytopathol 56:479–512. https://doi.org/10.1146/annurev-phyto-080417-050158

Zafar K, Khan MZ, Amin I, Mukhtar Z, Yasmin S, Arif M et al (2020) Precise CRISPR-Cas9 mediated genome editing in super basmati rice for resistance against bacterial blight by targeting the major susceptibility gene. Front Plant Sci 11:575. https://doi.org/10.3389/fpls.2020.00575

Xie K, Yang Y (2013) RNA-guided genome editing in plants using a CRISPR-Cas system. Mol Plant 6(6):1975–1983. https://doi.org/10.1093/mp/sst119

Wang F, Wang C, Liu P, Lei C, Hao W, Gao Y et al (2016) Enhanced rice blast resistance by CRISPR/Cas9-targeted mutagenesis of the ERF transcription factor gene OsERF922. PLoS One 11(4):e0154027. https://doi.org/10.1371/journal.pone.0154027

Oliva R, Ji C, Atienza-Grande G, Huguet-Tapia JC, Perez-Quintero A, Li T et al (2019) Broad-spectrum resistance to bacterial blight in rice using genome editing. Nat Biotechnol 37(11):1344–1350. https://doi.org/10.1038/s41587-019-0267-z

Wang L, Chen S, Peng A, Xie Z, He Y, Zou X (2019) CRISPR/CAS9 -mediated editing of CsWRKY22 reduces susceptibility to Xanthomonas citri subsp. citri in Wanjincheng orange ( Citrus sinensis (L.) Osbeck). Plant Biotechnol Rep 13(5):501–510. https://doi.org/10.1007/s11816-019-00556-x

Fister AS, Landherr L, Maximova SN, Guiltinan MJ (2018) Transient expression of CRISPR/Cas9 machinery targeting TcNPR3 Enhances defense response in theobroma cacao. Front Plant Sci 9:268. https://doi.org/10.3389/fpls.2018.00268

Ong Q, Nguyen P, Thao NP, Le L (2016) Bioinformatics approach in plant genomic research. Curr Genomics 17(4):368–378. https://doi.org/10.2174/1389202917666160331202956

Schatz MC, Witkowski J, McCombie WR (2012) Current challenges in de novo plant genome sequencing and assembly. Genome Biol 13(4):243. https://doi.org/10.1186/gb-2012-13-4-243

Claros MG, Bautista R, Guerrero-Fernández D, Benzerki H, Seoane P, Fernández-Pozo N (2012) Why assembling plant genome sequences is so challenging. Biology (Basel) 1(2):439–459. https://doi.org/10.3390/biology1020439

Kyriakidou M, Tai HH, Anglin NL, Ellis D, Strömvik MV (2018) Current strategies of polyploid plant genome sequence assembly. Front Plant Sci 9:1660. https://doi.org/10.3389/fpls.2018.01660

Mathur M (2018) Bioinformatics challenges: a review. Int J Adv Sci Res 3(6):29–33

Fazan L, Song YG, Kozlowski G (2020) The woody planet: from past triumph to manmade decline. Plants (Basel) 9(11):1593. https://doi.org/10.3390/plants9111593

Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 408(6814):796–815. https://doi.org/10.1038/35048692

Qi H, Jiang Z, Zhang K, Yang S, He F, Zhang Z (2018) PlaD: a transcriptomics database for plant defense responses to pathogens, providing new insights into plant immune system. Genomics Proteomics Bioinformatics 16(4):283–293. https://doi.org/10.1016/j.gpb.2018.08.002

Ohyanagi H, Takano T, Terashima S, Kobayashi M, Kanno M, Morimoto K et al (2015) Plant Omics Data Center: an integrated web repository for interspecies gene expression networks with NLP-based curation. Plant Cell Physiol 56(1):e9. https://doi.org/10.1093/pcp/pcu188

Download references

Acknowledgements

The authors wish to thank Prof. Hoe I. Ling of Columbia University (New York, USA) for his editorial input and proofread the manuscript.

Author information

Authors and affiliations.

Division of Applied Biomedical Sciences and Biotechnology, School of Health Sciences, International Medical University, 126 Jalan Jalil Perkasa 19, Bukit Jalil, 57000, Kuala Lumpur, Malaysia

Yung Cheng Tan, Asqwin Uthaya Kumar, Ying Pei Wong & Anna Pick Kiong Ling

School of Biosciences and Biotechnology, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600, Bangi, Malaysia

Asqwin Uthaya Kumar

You can also search for this author in PubMed   Google Scholar

Contributions

YCT designed the content and was a major contributor in writing the manuscript. AUK and YPW edited the manuscript. APKL designed and edited the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Anna Pick Kiong Ling .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Tan, Y.C., Kumar, A.U., Wong, Y.P. et al. Bioinformatics approaches and applications in plant biotechnology. J Genet Eng Biotechnol 20 , 106 (2022). https://doi.org/10.1186/s43141-022-00394-5

Download citation

Received : 30 November 2021

Accepted : 05 July 2022

Published : 15 July 2022

DOI : https://doi.org/10.1186/s43141-022-00394-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Bioinformatics
  • Biotic and abiotic
  • Plant breeding
  • Plant sequencing
  • Plant pathogen
  • PRGdb sequence analysis

research paper on plant biotechnology

research paper on plant biotechnology

Recent Advances in Plant Biotechnology

  • © 2009
  • Ara Kirakosyan 0 ,
  • Peter B. Kaufman 1

University of Michigan, Ann Arbor, U.S.A.

You can also search for this author in PubMed   Google Scholar

  • Presents a full overview of plant biotechnology from the history to applications
  • Approach includes associated risks and the effects of plant biotechnology on global warming, alternative energy initiatives, food production, and medicine
  • Includes supplementary material: sn.pub/extras

33k Accesses

80 Citations

4 Altmetric

This is a preview of subscription content, log in via an institution to check access.

Access this book

  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
  • Durable hardcover edition

Tax calculation will be finalised at checkout

Other ways to access

Licence this eBook for your library

Institutional subscriptions

Table of contents (16 chapters)

Front matter, plant biotechnology from inception to the present, overview of plant biotechnology from its early roots to the present.

  • Ara Kirakosyan, Peter B. Kaufman, Leland J. Cseke

The Use of Plant Cell Biotechnology for the Production of Phytochemicals

  • Ara Kirakosyan, Leland J. Cseke, Peter B. Kaufman

Molecular Farming of Antibodies in Plants

  • Rainer Fischer, Stefan Schillberg, Richard M. Twyman

Use of Cyanobacterial Proteins to Engineer New Crops

  • Matias D. Zurbriggen, Néstor Carrillo, Mohammad-Reza Hajirezaei

Molecular Biology of Secondary Metabolism: Case Study for Glycyrrhiza Plants

  • Hiroaki Hayashi

Applications of Plant Biotechnology in Agriculture and Industry

New developments in agricultural and industrial plant biotechnology, phytoremediation: the wave of the future.

  • Jerry S. Succuro, Steven S. McDonald, Casey R. Lu

Biotechnology of the Rhizosphere

  • Beatriz Ramos Solano, Jorge Barriuso Maicas, Javier Gutierrez Mañero

Plants as Sources of Energy

  • Leland J. Cseke, Gopi K. Podila, Ara Kirakosyan, Peter B. Kaufman

Use of Plant Secondary Metabolites in Medicine and Nutrition

Interactions of bioactive plant metabolites: synergism, antagonism, and additivity.

  • John Boik, Ara Kirakosyan, Peter B. Kaufman, E. Mitchell Seymour, Kevin Spelman

The Use of Selected Medicinal Herbs for Chemoprevention and Treatment of Cancer, Parkinson’s Disease, Heart Disease, and Depression

  • Maureen McKenzie, Carl Li, Peter B. Kaufman, E. Mitchell Seymour, Ara Kirakosyan

Regulating Phytonutrient Levels in Plants – Toward Modification of Plant Metabolism for Human Health

Risks and benefits associated with plant biotechnology, risks and benefits associated with genetically modified (gm) plants.

  • Peter B. Kaufman, Soo Chul Chang, Ara Kirakosyan

Risks Involved in the Use of Herbal Products

  • Peter B. Kaufman, Maureen McKenzie, Ara Kirakosyan

Risks Associated with Overcollection of Medicinal Plants in Natural Habitats

  • Maureen McKenzie, Ara Kirakosyan, Peter B. Kaufman
  • agriculture
  • alternative energy
  • bioremediation
  • biotechnology
  • genetically modified plants
  • herbal medicine
  • herbal products
  • plant biotechnology
  • transgenic plants

About this book

Authors and affiliations.

Ara Kirakosyan, Peter B. Kaufman

About the authors

Bibliographic information.

Book Title : Recent Advances in Plant Biotechnology

Authors : Ara Kirakosyan, Peter B. Kaufman

DOI : https://doi.org/10.1007/978-1-4419-0194-1

Publisher : Springer New York, NY

eBook Packages : Biomedical and Life Sciences , Biomedical and Life Sciences (R0)

Copyright Information : The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2009

Hardcover ISBN : 978-1-4419-0193-4 Published: 30 July 2009

Softcover ISBN : 978-1-4899-7916-2 Published: 23 August 2016

eBook ISBN : 978-1-4419-0194-1 Published: 15 August 2009

Edition Number : 1

Number of Pages : XIV, 405

Topics : Plant Genetics and Genomics , Plant Sciences

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

From microbes to microbiomes: Applications for plant health and sustainable agriculture

Affiliations.

  • 1 Iowa State University, Plant Pathology & Microbiology, 2213 Pammel Drive, 4005 ATRB, Ames, Iowa, United States, 50011-1101; [email protected].
  • 2 Murdoch University, Food Futures Institute, Murdoch, Western Australia, Australia; [email protected].
  • 3 Oakridge National Laboratory Biosciences Division, Oakridge, Tennessee, United States; [email protected].
  • 4 International Alliance for Phytobiomes Research, Eau Claire, Wisconsin, United States; [email protected].
  • 5 Mosaic Biosciences, Plant City, Florida, United States; [email protected].
  • 6 AIT Austrian Institute of Technology, Health & Bioresources Unit, Tulln, Austria; [email protected].
  • 7 Trace Genomics, Ames, Iowa, United States; [email protected].
  • 8 Eversole Associates, Bethesda, Maryland, United States; [email protected].
  • 9 Eversole Associates, 5207 Wyoming Road, Bethesda, Maryland, United States, 20816; [email protected].
  • 10 International Alliance for Phytobiomes Research Inc, Eau Claire, Wisconsin, United States; [email protected].
  • PMID: 38776137
  • DOI: 10.1094/PHYTO-02-24-0054-KC

Plant-microbe interaction research has had a transformative trajectory, from individual microbial isolate studies to comprehensive analyses of plant microbiomes within the broader phytobiome framework. Acknowledging the indispensable role of plant microbiomes in shaping plant health, agriculture, and ecosystem resilience, we underscore the urgent need for sustainable crop production strategies in the face of contemporary challenges. We discuss how the synergies between advancements in 'omics technologies and artificial intelligence can help advance the profound potential of plant microbiomes. Furthermore, we propose a multifaceted approach encompassing translational considerations, transdisciplinary research initiatives, public-private partnerships, regulatory policy development, and pragmatic expectations for the practical application of plant microbiome knowledge across diverse agricultural landscapes. We advocate for strategic collaboration and intentional transdisciplinary efforts to unlock the benefits offered by plant microbiomes and address pressing global issues in food security. By emphasizing a nuanced understanding of plant microbiome complexities and fostering realistic expectations, we encourage the scientific community to navigate the transformative journey from discoveries in the laboratory to field applications. As companies specializing in agricultural microbes and microbiomes undergo shifts, we highlight the necessity of understanding how to approach sustainable agriculture with site-specific management solutions. While cautioning against over-promising, we underscore the excitement of exploring the many impacts of microbiome-plant interactions. We emphasize the importance of collaborative endeavors with societal partners to accelerate our collective capacity to harness the diverse and yet-to-be-discovered beneficial activities of plant microbiomes.

Keywords: Biological Control; Biotechnology; Data Science; Disease Control and Pest Management; Endophytic Interactions; Microbiome; Modelling; Plant Stress and Abiotic Disorders; Rhizobial Interactions; Systems Biology.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Published: 17 May 2024

A second chance for plant biotechnology in Europe

  • Cormac Sheridan 1  

Nature Biotechnology volume  42 ,  pages 687–689 ( 2024 ) Cite this article

2050 Accesses

28 Altmetric

Metrics details

Europe tilts towards gene-edited plants, but progress could be derailed over who owns the patents.

You have full access to this article via your institution.

On 7 February the European Parliament voted in favor of a legislative proposal to markedly relax rules for certain gene-edited plants. But it also added several amendments to the draft legislation, originally proposed by the European Commission, that, if adopted, would also ban patents for all CRISPR–Cas9-edited plants, a stance likely to discourage companies from investing in new plant products.

research paper on plant biotechnology

The European Union has long history of opposition to genetically modified crops, but CRISPR and other genome editing technologies have prompted a rethink of the rules. A genetically modified plant or organism is obtained by inserting genetic material from another species using genetic engineering, in a way that does not occur in nature, whereas genome editing is a technology that can introduce desired traits — increased yield, improved resistance to pests, climate resilient, long shelf-life — by introducing modifications indistinguishable from those that could have happened naturally or by selective breeding. In the United States, such CRISPR-edited crops have been cultivated and sold without oversight since 2016. Globally, only a small number of gene-edited plants are available at present, but that is set to change dramatically over the next decade.

The European Parliament’s support for gene-edited plants, however qualified, is noteworthy. “Getting through this part of the legislative process was not necessarily expected by many,” says Garlich von Essen, secretary general of Euroseeds, a Brussels-based lobby group for the seed sector. The vote reflects a consensus among public scientific institutions, industry and farmers on the need for reform, he says. The patent-related hurdles could be interpreted as a ploy on the part of those who remain staunchly opposed to commercial plant biotechnology to “split the pack,” says von Essen

But patent revisions could complicate the picture considerably. They could result in “different patent rules for NGTs [new genomic technologies], for GMOs and for conventionally produced plants.” says Mathijs Vleugel, scientific policy officer at Berlin-based All European Academies (ALLEA), an umbrella body for European academies of sciences and learned bodies. Another concern, von Essen says, is that adoption of gene-edited plants should not lead to market domination by a small number of large multinational firms, as was the case with transgenic crops. “Then the question is, how do you do this in practice?”

The push to revamp European Union plant rules follows a controversial judgment in 2018, which essentially ruled that all gene-edited plants are considered genetically modified organisms (GMOs). The European Union has been notoriously unfertile ground for transgenic crops, which have been widely adopted by the United States, Brazil, Argentina, Canada and India, among other countries, over the past 25 years. European countries import large volumes of genetically modified crops as food ingredients and animal feed, but their cultivation is limited. Just one, Leverkusen, Germany-based Bayer’s insect-resistant maize strain MON 810, is authorized, but less than 70,000 hectares were planted in Spain and Portugal in 2022, which represents a tiny fraction of its global production.

Gene editing represents a second chance for plant biotechnology in Europe. Precise gene editing methods, such as CRISPR–Cas, zinc finger nucleases, transcription activator-like effector nucleases (TALENs) and oligonucleotide-directed mutagenesis — the European Union refers to them collectively as ‘new genomic techniques’ (NGTs) — can alter important traits such as nutritional profile, resistance to stress, and yield, without introducing foreign DNA.

Early examples of gene-edited plants include the Sicilian Rouge tomato ( Solanum lycopersicum ), from Tokyo-based Sanatech Life Science. It produces high levels of γ-aminobutyric acid (GABA), which the company claims can help lower blood pressure, by inserting a stop codon that interferes with expression of the autoinhibitory domain of the SIGAD3 gene, which encodes the enzyme that catalyzes glutamate-to-GABA conversion in the plant. In the United States, where gene-edited plants that resemble conventionally bred counterparts do not require approval, Pairwise is ready to market its Conscious Greens mustard leaves ( Brassica juncea ), which carry CRISPR–Cas12a-induced knockouts of the type I myrosinase multigene family. Myrosinase enzymes normally hydrolyze sinigrin, and the resulting breakdown products give rise to the pungent ‘mustard bomb’ effect that occurs when the plant is eaten in large quantities. The company is now seeking partners to commercialize the milder-tasting product, which is more nutritious than many other types of salad leaves.

Even if only a few gene-edited plants are commercially available at present, the global development pipeline is large, as many countries have adopted liberal regulatory regimes with either no special rules or just minimal regulations for gene-edited plants that do not contain foreign DNA. The list of states embracing gene-edited plants extends well beyond the main adopters of GMOs.

“Latin America is clearly a leader here,” says Dan Jenkins, vice president, regulatory and government affairs at Pairwise. Argentina was an early mover: it put a system in place in 2015, which, according to one analysis , has helped to foster a diverse group of innovators, led by small and medium-sized enterprises and public research institutes. Chile, Brazil, Colombia, Paraguay, Honduras, Guatemala and El Salvador all followed between 2017 and 2019. More recently, Costa Rica has also eased its regulatory requirements, and a gene-edited banana resistant to the fungal diseases sigatoka and fusarium wilt (caused by Mycosphaerella fijiensis and Fusarium oxysporum , respectively) may become available later this year.

In Africa , Nigeria, Kenya and Malawi acted early. China, a late adopter of GMO crops, has already approved a gene-edited soybean, which Shandong-based Shandong Shunfeng Biotechnology developed. The gene-edited soybean produces high levels of oleic acid, a monounsaturated fatty acid that may lower the risk of coronary heart disease. ( Calyxt , now part of Cibus, introduced a TALEN-edited high-oleic-acid soybean to the United States in 2019.) The United Kingdom has also enacted new legislation to allow “precision-bred organisms” that do not contain foreign DNA.

In Europe, some plant biotechnologists are forging ahead with their development plans despite the regulatory uncertainty over patents (Table 1 ). But it is difficult to see how companies can build a viable commercial market without patent protection for the traits they have introduced to their target plants.

In the absence of patents, the European parliament has proposed that the existing Community Plant Variety Rights (CPVR) system, which has long been in place for conventionally bred varieties, would provide sufficient intellectual property (IP) protection. But this includes a ‘breeder’s rights’ provision, which would allow any rival breeder free access to a given innovation once it became commercially available. “You lose the incentive to develop the trait because you cannot capture the value,” says Mario Caccamo, CEO of NIAB, a Cambridge, UK-based not-for-profit crop science organization. Jenkins concurs: “I just don’t know why somebody would invest in our company if, as soon as we go on the market, somebody else can take that trait.”

What’s more, the CPVR system requires breeders to demonstrate the “distinctness, uniformity, stability, and novelty” of a given variety. That is not easily done. “It takes a long time and quite a lot of money,” says Heinz Müller, an IP expert and emeritus professor of biochemistry at the University of Basel, in Switzerland, who is co-chair of ALLEA’s task force on IP and NGTs. That provision is more appropriate to a commercial variety to be planted at scale rather than a single trait that a developer would aim to distribute across multiple varieties.

In the absence of patent protection, gene-editing companies could instead opt for trade secrets, which would lead to less transparency, says Caccamo. Moreover, banning patents from certain forms of innovation would, says Müller, “go against some of the international agreements” on IP, such as the longstanding Paris Convention for the Protection of Industrial Property, administered by the World Intellectual Property Organization, and the World Trade Organization’s Agreement on Trade-Related Aspects of Intellectual Property Rights, or TRIPS.

Ultimately, the legislation will be shaped by a three-way negotiation among the institutions that make up the European Union. This will take time, and new elections to the European Parliament in June will probably further delay the process. Even so, the parliamentary vote has added real momentum to the legislative initiative. The European Union’s scientific credibility could suffer if it fails to pass legislation that gives its agbiotech sector an opportunity to embrace this innovation in the coming decades.

Climate change is another consideration. The extent to which gene editing — of plants or livestock — can help reduce the impact of agriculture on greenhouse gas emissions and deforestation and at the same time mitigate the effects of climate change on food production is an open question. But those considerations will loom large as the European Union’s institutions hammer out a compromise in the coming months. “Not everybody will be happy. Not everybody will be equally happy,” says von Essen. But for plant breeders grappling with the vagaries of climate change, legislative reform will buy them precious time, as it will accelerate their efforts to introduce useful traits to their germplasm.

The timetable leading to the final legislation is still unclear. The Commission’s proposal had yet to receive an official response from the European Council, as Nature Biotechnology went to press.

Author information

Authors and affiliations.

Dublin, Ireland

Cormac Sheridan

You can also search for this author in PubMed   Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article.

Sheridan, C. A second chance for plant biotechnology in Europe. Nat Biotechnol 42 , 687–689 (2024). https://doi.org/10.1038/s41587-024-02246-8

Download citation

Published : 17 May 2024

Issue Date : May 2024

DOI : https://doi.org/10.1038/s41587-024-02246-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

research paper on plant biotechnology

Loading metrics

Open Access

Peer-reviewed

Meta-Research Article

Meta-Research Articles feature data-driven examinations of the methods, reporting, verification, and evaluation of scientific research.

See Journal Information »

Assessing the evolution of research topics in a biological field using plant science as an example

Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

* E-mail: [email protected]

Affiliations Department of Plant Biology, Michigan State University, East Lansing, Michigan, United States of America, Department of Computational Mathematics, Science, and Engineering, Michigan State University, East Lansing, Michigan, United States of America, DOE-Great Lake Bioenergy Research Center, Michigan State University, East Lansing, Michigan, United States of America

ORCID logo

Roles Conceptualization, Investigation, Project administration, Supervision, Writing – review & editing

Affiliation Department of Plant Biology, Michigan State University, East Lansing, Michigan, United States of America

  • Shin-Han Shiu, 
  • Melissa D. Lehti-Shiu

PLOS

  • Published: May 23, 2024
  • https://doi.org/10.1371/journal.pbio.3002612
  • Peer Review
  • Reader Comments

Fig 1

Scientific advances due to conceptual or technological innovations can be revealed by examining how research topics have evolved. But such topical evolution is difficult to uncover and quantify because of the large body of literature and the need for expert knowledge in a wide range of areas in a field. Using plant biology as an example, we used machine learning and language models to classify plant science citations into topics representing interconnected, evolving subfields. The changes in prevalence of topical records over the last 50 years reflect shifts in major research trends and recent radiation of new topics, as well as turnover of model species and vastly different plant science research trajectories among countries. Our approaches readily summarize the topical diversity and evolution of a scientific field with hundreds of thousands of relevant papers, and they can be applied broadly to other fields.

Citation: Shiu S-H, Lehti-Shiu MD (2024) Assessing the evolution of research topics in a biological field using plant science as an example. PLoS Biol 22(5): e3002612. https://doi.org/10.1371/journal.pbio.3002612

Academic Editor: Ulrich Dirnagl, Charite Universitatsmedizin Berlin, GERMANY

Received: October 16, 2023; Accepted: April 4, 2024; Published: May 23, 2024

Copyright: © 2024 Shiu, Lehti-Shiu. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The plant science corpus data are available through Zenodo ( https://zenodo.org/records/10022686 ). The codes for the entire project are available through GitHub ( https://github.com/ShiuLab/plant_sci_hist ) and Zenodo ( https://doi.org/10.5281/zenodo.10894387 ).

Funding: This work was supported by the National Science Foundation (IOS-2107215 and MCB-2210431 to MDL and SHS; DGE-1828149 and IOS-2218206 to SHS), Department of Energy grant Great Lakes Bioenergy Research Center (DE-SC0018409 to SHS). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Abbreviations: BERT, Bidirectional Encoder Representations from Transformers; br, brassinosteroid; ccTLD, country code Top Level Domain; c-Tf-Idf, class-based Tf-Idf; ChatGPT, Chat Generative Pretrained Transformer; ga, gibberellic acid; LOWESS, locally weighted scatterplot smoothing; MeSH, Medical Subject Heading; SHAP, SHapley Additive exPlanations; SJR, SCImago Journal Rank; Tf-Idf, Term frequency-Inverse document frequency; UMAP, Uniform Manifold Approximation and Projection

Introduction

The explosive growth of scientific data in recent years has been accompanied by a rapidly increasing volume of literature. These records represent a major component of our scientific knowledge and embody the history of conceptual and technological advances in various fields over time. Our ability to wade through these records is important for identifying relevant literature for specific topics, a crucial practice of any scientific pursuit [ 1 ]. Classifying the large body of literature into topics can provide a useful means to identify relevant literature. In addition, these topics offer an opportunity to assess how scientific fields have evolved and when major shifts in took place. However, such classification is challenging because the relevant articles in any topic or domain can number in the tens or hundreds of thousands, and the literature is in the form of natural language, which takes substantial effort and expertise to process [ 2 , 3 ]. In addition, even if one could digest all literature in a field, it would still be difficult to quantify such knowledge.

In the last several years, there has been a quantum leap in natural language processing approaches due to the feasibility of building complex deep learning models with highly flexible architectures [ 4 , 5 ]. The development of large language models such as Bidirectional Encoder Representations from Transformers (BERT; [ 6 ]) and Chat Generative Pretrained Transformer (ChatGPT; [ 7 ]) has enabled the analysis, generation, and modeling of natural language texts in a wide range of applications. The success of these applications is, in large part, due to the feasibility of considering how the same words are used in different contexts when modeling natural language [ 6 ]. One such application is topic modeling, the practice of establishing statistical models of semantic structures underlying a document collection. Topic modeling has been proposed for identifying scientific hot topics over time [ 1 ], for example, in synthetic biology [ 8 ], and it has also been applied to, for example, automatically identify topical scenes in images [ 9 ] and social network topics [ 10 ], discover gene programs highly correlated with cancer prognosis [ 11 ], capture “chromatin topics” that define cell-type differences [ 12 ], and investigate relationships between genetic variants and disease risk [ 13 ]. Here, we use topic modeling to ask how research topics in a scientific field have evolved and what major changes in the research trends have taken place, using plant science as an example.

Plant science corpora allow classification of major research topics

Plant science, broadly defined, is the study of photosynthetic species, their interactions with biotic/abiotic environments, and their applications. For modeling plant science topical evolution, we first identified a collection of plant science documents (i.e., corpus) using a text classification approach. To this end, we first collected over 30 million PubMed records and narrowed down candidate plant science records by searching for those with plant-related terms and taxon names (see Materials and methods ). Because there remained a substantial number of false positives (i.e., biomedical records mentioning plants in passing), a set of positive plant science examples from the 17 plant science journals with the highest numbers of plant science publications covering a wide range of subfields and a set of negative examples from journals with few candidate plant science records were used to train 4 types of text classification models (see Materials and methods ). The best text classification model performed well (F1 = 0.96, F1 of a naïve model = 0.5, perfect model = 1) where the positive and negative examples were clearly separated from each other based on prediction probability of the hold-out testing dataset (false negative rate = 2.6%, false positive rate = 5.2%, S1A and S1B Fig ). The false prediction rate for documents from the 17 plant science journals annotated with the Medical Subject Heading (MeSH) term “Plants” in NCBI was 11.7% (see Materials and methods ). The prediction probability distribution of positive instances with the MeSH term has an expected left-skew to lower values ( S1C Fig ) compared with the distributions of all positive instances ( S1A Fig ). Thus, this subset with the MeSH term is a skewed representation of articles from these 17 major plant science journals. To further benchmark the validity of the plant science records, we also conducted manual annotation of 100 records where the false positive and false negative rates were 14.6% and 10.6%, respectively (see Materials and methods ). Using 12 other plant science journals not included as positive examples as benchmarks, the false negative rate was 9.9% (see Materials and methods ). Considering the range of false prediction rate estimates with different benchmarks, we should emphasize that the model built with the top 17 plant science journals represents a substantial fraction of plant science publications but with biases. Applying the model to the candidate plant science record led to 421,658 positive predictions, hereafter referred to as “plant science records” ( S1D Fig and S1 Data ).

To better understand how the models classified plant science articles, we identified important terms from a more easily interpretable model (Term frequency-Inverse document frequency (Tf-Idf) model; F1 = 0.934) using Shapley Additive Explanations [ 14 ]; 136 terms contributed to predicting plant science records (e.g., Arabidopsis, xylem, seedling) and 138 terms contributed to non-plant science record predictions (e.g., patients, clinical, mice; Tf-Idf feature sheet, S1 Data ). Plant science records as well as PubMed articles grew exponentially from 1950 to 2020 ( Fig 1A ), highlighting the challenges of digesting the rapidly expanding literature. We used the plant science records to perform topic modeling, which consisted of 4 steps: representing each record as a BERT embedding, reducing dimensionality, clustering, and identifying the top terms by calculating class (i.e., topic)-based Tf-Idf (c-Tf-Idf; [ 15 ]). The c-Tf-Idf represents the frequency of a term in the context of how rare the term is to reduce the influence of common words. SciBERT [ 16 ] was the best model among those tested ( S2 Data ) and was used for building the final topic model, which classified 372,430 (88.3%) records into 90 topics defined by distinct combinations of terms ( S3 Data ). The topics contained 620 to 16,183 records and were named after the top 4 to 5 terms defining the topical areas ( Fig 1B and S3 Data ). For example, the top 5 terms representing the largest topic, topic 61 (16,183 records), are “qtl,” “resistance,” “wheat,” “markers,” and “traits,” which represent crop improvement studies using quantitative genetics.

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

(A) Numbers of PubMed (magenta) and plant science (green) records between 1950 and 2020. (a, b, c) Coefficients of the exponential function, y = ae b . Data for the plot are in S1 Data . (B) Numbers of documents for the top 30 plant science topics. Each topic is designated by an index number (left) and the top 4–6 terms with the highest cTf-Idf values (right). Data for the plot are in S3 Data . (C) Two-dimensional representation of the relationships between plant science records generated by Uniform Manifold Approximation and Projection (UMAP, [ 17 ]) using SciBERT embeddings of plant science records. All topics panel: Different topics are assigned different colors. Outlier panel: UMAP representation of all records (gray) with outlier records in red. Blue dotted circles: areas with relatively high densities indicating topics that are below the threshold for inclusion in a topic. In the 8 UMAP representations on the right, records for example topics are in red and the remaining records in gray. Blue dotted circles indicate the relative position of topic 48.

https://doi.org/10.1371/journal.pbio.3002612.g001

Records with assigned topics clustered into distinct areas in a two-dimensional (2D) space ( Fig 1C , for all topics, see S4 Data ). The remaining 49,228 outlier records not assigned to any topic (11.7%, middle panel, Fig 1C ) have 3 potential sources. First, some outliers likely belong to unique topics but have fewer records than the threshold (>500, blue dotted circles, Fig 1C ). Second, some of the many outliers dispersed within the 2D space ( Fig 1C ) were not assigned to any single topic because they had relatively high prediction scores for multiple topics ( S2 Fig ). These likely represent studies across subdisciplines in plant science. Third, some outliers are likely interdisciplinary studies between plant science and other domains, such as chemistry, mathematics, and physics. Such connections can only be revealed if records from other domains are included in the analyses.

Topical clusters reveal closely related topics but with distinct key term usage

Related topics tend to be located close together in the 2D representation (e.g., topics 48 and 49, Fig 1C ). We further assessed intertopical relationships by determining the cosine similarities between topics using cTf-Idfs ( Figs 2A and S3 ). In this topic network, some topics are closely related and form topic clusters. For example, topics 25, 26, and 27 collectively represent a more general topic related to the field of plant development (cluster a , lower left in Fig 2A ). Other topic clusters represent studies of stress, ion transport, and heavy metals ( b ); photosynthesis, water, and UV-B ( c ); population and community biology (d); genomics, genetic mapping, and phylogenetics ( e , upper right); and enzyme biochemistry ( f , upper left in Fig 2A ).

thumbnail

(A) Graph depicting the degrees of similarity (edges) between topics (nodes). Between each topic pair, a cosine similarity value was calculated using the cTf-Idf values of all terms. A threshold similarity of 0.6 was applied to illustrate the most related topics. For the full matrix presented as a heatmap, see S4 Fig . The nodes are labeled with topic index numbers and the top 4–6 terms. The colors and width of the edges are defined based on cosine similarity. Example topic clusters are highlighted in yellow and labeled a through f (blue boxes). (B, C) Relationships between the cTf-Idf values (see S3 Data ) of the top terms for topics 26 and 27 (B) and for topics 25 and 27 (C) . Only terms with cTf-Idf ≥ 0.6 are labeled. Terms with cTf-Idf values beyond the x and y axis limit are indicated by pink arrows and cTf-Idf values. (D) The 2D representation in Fig 1C is partitioned into graphs for different years, and example plots for every 5-year period since 1975 are shown. Example topics discussed in the text are indicated. Blue arrows connect the areas occupied by records of example topics across time periods to indicate changes in document frequencies.

https://doi.org/10.1371/journal.pbio.3002612.g002

Topics differed in how well they were connected to each other, reflecting how general the research interests or needs are (see Materials and methods ). For example, topic 24 (stress mechanisms) is the most well connected with median cosine similarity = 0.36, potentially because researchers in many subfields consider aspects of plant stress even though it is not the focus. The least connected topics include topic 21 (clock biology, 0.12), which is surprising because of the importance of clocks in essentially all aspects of plant biology [ 18 ]. This may be attributed, in part, to the relatively recent attention in this area.

Examining topical relationships and the cTf-Idf values of terms also revealed how related topics differ. For example, topic 26 is closely related to topics 27 and 25 (cluster a on the lower left of Fig 2A ). Topics 26 and 27 both contain records of developmental process studies mainly in Arabidopsis ( Fig 2B ); however, topic 26 is focused on the impact of light, photoreceptors, and hormones such as gibberellic acids (ga) and brassinosteroids (br), whereas topic 27 is focused on flowering and floral development. Topic 25 is also focused on plant development but differs from topic 27 because it contains records of studies mainly focusing on signaling and auxin with less emphasis on Arabidopsis ( Fig 2C ). These examples also highlight the importance of using multiple top terms to represent the topics. The similarities in cTf-Idfs between topics were also useful for measuring the editorial scope (i.e., diverse, or narrow) of journals publishing plant science papers using a relative topic diversity measure (see Materials and methods ). For example, Proceedings of the National Academy of Sciences , USA has the highest diversity, while Theoretical and Applied Genetics has the lowest ( S4 Fig ). One surprise is the relatively low diversity of American Journal of Botany , which focuses on plant ecology, systematics, development, and genetics. The low diversity is likely due to the relatively larger number of cellular and molecular science records in PubMed, consistent with the identification of relatively few topical areas relevant to studies at the organismal, population, community, and ecosystem levels.

Investigation of the relative prevalence of topics over time reveals topical succession

We next asked whether relationships between topics reflect chronological progression of certain subfields. To address this, we assessed how prevalent topics were over time using dynamic topic modeling [ 19 ]. As shown in Fig 2D , there is substantial fluctuation in where the records are in the 2D space over time. For example, topic 44 (light, leaves, co, synthesis, photosynthesis) is among the topics that existed in 1975 but has diminished gradually since. In 1985, topic 39 (Agrobacterium-based transformation) became dense enough to be visualized. Additional examples include topics 79 (soil heavy metals), 42 (differential expression), and 82 (bacterial community metagenomics), which became prominent in approximately 2005, 2010, and 2020, respectively ( Fig 2D ). In addition, animating the document occupancy in the 2D space over time revealed a broad change in patterns over time: Some initially dense areas became sparse over time and a large number of topics in areas previously only loosely occupied at the turn of the century increased over time ( S5 Data ).

While the 2D representations reveal substantial details on the evolution of topics, comparison over time is challenging because the number of plant science records has grown exponentially ( Fig 1A ). To address this, the records were divided into 50 chronological bins each with approximately 8,400 records to make cross-bin comparisons feasible ( S6 Data ). We should emphasize that, because of the way the chronological bins were split, the number of records for each topic in each bin should be treated as a normalized value relative to all other topics during the same period. Examining this relative prevalence of topics across bins revealed a clear pattern of topic succession over time (one topic evolved into another) and the presence of 5 topical categories ( Fig 3 ). The topics were categorized based on their locally weighted scatterplot smoothing (LOWESS) fits and ordered according to timing of peak frequency ( S7 and S8 Data , see Materials and methods ). In Fig 3 , the relative decrease in document frequency does not mean that research output in a topic is dwindling. Because each row in the heatmap is normalized based on the minimum and maximum values within each topic, there still can be substantial research output in terms of numbers of publications even when the relative frequency is near zero. Thus, a reduced relative frequency of a topic reflects only a below-average growth rate compared with other topical areas.

thumbnail

(A-E) A heat map of relative topic frequency over time reveals 5 topical categories: (A) stable, (B) early, (C) transitional, (D) sigmoidal, and (E) rising. The x axis denotes different time bins with each bin containing a similar number of documents to account for the exponential growth of plant science records over time. The sizes of all bins except the first are drawn to scale based on the beginning and end dates. The y axis lists different topics denoted by the label and top 4 to 5 terms. In each cell, the prevalence of a topic in a time bin is colored according to the min-max normalized cTf-Idf values for that topic. Light blue dotted lines delineate different decades. The arrows left of a subset of topic labels indicate example relationships between topics in topic clusters. Blue boxes with labels a–f indicate topic clusters, which are the same as those in Fig 2 . Connecting lines indicate successional trends. Yellow circles/lines 1 – 3: 3 major transition patterns. The original data are in S5 Data .

https://doi.org/10.1371/journal.pbio.3002612.g003

The first topical category is a stable category with 7 topics mostly established before the 1980s that have since remained stable in terms of prevalence in the plant science records (top of Fig 3A ). These topics represent long-standing plant science research foci, including studies of plant physiology (topics 4, 58, and 81), genetics (topic 61), and medicinal plants (topic 53). The second category contains 8 topics established before the 1980s that have mostly decreased in prevalence since (the early category, Fig 3B ). Two examples are physiological and morphological studies of hormone action (topic 45, the second in the early category) and the characterization of protein, DNA, and RNA (topic 18, the second to last). Unlike other early topics, topic 78 (paleobotany and plant evolution studies, the last topic in Fig 3B ) experienced a resurgence in the early 2000s due to the development of new approaches and databases and changes in research foci [ 20 ].

The 33 topics in the third, transitional category became prominent in the 1980s, 1990s, or even 2000s but have clearly decreased in prevalence ( Fig 3C ). In some cases, the early and the transitional topics became less prevalent because of topical succession—refocusing of earlier topics led to newer ones that either show no clear sign of decrease (the sigmoidal category, Fig 3D ) or continue to increase in prevalence (the rising category, Fig 3E ). Consistent with the notion of topical succession, topics within each topic cluster ( Fig 2 ) were found across topic categories and/or were prominent at different time periods (indicated by colored lines linking topics, Fig 3 ). One example is topics in topic cluster b (connected with light green lines and arrows, compare Figs 2 and 3 ); the study of cation transport (topic 47, the third in the transitional category), prominent in the 1980s and early 1990s, is connected to 5 other topics, namely, another transitional topic 29 (cation channels and their expression) peaking in the 2000s and early 2010s, sigmoidal topics 24 and 28 (stress response, tolerance mechanisms) and 30 (heavy metal transport), which rose to prominence in mid-2000s, and the rising topic 42 (stress transcriptomic studies), which increased in prevalence in the mid-2010s.

The rise and fall of topics can be due to a combination of technological or conceptual breakthroughs, maturity of the field, funding constraints, or publicity. The study of transposable elements (topic 62) illustrates the effect of publicity; the rise in this field coincided with Barbara McClintock’s 1983 Nobel Prize but not with the publication of her studies in the 1950s [ 21 ]. The reduced prevalence in early 2000 likely occurred in part because analysis of transposons became a central component of genome sequencing and annotation studies, rather than dedicated studies. In addition, this example indicates that our approaches, while capable of capturing topical trends, cannot be used to directly infer major papers leading to the growth of a topic.

Three major topical transition patterns signify shifts in research trends

Beyond the succession of specific topics, 3 major transitions in the dynamic topic graph should be emphasized: (1) the relative decreasing trend of early topics in the late 1970s and early 1980s; (2) the rise of transitional topics in late 1980s; and (3) the relative decreasing trend of transitional topics in the late 1990s and early 2000s, which coincided with a radiation of sigmoidal and rising topics (yellow circles, Fig 3 ). The large numbers of topics involved in these transitions suggest major shifts in plant science research. In transition 1, early topics decreased in relative prevalence in the late 1970s to early 1980s, which coincided with the rise of transitional topics over the following decades (circle 1, Fig 3 ). For example, there was a shift from the study of purified proteins such as enzymes (early topic 48, S5A Fig ) to molecular genetic dissection of genes, proteins, and RNA (transitional topic 35, S5B Fig ) enabled by the wider adoption of recombinant DNA and molecular cloning technologies in late 1970s [ 22 ]. Transition 2 (circle 2, Fig 3 ) can be explained by the following breakthroughs in the late 1980s: better approaches to create transgenic plants and insertional mutants [ 23 ], more efficient creation of mutant plant libraries through chemical mutagenesis (e.g., [ 24 ]), and availability of gene reporter systems such as β-glucuronidase [ 25 ]. Because of these breakthroughs, molecular genetics studies shifted away from understanding the basic machinery to understanding the molecular underpinnings of specific processes, such as molecular mechanisms of flower and meristem development and the action of hormones such as auxin (topic 27, S5C Fig ); this type of research was discussed as a future trend in 1988 [ 26 ] and remains prevalent to this date. Another example is gene silencing (topic 12), which became a focal area of study along with the widespread use of transgenic plants [ 27 ].

Transition 3 is the most drastic: A large number of transitional, sigmoidal, and rising topics became prevalent nearly simultaneously at the turn of the century (circle 3, Fig 3 ). This period also coincides with a rapid increase in plant science citations ( Fig 1A ). The most notable breakthroughs included the availability of the first plant genome in 2000 [ 28 ], increasing ease and reduced cost of high-throughput sequencing [ 29 ], development of new mass spectrometry–based platforms for analyzing proteins [ 30 ], and advancements in microscopic and optical imaging approaches [ 31 ]. Advances in genomics and omics technology also led to an increase in stress transcriptomics studies (42, S5D Fig ) as well as studies in many other topics such as epigenetics (topic 11), noncoding RNA analysis (13), genomics and phylogenetics (80), breeding (41), genome sequencing and assembly (60), gene family analysis (23), and metagenomics (82 and 55).

In addition to the 3 major transitions across all topics, there were also transitions within topics revealed by examining the top terms for different time bins (heatmaps, S5 Fig ). Taken together, these observations demonstrate that knowledge about topical evolution can be readily revealed through topic modeling. Such knowledge is typically only available to experts in specific areas and is difficult to summarize manually, as no researcher has a command of the entire plant science literature.

Analysis of taxa studied reveals changes in research trends

Changes in research trends can also be illustrated by examining changes in the taxa being studied over time ( S9 Data ). There is a strong bias in the taxa studied, with the record dominated by research models and economically important taxa ( S6 Fig ). Flowering plants (Magnoliopsida) are found in 93% of records ( S6A Fig ), and the mustard family Brassicaceae dominates at the family level ( S6B Fig ) because the genus Arabidopsis contributes to 13% of plant science records ( Fig 4A ). When examining the prevalence of taxa being studied over time, clear patterns of turnover emerged similar to topical succession ( Figs 4B , S6C, and S6D ; Materials and methods ). Given that Arabidopsis is mentioned in more publications than other species we analyzed, we further examined the trends for Arabidopsis publications. The increase in the normalized number (i.e., relative to the entire plant science corpus) of Arabidopsis records coincided with advocacy of its use as a model system in the late 1980s [ 32 ]. While it remains a major plant model, there has been a decrease in overall Arabidopsis publications relative to all other plant science publications since 2011 (blue line, normalized total, Fig 4C ). Because the same chronological bins, each with same numbers of records, from the topic-over-time analysis ( Fig 3 ) were used, the decrease here does not mean that there were fewer Arabidopsis publications—in fact, the number of Arabidopsis papers has remained steady since 2011. This decrease means that Arabidopsis-related publications represent a relatively smaller proportion of plant science records. Interestingly, this decrease took place much earlier (approximately 2005) and was steeper in the United States (red line, Fig 4C ) than in all countries combined (blue line, Fig 4C ).

thumbnail

(A) Percentage of records mentioning specific genera. (B) Change in the prevalence of genera in plant science records over time. (C) Changes in the normalized numbers of all records (blue) and records from the US (red) mentioning Arabidopsis over time. The lines are LOWESS fits with fraction parameter = 0.2. (D) Topical over (red) and under (blue) representation among 5 genera with the most plant science records. LLR: log 2 likelihood ratios of each topic in each genus. Gray: topic-species combination not significantly enriched at the 5% level based on enrichment p -values adjusted for multiple testing with the Benjamini–Hochberg method [ 33 ]. The data used for plotting are in S9 Data . The statistics for all topics are in S10 Data .

https://doi.org/10.1371/journal.pbio.3002612.g004

Assuming that the normalized number of publications reflects the relative intensity of research activities, one hypothesis for the relative decrease in focus on Arabidopsis is that advances in, for example, plant transformation, genetic manipulation, and genome research have allowed the adoption of more previously nonmodel taxa. Consistent with this, there was a precipitous increase in the number of genera being published in the mid-90s to early 2000s during which approaches for plant transgenics became established [ 34 ], but the number has remained steady since then ( S7A Fig ). The decrease in the proportion of Arabidopsis papers is also negatively correlated with the timing of an increase in the number of draft genomes ( S7B Fig and S9 Data ). It is plausible that genome availability for other species may have contributed to a shift away from Arabidopsis. Strikingly, when we analyzed US National Science Foundation records, we found that the numbers of funded grants mentioning Arabidopsis ( S7C Fig ) have risen and fallen in near perfect synchrony with the normalized number of Arabidopsis publication records (red line, Fig 4C ). This finding likely illustrates the impact of funding on Arabidopsis research.

By considering both taxa information and research topics, we can identify clear differences in the topical areas preferred by researchers using different plant taxa ( Fig 4D and S10 Data ). For example, studies of auxin/light signaling, the circadian clock, and flowering tend to be carried out in Arabidopsis, while quantitative genetic studies of disease resistance tend to be done in wheat and rice, glyphosate research in soybean, and RNA virus research in tobacco. Taken together, joint analyses of topics and species revealed additional details about changes in preferred models over time, and the preferred topical areas for different taxa.

Countries differ in their contributions to plant science and topical preference

We next investigated whether there were geographical differences in topical preference among countries by inferring country information from 330,187 records (see Materials and methods ). The 10 countries with the most records account for 73% of the total, with China and the US contributing to approximately 18% each ( Fig 5A ). The exponential growth in plant science records (green line, Fig 1A ) was in large part due to the rapid rise in annual record numbers in China and India ( Fig 5B ). When we examined the publication growth rates using the top 17 plant science journals, the general patterns remained the same ( S7D Fig ). On the other hand, the US, Japan, Germany, France, and Great Britain had slower rates of growth compared with all non-top 10 countries. The rapid increase in records from China and India was accompanied by a rapid increase in metrics measuring journal impact ( Figs 5C and S8 and S9 Data ). For example, using citation score ( Fig 5C , see Materials and methods ), we found that during a 22-year period China (dark green) and India (light green) rapidly approached the global average (y = 0, yellow), whereas some of the other top 10 countries, particularly the US (red) and Japan (yellow green), showed signs of decrease ( Fig 5C ). It remains to be determined whether these geographical trends reflect changes in priority, investment, and/or interest in plant science research.

thumbnail

(A) Numbers of plant science records for countries with the 10 highest numbers. (B) Percentage of all records from each of the top 10 countries from 1980 to 2020. (C) Difference in citation scores from 1999 to 2020 for the top 10 countries. (D) Shown for each country is the relationship between the citation scores averaged from 1999 to 2020 and the slope of linear fit with year as the predictive variable and citation score as the response variable. The countries with >400 records and with <10% missing impact values are included. Data used for plots (A–D) are in S11 Data . (E) Correlation in topic enrichment scores between the top 10 countries. PCC, Pearson’s correlation coefficient, positive in red, negative in blue. Yellow rectangle: countries with more similar topical preferences. (F) Enrichment scores (LLR, log likelihood ratio) of selected topics among the top 10 countries. Red: overrepresentation, blue: underrepresentation. Gray: topic-country combination that is not significantly enriched at the 5% level based on enrichment p -values adjusted for multiple testing with the Benjamini–Hochberg method (for all topics and plotting data, see S12 Data ).

https://doi.org/10.1371/journal.pbio.3002612.g005

Interestingly, the relative growth/decline in citation scores over time (measured as the slope of linear fit of year versus citation score) was significantly and negatively correlated with average citation score ( Fig 5D ); i.e., countries with lower overall metrics tended to experience the strongest increase in citation scores over time. Thus, countries that did not originally have a strong influence on plant sciences now have increased impact. These patterns were also observed when using H-index or journal rank as metrics ( S8 Fig and S11 Data ) and were not due to increased publication volume, as the metrics were normalized against numbers of records from each country (see Materials and methods ). In addition, the fact that different metrics with different caveats and assumptions yielded consistent conclusions indicates the robustness of our observations. We hypothesize that this may be a consequence of the ease in scientific communication among geographically isolated research groups. It could also be because of the prevalence of online journals that are open access, which makes scientific information more readily accessible. Or it can be due to the increasing international collaboration. In any case, the causes for such regression toward the mean are not immediately clear and should be addressed in future studies.

We also assessed how the plant research foci of countries differ by comparing topical preference (i.e., the degree of enrichment of plant science records in different topics) between countries. For example, Italy and Spain cluster together (yellow rectangle, Fig 5E ) partly because of similar research focusing on allergens (topic 0) and mycotoxins (topic 54) and less emphasis on gene family (topic 23) and stress tolerance (topic 28) studies ( Fig 5F , for the fold enrichment and corrected p -values of all topics, see S12 Data ). There are substantial differences in topical focus between countries ( S9 Fig ). For example, research on new plant compounds associated with herbal medicine (topic 69) is a focus in China but not in the US, but the opposite is true for population genetics and evolution (topic 86) ( Fig 5F ). In addition to revealing how plant science research has evolved over time, topic modeling provides additional insights into differences in research foci among different countries, which are informative for science policy considerations.

In this study, topic modeling revealed clear transitions among research topics, which represent shifts in research trends in plant sciences. One limitation of our study is the bias in the PubMed-based corpus. The cellular, molecular, and physiological aspects of plant sciences are well represented, but there are many fewer records related to evolution, ecology, and systematics. Our use of titles/abstracts from the top 17 plant science journals as positive examples allowed us to identify papers we typically see in these journals, but this may have led to us missing “outlier” articles, which may be the most exciting. Another limitation is the need to assign only one topic to a record when a study is interdisciplinary and straddles multiple topics. Furthermore, a limited number of large, inherently heterogeneous topics were summarized to provide a more concise interpretation, which undoubtedly underrepresents the diversity of plant science research. Despite these limitations, dynamic topic modeling revealed changes in plant science research trends that coincide with major shifts in biological science. While we were interested in identifying conceptual advances, our approach can identify the trend but the underlying causes for such trends, particularly key records leading to the growth in certain topics, still need to be identified. It also remains to be determined which changes in research trends lead to paradigm shifts as defined by Kuhn [ 35 ].

The key terms defining the topics frequently describe various technologies (e.g., topic 38/39: transformation, 40: genome editing, 59: genetic markers, 65: mass spectrometry, 69: nuclear magnetic resonance) or are indicative of studies enabled through molecular genetics and omics technologies (e.g., topic 8/60: genome, 11: epigenetic modifications, 18: molecular biological studies of macromolecules, 13: small RNAs, 61: quantitative genetics, 82/84: metagenomics). Thus, this analysis highlights how technological innovation, particularly in the realm of omics, has contributed to a substantial number of research topics in the plant sciences, a finding that likely holds for other scientific disciplines. We also found that the pattern of topic evolution is similar to that of succession, where older topics have mostly decreased in relative prevalence but appear to have been superseded by newer ones. One example is the rise of transcriptome-related topics and the correlated, reduced focus on regulation at levels other than transcription. This raises the question of whether research driven by technology negatively impacts other areas of research where high-throughput studies remain challenging.

One observation on the overall trends in plant science research is the approximately 10-year cycle in major shifts. One hypothesis is related to not only scientific advances but also to the fashion-driven aspect of science. Nonetheless, given that there were only 3 major shifts and the sample size is small, it is difficult to speculate as to why they happened. By analyzing the country of origin, we found that China and India have been the 2 major contributors to the growth in the plant science records in the last 20 years. Our findings also show an equalizing trend in global plant science where countries without a strong plant science publication presence have had an increased impact over the last 20 years. In addition, we identified significant differences in research topics between countries reflecting potential differences in investment and priorities. Such information is important for discerning differences in research trends across countries and can be considered when making policy decisions about research directions.

Materials and methods

Collection and preprocessing of a candidate plant science corpus.

For reproducibility purposes, a random state value of 20220609 was used throughout the study. The PubMed baseline files containing citation information ( ftp://ftp.ncbi.nlm.nih.gov/pubmed/baseline/ ) were downloaded on November 11, 2021. To narrow down the records to plant science-related citations, a candidate citation was identified as having, within the titles and/or abstracts, at least one of the following words: “plant,” “plants,” “botany,” “botanical,” “planta,” and “plantarum” (and their corresponding upper case and plural forms), or plant taxon identifiers from NCBI Taxonomy ( https://www.ncbi.nlm.nih.gov/taxonomy ) or USDA PLANTS Database ( https://plants.sc.egov.usda.gov/home ). Note the search terms used here have nothing to do with the values of the keyword field in PubMed records. The taxon identifiers include all taxon names including and at taxonomic levels below “Viridiplantae” till the genus level (species names not used). This led to 51,395 search terms. After looking for the search terms, qualified entries were removed if they were duplicated, lacked titles and/or abstracts, or were corrections, errata, or withdrawn articles. This left 1,385,417 citations, which were considered the candidate plant science corpus (i.e., a collection of texts). For further analysis, the title and abstract for each citation were combined into a single entry. Text was preprocessed by lowercasing, removing stop-words (i.e., common words), removing non-alphanumeric and non-white space characters (except Greek letters, dashes, and commas), and applying lemmatization (i.e., grouping inflected forms of a word as a single word) for comparison. Because lemmatization led to truncated scientific terms, it was not included in the final preprocessing pipeline.

Definition of positive/negative examples

Upon closer examination, a large number of false positives were identified in the candidate plant science records. To further narrow down citations with a plant science focus, text classification was used to distinguish plant science and non-plant science articles (see next section). For the classification task, a negative set (i.e., non-plant science citations) was defined as entries from 7,360 journals that appeared <20 times in the filtered data (total = 43,329, journal candidate count, S1 Data ). For the positive examples (i.e., true plant science citations), 43,329 plant science citations (positive examples) were sampled from 17 established plant science journals each with >2,000 entries in the filtered dataset: “Plant physiology,” “Frontiers in plant science,” “Planta,” “The Plant journal: for cell and molecular biology,” “Journal of experimental botany,” “Plant molecular biology,” “The New phytologist,” “The Plant cell,” “Phytochemistry,” “Plant & cell physiology,” “American journal of botany,” “Annals of botany,” “BMC plant biology,” “Tree physiology,” “Molecular plant-microbe interactions: MPMI,” “Plant biology,” and “Plant biotechnology journal” (journal candidate count, S1 Data ). Plant biotechnology journal was included, but only 1,894 records remained after removal of duplicates, articles with missing info, and/or withdrawn articles. The positive and negative sets were randomly split into training and testing subsets (4:1) while maintaining a 1:1 positive-to-negative ratio.

Text classification based on Tf and Tf-Idf

Instead of using the preprocessed text as features for building classification models directly, text embeddings (i.e., representations of texts in vectors) were used as features. These embeddings were generated using 4 approaches (model summary, S1 Data ): Term-frequency (Tf), Tf-Idf [ 36 ], Word2Vec [ 37 ], and BERT [ 6 ]. The Tf- and Tf-Idf-based features were generated with CountVectorizer and TfidfVectorizer, respectively, from Scikit-Learn [ 38 ]. Different maximum features (1e4 to 1e5) and n-gram ranges (uni-, bi-, and tri-grams) were tested. The features were selected based on the p- value of chi-squared tests testing whether a feature had a higher-than-expected value among the positive or negative classes. Four different p- value thresholds were tested for feature selection. The selected features were then used to retrain vectorizers with the preprocessed training texts to generate feature values for classification. The classification model used was XGBoost [ 39 ] with 5 combinations of the following hyperparameters tested during 5-fold stratified cross-validation: min_child_weight = (1, 5, 10), gamma = (0.5, 1, 1.5, 2.5), subsample = (0.6, 0.8, 1.0), colsample_bytree = (0.6, 0.8, 1.0), and max_depth = (3, 4, 5). The rest of the hyperparameters were held constant: learning_rate = 0.2, n_estimators = 600, objective = binary:logistic. RandomizedSearchCV from Scikit-Learn was used for hyperparameter tuning and cross-validation with scoring = F1-score.

Because the Tf-Idf model had a relatively high model performance and was relatively easy to interpret (terms are frequency-based, instead of embedding-based like those generated by Word2Vec and BERT), the Tf-Idf model was selected as input to SHapley Additive exPlanations (SHAP; [ 14 ]) to assess the importance of terms. Because the Tf-Idf model was based on XGBoost, a tree-based algorithm, the TreeExplainer module in SHAP was used to determine a SHAP value for each entry in the training dataset for each Tf-Idf feature. The SHAP value indicates the degree to which a feature positively or negatively affects the underlying prediction. The importance of a Tf-Idf feature was calculated as the average SHAP value of that feature among all instances. Because a Tf-Idf feature is generated based on a specific term, the importance of the Tf-Idf feature indicates the importance of the associated term.

Text classification based on Word2Vec

The preprocessed texts were first split into train, validation, and test subsets (8:1:1). The texts in each subset were converted to 3 n-gram lists: a unigram list obtained by splitting tokens based on the space character, or bi- and tri-gram lists built with Gensim [ 40 ]. Each n-gram list of the training subset was next used to fit a Skip-gram Word2Vec model with vector_size = 300, window = 8, min_count = (5, 10, or 20), sg = 1, and epochs = 30. The Word2Vec model was used to generate word embeddings for train, validate, and test subsets. In the meantime, a tokenizer was trained with train subset unigrams using Tensorflow [ 41 ] and used to tokenize texts in each subset and turn each token into indices to use as features for training text classification models. To ensure all citations had the same number of features (500), longer texts were truncated, and shorter ones were zero-padded. A deep learning model was used to train a text classifier with an input layer the same size as the feature number, an attention layer incorporating embedding information for each feature, 2 bidirectional Long-Short-Term-Memory layers (15 units each), a dense layer (64 units), and a final, output layer with 2 units. During training, adam, accuracy, and sparse_categorical_crossentropy were used as the optimizer, evaluation metric, and loss function, respectively. The training process lasted 30 epochs with early stopping if validation loss did not improve in 5 epochs. An F1 score was calculated for each n-gram list and min_count parameter combination to select the best model (model summary, S1 Data ).

Text classification based on BERT models

Two pretrained models were used for BERT-based classification: DistilBERT (Hugging face repository [ 42 ] model name and version: distilbert-base-uncased [ 43 ]) and SciBERT (allenai/scibert-scivocab-uncased [ 16 ]). In both cases, tokenizers were retrained with the training data. BERT-based models had the following architecture: the token indices (512 values for each token) and associated masked values as input layers, pretrained BERT layer (512 × 768) excluding outputs, a 1D pooling layer (768 units), a dense layer (64 units), and an output layer (2 units). The rest of the training parameters were the same as those for Word2Vec-based models, except training lasted for 20 epochs. Cross-validation F1-scores for all models were compared and used to select the best model for each feature extraction method, hyperparameter combination, and modeling algorithm or architecture (model summary, S1 Data ). The best model was the Word2Vec-based model (min_count = 20, window = 8, ngram = 3), which was applied to the candidate plant science corpus to identify a set of plant science citations for further analysis. The candidate plant science records predicted as being in the positive class (421,658) by the model were collectively referred to as the “plant science corpus.”

Plant science record classification

In PubMed, 1,384,718 citations containing “plant” or any plant taxon names (from the phylum to genus level) were considered candidate plant science citations. To further distinguish plant science citations from those in other fields, text classification models were trained using titles and abstracts of positive examples consisting of citations from 17 plant science journals, each with >2,000 entries in PubMed, and negative examples consisting of records from journals with fewer than 20 entries in the candidate set. Among 4 models tested the best model (built with Word2Vec embeddings) had a cross validation F1 of 0.964 (random guess F1 = 0.5, perfect model F1 = 1, S1 Data ). When testing the model using 17,330 testing set citations independent from the training set, the F1 remained high at 0.961.

We also conducted another analysis attempting to use the MeSH term “Plants” as a benchmark. Records with the MeSH term “Plants” also include pharmaceutical studies of plants and plant metabolites or immunological studies of plants as allergens in journals that are not generally considered plant science journals (e.g., Acta astronautica , International journal for parasitology , Journal of chromatography ) or journals from local scientific societies (e.g., Acta pharmaceutica Hungarica , Huan jing ke xue , Izvestiia Akademii nauk . Seriia biologicheskaia ). Because we explicitly labeled papers from such journals as negative examples, we focused on 4,004 records with the “Plants” MeSH term published in the 17 plant science journals that were used as positive instances and found that 88.3% were predicted as the positive class. Thus, based on the MeSH term, there is an 11.7% false prediction rate.

We also enlisted 5 plant science colleagues (3 advanced graduate students in plant biology and genetic/genome science graduate programs, 1 postdoctoral breeder/quantitative biologist, and 1 postdoctoral biochemist/geneticist) to annotate 100 randomly selected abstracts as a reviewer suggested. Each record was annotated by 2 colleagues. Among 85 entries where the annotations are consistent between annotators, 48 were annotated as negative but with 7 predicted as positive (false positive rate = 14.6%) and 37 were annotated as positive but with 4 predicted as negative (false negative rate = 10.8%). To further benchmark the performance of the text classification model, we identified another 12 journals that focus on plant science studies to use as benchmarks: Current opinion in plant biology (number of articles: 1,806), Trends in plant science (1,723), Functional plant biology (1,717), Molecular plant pathology (1,573), Molecular plant (1,141), Journal of integrative plant biology (1,092), Journal of plant research (1,032), Physiology and molecular biology of plants (830), Nature plants (538), The plant pathology journal (443). Annual review of plant biology (417), and The plant genome (321). Among the 12,611 candidate plant science records, 11,386 were predicted as positive. Thus, there is a 9.9% false negative rate.

Global topic modeling

BERTopic [ 15 ] was used for preliminary topic modeling with n-grams = (1,2) and with an embedding initially generated by DistilBERT, SciBERT, or BioBERT (dmis-lab/biobert-base-cased-v1.2; [ 44 ]). The embedding models converted preprocessed texts to embeddings. The topics generated based on the 3 embeddings were similar ( S2 Data ). However, SciBERT-, BioBERT-, and distilBERT-based embedding models had different numbers of outlier records (268,848, 293,790, and 323,876, respectively) with topic index = −1. In addition to generating the fewest outliers, the SciBERT-based model led to the highest number of topics. Therefore, SciBERT was chosen as the embedding model for the final round of topic modeling. Modeling consisted of 3 steps. First, document embeddings were generated with SentenceTransformer [ 45 ]. Second, a clustering model to aggregate documents into clusters using hdbscan [ 46 ] was initialized with min_cluster_size = 500, metric = euclidean, cluster_selection_method = eom, min_samples = 5. Third, the embedding and the initialized hdbscan model were used in BERTopic to model topics with neighbors = 10, nr_topics = 500, ngram_range = (1,2). Using these parameters, 90 topics were identified. The initial topic assignments were conservative, and 241,567 records were considered outliers (i.e., documents not assigned to any of the 90 topics). After assessing the prediction scores of all records generated from the fitted topic models, the 95-percentile score was 0.0155. This score was used as the threshold for assigning outliers to topics: If the maximum prediction score was above the threshold and this maximum score was for topic t , then the outlier was assigned to t . After the reassignment, 49,228 records remained outliers. To assess if some of the outliers were not assigned because they could be assigned to multiple topics, the prediction scores of the records were used to put records into 100 clusters using k- means. Each cluster was then assessed to determine if the outlier records in a cluster tended to have higher prediction scores across multiple topics ( S2 Fig ).

Topics that are most and least well connected to other topics

The most well-connected topics in the network include topic 24 (stress mechanisms, median cosine similarity = 0.36), topic 42 (genes, stress, and transcriptomes, 0.34), and topic 35 (molecular genetics, 0.32, all t test p -values < 1 × 10 −22 ). The least connected topics include topic 0 (allergen research, median cosine similarity = 0.12), topic 21 (clock biology, 0.12), topic 1 (tissue culture, 0.15), and topic 69 (identification of compounds with spectroscopic methods, 0.15; all t test p- values < 1 × 10 −24 ). Topics 0, 1, and 69 are specialized topics; it is surprising that topic 21 is not as well connected as explained in the main text.

Analysis of documents based on the topic model

research paper on plant biotechnology

Topical diversity among top journals with the most plant science records

Using a relative topic diversity measure (ranging from 0 to 10), we found that there was a wide range of topical diversity among 20 journals with the largest numbers of plant science records ( S3 Fig ). The 4 journals with the highest relative topical diversities are Proceedings of the National Academy of Sciences , USA (9.6), Scientific Reports (7.1), Plant Physiology (6.7), and PLOS ONE (6.4). The high diversities are consistent with the broad, editorial scopes of these journals. The 4 journals with the lowest diversities are American Journal of Botany (1.6), Oecologia (0.7), Plant Disease (0.7), and Theoretical and Applied Genetics (0.3), which reflects their discipline-specific focus and audience of classical botanists, ecologists, plant pathologists, and specific groups of geneticists.

Dynamic topic modeling

The codes for dynamic modeling were based on _topic_over_time.py in BERTopics and modified to allow additional outputs for debugging and graphing purposes. The plant science citations were binned into 50 subsets chronologically (for timestamps of bins, see S5 Data ). Because the numbers of documents increased exponentially over time, instead of dividing them based on equal-sized time intervals, which would result in fewer records at earlier time points and introduce bias, we divided them into time bins of similar size (approximately 8,400 documents). Thus, the earlier time subsets had larger time spans compared with later time subsets. If equal-size time intervals were used, the numbers of documents between the intervals would differ greatly; the earlier time points would have many fewer records, which may introduce bias. Prior to binning the subsets, the publication dates were converted to UNIX time (timestamp) in seconds; the plant science records start in 1917-11-1 (timestamp = −1646247600.0) and end in 2021-1-1 (timestamp = 1609477201). The starting dates and corresponding timestamps for the 50 subsets including the end date are in S6 Data . The input data included the preprocessed texts, topic assignments of records from global topic modeling, and the binned timestamps of records. Three additional parameters were set for topics_over_time, namely, nr_bin = 50 (number of bins), evolution_tuning = True, and global_tuning = False. The evolution_tuning parameter specified that averaged c-Tf-Idf values for a topic be calculated in neighboring time bins to reduce fluctuation in c-Tf-Idf values. The global_tuning parameter was set to False because of the possibility that some nonexisting terms could have a high c-Tf-Idf for a time bin simply because there was a high global c-Tf-Idf value for that term.

The binning strategy based on similar document numbers per bin allowed us to increase signal particularly for publications prior to the 90s. This strategy, however, may introduce more noise for bins with smaller time durations (i.e., more recent bins) because of publication frequencies (there can be seasonal differences in the number of papers published, biased toward, e.g., the beginning of the year or the beginning of a quarter). To address this, we examined the relative frequencies of each topic over time ( S7 Data ), but we found that recent time bins had similar variances in relative frequencies as other time bins. We also moderated the impact of variation using LOWESS (10% to 30% of the data points were used for fitting the trend lines) to determine topical trends for Fig 3 . Thus, the influence of the noise introduced via our binning strategy is expected to be minimal.

Topic categories and ordering

The topics were classified into 5 categories with contrasting trends: stable, early, transitional, sigmoidal, and rising. To define which category a topic belongs to, the frequency of documents over time bins for each topic was analyzed using 3 regression methods. We first tried 2 forecasting methods: recursive autoregressor (the ForecasterAutoreg class in the skforecast package) and autoregressive integrated moving average (ARIMA implemented in the pmdarima package). In both cases, the forecasting results did not clearly follow the expected trend lines, likely due to the low numbers of data points (relative frequency values), which resulted in the need to extensively impute missing data. Thus, as a third approach, we sought to fit the trendlines with the data points using LOWESS (implemented in the statsmodels package) and applied additional criteria for assigning topics to categories. When fitting with LOWESS, 3 fraction parameters (frac, the fraction of the data used when estimating each y-value) were evaluated (0.1, 0.2, 0.3). While frac = 0.3 had the smallest errors for most topics, in situations where there were outliers, frac = 0.2 or 0.1 was chosen to minimize mean squared errors ( S7 Data ).

The topics were classified into 5 categories based on the slopes of the fitted line over time: (1) stable: topics with near 0 slopes over time; (2) early: topics with negative (<−0.5) slopes throughout (with the exception of topic 78, which declined early on but bounced back by the late 1990s); (3) transitional: early positive (>0.5) slopes followed by negative slopes at later time points; (4) sigmoidal: early positive slopes followed by zero slopes at later time points; and (5) rising: continuously positive slopes. For each topic, the LOWESS fits were also used to determine when the relative document frequency reached its peak, first reaching a threshold of 0.6 (chosen after trial and error for a range of 0.3 to 0.9), and the overall trend. The topics were then ordered based on (1) whether they belonged to the stable category or not; (2) whether the trends were decreasing, stable, or increasing; (3) the time the relative document frequency first reached 0.6; and (4) the time that the overall peak was reached ( S8 Data ).

Taxa information

To identify a taxon or taxa in all plant science records, NCBI Taxonomy taxdump datasets were downloaded from the NCBI FTP site ( https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/new_taxdump/ ) on September 20, 2022. The highest-level taxon was Viridiplantae, and all its child taxa were parsed and used as queries in searches against the plant science corpus. In addition, a species-over-time analysis was conducted using the same time bins as used for dynamic topic models. The number of records in different time bins for top taxa are in the genus, family, order, and additional species level sheet in S9 Data . The degree of over-/underrepresentation of a taxon X in a research topic T was assessed using the p -value of a Fisher’s exact test for a 2 × 2 table consisting of the numbers of records in both X and T, in X but not T, in T but not X, and in neither ( S10 Data ).

For analysis of plant taxa with genome information, genome data of taxa in Viridiplantae were obtained from the NCBI Genome data-hub ( https://www.ncbi.nlm.nih.gov/data-hub/genome ) on October 28, 2022. There were 2,384 plant genome assemblies belonging to 1,231 species in 559 genera (genome assembly sheet, S9 Data ). The date of the assembly was used as a proxy for the time when a genome was sequenced. However, some species have updated assemblies and have more recent data than when the genome first became available.

Taxa being studied in the plant science records

Flowering plants (Magnoliopsida) are found in 93% of records, while most other lineages are discussed in <1% of records, with conifers and related species being exceptions (Acrogynomsopermae, 3.5%, S6A Fig ). At the family level, the mustard (Brassicaceae), grass (Poaceae), pea (Fabaceae), and nightshade (Solanaceae) families are in 51% of records ( S6B Fig ). The prominence of the mustard family in plant science research is due to the Brassica and Arabidopsis genera ( Fig 4A ). When examining the prevalence of taxa being studied over time, clear patterns of turnovers emerged ( Figs 4B , S6C, and S6D ). While the study of monocot species (Liliopsida) has remained steady, there was a significant uptick in the prevalence of eudicot (eudicotyledon) records in the late 90s ( S6C Fig ), which can be attributed to the increased number of studies in the mustard, myrtle (Myrtaceae), and mint (Lamiaceae) families among others ( S6D Fig ). At the genus level, records mentioning Gossypium (cotton), Phaseolus (bean), Hordeum (wheat), and Zea (corn), similar to the topics in the early category, were prevalent till the 1980s or 1990s but have mostly decreased in number since ( Fig 4B ). In contrast, Capsicum , Arabidopsis , Oryza , Vitus , and Solanum research has become more prevalent over the last 20 years.

Geographical information for the plant science corpus

The geographical information (country) of authors in the plant science corpus was obtained from the address (AD) fields of first authors in Medline XML records accessible through the NCBI EUtility API ( https://www.ncbi.nlm.nih.gov/books/NBK25501/ ). Because only first author affiliations are available for records published before December 2014, only the first author’s location was considered to ensure consistency between records before and after that date. Among the 421,658 records in the plant science corpus, 421,585 had Medline records and 421,276 had unique PMIDs. Among the records with unique PMIDs, 401,807 contained address fields. For each of the remaining records, the AD field content was split into tokens with a “,” delimiter, and the token likely containing geographical info (referred to as location tokens) was selected as either the last token or the second to last token if the last token contained “@” indicating the presence of an email address. Because of the inconsistency in how geographical information was described in the location tokens (e.g., country, state, city, zip code, name of institution, and different combinations of the above), the following 4 approaches were used to convert location tokens into countries.

The first approach was a brute force search where full names and alpha-3 codes of current countries (ISO 3166–1), current country subregions (ISO 3166–2), and historical country (i.e., country that no longer exists, ISO 3166–3) were used to search the address fields. To reduce false positives using alpha-3 codes, a space prior to each code was required for the match. The first approach allowed the identification of 361,242, 16,573, and 279,839 records with current country, historical country, and subregion information, respectively. The second method was the use of a heuristic based on common address field structures to identify “location strings” toward the end of address fields that likely represent countries, then the use of the Python pycountry module to confirm the presence of country information. This approach led to 329,025 records with country information. The third approach was to parse first author email addresses (90,799 records), recover top-level domain information, and use country code Top Level Domain (ccTLD) data from the ISO 3166 Wikipedia page to define countries (72,640 records). Only a subset of email addresses contains country information because some are from companies (.com), nonprofit organizations (.org), and others. Because a large number of records with address fields still did not have country information after taking the above 3 approaches, another approach was implemented to query address fields against a locally installed Nominatim server (v.4.2.3, https://github.com/mediagis/nominatim-docker ) using OpenStreetMap data from GEOFABRIK ( https://www.geofabrik.de/ ) to find locations. Initial testing indicated that the use of full address strings led to false positives, and the computing resource requirement for running the server was high. Thus, only location strings from the second approach that did not lead to country information were used as queries. Because multiple potential matches were returned for each query, the results were sorted based on their location importance values. The above steps led to an additional 72,401 records with country information.

Examining the overlap in country information between approaches revealed that brute force current country and pycountry searches were consistent 97.1% of the time. In addition, both approaches had high consistency with the email-based approach (92.4% and 93.9%). However, brute force subregion and Nominatim-based predictions had the lowest consistencies with the above 3 approaches (39.8% to 47.9%) and each other. Thus, a record’s country information was finalized if the information was consistent between any 2 approaches, except between the brute force subregion and Nominatim searches. This led to 330,328 records with country information.

Topical and country impact metrics

research paper on plant biotechnology

To determine annual country impact, impact scores were determined in the same way as that for annual topical impact, except that values for different countries were calculated instead of topics ( S8 Data ).

Topical preferences by country

To determine topical preference for a country C , a 2 × 2 table was established with the number of records in topic T from C , the number of records in T but not from C , the number of non- T records from C , and the number of non- T records not from C . A Fisher’s exact test was performed for each T and C combination, and the resulting p -values were corrected for multiple testing with the Bejamini–Hochberg method (see S12 Data ). The preference of T in C was defined as the degree of enrichment calculated as log likelihood ratio of values in the 2 × 2 table. Topic 5 was excluded because >50% of the countries did not have records for this topic.

The top 10 countries could be classified into a China–India cluster, an Italy–Spain cluster, and remaining countries (yellow rectangles, Fig 5E ). The clustering of Italy and Spain is partly due to similar research focusing on allergens (topic 0) and mycotoxins (topic 54) and less emphasis on gene family (topic 23) and stress tolerance (topic 28) studies ( Figs 5F and S9 ). There are also substantial differences in topical focus between countries. For example, plant science records from China tend to be enriched in hyperspectral imaging and modeling (topic 9), gene family studies (topic 23), stress biology (topic 28), and research on new plant compounds associated with herbal medicine (topic 69), but less emphasis on population genetics and evolution (topic 86, Fig 5F ). In the US, there is a strong focus on insect pest resistance (topic 75), climate, community, and diversity (topic 83), and population genetics and evolution but less focus on new plant compounds. In summary, in addition to revealing how plant science research has evolved over time, topic modeling provides additional insights into differences in research foci among different countries.

Supporting information

S1 fig. plant science record classification model performance..

(A–C) Distributions of prediction probabilities (y_prob) of (A) positive instances (plant science records), (B) negative instances (non-plant science records), and (C) positive instances with the Medical Subject Heading “Plants” (ID = D010944). The data are color coded in blue and orange if they are correctly and incorrectly predicted, respectively. The lower subfigures contain log10-transformed x axes for the same distributions as the top subfigure for better visualization of incorrect predictions. (D) Prediction probability distribution for candidate plant science records. Prediction probabilities plotted here are available in S13 Data .

https://doi.org/10.1371/journal.pbio.3002612.s001

S2 Fig. Relationships between outlier clusters and the 90 topics.

(A) Heatmap demonstrating that some outlier clusters tend to have high prediction scores for multiple topics. Each cell shows the average prediction score of a topic for records in an outlier cluster. (B) Size of outlier clusters.

https://doi.org/10.1371/journal.pbio.3002612.s002

S3 Fig. Cosine similarities between topics.

(A) Heatmap showing cosine similarities between topic pairs. Top-left: hierarchical clustering of the cosine similarity matrix using the Ward algorithm. The branches are colored to indicate groups of related topics. (B) Topic labels and names. The topic ordering was based on hierarchical clustering of topics. Colored rectangles: neighboring topics with >0.5 cosine similarities.

https://doi.org/10.1371/journal.pbio.3002612.s003

S4 Fig. Relative topical diversity for 20 journals.

The 20 journals with the most plant science records are shown. The journal names were taken from the journal list in PubMed ( https://www.nlm.nih.gov/bsd/serfile_addedinfo.html ).

https://doi.org/10.1371/journal.pbio.3002612.s004

S5 Fig. Topical frequency and top terms during different time periods.

(A-D) Different patterns of topical frequency distributions for example topics (A) 48, (B) 35, (C) 27, and (D) 42. For each topic, the top graph shows the frequency of topical records in each time bin, which are the same as those in Fig 3 (green line), and the end date for each bin is indicated. The heatmap below each line plot depicts whether a term is among the top terms in a time bin (yellow) or not (blue). Blue dotted lines delineate different decades (see S5 Data for the original frequencies, S6 Data for the LOWESS fitted frequencies and the top terms for different topics/time bins).

https://doi.org/10.1371/journal.pbio.3002612.s005

S6 Fig. Prevalence of records mentioning different taxonomic groups in Viridiplantae.

(A, B) Percentage of records mentioning specific taxa at the ( A) major lineage and (B) family levels. (C, D) The prevalence of taxon mentions over time at the (C) major lineage and (E) family levels. The data used for plotting are available in S9 Data .

https://doi.org/10.1371/journal.pbio.3002612.s006

S7 Fig. Changes over time.

(A) Number of genera being mentioned in plant science records during different time bins (the date indicates the end date of that bin, exclusive). (B) Numbers of genera (blue) and organisms (salmon) with draft genomes available from National Center of Biotechnology Information in different years. (C) Percentage of US National Science Foundation (NSF) grants mentioning the genus Arabidopsis over time with peak percentage and year indicated. The data for (A–C) are in S9 Data . (D) Number of plant science records in the top 17 plant science journals from the USA (red), Great Britain (GBR) (orange), India (IND) (light green), and China (CHN) (dark green) normalized against the total numbers of publications of each country over time in these 17 journals. The data used for plotting can be found in S11 Data .

https://doi.org/10.1371/journal.pbio.3002612.s007

S8 Fig. Change in country impact on plant science over time.

(A, B) Difference in 2 impact metrics from 1999 to 2020 for the 10 countries with the highest number of plant science records. (A) H-index. (B) SCImago Journal Rank (SJR). (C, D) Plots show the relationships between the impact metrics (H-index in (C) , SJR in (D) ) averaged from 1999 to 2020 and the slopes of linear fits with years as the predictive variable and impact metric as the response variable for different countries (A3 country codes shown). The countries with >400 records and with <10% missing impact values are included. The data used for plotting can be found in S11 Data .

https://doi.org/10.1371/journal.pbio.3002612.s008

S9 Fig. Country topical preference.

Enrichment scores (LLR, log likelihood ratio) of topics for each of the top 10 countries. Red: overrepresentation, blue: underrepresentation. The data for plotting can be found in S12 Data .

https://doi.org/10.1371/journal.pbio.3002612.s009

S1 Data. Summary of source journals for plant science records, prediction models, and top Tf-Idf features.

Sheet–Candidate plant sci record j counts: Number of records from each journal in the candidate plant science corpus (before classification). Sheet—Plant sci record j count: Number of records from each journal in the plant science corpus (after classification). Sheet–Model summary: Model type, text used (txt_flag), and model parameters used. Sheet—Model performance: Performance of different model and parameter combinations on the validation data set. Sheet–Tf-Idf features: The average SHAP values of Tf-Idf (Term frequency-Inverse document frequency) features associated with different terms. Sheet–PubMed number per year: The data for PubMed records in Fig 1A . Sheet–Plant sci record num per yr: The data for the plant science records in Fig 1A .

https://doi.org/10.1371/journal.pbio.3002612.s010

S2 Data. Numbers of records in topics identified from preliminary topic models.

Sheet–Topics generated with a model based on BioBERT embeddings. Sheet–Topics generated with a model based on distilBERT embeddings. Sheet–Topics generated with a model based on SciBERT embeddings.

https://doi.org/10.1371/journal.pbio.3002612.s011

S3 Data. Final topic model labels and top terms for topics.

Sheet–Topic label: The topic index and top 10 terms with the highest cTf-Idf values. Sheets– 0 to 89: The top 50 terms and their c-Tf-Idf values for topics 0 to 89.

https://doi.org/10.1371/journal.pbio.3002612.s012

S4 Data. UMAP representations of different topics.

For a topic T , records in the UMAP graph are colored red and records not in T are colored gray.

https://doi.org/10.1371/journal.pbio.3002612.s013

S5 Data. Temporal relationships between published documents projected onto 2D space.

The 2D embedding generated with UMAP was used to plot document relationships for each year. The plots from 1975 to 2020 were compiled into an animation.

https://doi.org/10.1371/journal.pbio.3002612.s014

S6 Data. Timestamps and dates for dynamic topic modeling.

Sheet–bin_timestamp: Columns are: (1) order index; (2) bin_idx–relative positions of bin labels; (3) bin_timestamp–UNIX time in seconds; and (4) bin_date–month/day/year. Sheet–Topic frequency per timestamp: The number of documents in each time bin for each topic. Sheets–LOWESS fit 0.1/0.2/0.3: Topic frequency per timestamp fitted with the fraction parameter of 0.1, 0.2, or 0.3. Sheet—Topic top terms: The top 5 terms for each topic in each time bin.

https://doi.org/10.1371/journal.pbio.3002612.s015

S7 Data. Locally weighted scatterplot smoothing (LOWESS) of topical document frequencies over time.

There are 90 scatter plots, one for each topic, where the x axis is time, and the y axis is the document frequency (blue dots). The LOWESS fit is shown as orange points connected with a green line. The category a topic belongs to and its order in Fig 3 are labeled on the top left corner. The data used for plotting are in S6 Data .

https://doi.org/10.1371/journal.pbio.3002612.s016

S8 Data. The 4 criteria used for sorting topics.

Peak: the time when the LOWESS fit of the frequencies of a topic reaches maximum. 1st_reach_thr: the time when the LOWESS fit first reaches a threshold of 60% maximal frequency (peak value). Trend: upward (1), no change (0), or downward (−1). Stable: whether a topic belongs to the stable category (1) or not (0).

https://doi.org/10.1371/journal.pbio.3002612.s017

S9 Data. Change in taxon record numbers and genome assemblies available over time.

Sheet–Genus: Number of records mentioning a genus during different time periods (in Unix timestamp) for the top 100 genera. Sheet–Genus: Number of records mentioning a family during different time periods (in Unix timestamp) for the top 100 families. Sheet–Genus: Number of records mentioning an order during different time periods (in Unix timestamp) for the top 20 orders. Sheet–Species levels: Number of records mentioning 12 selected taxonomic levels higher than the order level during different time periods (in Unix timestamp). Sheet–Genome assembly: Plant genome assemblies available from NCBI as of October 28, 2022. Sheet–Arabidopsis NSF: Absolute and normalized numbers of US National Science Foundation funded proposals mentioning Arabidopsis in proposal titles and/or abstracts.

https://doi.org/10.1371/journal.pbio.3002612.s018

S10 Data. Taxon topical preference.

Sheet– 5 genera LLR: The log likelihood ratio of each topic in each of the top 5 genera with the highest numbers of plant science records. Sheets– 5 genera: For each genus, the columns are: (1) topic; (2) the Fisher’s exact test p -value (Pvalue); (3–6) numbers of records in topic T and in genus X (n_inT_inX), in T but not in X (n_inT_niX), not in T but in X (n_niT_inX), and not in T and X (n_niT_niX) that were used to construct 2 × 2 tables for the tests; and (7) the log likelihood ratio generated with the 2 × 2 tables. Sheet–corrected p -value: The 4 values for generating LLRs were used to conduct Fisher’s exact test. The p -values obtained for each country were corrected for multiple testing.

https://doi.org/10.1371/journal.pbio.3002612.s019

S11 Data. Impact metrics of countries in different years.

Sheet–country_top25_year_count: number of total publications and publications per year from the top 25 countries with the most plant science records. Sheet—country_top25_year_top17j: number of total publications and publications per year from the top 25 countries with the highest numbers of plant science records in the 17 plant science journals used as positive examples. Sheet–prank: Journal percentile rank scores for countries (3-letter country codes following https://www.iban.com/country-codes ) in different years from 1999 to 2020. Sheet–sjr: Scimago Journal rank scores. Sheet–hidx: H-Index scores. Sheet–cite: Citation scores.

https://doi.org/10.1371/journal.pbio.3002612.s020

S12 Data. Topical enrichment for the top 10 countries with the highest numbers of plant science publications.

Sheet—Log likelihood ratio: For each country C and topic T, it is defined as log((a/b)/(c/d)) where a is the number of papers from C in T, b is the number from C but not in T, c is the number not from C but in T, d is the number not from C and not in T. Sheet: corrected p -value: The 4 values, a, b, c, and d, were used to conduct Fisher’s exact test. The p -values obtained for each country were corrected for multiple testing.

https://doi.org/10.1371/journal.pbio.3002612.s021

S13 Data. Text classification prediction probabilities.

This compressed file contains the PubMed ID (PMID) and the prediction probabilities (y_pred) of testing data with both positive and negative examples (pred_prob_testing), plant science candidate records with the MeSH term “Plants” (pred_prob_candidates_with_mesh), and all plant science candidate records (pred_prob_candidates_all). The prediction probability was generated using the Word2Vec text classification models for distinguishing positive (plant science) and negative (non-plant science) records.

https://doi.org/10.1371/journal.pbio.3002612.s022

Acknowledgments

We thank Maarten Grootendorst for discussions on topic modeling. We also thank Stacey Harmer, Eva Farre, Ning Jiang, and Robert Last for discussion on their respective research fields and input on how to improve this study and Rudiger Simon for the suggestion to examine differences between countries. We also thank Mae Milton, Christina King, Edmond Anderson, Jingyao Tang, Brianna Brown, Kenia Segura Abá, Eleanor Siler, Thilanka Ranaweera, Huan Chen, Rajneesh Singhal, Paulo Izquierdo, Jyothi Kumar, Daniel Shiu, Elliott Shiu, and Wiggler Catt for their good ideas, personal and professional support, collegiality, fun at parties, as well as the trouble they have caused, which helped us improve as researchers, teachers, mentors, and parents.

  • View Article
  • PubMed/NCBI
  • Google Scholar
  • 2. Blei DM, Lafferty JD. Topic Models. In: Srivastava A, Sahami M, editors. Text Mining. Cambridge: Chapman and Hall/CRC; 2009. pp. 71–93.
  • 7. ChatGPT. [cited 2023 Aug 25]. Available from: https://chat.openai.com
  • 9. Fei-Fei L, Perona P. A Bayesian hierarchical model for learning natural scene categories. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05); 2005. pp. 524–531 vol. 2. https://doi.org/10.1109/CVPR.2005.16
  • 19. Blei DM, Lafferty JD. Dynamic topic models. Proceedings of the 23rd International Conference on Machine learning. New York, NY, USA: Association for Computing Machinery; 2006. pp. 113–120. https://doi.org/10.1145/1143844.1143859
  • 35. Kuhn T. The Structure of Scientific Revolution. Chicago: University of Chicago Press; 1962.
  • 36. CiteSeer | Proceedings of the second international conference on Autonomous agents. [cited 2023 Aug 23]. Available from: https://dl.acm.org/doi/10.1145/280765.280786
  • 39. Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: ACM; 2016. pp. 785–794. https://doi.org/10.1145/2939672.2939785
  • 40. Řehůřek R, Sojka P. Software Framework for Topic Modelling with Large Corpora. Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. Valletta, Malta: ELRA; 2010. pp. 45–50.
  • 42. Hugging Face–The AI community building the future. 2023 Aug 19 [cited 2023 Aug 25]. Available from: https://huggingface.co/

plant biotechnology Recently Published Documents

Total documents.

  • Latest Documents
  • Most Cited Documents
  • Contributed Authors
  • Related Sources
  • Related Keywords

Role of plant biotechnology in enhancement of alkaloid production from cell culture system of Catharanthus roseus: A medicinal plant with potent anti-tumor properties

Induksi kalus daun binahong merah (basella rubra l.) dengan pe,berian 2,4-d dan kinetin.

Tanaman binahong merah (Basella rubra L.) merupakan salah satu tanaman yang mengandung senyawa metabolit sekunder berkhasiat obat. Kultur kalus adalah salah satu solusi dalam menghasilkan senyawa metabolit sekunder dengan jumlah yang besar.  Penelitian ini bertujuan untuk mengetahui pengaruh pemberian 2,4-D dan kinetin dalam menginduksi kalus daun binahong merah. Penelitian dilakukan di Laboratorium Bioteknologi Tanaman Fakultas Pertanian Universitas Riau pada bulan November 2019 sampai Maret 2020. Percobaan menggunakan rancangan acak kelompok yang terdiri dari dua faktor yaitu 2,4-D dengan empat taraf konsentrasi yaitu 0 ppm, 0,5 ppm, 1 ppm dan 2 ppm dan kinetin dengan empat taraf konsentrasi yaitu 0 ppm, 0,5 ppm, 1 ppm dan 2 ppm dengan tiga ulangan. Hasil penelitian menunjukkan bahwa kombinasi perlakuan tanpa 2,4-D dan 0,5 ppm kinetin memberikan respon terbaik pada saat muncul kalus 11,67 HST dan perlakuan 1 ppm 2,4-D dan 2 ppm kinetin memberikan respon terbaik pada persentase keberhasilan induksi kalus 62,50 %.  Red binahong (Basella rubra L.) is a plant that contains medicinal secondary metabolites.  Callus culture is one solution in producing secondary metabolites in large quantities. This research aimed to determine the effect of 2,4-D and kinetin in inducing callus on red binahong leaves. The research was conducted at the Laboratory of Plant Biotechnology, Faculty of Agriculture, the University of Riau from November 2019 to March 2020. The experiment used a randomized block design with two factors, namely four levels of 2,4-D 0, 0.5, 1, and 2 ppm and four levels of kinetin namely 0, 0.5, 1, and 2 ppm with three replications. The results showed that a combination of 0 ppm 2,4-D and 0.5 ppm kinetin and 1 ppm 2,4-D and 2 ppm had the fastest callus formation 11.67 DAP and a combination of 1 ppm 2,4-D and 2 ppm kinetin produced weigher callus 6.4 mg and had a percentage of callus formation 62.50%.

Highly Variable Dietary RNAi Sensitivity Among Coleoptera

Many herbivorous beetles (Order Coleoptera) contribute to serious losses in crop yields and forest trees, and plant biotechnology solutions are being developed with the hope of limiting these losses. Due to the unprecedented target-specificity of double-stranded RNA (dsRNA), and its utility in inducing RNA interference (RNAi) when consumed by target pest species, dsRNA-based plant biotechnology approaches represent the cutting edge of current pesticide research and development. We review dietary RNAi studies in coleopterans and discuss prospects and future directions regarding RNAi-based management of coleopteran plant pests. Herein, we also provide a balanced overview of existing studies in order to provide an accurate re-assessment of dietary RNAi sensitivity in coleopterans, despite the limitations to the existing body of scientific literature. We further discuss impediments to our understanding of RNAi sensitivity in this important insect order and identify critical future directions for research in this area, with an emphasis on using plant biotechnology approaches.

A plant-biotechnology approach for producing highly potent anti-HIV antibodies for antiretroviral therapy consideration

AbstractDespite a reduction in global HIV prevalence the development of a pipeline of new therapeutics or pre-exposure prophylaxis to control the HIV/AIDS epidemic are of high priority. Antibody-based therapies offer several advantages and have been shown to prevent HIV-infection. Plant-based production is efficient for several biologics, including antibodies. We provide a short review on the work by Singh et al., 2020 who demonstrated the transient production of potent CAP256-VRC26 broadly neutralizing antibodies. These antibodies have engineered posttranslational modifications, namely N-glycosylation in the fragment crystallizable region and O-sulfation of tyrosine residues in the complementary-determining region H3 loop. The glycoengineered Nicotiana benthamiana mutant (ΔXTFT) was used, with glycosylating structures lacking β1,2-xylose and/or α1,3-fucose residues, which is critical for enhanced effector activity. The CAP256-VRC26 antibody lineage targets the first and second variable region of the HIV-1 gp120 envelope glycoprotein. The high potency of this lineage is mediated by a protruding O-sulfated tyrosine in the CDR H3 loop. Nicotiana benthamiana lacks human tyrosyl protein sulfotransferase 1, the enzyme responsible for tyrosine O-sulfation. The transient coexpression of the CAP256-VRC26 antibodies with tyrosyl protein sulfotransferase 1 in planta had restored the efficacy of these antibodies through the incorporation of the O-sulfation modification. This approach demonstrates the strategic incorporation of posttranslational modifications in production systems, which may have not been previously considered. These plant-produced CAP256-VRC26 antibodies have therapeutic as well as topical and systemic pre-exposure prophylaxis potential in enabling the empowerment of young girls and women given that gender inequalities remain a major driver of the epidemic.

Peculiarities of the Transformation of Asteraceae Family Species: The Cases of Sunflower and Lettuce

The Asteraceae family is the largest and most diversified family of the Angiosperms, characterized by the presence of numerous clustered inflorescences, which have the appearance of a single compound flower. It is estimated that this family represents around 10% of all flowered species, with a great biodiversity, covering all environments on the planet, except Antarctica. Also, it includes economically important crops, such as lettuce, sunflower, and chrysanthemum; wild flowers; herbs, and several species that produce molecules with pharmacological properties. Nevertheless, the biotechnological improvement of this family is limited to a few species and their genetic transformation was achieved later than in other plant families. Lettuce (Lactuca sativa L.) is a model species in molecular biology and plant biotechnology that has easily adapted to tissue culture, with efficient shoot regeneration from different tissues, organs, cells, and protoplasts. Due to this plasticity, it was possible to obtain transgenic plants tolerant to biotic or abiotic stresses as well as for the production of commercially interesting molecules (molecular farming). These advances, together with the complete sequencing of lettuce genome allowed the rapid adoption of gene editing using the CRISPR system. On the other hand, sunflower (Helianthus annuus L.) is a species that for years was considered recalcitrant to in vitro culture. Although this difficulty was overcome and some publications were made on sunflower genetic transformation, until now there is no transgenic variety commercialized or authorized for cultivation. In this article, we review similarities (such as avoiding the utilization of the CaMV35S promoter in transformation vectors) and differences (such as transformation efficiency) in the state of the art of genetic transformation techniques performed in these two species.

Improved the Activity of Phosphite Dehydrogenase and its Application in Plant Biotechnology

Phosphorus (P) is a nonrenewable resource, which is one of the major challenges for sustainable agriculture. Although phosphite (Phi) can be absorbed by the plant cells through the Pi transporters, it cannot be metabolized by plant and unable to use as P fertilizers for crops. However, transgenic plants that overexpressed phosphite dehydrogenase (PtxD) from bacteria can utilize phosphite as the sole P source. In this study, we aimed to improve the catalytic efficiency of PtxD from Ralstonia sp.4506 (PtxDR4506), by directed evolution. Five mutations were generated by saturation mutagenesis at the 139th site of PtxD R4506 and showed higher catalytic efficiency than native PtxDR4506. The PtxDQ showed the highest catalytic efficiency (5.83-fold as compared to PtxDR4506) contributed by the 41.1% decrease in the Km and 2.5-fold increase in the kcat values. Overexpression of PtxDQ in Arabidopsis and rice showed increased efficiency of phosphite utilization and excellent development when phosphite was used as the primary source of P. High-efficiency PtxD transgenic plant is an essential prerequisite for future agricultural production using phosphite as P fertilizers.

Genomic Designing of Climate Smart Turmeric

Turmeric is highly tolerant to several climatic changes and can grow under high temperatures and moderate drought conditions. This herb is very much dependant on optimum rainfall, optimum heat with less chilling or freezing conditions. These conditions if are more than normal would tend to reduce the yields of the crops and also effect the productivity. To reduce such drastic yield losses certain conventional plant breeding methods were employed but were very less effective compared to plant biotechnology. To reduce these loses by stresses, extensive and effective molecular biology methods were employed which identifies the genes that are stress responsive along with certain methods like gene transfer, genetic engineering was also known to be effective. All these methods are quite helpful in mitigating the yield losses and promoting healthy growth in the plants. The maintenance of rhizome size, curcumin content, essential oils etc. is very much necessary for the turmeric crop because of its role, especially in the medical field. Therefore, the yield losses are reduced to a maximum extent so that development of smart turmeric is easy and crop designing is possible only with the advanced techniques involved in agriculture biotechnology.

Nanotechnology applications in plant tissue culture and molecular genetics: A holistic approach

: Nanotechnology is one of the most important modern sciences that has integrated all sectors of science. Nanotechnology has been applied in the agricultural sector in the last ten years in pursuit of increasing agricultural production and ensuring food security. Plant biotechnology is an essential science that is concerned with plant production. The use of nanotechnology in plant biotechnology under controlled conditions has facilitated the understanding of important internal mechanisms of the plant biological system. The application of nanoparticles (NPs) in plant biotechnology has demonstrated an interesting impact on in vitro plant growth and development. This includes the positive effect of the NPs on micropropagation, callus induction, somatic embryogenesis, cell suspension culture, and plant disinfection. In addition, other biotechnology processes, including the genetic transformation of plants, plant conservation, and secondary metabolite production have improved by the use of NPs. Furthermore, nanotechnology is used to improve plant tolerance to different stress conditions that limit plant production. In this review article, we attempt to consolidate the achievements of nanotechnology and plant biotechnology and discuss advances in the applications of nanotechnology in plant biotechnology. It has been concluded that more research is needed to understand the mechanism of nanoparticle delivery and translocation in plants in order to avoid any future hazardous effects of nanomaterials. This will be key to the achievement of magnificent progress in plant nanobiotechnology.

High-efficiency retron-mediated single-stranded DNA production in plants

ABSTRACT Background: Retrons are a class of retroelements that produce multicopy single-stranded DNA (msDNA) and participate in anti-phage defenses in bacteria. Retrons have been harnessed for the over-production of single-stranded DNA (ssDNA), genome engineering, and directed evolution in bacteria, yeast, and mammalian cells. However, no studies have shown retron-mediated ssDNA production in plants, which could unlock potential applications in plant biotechnology. For example, ssDNA can be used as a template for homology-directed repair (HDR) in several organisms. However, current gene editing technologies rely on the physical delivery of synthetic ssDNA, which limits their applications. Main methods and major results: Here, we demonstrated retron-mediated over-production of ssDNA in Nicotiana benthamiana. Additionally, we tested different retron architectures for improved ssDNA production and identified a new retron architecture that resulted in greater ssDNA abundance. Furthermore, co-expression of the gene encoding the ssDNA-protecting protein VirE2 from Agrobacterium tumefaciens with the retron systems resulted in a 10.7-fold increase in ssDNA production in vivo. We also demonstrated CRISPR-retron-coupled ssDNA over-production and targeted HDR in N. benthamiana. Conclusion: We present an efficient approach for in vivo ssDNA production in plants, which can be harnessed for biotechnological applications.

Transcriptomic Changes in Internode Explants of Stinging Nettle during Callogenesis

Callogenesis, the process during which explants derived from differentiated plant tissues are subjected to a trans-differentiation step characterized by the proliferation of a mass of cells, is fundamental to indirect organogenesis and the establishment of cell suspension cultures. Therefore, understanding how callogenesis takes place is helpful to plant tissue culture, as well as to plant biotechnology and bioprocess engineering. The common herbaceous plant stinging nettle (Urtica dioica L.) is a species producing cellulosic fibres (the bast fibres) and a whole array of phytochemicals for pharmacological, nutraceutical and cosmeceutical use. Thus, it is of interest as a potential multi-purpose plant. In this study, callogenesis in internode explants of a nettle fibre clone (clone 13) was studied using RNA-Seq to understand which gene ontologies predominate at different time points. Callogenesis was induced with the plant growth regulators α-napthaleneacetic acid (NAA) and 6-benzyl aminopurine (BAP) after having determined their optimal concentrations. The process was studied over a period of 34 days, a time point at which a well-visible callus mass developed on the explants. The bioinformatic analysis of the transcriptomic dataset revealed specific gene ontologies characterizing each of the four time points investigated (0, 1, 10 and 34 days). The results show that, while the advanced stage of callogenesis is characterized by the iron deficiency response triggered by the high levels of reactive oxygen species accumulated by the proliferating cell mass, the intermediate and early phases are dominated by ontologies related to the immune response and cell wall loosening, respectively.

Export Citation Format

Share document.

COMMENTS

  1. Plant biotechnology

    Haploids fast-track hybrid plant breeding. Two studies report the use of paternal haploids to enable one-step transfer of cytoplasmic male sterility in maize and broccoli, which resolves a key ...

  2. Plant Biotechnology Journal

    About this journal. Plant Biotechnology Journal (PBJ) is an open access journal publishing high-impact original research and incisive reviews with an emphasis on molecular plant sciences and their applications through plant biotechnology. It is published by Wiley in collaboration with the Society for Experimental Biology (SEB) and the ...

  3. The future of plant biotechnology in a globalized and environmentally

    The paper argues that science alone will not solve problems. Three major forces - science, the economy and society - shape our modern world. There is a need for a new social contract to harmonize these forces. ... The political decision triggered the shrinking of public funds for research on plant biotechnology, severely delaying the ...

  4. CRISPR/Cas9 in plant biotechnology: applications and challenges

    Hence, research should be focused on improving current delivery methods or developing novel ones to facilitate CRISPR/Cas9-based gene editing studies. Strict regulations on the sale and commercial growth of gene-edited crops have restricted more efforts in applying CRISPR/Cas9 technology in plant species. Therefore, a shift in public viewpoint ...

  5. Home

    Plant Biotechnology Reports is a peer-reviewed journal emphasizing fundamental and applied research in plant biotechnology. Offers comprehensive coverage extending to molecular biology, genetics, biochemistry, and more. Prioritizes studies on plants indigenous to the Asia-Pacific region. Encourages studies related to commercialization of plant ...

  6. Plant Biotechnology Journal

    About this journal. Plant Biotechnology Journal (PBJ) is an open access journal publishing high-impact original research and incisive reviews with an emphasis on molecular plant sciences and their applications through plant biotechnology. It is published by Wiley in collaboration with the Society for Experimental Biology (SEB) and the ...

  7. Articles

    Characterization of the effects of exogenous abscisic acid (ABA) application on the expression of ABA-responsive genes in Abies koreana. Plant Biotechnology Reports is a peer-reviewed journal emphasizing fundamental and applied research in plant biotechnology. Offers comprehensive coverage ...

  8. Frontiers in Plant Science

    Research Topics. See all (123) Learn more about Research Topics. This section explores all branches of plant biotechnology, addressing the attempts of modern technologies to satisfy increasing demands for crop production.

  9. Frontiers

    Editorial on the Research Topic. Insights in plant biotechnology: 2021. The Plant Biotechnology section at Frontiers in Plant Science mainly publishes applied studies examining how plants can be improved using modern genetic techniques ( Lloyd and Kossmann, 2021 ). This Research Topic was designed to allow editors from the section to highlight ...

  10. Plant Biotechnology Journal

    Department of Plant Pathology and Microbiology, Texas A&M University, College Station, Texas, USA. Institute for Advancing Health Through Agriculture, Texas A&M AgriLife, College Station, Texas, USA. Correspondence (Tel 1-956-969-5634; fax 1-956-969-5620; email [email protected]) Search for more papers by this author

  11. Insights in Plant Biotechnology: 2021

    The Plant Biotechnology section at Frontiers in Plant Science mainly publishes applied studies examining how plants can be improved using modern genetic techniques (Lloyd and Kossmann, 2021). This Research Topic was designed to allow editors from the section to highlight some of their own plant biotechnological work.

  12. Bioinformatics approaches and applications in plant biotechnology

    Biotechnology and bioinformatics for plant breeding. Plant breeding can be defined as the changing or improvement of desired traits in plants to produce improved new crop cultivars for the benefits of humankind [].Jhansi and Usha [] mentioned a few benefits brought by genetically engineered plants such as improved quality, enhanced nutritional value, and maximized yield.

  13. Recent Advances in Plant Biotechnology

    Dr. Kirakosyan is principal author of over 50 peer-reviewed research papers in professional journals and several chapters in books dealing with plant biotechnology and molecular biology. He is second author of best-selling book, "Natural Products from Plants", 2 nd edition (2006). Ara Kirakosyan is a full member of the Phytochemical Society of ...

  14. Plant Biotechnology

    Plant Tissue Culture-Based Industries. Saurabh Bhatia, Kiran Sharma, in Modern Applications of Plant Biotechnology in Pharmaceutical Sciences, 2015. Abstract. Plant biotechnology has been internationally acknowledged as one of the significant tools for direct application in the field of agricultural. Plant tissue culture and its developments have a remarkable affect on the agricultural sector ...

  15. Plant Biotechnology—An Indispensable Tool for Crop Improvement

    These aspects have been addressed in the 17 papers published in this Special Issue titled 'Plant Biotechnology and Crop Improvement'. There have been four general review papers covering different biotechnologies and thirteen original research contributions focusing on different crop groups, including tropical and temperate cereal, legume ...

  16. Applications of Plant Biotechnology and Tissue Culture ...

    Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications. ... New plant biotechnology ...

  17. (PDF) The future of plant biotechnology in a globalized and

    endangered world. Marc Van Montagu. 1 VIB-International Plant Biotechnology Outreach, Ghent University, Ghent, Belgium. Abstract. This paper draws on the importance of science-based agriculture in ...

  18. Social and biological innovations are essential to deliver

    With the recent launch of the new section of New Phytologist on 'transformative plant biotechnology (Halpin et al., 2023)', it is an apt time to consider what is needed for transformative leaps that can enhance forest tree biotechnology.By biotechnology, we mean modification by recombinant DNA (rDNA) in the form of gene-editing (i.e. targeted modification of native DNA) or gene insertion ...

  19. From microbes to microbiomes: Applications for plant health and

    Plant-microbe interaction research has had a transformative trajectory, from individual microbial isolate studies to comprehensive analyses of plant microbiomes within the broader phytobiome framework. Acknowledging the indispensable role of plant microbiomes in shaping plant health, agriculture, an …

  20. A second chance for plant biotechnology in Europe

    A second chance for plant biotechnology in Europe. Nature Biotechnology 42 , 687-689 ( 2024) Cite this article. Europe tilts towards gene-edited plants, but progress could be derailed over who ...

  21. Overview of Applications of Plant Biotechnology

    The purpose of this chapter is to provide a brief overview of the spectrum of applications of plant biotechnology that are in current use or are under development in research labs around the world. Plant biotechnology, in the sense of the application of recombinant DNA techniques to crop improvement, or the production of valuable molecules in ...

  22. Assessing the evolution of research topics in a biological field using

    Our ability to understand the progress of science through the evolution of research topics is limited by the need for specialist knowledge and the exponential growth of the literature. This study uses artificial intelligence and machine learning approaches to demonstrate how a biological field (plant science) has evolved, how the model systems have changed, and how countries differ in terms of ...

  23. Current Research in Biotechnology

    Current Research in Biotechnology (CRBIOT) is a new primary research, gold open access journal from Elsevier.CRBIOT publishes original papers, reviews, and short communications (including viewpoints and perspectives) resulting from research in biotechnology and biotech-associated disciplines. Current Research in Biotechnology is a peer-reviewed gold open access (OA) journal and upon acceptance ...

  24. plant biotechnology Latest Research Papers

    This research aimed to determine the effect of 2,4-D and kinetin in inducing callus on red binahong leaves. The research was conducted at the Laboratory of Plant Biotechnology, Faculty of Agriculture, the University of Riau from November 2019 to March 2020. The experiment used a randomized block design with two factors, namely four levels of 2 ...

  25. Biotechnology Advances

    In addition, we introduce the applications of type V effectors for gene editing in animals and plants, including the development of base editors, tools for regulating gene expression, methods for gene targeting, and biosensors. We emphasize the prospects for development and application of CRISPR/Cas12 effectors with the goal of better utilizing ...

  26. Isolation, anticancer potency, and camptothecin—producing ability of

    An estimated 10 million cancer-related deaths were reported in 2020, 8 while patients had to face a significant financial burden ranging from $1500 to $22,000 per year for cancer treatments. 9 The cost is believed to continue rising due to the sheared budget for cancer drugs, new research, and clinical trials. 9 At present, approximately one ton of raw CPT material is required for the global ...