Abstract
Introduction
Herbal supplements are used worldwide for reasons such as treating numerous ailments, performance enhancement, or for health maintenance. The estimated sale of herbal dietary supplements in the United States has reached $7.452 billion in 2016 (an increase of 7.7% from the previous year).
1
There has been a consistent increase in herbal sales (over that last 13 years
1
), and they are oftentimes perceived as safer alternatives to conventional drug interventions by consumers.
2
Although used by a large portion of the population, there are no guidelines in the United States for premarket approval of safety and efficacy for herbals, such as those provided by the US Food and Drug Administration (FDA) for drugs.
3
Since most herbal intake generally falls outside the supervision of healthcare professionals, concerns of safety and efficacy are not routinely documented. An estimate of adverse events associated with dietary supplements (which includes herbals) suggests that only 1 in 100 are reported to FDA.
4
In 2006, a bill (S.3546 “Dietary Supplement and Nonprescription Drug Consumer Act”; colloquially referred to as the “AER bill”) was signed into law (Public Law 109-462) in the United States Senate of the 109th Congress requiring manufacturers of dietary supplements (which includes botanical products) and over-the-counter (OTC) products to submit serious adverse event reports (AERS) to FDA.
5
Preliminary evidence among military personnel indicates that 22% of dietary supplement users have reported one or more adverse events.
6
Dietary supplements that include herbals are sold on military bases in many countries.
7
A committee on dietary supplement use by military personnel made recommendations about the critical requirement for identification of potentially harmful effects.
8
The perception that herbal supplements (“botanicals”) are safer alternatives may not always be true (eg, St. John’s Wort [
Identifying adverse drug reactions (ADR) have focused on data from Spontaneous Reporting Systems (SRS),14-16 biomedical literature,17-19 clinical reports (including Electronic Health Records),20-22 social media,23,24 and PubChem BioAssay. 25 Resources such as SIDER (Side Effect Resource 26 ) can provide comprehensive lists of drug-ADR information. In addition, the chemical, biological, and phenotypic properties of drugs have been used for larger-scale prediction of ADRs. 27 Publicly accessible SRS resources, such as the FDA Adverse Event Reporting System (FAERS) 28 and the Canada Vigilance Adverse Reaction (CVAR), 29 provide a means for monitoring safety of drug interventions in North America.2,30 However, the current data organization within these sources do not allow for systematic identification of AERs specific to herbals (eg, taxonomically). Lack of the ability for robust acquisition of relevant data therefore hinders subsequent design and execution of data-driven analysis of associated health outcomes.
The compilation and detection of potential adverse signals for herbal ADRs from SRS is challenging. This is in part due to the variability of their inclusion in public SRSs (since they are required to be reported only for serious adverse events that result in hospitalization, significant disability, or death 5 ). Herbal interventions can be reported in SRS as a range of brand names, vernacular names, and scientific name synonyms, often embedded within free text without association with a standardized code or nomenclature. Taking full advantage of SRS data therefore requires data cleaning, mapping, and normalization to controlled terminologies. For pharmaceutical drugs, several efforts have been made in standardization and curation. Natural language processing tools such as MedEx 31 or other drug mapping tools30,32 have been used to normalize reported drug names to RxNorm. 33 Community efforts such as the Observational Health Data Sciences and Informatics (OHDSI) provide a platform for researchers to gain access to standardized data for generating insights through analytics. 34 Resources such as the Adverse Event Open Learning through Universal Standardization (AEOLUS) facilitate interoperability and ease of analysis by making available FAERS data in a standardized form. 30 The biomedical domain is equipped with standard vocabularies from several sources that allow such standardization. The existing gap between dietary supplement terms and standard terminologies limits the ability to retrieve associated health outcome data. 35 To date, there has been limited direct use of such approaches for studying events associated with herbal interventions from public SRS data. Amid the acknowledged importance of identifying herbal-specific adverse events, to date there has been limited use of informatics approaches to systematically identify herbal-specific AERs and consequential signals that can be used to motivate subsequent safety studies.
This study explored the feasibility of extracting herbal adverse events from two SRSs, specifically, FAERS and CVAR. This resource was integrated with a mapping and alignment approach accommodating spelling errors and approximate matches to identify records related to herbals. The results demonstrate that there is indeed a significant amount of herb-specific information embedded within SRSs that can be used to analyze associated botanical adverse events.
Materials and Methods
The main goal of this study was to develop an approach to facilitate the acquisition of herb-related adverse event information from SRS. A thesaurus of plant names was created by combining entries from three major taxonomic sources that further grouped scientific names, synonyms, and vernaculars. Using this as a nomenclature scaffold, a mapping approach was designed to accommodate detection of variants and spelling errors. The intervention name strings from SRS were processed to identify and resolve mappings to standardized taxonomic plant names. The standardized data were used to calculate statistically significant safety signals at two different levels of granularity, leveraging the hierarchical structure of the Medical Dictionary for Regulatory Activities (MedDRA). A general overview of the approach is graphically depicted in Figure 1.

Overview of the approach followed in this study.
Data sources used for the study
Data from two SRS were considered for this study: (1) FAERS and (2) CVAR database.
FAERS
CVAR
Identification of records associated with herbals
The pipeline for identification of herb-associated AERS involved two steps: (1) creating an herb-specific thesaurus and (2) mapping and resolution of intervention name strings to herb names.
Plant species thesaurus (uBiota)
A unified compendium of plant species names, synonyms, and vernaculars (“uBiota”) was created from the union of three sources. The data from the sources were organized by accepted scientific names to which unique identifiers were assigned keeping track of source and source identifiers. Synonyms and vernacular names were also isolated, relations labeled, and organized based on unique identifiers. To maintain the tractability and feasibility, uBiota represents the seven canonical Linnaean taxonomic groups: Kingdom, Phylum, Class, Order, Family, Genus, and Species. A short description of the three sources chosen for this study is provided:
Catalog of Life (COL): This is the most comprehensive and authoritative global index of species compiled from diverse sources around the world. It provides critical species information on synonymy, higher taxa, and distribution of across global regions.
Integrated Taxonomic Information System (ITIS): As a result of recognition of the importance of organization and access to standardized nomenclature by the White House Subcommittee on Biodiversity and Ecosystem Dynamics, this resource was designed. The database ensures high quality with valid classifications, revisions, and additions of newly described species.
NCBI Taxonomy: This resource provides manually curated classification and nomenclature for all the organisms in the public sequence databases.
Mapping of intervention name strings
The mapping and normalization involved indexing the intervention name string or active ingredient name string dataset from FAERS and CVAR using Apache Solr. The retrieval step leveraged a fuzzy-matching feature provided by Solr. The retrieved strings were subjected to further verification using Smith-Waterman string alignment algorithm. Based on the alignment, matches were determined using the following constraints: (1) The coverage of aligned string matched the length of taxonomic name, (2) the difference between herb name length and alignment score is less than half of herb name length, (3) the first and last characters of herb name matched, and (4) if the length of herb name was less than or equal to four characters, then, perfect matches were considered. The identified herb names were then resolved to their respective scientific names. A more detailed description, evaluation, and a comparison of this mapping approach with other existing tools is provided in our previously published study. 36
Evaluation of intervention name mapping
The evaluation of mapping of intervention name strings was carried out on 1000 uncorrected name strings from SALVIAS database (provided and used previously by Boyle et al 37 ). Assessment was based on standard evaluation metrics of Precision, Recall, and F-score.
Detection of adverse event signals
Signal Disproportionality Analysis (SDA) was conducted at the level of Preferred Term (PT) and System Organ Class (SOC) from MedDRA hierarchy. Two metrics for signal detection were used: (1) the Proportional Reporting Ratio (PRR) and (2) Reporting Odds Ratio (ROR). The significance of the signal was determined as described by Gavali et al: 38 (1) If the number of co-occurrences were three or more, (2) the PRR or ROR value was greater than or equal to three, and (3) the lower bound of 95% confidence interval (CI) greater than or equal to one. For the purpose of reporting, a filter was set to retain only the most reliable adverse events associations. This filter accommodated the confounding factor of herb name variations, allowed only those associations where the scientific name was present at least once.
Herb-drug adverse event similarity analysis
The significant associations identified using SDAs from the two data sources (FAERS and CVAR) were merged. To compare herbal adverse event profiles with those of drugs, this dataset was combined with significant associations from AEOLUS. 30 AEOLUS is a publicly available standardized resource that provides a pre-processed and clean version of FAERS data along with pre-computed statistical measures of disproportionality (PRR and ROR). The combined dataset was used for hierarchically grouping of herbs/drugs using the maximum parsimony phylogenetic technique. The input matrix for the phylogenetic analysis was constructed such that adverse event terms were as represented as characters and their respective presence (“1”) or absence (“0”) as character states. These adverse event profile data were then analyzed using the maximum parsimony phylogenetic inference tool, tree analysis using new technology (TNT). 39 The output tree was evaluated using the drug pharmacodynamic pathway entries listed in PharmGKB. 40 For each selected pathway, related pathways were gathered and compared against drug group subsets from the generated tree. The evaluation was based on whether drugs belonging to the related pathways were grouped. A web tool called PhyloZoom was used to enable interactive exploring of the tree (available at https://bcbi.brown.edu/phylozoom/). Selected evaluation based on comparison with results from Mizutani et al 41 is also presented in Figure 3A to C.
Results
Summary of herb thesaurus and mapping evaluation
The compilation of plant species names and their taxonomy from COL, ITIS, and NCBI taxonomy resulted in 390,638 scientific names, 660,645 synonyms, and 82,672 vernacular names. The evaluation of the pipeline on SALVIAS dataset resulted in Precision, Recall, and
Summary of herb-related reports
The identified herb-related reports from FAERS accounted for 2.51% of reports. Herb-related reports from CVAR accounted for 6.83% of total reports (Table 1). In general, it was observed that AERs (both herb and non-herb) were higher for females. However, the proportions (group specific herb-AERs out of total herb-AERs) of herb-AERs were similar between male and female groups for both FAERS and CVAR dataset. The proportion of non-herb AERs was similar or higher in all age groups below 65. For the age-group 65 and higher, the proportion of herb-AERs was significantly higher when compared to non-herb AERs (
AER counts from CVAR and FAERS.
AER: adverse event report; CVAR: Canada Vigilance Adverse Reaction; FAERS: FDA’s Adverse Event Reporting System. Timeframe: FAERS:2004-2016 and CVAR: 1965-2017.

AER count comparison among different age groups (Asterisks indicate significant difference). Timeframe: CVAR: 1965-2017 and FAERS:2004-2016.
Herb-associated adverse event signals
The identified adverse event associations from FAERS dataset accounted for 109 and 130 plant species associated with PTs and SOCs, respectively. Similarly, from CVAR dataset comprised of 298 and 249 plant species associated with PTs and SOCs, respectively. The adverse event counts from FAERS and CVAR are provided in Table 2. Table 2 also provides PRR values and CIs of those associations that were reported at least once as “PS/SS” (FAERS dataset) or “Suspect” (CVAR dataset). The full set of identified associations from both the data sources along with their respective SDA scores are provided as supplemental tables. A small subset of identified adverse events that were found to be in common between FAERS and CVAR and significant with at least 10 or more reports is provided in Tables 3 and 4.
Counts of adverse events and System Organ Class (SOC) that were found significant based on both PRR and ROR values and their respective Confidence Intervals.
CVAR: Canada Vigilance Adverse Reaction; FAERS: FDA’s Adverse Event Reporting System; PRR: Proportional Reporting Ratio; ROR: Reporting Odds Ratio; SOC: System Organ Class. Timeframe: FAERS:2004-2016 and CVAR: 1965-2017.
Common System Organ Class (SOC) a from CVAR and FAERS.
CI: confidence interval; CVAR: Canada Vigilance Adverse Reaction; FAERS: FDA’s Adverse Event Reporting System; PRR: Proportional Reporting Ratio; SOC: System Organ Class. Timeframe: FAERS:2004-2016 and CVAR: 1965-2017.
Not all data represent 100% adverse events; PRR scores are provided alongwith confidence intervals. ROR scores are available in Supplementary Tables.
Common adverse event associations of herbs found in FAERS and CVAR.
CVAR: Canada Vigilance Adverse Reaction; FAERS: FDA’s Adverse Event Reporting System; PRR: Proportional Reporting Ratio; ROR: Reporting Odds Ratio; Timeframe: FAERS: 2004-2016 and CVAR: 1965-2017.
PRR scores are provided along with confidence intervals from FAERS and CVAR in order. ROR scores are available in Supplementary Tables.
Note: Not all data represent 100% adverse events.
Herb-drug adverse event similarity
In order to determine the severity and extent of adverse events associated with herbs as well as their respective potential to cause harm a comparison with prescription drugs was carried out. The evaluation of hierarchical relationships among drugs/herbs based on their respective adverse event profiles using the maximum parsimony criterion revealed characteristically distinguishable groupings. Selected examples are shown in Figure 3A to C as evaluation of validity of the approach. Drugs with similar mechanisms of action and indications were grouped together. Warfarin and Acenocoumarol, anticoagulants targeting Vitamin K epoxide reductase, were discovered in the same cluster. Nonsteroidal anti-inflammatory drugs Celecoxib, Rofecoxib, and Valdecoxib (targeting COX-1 and COX-2) were also grouped together as having similar adverse events. In addition, Aspirin and Clopidogrel, although having different targets but similar indications of use and adverse reactions, 41 were paired together. Similarly, based on related ADRs Aripiprazole, Ziprasidone, Risperidone, Olanzapine, and Quetiapine were grouped together, as well as validated by previous work. 41 Drugs indicated for attention deficit hyperactivity disorder and narcolepsy showed clear grouping with related mechanism of increasing levels of neurotransmitters (dopamine and norepinephrine). Results from more thorough investigations of grouping of drugs are presented in the Supplementary data. The closest neighbors (drug concepts from AEOLUS) associated with herbal interventions identified in this study are listed in Table 5, where distance is defined by the parsimony criterion. Five out of the 25 identified closest associations exhibit same herb concepts (from this study and from AEOLUS) paired together further validating the efficacy of approach used in grouping herbs/drugs based on similarity of adverse event profiles. The associations in Table 5 show several herbs that have adverse event profiles similar to prescription drugs.

Evaluation of hierarchical grouping of herbs/drugs using the maximum parsimony phylogenetic technique. Groupings of drugs (e.g., A, B, and C represent three different groupings) with known underlying mechanisms were manually examined to validate the efficacy of the approach used (evaluation based on comparison with results from Mizutani et al).
Herb-Drug pairs placed closest to each other by hierarchical grouping based on tree analysis.
Discussion
Although widely used, the issues of efficacy and safety of botanicals still remain a concern. There have been studies focused on evaluating such aspects of botanicals, but the results are isolated and embedded within large amounts of biomedical literature. 42 Available studies of ADRs are represented in the form of systematic reviews. 2 In addition to these challenges, most cases of herb-related ADRs remain under-reported in post-marketing SRS.
Apart from these obstacles, data-driven studies to gain insights by analyzing AERs from multiple data sources are notably challenged by the lack of being organized according to data organizing standards. 35 Resolving the mentions of supplement names and ingredients to their respective plant species names is a primary requirement for further identification and analysis of reports. The interventions can be mentioned by vernacular names, synonyms, scientific names, or product names. There is a clear lack of standardization of intervention names that is an important criterion for facilitating the identification of herb specific ADRs. Standardization and resolution of herb mentions as well as adverse event terminology across databases such as LiverTox 43 and Dietary Supplement Label Database 44 may possibly enhance connectivity providing avenues to generate data-driven insights. The associations identified here may be a great resource to enrich the content of existing toxicity databases but will require manual assessment. Leads from analysis of data across several sources has the potential to design more informative labels for available herbal supplements in the market thereby helping consumers make informed choices.
The goal of this study was to test the feasibility of mapping and extracting herbal ADRs from two of the current SRSs, FAERS, and CVAR. Resolving herb species name was a major issue. After a preliminary examination of the intervention names listed in these two databases, it was identified that these data sources required an approach for effective mapping and resolution of herb names that could be scaled to accommodate the characteristically larger size. Ambiguity and synonymy of taxonomic names poses a major issue in the study of biodiversity. There is still a pressing need for automated systems that recognize taxonomic names overcoming the aforementioned hurdles. 45 Lack of such resources that provide authoritative taxonomic solutions hinder the process of integration, and thereby interoperability of knowledge, between biodiversity and biomedical data sources. 46
AERs were identified for herbal interventions from both FAERS and CVAR. Based on a comparison of demographic variables (sex and age) for herb versus drug-related reports, a difference in the proportion of associated AERs was observed. This is consistent with previous estimates that older adults are more likely to use supplements. 3 The AER data extracted from FAERS and CVAR was used for detection of adverse event signals at the level of associated adverse events and SOC. ADRs at the SOC level were calculated as they are known to improve the detectability of signals. 25 Significant associations from both FAERS and CVAR were identified (Supplementary Data). A list of significant ADRs that were common among FAERS and CVAR is provided in Tables 3 and 4. Some of the herbs identified previously in an assessment of PubMed displayed significant adverse effect associations. For example, germander, black cohosh, kava kava, and green tea with liver toxicity. 10 Kava kava was also associated with renal and urinary disorders (SOC). Similarly, St John’s Wort shows kidney toxicity 11 as identified from FAERS dataset. Some of the other high scoring herb-SOC associations common among the two datasets were as follows: (1) Timothy-grass, which is a known allergen but it’s extract is used to help the body develop immunity; (2) St John’s Wort, marijuana, hops, and valerian that are known for their psychotropic properties were associated with psychiatric disorders; and (3) Dandelion and milk thistle were associated with hepatobiliary disorders. These herbs are commonly known to improve liver function. Having them associated with liver toxicity may be a result of discrepancy in reporting; (4) Foxglove and guarana showed adverse effects within the SOC category of cardiac disorders. Although, foxglove is used for heart related issues, the toxicity may arise as a response to higher dose. Guarana, a common ingredient in energy drinks has been known to display cardiovascular adverse effects when taken in larger quantity. 47 In addition to the signals identified in this study, there is a possibility for identification of more ADRs from reports that may appear non-significant as a result of the underreported nature of herbal AERs.
A unified compendium of taxonomic names, synonyms, and vernacular names was created and used along with a mapping strategy for quick and accurate mapping allowing for identification of misspelled names. 36 Such unification of taxonomic names from three sources provides a more comprehensive coverage for resolving names that would be missed out otherwise leading to incomplete information recovery. This resource is available for use from https://bcbi.brown.edu/solrplant/. At present, this tool doesn’t resolve authority from scientific names and will be added in the future.
The challenges in normalizing herbal ingredient names and the underreported nature of herbal ADR data are major hurdles in identifying adverse reaction associated with herbals. However, additional challenges may exist in mining SRS data as a result of evolving PTs in MedDRA versions that are used for coding ADRs. 32 MedDRA versions are updated twice annually and some codes may become obsolete in more recent versions. Such changes can result in missing reports during the identification of AERs, which in turn may lead to missing adverse event signals. To overcome this challenge, future work will involve mapping of listed adverse event PTs to a standardized clinical terminology resource such as Systematized Nomenclature of Medicine—Clinical Terms (SNOMED-CT) 48 or Adverse Drug Reaction Classification System (ADReCS). 49 A cursory survey of the data also revealed that the herbal supplements listed in the SRS are taken in combination with one or more other drugs or supplements. This suggests that the observed ADRs may also be a result of interactions with other drugs or supplements in addition to the direct effects. There is also a possible confounding factor that sex, age, and underlying medical conditions may be the cause of supplement intake related ADR. Although important, this area has received very little attention as reflected from the significant shortage of scientific literature and warrants in-depth investigation.
To gain a comparative view of ADR profiles of herbs and drugs, this study analyzed the grouping and categorization using a maximum parsimony tree-building framework. The use of a character-based approach enables the identification of discrete characters shared among entities.
50
Inspection of the resulting tree for drugs with known indications and mechanism of action revealed meaningful groupings. Drugs with common underlying mechanisms may result in similar ADRs. Such associations have been studied using the drug indication information and their associated ADRs. The groupings were evaluated based on this premise, and the results were compared to related pathways from PharmGKB. Selected examples are provided in Figure 3A to C that were used as evaluation by comparing the results discussed by Mizutani et al
41
Representative closest drug neighbors associated with herbs were also extracted (presented in Table 5). Herbs placed in proximity to pharmaceutical drugs have similar adverse event profile which may have remained unnoticed. For example, the hierarchical clustering shows comparable adverse event profiles as well as potential interaction
51
for
Conclusion
Evidence from observational and biomedical data sources have the potential to provide significant insight into efficacy and safety issues related to currently marketed dietary and herbal supplements. This study focused specifically on identifying herbals from SRSs using a combination of informatics approaches. The promising results suggest that systematic approaches involving appropriate mapping and normalization of herb names may facilitate interoperability among traditionally disconnected biodiversity, biomedical, and observational data sources.
Supplemental Material
SupplementalTables-HerbalADR_xyz35739a1cd2138 – Supplemental material for Identifying Herbal Adverse Events From Spontaneous Reporting Systems Using Taxonomic Name Resolution Approach
Supplemental material, SupplementalTables-HerbalADR_xyz35739a1cd2138 for Identifying Herbal Adverse Events From Spontaneous Reporting Systems Using Taxonomic Name Resolution Approach by Vivekanand Sharma, Luiz Fernando Fracassi Gelin and Indra Neil Sarkar in Bioinformatics and Biology Insights
Footnotes
Funding:
Declaration of conflicting interests:
Authors’ Note
Author Contributions
Supplemental Material
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
