The biggest data quality concerns cited by users of primary biodiversity data in a recent survey [24] were georeference quality and taxonomic qualitywe found that studies addressed these issues in 24% (spatial error in georeferences), 39% (taxonomic nomenclature), and 19% (species identifications) of published papers from our dataset (Table 6). Taxonomic nomenclature was the most commonly checked data quality issue for all other top uses, ranging from 40% of papers (conservation and data quality uses) to 56% (taxonomy). Non-experts can check for spatial outliers or incorrect georeferences using standardized methods and online georeferencing tools [37,117]. Such efforts unlock previously inaccessible data and expand their availability to researchers around the world. When data are available, researchers must check for common errors and biases known to occur in opportunistic datasets that are often assembled over long time periods (e.g. Once a country attains the capacity to manage its genetic resources, it will automatically enable it to produce novel products from its own biodiversity. Among environmental variables, climate data are perhaps the most readily available, relevant for the distribution of organisms on a global scale, and provide essential information for determining impacts of climate change on distribution [111,112]. The prevalence of plants in studies that use online biodiversity databases may be due to a strong history of plant diversity work in Europe in particular, and the relative ease with which herbarium records can be digitized by scanning herbarium sheets. The biocollections community must therefore continue to promote digitization efforts, which in part requires demonstrating compelling applications of the data. Australia, fish) of each database. Data types fall within one of four categories, including 1.) We found that 34 percent of papers (n = 170) had insufficient citation information for one or more databases; this meant that there was either no URL provided to access the database, or the URL was invalid. This cookie is set by GDPR Cookie Consent plugin. The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. However, this may be an effect of small sample sizes. [75]). We downloaded the first 500 records (or all if there were fewer than 500 results), which are presumably the most relevant search returns, for each search term into a Zotero reference management database [57]. environmental data (e.g. However, most records in GBIF, for example, still do not have uncertainty radii; in a recent assessment of GBIF records for Odonata, Ephemeroptera, Plecoptera, and Trichoptera from the U.S.A., we found that the percentage of records with uncertainty radii associated with them was only 736% for these aquatic insect groups (as of April 2017). The top research uses for online species occurrence databasesfrom our dataset of 501 relevant paperswere studies on species distribution (n = 175), diversity/population studies that usually assess species richness (n = 122), dataset description (i.e. Depending on data needs, one may also use existing uncertainty radii associated with georeferenced coordinates to select appropriate records for a study. Conceptualization, https://doi.org/10.1371/journal.pone.0215794.s003. Many digitization efforts for insects in particular have prioritized transcribing and publishing specimen label information and have not yet begun or completed georeferencing. Up to this point, researchers have most often cited GBIF in this case (usually in-text, not in the reference section) and neglect to credit original data sources [77]. We found only four studies since 2010 that address hundreds of thousands of taxa, and most papers address numbers of taxa in the single or double digits. While distribution studies were still the most common application across groups, significantly smaller percentages of plant (33%) and invertebrate (41%) studies dealt with species distribution. Our data show that climate is indeed the most common environmental variable used in association with occurrence records (Fig 6; also documented in [56]). Field Museum of Natural History, Chicago, IL, United States of America, Roles We also determine uses with the highest number of citations, how online occurrence data are linked to other data types, and if/how data quality is addressed. But opting out of some of these cookies may affect your browsing experience. Studies were not relevant if they exclusively used data that are not available online or from systematic surveys, government monitoring programs, or field data collected explicitly for the study in question. Conceptualization, Large species often receive more research and conservation funding, and very few conservation assessments exist for invertebrate taxa; most insect species are classified as data deficient (e.g. Other groups may lack online sources or have sources that are significantly out of date [123]. While efforts towards workflow optimization will undoubtedly improve efficiency in certain areas [12,1619], it is critical that the biocollections community prioritize efforts; we must advocate for continued digitization through production of innovative data products, tools, interdisciplinary collaborations, and by highlighting research that requires primary biodiversity data [3,2022]. The total number of invertebrate studies was equivalent to the total number of vertebrate studies (Fig 3). https://doi.org/10.1371/journal.pone.0215794.g006. Geographic errors (or missing information) may be more readily corrected and associated with appropriate uncertainty estimates using standardized methods [31,37] and online tools (i.e. When the foetus is growing inside the uterus it needs nutrients. Before adding feedback, consider if it can be asked as a question instead, and if so then use the Question tab. https://doi.org/10.1371/journal.pone.0215794.t001. Visualization, We characterize a variety of ways in which researchers are using species occurrence records by assessing the prevalence of individual tags corresponding to topics of interest. Opportunistic species occurrence records may therefore be best used to identify data gaps and promising areas for resurveys or standardized long-term monitoring studies when dealing with species decline [48]. The increasing application of biotechnology to biodiversity (including genetic engineering) has greatly enhanced the value and availability of bio resources and products for mankind. GEOLocate, www.geo-locate.org). Conceptualization, These general taxonomic categories also correspond to common divisions for the organization of natural history collections and associated databases. No, Is the Subject Area "Invertebrates" applicable to this article? We searched for information regarding other data types used within the methods section of each paper. In addition, we determine prevalence of these tags over time to assess positive or negative trends. We argue that the scale of data that needs processing, along with issues of often sparse data, data obsolescence [109], and data of uncertain quality, make large-scale analyses challenging for anyone but a small group of data sciences-savvy end users. https://doi.org/10.1371/journal.pone.0215794.t007. Our goals here were to characterize the most commonly studied taxonomic groups, the number of taxa addressed, and to determine uses associated with the three most common organismal groupings (plants, vertebrates, and invertebrates). We attach a documentation of search terms as Supplemental Material. Table 5 summarizes top data linkages for different key uses. We subsequently split up certain aggregated topics and revised and added use categories based on important subject areas that arose during the tagging process. Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features. More recently, programs to automate and document data cleaning workflows have been developed, such as Kurator, a Kepler data curation package [38], but are not yet widely used due to the highly technical user interface, and have uncertain future support. While an in-depth assessment of specific taxa is beyond the scope of the current paper, we did tag the number of taxa addressed in each paper, if that number was apparent. The cookie is used to store the user consent for the cookies in the category "Other. We found that the most common uses of online biodiversity databases have been to estimate species distribution and richness, to outline data compilation and publication, and to assist in developing species checklists or describing new species. The terms included: species occurrence database (8,800 results), natural history collection database (634 results), herbarium database (16,500 results), biodiversity database (3,350 results), primary biodiversity data database (483 results), museum collection database (4,480 results), digital accessible information database (10 results), and digital accessible knowledge database (52 results)note that quotations are used as part of the search terms where specific phrases are needed in whole. Answer Now and help others. These include data needed for taxonomic/phylogenetic studies, namely those from natural history specimens, genetic data, and phylogenetic data. https://doi.org/10.1371/journal.pone.0215794, Editor: Daniel de Paiva Silva, Instituto Federal de Educacao Ciencia e Tecnologia Goiano - Campus Urutai, BRAZIL, Received: April 8, 2019; Accepted: July 13, 2019; Published: September 11, 2019. In addition to errors, some studies address specific biases known to be a problem in opportunistic datasets, including taxonomic, spatial, temporal, and environmental biases. We need better methods to document confidence in data at a record and dataset level [23]. https://doi.org/10.1371/journal.pone.0215794.g001, https://doi.org/10.1371/journal.pone.0215794.g002. Validation, Data uses requiring large numbers of dispersed records, such as species distribution models and biodiversity studies, will be the most common applications of online databases. Adverse impacts on biodiversity through the introduction of GMOS may also result from disturbance of the dynamic population equilibrium of ecosystems. Taxonomy-related uses of online species occurrence databases sometimes involve describing new species, but more commonly involve compilation of regional species checklists. Of the 6.2 million catalogued molluscan lots in U.S. and Canadian collections, 4.5 million have undergone some form of data digitization. However, we still have not reached the major goal of having online taxonomic data sources that are consistently updated by taxonomic experts for all species, although community-supported resources such as FishBase [65], WoRMS [120], and the latters affiliated databases such as MilliBase [121], and MolluscaBase [122] are approaching that goal for many taxonomic groups. Methodology, Validation, We propose that neutral theory can serve as a valuable first-order approximation to reduce complexity and by design account for drift and stochasticity. Moreover, some models based on neutral theory subdivide space into local community and metacommunity, which reflects concepts commonly used in conservation science. attributes of occurrence information, 2.) Three of the top five data types linked to online occurrence records included other types of occurrence dataliterature-based occurrence data, surveys, and specimen data from natural history collections (n = 189, n = 145, and n = 135 papers used these data types, respectively). Neutral approaches have been used in conservation to generate realistic species-abundance distributions and species-area relationships, provide a standard against which to compare species loss, prioritize species protection, model biological invasions, and support protected area design. Environmental data used in conjunction with online biodiversity records are often applied in studies of species distribution. TOS4. The prevalence of inaccessible databases and incomplete database citations indicates that many biodiversity researchers lack the resources to manage and preserve data for the long term and/or are unaware of best practices. Both taxonomy and data papers used collection data most frequently in addition to data already available in online databases. Some exceptions were that a relatively large number of survey respondents claimed that they use biodiversity data for ecology/evolution studies, natural resources management, life history/phenology studies, and education/outreach, but relatively few published studies used occurrence data for these purposes in our dataset. barcoding, citizen science, species interactions) that can be linked to species occurrence records will increase. We also characterize studies that exclude certain inappropriate records, remove records with high georeferencing uncertainty, remove outliers, and those that address collection effortsee S1 Table. In some cases, researchers appropriately cited a database that is no longer in operation or has subsequently been integrated into an aggregate system. No, Is the Subject Area "Conservation science" applicable to this article? Indirect impacts of biotechnology are immense and of very great relevance to people in developing countries who rely directly on biodiversity for their sustenance. Yes The best answers are voted up and rise to the top. We then determine the average number of data link tags associated with the six top uses, and the most common data type associated with each of these top uses. Share Your PDF File This may be one limiting factor holding back studies that utilize all data currently held within biodiversity databases and studies that address very large numbers of taxa within clades. Continued efforts in data preservation and promoting best practices in data citation are essential for advancing scientific reproducibility, sustaining data resources, and encouraging publication of high-quality biodiversity data. Share Your PPT File. Elevation, land use, and vegetation data are also among the most readily available environmental data types, and are often relevant for evaluating species distribution at smaller spatial scales [113]. Another possible direct impact of GMOS raised for conferring viral resistance is the likely emergence of new viruses with new biological characteristics through recombination.