The Plant List — A working list for all plant species

⚠ Version 1 of The Plant List has been superseded.

You should refer instead to the current version of The Plant List.

About The Plant List

The Plant List is a working list of all known plant species. Version 1, released in December 2010, aims to be comprehensive for species of Vascular plant (flowering plants, conifers, ferns and their allies) and of Bryophytes (mosses and liverworts). It does not include algae or fungi. Version 1 contains 1,244,871 scientific plant names of which 298,900 are accepted species names. It includes no vernacular or common plant names.

Collaboration between the Royal Botanic Gardens, Kew and Missouri Botanical Garden enabled the creation of The Plant List by combining multiple checklist datasets held by these institutions and other collaborators.

The Plant List provides the Accepted Latin name for most species, with links to all Synonyms by which that species has been known. It also includes Unresolved names for which the contributing data sources did not contain sufficient evidence to decide whether they were Accepted or Synonyms.

A description of the content, creation and use of The Plant List follows.

Contents

Overview

The Plant List is a widely accessible working list of known plant species and has been developed and disseminated as a direct response to the Global Strategy for Plant Conservation, adopted in 2002 by the 193 governments who are Parties to the Convention on Biological Diversity. The GSPC was designed as a framework for action to halt the loss of plant diversity. Target 1 of the Strategy called for the completion by 2010 of a widely accessible working list of all known plant species, as a step towards a complete world Flora. Released in December 2010, Version 1 of The Plant List aims to be comprehensive for species of Vascular plant (flowering plants, conifers, ferns and their allies) and of Bryophytes (mosses and liverworts). This is consistent with the initial focus of the GSPC.

The Plant List is not perfect and represents work in progress. Our aim was to produce a ‘best effort’ list by 2010 to demonstrate progress and stimulate further work.

The Plant List was produced as a collaborative venture coordinated by the Royal Botanic Gardens, Kew and the Missouri Botanical Garden and involving collaborators worldwide.

Data records from numerous existing global checklist databases (derived from primary taxonomic publications) were brought together and combined with regional and national checklist data and other records from Tropicos. These resources were then complemented by the inclusion of additional names found in IPNI (for Angiosperms, Gymnosperms and Fern & Fern Allies). The Plant List may omit some names and may include some duplicate names. Furthermore those names derived from nomenclators may not include any indication of whether they are Accepted names or Synonyms. Our purpose has been to detect inconsistencies between overlapping data sources and resolve them.

The Plant List does not seek to duplicate the efforts of collaborators that have contributed data to the creation of The Plant List. This version will not be edited but feedback will be forwarded to our collaborators so that they can extend and improve their original data. (see Enhancing The Plant List and Recreating The Plant List). Feedback will arise from our own analysis of the data (and its comparison with other resources) and from users of The Plant List (see How to Submit Feedback).

In the future we hope to

  1. include improved and extended versions of the data sets included in this version of The Plant List
  2. to include other data sets which we were unable to include in Version 1 and
  3. to refine the procedures that were used to create The Plant List: e.g. for locating duplicate name records, for resolving inconsistencies and for detecting conflicting opinions expressed within alternative data sets and then for selecting from among those opinions (see How The Plant List was Created).

We welcome comments on the content of The Plant List, and offers of contributions for inclusion in the next edition.

Target audience

The name of a plant is the key to communicating about it and to finding information about its uses, conservation status, relationships and place within ecosystems. The Plant List provides a tool for resolving or verifying the spelling of plant names and a means to find from a global view the botanically accepted name for a plant and all of its alternative synonyms. Since the ability to plan the sustainable use of plants, essential resources for food, medicines, and ecosystem services depends on effective retrieval of information about plants there is a broad constituency of potential users of The Plant List.

Scope

The Plant List is a working list of known plant species, which aims to be comprehensive in coverage at species level for all names of mosses and liverworts and their allies (Bryophytes) and of Vascular plants which include the flowering plants (Angiosperms), conifers, cycads and their allies (Gymnosperms) and the ferns and their allies including horsetails and club mosses (Pteridophytes).

For each name at species level we aim to provide the author of the name, the original place of publication and an assessment of whether the name is accepted or is a synonym for another name from data resources held by Kew, by Missouri Botanical Garden and by our collaborators. Wherever possible for each name included links are also provided to the original online database record, to its corresponding entry in IPNI and to further sources of information about that plant.

The names of some subspecies or varieties of plant are also included in The Plant List primarily where they are synonyms or accepted names for species names and where they were available from the contributing data sets. The Plant List does not aspire to comprehensive coverage of infraspecific taxa (subspecies, varieties, forms etc.).

What does The Plant List not contain?

Version 1 of The Plant List does not contain:

  • scientific names for fossil plants, algae or fungi;
  • common (or vernacular) names for the plants included;
  • the geographic distribution or any other data about the plants included (though such data may be obtained from the source databases in many instances).

Description of The Plant List data set

Taxonomic coverage

The Plant List includes all known species of the following major plant groups:

  • Angiosperms
  • Gymnosperms
  • Pteridophytes
  • Bryophytes

Genera and species are presented in families which follow the source database(s) except in the case of Angiosperms where we have, wherever possible allocated accepted genera to the families recognised by the Angiosperm Phylogeny Group.

Angiosperms

Angiosperms (Subclass Magnoliidae Novák ex Takht.). Subclass level classification follows Chase, M.W. & Reveal, J.L., 2009. A phylogenetic classification of the land plants to accompany APG III. Botanical Journal of the Linnean Society, 161, 122–127.

Genera and species of Angiosperms are presented in families following family circumscriptions in The Angiosperm Phylogeny Group, 2009. An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG III. Botanical Journal of the Linnean Society, 161, 105–121.

Within the Angiosperms, data quality varies widely reflecting the patchiness of the taxonomic and geographic coverage of the source databases. Coverage is believed to be most comprehensive and consistent for Monocotyledonen, where The Plant List benefited from the existence of comprehensive checklists fully reviewed by experts (see WCSP and GrassBase). For Angiosperms other than Monocotyledons, expert-reviewed lists of similar quality provide comprehensive and consistent coverage for certain major families. Otherwise coverage is more patchy and likely to be less consistent as the name records have been assembled from regional lists and/or other sources not fully reviewed by specialist systematists. Coverage is probably least reliable for areas for which regional lists were not available for incorporation, especially for South East Asia, and for genera with names ending with the letters H–Z (as genera beginning with the letters A–G benefited from earlier compilation effort as part of development of the World Checklist of Selected Plant Families).

Gymnosperms

Gymnosperms — Conifers, Cycads, Ephedras, Gnetum, Ginkgo and Welwitschia (including Subclass Ginkgooidae Engl; Subclass Cycadidae Pax, Subclass Pinidae Cronquist, Takht. & Zimmerm.; Subclass Gnetidae Pax following Chase and Reveal, 2009)

Gymnosperms records derive primarily from WCSP and incorporate the 2001 World Checklist of Conifers by A.Farjon. Coverage is thought to be comprehensive.

Pteridophytes

Pteridophytes — ferns, horsetails and club mosses (including Subclass Equisetidae Warm; Subclass Marattiidae Klinge; Subclass Ophioglossidae Klinge; Subclass Polypodiidae Cronquist, Takht. & Zimmerm; Subclass Psilotidae Reveal; Subclass Lycopodiidae Beketov.)

No peer-reviewed global lists of any family of ferns or other Pteridophyte has been incorporated. The data presented are compiled from regional and nomenclatural sources not reviewed by experts and are therefore likely to be less comprehensive and consistent than those for Angiosperms and Gymnosperms.

Bryophytes

Bryophytes — mosses, liverworts and hornworts (including Subclass Anthocerotidae Engl.; Subclass Bryidae Engl; Subclass Marchantiidae Engl.)

Nomenclatural coverage

The Plant List aims to provide all the scientific names for species for these plant groups. A breakdown of the numbers of plants and names included in each plant group is provided, see Statistics. Coverage of infraspecific taxa (subspecies, varieties, forms etc) is not comprehensive; they are included, primarily where they are synonyms or accepted names for species names.

Coverage and data quality are primarily influenced by the source data sets used to build The Plant List. We are aware of additional data sets which, had they been included, would have enriched and improved the final product. We hope to include such data sets in later releases.

The Status of name records

Each name record included within The Plant List is assigned one of the statuses listed below. The Status of each name is derived primarily from the data source from which that name record comes (see Derivation of Name Status). The decision, for example, that one name is the accepted name of a given plant is based upon a taxonomic opinion recorded within the cited data source. Such decisions were automated using a rules-based approach which, where necessary, selected from among alternative taxonomic opinions expressed, within or between different data sources. For an explanation of how these decisions were reached see How The Plant List was Created.

Accepted Name

This is the name which should be used to refer to the species (or to a subspecies, variety or forma).

For each name with Status of ‘Accepted’ The Plant List aims to provide:

  • the name currently accepted as the one which should be used in preference to refer to this species (or subspecies, variety or forma);
  • the author(s) credited with publishing that name;
  • the place and date of original publication of the name where this was supplied;
  • a reference to the source database supplying this name record that recorded the opinion that it is an accepted name (with, where possible, a link to that record in the source database);
  • other names (synonyms) considered to refer to that species;
  • the IPNI identifier (linking the name record to the International Plant Names Index, a bibliographic resource which will provide full original publication details for this name);
  • an assessment of the Confidence that The Plant List attaches to the name being accepted. (This is an indication of the confidence that the Status of the name is correct).

Synonym

A Synonym is an alternative name which has been used to refer to a species (or to a subspecies, variety or forma) but which The Plant List does not consider to be the currently Accepted name. The decision to assign a Status of Synonym to a name record is based upon a taxonomic opinion recorded in the cited data source (selected using automated rules-based approach; see How The Plant List was Created).

Synonymy can be derived directly from the source data (showing identical data as the source data) or can be derived indirectly using the automated decision rules (e.g. if source 1 says that A is a synonym of B and source 2 says that B is a synonym of C, then The Plant List will show A to be a synonym of C).

For each name with Status of Synonym The Plant List aims to provide:

  • the name;
  • the author(s) credited with publishing that name;
  • the place and date of original publication of the name where this was supplied;
  • a link to its Accepted name;
  • a reference to the source database supplying this name record and expressing the opinion that it is a Synonym (with, where possible, a link to that record in the source database);
  • the IPNI identifier (linking the name record to the International Plant Names Index, a bibliographic resource which will provide full original publication details for this name);
  • an assessment of the Confidence that The Plant List attaches to the Status of the name being Synonym.

Unresolved Name

Unresolved names are those to which it is not yet possible to assign a status of either ‘Accepted’ or ‘Synonym’. For an explanation of how names were assigned a status please refer to How The Plant List was Created. Unresolved names fall into two sub-classes:

Unassessed names

for which there is no evidence within any of the contributing data sources that the status of this name had been evaluated by the data owners. None had recorded that it was either ‘accepted’ or a ‘synonym’. None had recorded that they had attempted such an evaluation. Since, by definition, a name is accepted by the publishing author at the time of publication, it could be argued that all names are putatively Accepted until such time as they are demonstrated to be Synonyms.

Unplaced names

for which the data source supplying that record indicated positively that the data owners had sought to resolve its status and not been able to come to a conclusion so as to place it either in synonymy or as the accepted name of a new species. This is often the case if the name has insufficient description and no herbarium specimens are known.

Among Unresolved names, Unassessed names are much more numerous than Unplaced names.

It is also important to note that in a small number of cases the status ‘Unresolved’ was assigned to a name record during creation of The Plant List despite a taxonomic opinion having been recorded in the contributing data source. This occurs where to have followed this opinion would have conflicted with opinions recorded elsewhere in other data sources. To follow both would have resulted in inconsistencies within the working list of plants. In such cases:

  • the status of the record on this website is indicated with an ‘*’ to indicate that it derives from the procedures used to build The Plant List and
  • the original status of the name (as recorded in the source database) is indicated on the details page for that name.

For each name with Status of ‘Unresolved’ The Plant List aims to provide:

  • the name;
  • the author(s) credited with publishing that name
  • the place and date of original publication of the name where this was supplied;
  • a reference crediting the source database providing the name; (with, where possible, a link to that record in the source database)
  • the IPNI identifier (linking the name record to the International Plant Names Index which will provide full original publication details for this name);
  • Unresolved names are generally flagged as ‘Low Confidence’ entries.

Misapplied Names

Some data sets which contributed to The Plant List record not only how plant names should be used but also where in the published literature a given name may previously have been used inappropriately (to refer erroneously to another species). Recording such misapplication of names helps users to avoid pitfalls when interpreting the literature. The decision that a record establishes the misuse of a name is derived from the cited data source (see How The Plant List was Created.)

For each reported misapplication of a plant name we aim to provide:

  • the name;
  • the author(s) that published that name and wherever possible an indication of where or by whom this was misused (e.g. ‘sensu Smith’ may appear after the publishing author);
  • a link to the Accepted name of the species to which this name has been previously and erroneously applied;
  • a reference crediting the source database recording this misuse of the name; (with a link to that record in the source database and hence the publication details of where this name was misapplied);
  • an assessment of the Confidence that The Plant List attaches to this name having being erroneously applied to the other species.

Annotation of names

Sources which contributed name records to The Plant List record included, on relatively few occasions, additional information about individual names beyond their status as Accepted or Synonym. Where possible this information is retained within The Plant List and made visible to users as annotations attached to the relevant name record.

Invalid and Illegitimate Names

Some of the names in The Plant List were recorded by the contributing data sets to be either invalidly or illegitimately published according to the rules of the International Code of Botanical Nomenclature.

Spelling variants (or Orthographic variants)

Some data sources include names which are recorded as ‘Orthographic variants’ (or spelling variants) of another name. These misspelt names may not have been validly published and yet are nevertheless used in the literature and therefore included in The Plant List to guide those that find them.

Confidence Levels

For each name record The Plant List offers an indication of the confidence that the Status of the name record is correct: Our confidence assessments are based primarily on the nature and taxonomic integrity of the source data.

High Confidence level

is applied to the Status of name records derived from taxonomic datasets which treat the whole of the taxonomic group in question on a global basis and have been peer reviewed (e.g. ILDIS, WCSP, see collaborators).

Medium Confidence level

is applied to the Status of name records derived from:

  • Either national or regional databases via a rules-based automated process, reflecting the challenges inherent in resolving taxonomic differences between different name data sets for the same species for different geographic areas. Regional datasets used as sources for The Plant List are primarily those stored within Tropicos (see collaborators for details).
  • Or taxonomic datasets which treat the whole of the taxonomic group in question on a global basis but which have not yet undergone peer review (e.g. GCC and WCSP (in review) see Collaborators).

Low Confidence level

is applied to the Status of name records derived from

  • any of the contributing data sets which were recorded as unresolved in those data sets.
  • to name records whose status has been inferred from (sometimes conflicting) information from more than one source database.
  • to records derived from nomenclatural resources such as IPNI which do not contain opinions about the status of the name and which were assigned a status of Unresolved in The Plant List.

Contributing data sets

The data resources used to build Version 1 of The Plant List are listed here and we are grateful to the many collaborators listed below that made their data available.

We welcome offers of additional data sets for inclusion in the future editions of The Plant List (see Contributions).

Global species resources

  • World Checklist of Selected Plant Families

    This large database of global monographic treatments was supplied to The Plant List as two separate data sets which were treated slightly differently:

    1. WCSP

      Peer reviewed treatments are available online for 151 Seed Plant families (view published families). WCSP gives information on the accepted scientific names and synonyms of selected plant families. It includes more than 320,000 names and allows the user to search for all the scientific names of a particular plant, or the areas of the world in which it grows (distribution). The data set counts upon the collaboration over 16 years of 132 specialists from 25 countries who have contributed data or acted as reviewers.

    2. WCSP (in review)

      In addition to the published family checklists the World Checklist database contains data for many other families which have either been completed and await review by specialists or are still being compiled. The Plant List also incorporates these unpublished data which include more than 290,000 additional names.

  • GrassBase – The Online World Grass Flora

    The nomenclatural component of this database currently holds over 60,000 names and lists names for any given genus, geographical region or genus within a geographical region; and links to the GrassBase description for any species. The nomenclatural data from GrassBase is made available through the WCSP system.

  • The Global Compositae Checklist

    is an integrated database of nomenclatural and taxonomic information for the second largest vascular plant family in the world. This checklist is published by the International Compositae Alliance and compiled from many contributed datasets. The database will be continually updated. Additional information such as references, distribution and infraspecific taxa are available on the website. All species are marked as ‘provisionally accepted names’ in the Beta version. The data set has not yet been fully peer-reviewed and may contain some errors. More than 100,000 records derived from The Global Compositae Checklist are included in The Plant List.

  • The International Legume Database and Information Service

    is a long-term programme of co-operation among legume specialists worldwide to create a biodiversity database for the Leguminosae (Fabaceae) family. The database provides a taxonomic checklist plus basic factual data on distribution, common names, life-forms, uses, literature references to descriptions, illustrations and maps. More than 40,000 records derived from ILDIS are included in The Plant List.

  • The iPlants project

    developed and tested the processes and procedures that would be required during production of an authoritative, global online list of plant names. The project was a collaboration between The Royal Botanic Gardens, Kew, the Missouri Botanical Garden and the New York Botanical Garden and was funded from April 2004 to May 2006 by the Gordon and Betty Moore Foundation. Checklists for the following families were made available for The Plant List: Bignoniaceae, Iridaceae, Lecythidaceae, Melanophyllaceae, Physenaceae, Sarcolaenaceae, Schlegeliaceae and Sphaerosepalaceae. More than 11,000 records derived from iPlants are included in The Plant List.

  • The International Organization for Plant Information

    aims to provide a series of computerised databases summarizing taxonomic, biological, and other information on plants of the world. IOPI’s mission is to develop an efficient and effective means of providing basic plant information to users, and guide them toward sources of authoritative data. Their checklist currently holds over 200,000 names from which The Plant List includes records for Juncaceae compiled by J. Kirschner (Institute of Botany, Pruhonice) (Over 1,000 name records).

  • Missouri Botanical Gardens

    The Bryophyte information was primarily gathered from A Checklist of Mosses and ongoing projects dealing with mosses and liverworts to create World Checklists for these groups. Some liverwort names were not yet available from data sources but are expected to be added in future versions.

Floristic Datasets

  • Missouri Botanical Gardens

    the botanical information system at the Missouri Botanical Garden, Tropicos contains information on over one million plant names and 3.9 million herbarium specimens. The system was developed through the actions of a wide variety of floristic, nomenclatural, and bibliographic projects both at the Garden and in collaboration with other institutions. All of this information is available on the Internet through the Garden’s web site.

    Tropicos provides access to the accumulated data on vascular plant and bryophyte as authority files for the development of floras and checklists that provide synthesis of local and regional vegetation. Included within each of these syntheses are indications of acceptance, synonymy and misapplication of names within a floristic region. This information was used to evaluate plant names from these regions for The Plant List.

    The project data held by Tropicos and used in the development of The Plant List includes:

    Information was also gleaned from recent published literature when the acceptance or synonyms have been recorded in Tropicos.

    More than 240,000 records derived from Tropicos were included in The Plant List.

  • Madagascan endemics

    The iPlants project also provided a checklist for Madagascan endemics.

Plant nomenclatural resources

  • The International Plant Names Index

    is a database of the names and associated basic bibliographical details of seed plants, ferns and fern allies. Its goal is to eliminate the need for repeated reference to primary sources for basic bibliographic information about plant names. The data are freely available and are gradually being standardised and checked. IPNI will be a dynamic resource, depending on direct contributions by all members of the botanical community. IPNI is the product of a collaboration between the Royal Botanic Gardens, Kew, the Harvard University Herbaria, and the Australian National Herbarium.

  • Uncompiled name data records derived from Kew’s checklist databases.

  • Uncompiled name data records from Missouri’s Tropicos database.

How The Plant List was Created

Development of The Plant List has been a collaborative venture coordinated at the Royal Botanic Gardens, Kew and Missouri Botanical Garden and relying on the generosity of many collaborators who manage significant taxonomic data resources. The purpose was to merge into a single consistent database the best of the nomenclatural information available in these diverse data resources through a defined and automated process. In summary, development of The Plant List involved merging many taxonomic data sources taking the accepted name and synonymy relationships from those that were global checklist datasets, augmenting these and adding additional names and synonymy relationships from regional and national floristic datasets following a set of decision rules. Species names not accounted for in any of the previously incorporated data sets are added from nomenclatural resources, ensuring the list is comprehensive for all plant names. Finally a further set of rules are applied to the final data set to resolve inconsistencies, conflicting or overlapping statuses and to correct logical data errors.

The Sequence for Merging Data Sets

The starting point was the set of global peer reviewed family checklists published within the World Checklist of Selected Plant Families (WCSP). Families available through the WCSP from other sources including GrassBase, iPlants (Bignoniaceae, Iridaceae, Lecythidaceae, Melanophyllaceae, Physenaceae, Sarcolaenaceae, Schlegeliaceae and Sphaerosepalaceae) and IOPI (Juncaceae) were also included. To these were added additional global checklists from collaborating partners: The Global Compositae Checklist from the International Compositae Alliance and The International Legume Database and Information Service Also incorporated were all of the compiled WCSP data records for families other than those which have been published (i.e. are in the process of being compiled or are under peer review): WCSP (in review).

The second category of information sources was various national and regional checklists. Missouri Botanical Garden’s Tropicos system, primarily provided data from nearly ten digital flora projects. Each of these national or regional floras or checklists was created at a different time by a different team of botanists and considers only plant specimens found within that area’s borders. Thus these floras/checklists contain different subsets of plants (and plant names) and record conflicting opinions as to which are the accepted names for particular plants or what are their synonyms. In building The Plant List, therefore, a significant task was to automate procedures to trawl each of these different data sets to locate new information that they might contain about names and synonymy, then to detect and resolve conflicting opinions among these data sets and to add this additional information to the merged data set. A set of decision rules was employed to differentiate between and select from among the diverse opinions expressed within these national and regional data sets.

Finally, there were many scientific plant names (recorded in IPNI or included in Tropicos or WCSP as uncompiled records) that had not been included in any of the data sets consulted up to that point. The combined set of global and regional data were therefore compared with the IPNI database to detect missing from our merged data set so that they could be added to our final product. Names derived IPNI (and other nomenclatural data sets consulted) were included as ‘Unresolved’, since data was not recorded in these data sets asserting whether these were the Accepted name for a new plant (not yet in the merged data set) or whether they were Synonyms of plants already in the merged list.

A significant component of this and later phases of the creation of The Plant List involved the matching of names between different data sets to identify whether a name was unique to one data set or included in multiple data sets. A variety of algorithms were employed to perform name matching at different stages depending upon the requirements at that stage in the process.

Derivation of Name Status

The procedures used to build The Plant List were designed to follow the taxonomic opinions recorded within the contributing data sets. Where necessary these procedures selected from among alternative and conflicting opinions recorded between data sets so as to achieve a coherent taxonomic consensus.

Consistent application of the decision rules allowed resolution of most instances of conflicts between data sources so that most species names can be clearly established as either an accepted name or as a synonym with reference to the data source in which that status is recorded. It is important to note that the set of synonyms which point to a given accepted name in The Plant List may have originated from more than one data source i.e. some synonyms for a given species may derive from a data set other than that from which the accepted name record derived.

Approximately 98% of all Status values within The Plant List derive directly from the data source which supplied that name record.

The Status of the remaining 2% of name records in The Plant List has been modified from that stored in the source data set as a result of the conflict resolution processes. Such changes were made only where necessary to avoid illogical conflicts detected within the data sets supplied or within the merged data set (i.e. were made to improve the consistency of The Plant List). Where such changes were made, these were primarily to downgrade name records recorded as having a status of ‘Accepted’ in the source database to having a status of ‘Unresolved’ in The Plant List.

Any name records whose status was modified during the creation of The Plant List are labelled (using an asterisk) and the original status in the data source also indicated. The Confidence level of any record modified by these procedures was set to ‘Low Confidence’.

Decision Rules to arbitrate between Conflicts of Opinion

A set of decision rules were employed to differentiate between and select from among diverse opinions expressed within all of the data sources consulted. These rules were developed by the team at Kew and Missouri in an attempt to mimic the sort of decision-making rationale a botanist might use in a situation where he/she encounters conflict between taxonomic treatments in the literature but is not in a position to resolve the question by examining the original material. For example:

  • monographic treatments which consider the group in question in its entirety throughout its distribution are given priority over geographically defined treatments which can result in a single species being treated under different names in different parts of its range;
  • synonym relationships reported in more recent treatments are given priority over those published earlier;
  • publication dates are used to assist in detecting likely illegitimate names;
  • author details are used to detect likely orthographic variants (alternative spellings of the same name);
  • the decision rules are informed by the principles embedded in the International Code of Botanical Nomenclature.

Data analysis of logical inconsistencies and data integrity issues

The data set created by merging records from the various data sources as described above was found initially to be inconsistent and logically incongruous for a variety of reasons.

Each of the taxonomic data sets incorporated into The Plant List are themselves still being developed and improved upon by their owners and editors. None therefore can be considered to be complete or entirely up to date. Nor would their owners claim that these data sets were free of inconsistencies, gaps or data error. Furthermore these databases use terminology in different ways which necessitated some level of standardisation. Some contained fossil plant names or names of taxonomic ranks that are not intended to be included in The Plant List and yet which, nevertheless, might link to names in the merged data set. Careful filtering of the record set was needed.

Inevitably, the process of bringing many different data sets together added a layer of further complexity. Thus for example it is not straightforward to automate recognition of a particular Latin binomial reliably within different data sets given that the plant name authors may have been cited or abbreviated differently, subtle differences in spelling and punctuation occur between the data sources and not all of them included the place of publication of a name to help resolve suspected matches. This added a degree of uncertainty even before other complexities such as those surrounding homonyms and misapplied use were dealt with. As a result certain circumstances in the procedures created a merged data set in which a few names were used inconsistently based upon records derived from different sources.

The goal of The Plant List project is to create a single internally coherent view rather than a set of alternative views. The final stage of development of The Plant List therefore involved rigorous logical analysis of the data set. Steps were taken, for example, to identify likely duplicates used in different senses, to detect where a number of Synonyms link one to another but lack any link to an Accepted name, where illegitimate names are assigned Accepted status or where a subspecies included in the dataset occurs within a species which itself does not occur.

Resolution of logical inconsistencies and data integrity issues

For each different data inconsistency detected solutions were derived based upon the concepts and principles as outlined above and used in the previous stages. Additional decision rules were created and new automated steps introduced to perform the following actions on the merged data set:

  1. Standardisation of terminology
  2. Standardisation, Selection and Filtering of name records
  3. Deduplication of names
  4. Resolving referential integrity regarding linkages among synonyms.
  5. Resolving referential integrity regarding taxonomic relationships.
  6. Standardisation of the names of Families and Major Groups so as to create the taxonomic hierarchy necessary to support browsing of The Plant List.

Online Publication of The Plant List

Target 1 of the GSPC was to achieve a "widely accessible" working list of all known plant species. To accomplish that aspect of Target 1, this website was created to enable world-wide access to the working list. The final merged and resolved data set of all plant species is accessible through the search and browse features offered here.

Next Steps

As a result of the data analysis and conflict resolution steps described above it is now intended to provide detailed feedback to each of the collaborators that contributed datasets on providing them with enriched data records, information on inconsistencies detected and comparisons with other relevant data sets. Details of the data processing entailed in creating The Plant List are to be published for broader discussion. Interest in the process and suggestions for refinements to the decision rules are welcome.

The project team

The Plant List owes its origins to a three-day workshop at Missouri Botanical Garden in May 2008. Bob Allkin, Eimear Nic Lughadha and Alan Paton (Kew) joined Bob Magill and Chuck Miller (MO) to plan how existing resources could best be combined to produce a best efforts working list to meet the 2010 deadline. The principles underlying our approach were agreed at that time, along with many of the decision rules and initial drafts of workflows for the data processing required. Translating that initial plan into action and refining the process to improve the product involved many more people over many months, with datasets, e-mails and occasionally people moving back and forward between Kew and St Louis.

Contributors working at Kew

  • Bob Allkin – Project Manager
  • Abigail Barker – Applications Development Manager
  • Matthew Blissett – Lead Developer Web Application
  • Charlotte Couch – Support Families and Genera Index
  • Paramjit Dhaliwal – IT Operations team
  • Jeff Eden – Graphic Designer
  • Rafaël Govaerts – Editor of the World Checklist of Selected Plant Families
  • Graham Hawkes – Developer responsible for The Plant List data and procedures
  • Chris Hopkins – Developer Web Application
  • Eimear Nic Lughadha – Senior Responsible Owner
  • Nicky Nicolson – Developer responsible for IPNI
  • Alan Paton – Assistant Keeper, Herbarium
  • John Stone – Graphic Designer
  • Julius Welby – Data administration
  • Ian Wright – IT Operations team leader

Contributors working at Missouri

  • Bob Magill – Senior Vice President of Science & Conservation
  • Chuck Miller – Vice President of Information Technology
  • Chris Freeland – Director of Center for Biodiversity Informatics
  • Jay Paige – Application and Database Developer
  • Heather Stimmel – Application and Database Developer
  • Craig Geil – Application and Database Developer

Enhancing The Plant List

In the future we envisage producing subsequent versions of The Plant List at regular intervals. Subsequent versions could:

  1. merge improved and extended versions of the data sets used to create Version 1. As the custodians of the source datasets enhance their data sets through their own planned additions and corrections, their improvements will feed into subsequent versions of The Plant List. We are committed to supplying feedback to the owners of each data set as this arises from the use and creation of The Plant List
  2. include additional data sets which were unavailable in Version 1. If you are interested in making a future contribution, please contact contributors@theplantlist.org.
  3. reflect enhancements in the procedures that were used to create The Plant List: e.g for locating duplicate name records, for resolving inconsistencies and for detecting conflicting opinions expressed within alternative data sets and then for selecting from among those opinions (see How The Plant List was created).

Our immediate priorities following the December 2010 launch are to:

  • provide feedback to the collaborators who contributed datasets
  • complete documentation of the decision rules for publication
  • plan future versions of The Plant List

Likely priorities for future work include:

  • increasing the number of data contributors — especially for South East Asia which is poorly represented in Version 1
  • adding links to country level geography
  • providing unique identifiers.

Relationships to other resources

Of the data resources that were used to create The Plant List many of the previously published global monographic datasets are also available through the Catalogue of Life which provides peer reviewed information for many plant families.

The Plant List goes beyond the scope of Catalogue of Life by also including global treatments which are in peer review or have not yet been published and by seeking to complete the list of all species names by filling the gaps using further digital resources including regional and national floras and nomenclatural databases. The Plant List thus aims to be comprehensive in coverage at species level for all names of mosses and liverworts, flowering plants, conifers, cycads and their allies and the ferns and their allies. It is nevertheless a work in progress. The Confidence in the Status assigned each name records is indicated.