For Data Managers

Background

Bionomia scribes attribute specimen records to the collectors and determiners represented in your dataset(s) by linking natural history specimen records you publish to GBIF to their Wikidata Q numbers or ORCID IDs. People with ORCID IDs also claim records for the specimens they themselves collected or identified. Wikidata and ORCID identifiers have associated resources and services that are unquestionably useful for collections ranging from disambiguating people names to gauging the impact your collection has on the academic community.

Engaging With Your Community

Bionomia scribes are a welcoming, international group of enthusiasts who are driven to help you attribute specimen records to the collectors and determiners represented in your dataset(s). They work tirelessly to enhance entries in Wikidata by adding links and attributes like birth and death dates to deceased natural historians. They are also advocates of ORCID and can help you campaign for its adoption at your institution.

Data round trip

Incorporating Enhancements

Every few weeks, Bionomia refreshes a subset of the Darwin Core data you publish to GBIF. See how it works for more details.

Frictionless Data

Search for your dataset(s) and find the link to a Frictionless Data package. These zipped, UTF-8 encoded relational files are similar to the Darwin Core Archives you produced for GBIF. They differ in that they more efficiently represent many:many relationships. There is also a breadth of open software libraries in many programming languages that read, validate, and process Frictionless Data. You can also extract the zipped package and import the UTF-8 encoded csv files into any spreadsheet software, provided the files are not excessively large.

The packages contain a standard datapackage.json metadata file and up to ten zipped csv files: users.csv.zip, occurrences.csv.zip, problem_collector_dates.csv.zip, problem_determiner_dates.csv.zip, citations.csv.zip, articles.csv.zip, attributions.csv.zip and missing_attributions.csv.zip, unresolved_users.csv.zip, and not_them_assertions.csv.zip (all three optional).

The datapackage.json metadata file contains a "created" timestamp for when the package was last produced. Regeneration of these packages typically occurs once every few weeks but if you would like a more up-to-date version, please create a ticket.

The users.csv.zip file contains a list of unique users that were attributed or have claimed specimen records as their own in your dataset. It also contains their full names, aliases, ORCID IDs or Wikidata Q numbers plus birth and death dates for the latter. The occurrences.csv.zip file contains the subset of Darwin Core fields from your specimen records for which attributions have been made. The problem_collector_dates.csv.zip file contains a list of occurrence records whose eventDate is earlier than a collector's birthDate or later than their deathDate. The problem_determiner_dates.csv.zip file contains a list of occurrence records whose dateIdentified is earlier than a determiner's birthDate or later than their deathDate. The attributions.csv.zip file is a join table for the other two csv files and also contains columns for who made the attribution, their ORCID ID, and a timestamp for when they made the attribution. Finally, the missing_attributions.csv.zip file contains attributions made in Bionomia but not present in a data source's identifiedByID and recordedByID data fields.

Assessing Data Quality

In the set of "Help Others" pages where specimen records are attributed to collectors and determiners, there are tabs to Fix and Visualize records. Here, a collector's birth and death dates are cross-referenced against those on their specimen records. Countries on maps and date ranges on charts can also be clicked to execute dynamic filters. In time and as more attributions are made, data quality reports like these on individuals' specimen records may be rolled-up to dataset-level reports.

Reconcile

OpenRefine logo
OpenRefine reconciliation endpoint:
https://api.bionomia.net/reconcile
Recommended Use

The endpoint works best when there is a single name in a person column. Other columns such as Family collected or identified and/or date collected or identified may be optionally used to help adjust the score of returned results. Dates of birth and death (when known) are cross-referenced against the date column you use. Try out the Bionomia ID endpoint among others on the Reconciliation service test bench.