Kurator

From Kurator
Jump to: navigation, search

This is the front page for the Kurator project public wiki.

Contents

Kurator is a project to develop a provenance-enabled Workflow Platform and Toolkit to Curate Biodiversity Data.

The University of Illinois Urbana Champaign and Harvard University are awarded grants to develop software tools for scientific data digitization, sharing, integration and use. The considerable challenge to digitizing natural science collections in the U.S. (and globally) necessitate a focus on both digitization efficiencies and the utility of the generated data. This grant will develop a novel, extensible, open source toolkit (Kurator) for automated and semi-automated workflows with diverse curation services to aid biodiversity research and beyond. This project will enhance discovery and understanding while promoting teaching, training and learning. A postdoctoral fellow will be trained in provenance enhanced workflow technology and contribute to Kurator design and implementation. Principles of data management curation with a focus on provenance will be taught in undergraduate and graduate courses. This project will also enhance infrastructure for research and education through collaborations with iDigBio and the Encyclopedia of Life. Educational modules and outreach activities on data quality and data curation will be developed for undergraduates and high school educators. Several Thematic Collection Networks will provide data via iDigBio for testing to insure dissemination of high quality data. Critical community authority files that do not have associated web services will be made available to the greater community as Kurator actors and services.

Kurator will consist of a user friendly web interface for users to configure and launch workflows while maintaining provenance, and a workflow platform for rapid development of new curation services and workflow variants. The latter will also be used to "wrap" valuable domain authority files that are not currently available as services. New "curator-in-the-loop" workflow technology allows us to directly involve experts in semi automatic curation pipelines, using human interaction actors via FilteredPush, other syndication methods, and discovery environments. Kurator will allow examination of data lineages to facilitate the assessment of credibility, supports repeatability in publication, informs legal proceedings where data are regulated, and provides context for feedback to a given resource. Kurator will facilitate digitization efforts through custom processing of raw data obtained from hardcopy specimen labels against existing services, including taxonomic name resolution, georeferencing, and duplicate specimen detection, as well as newly created customizable actors for appropriate controlled vocabularies to clean the data. Where required, semi-automated services can invoke expert review using annotation services and existing discovery environments. Curation pipelines can be integrated with other workflows for analysis of ecological, evolutionary, phenological, genomic and related data, can be shared or repurposed easily, and can be made accessible for publication.

Public discussion list: https://lists.illinois.edu/lists/info/kurator

Products

FP-Akka User Documentation

YesWorkflow Developer wiki

Kurator-Akka Developer tutorial

Publications

  • SPNHC 2016: P. J. Morris, J. Hanken, D. B. Lowery, B. Ludäscher, J.A. Macklin, T.M. McPhillips, R.A. Morris, A. M. Saraiva, T. Song, A.K. Veiga, J. Wieczorek. (2016) Error? What Error? Expectation management in reporting data quality issues to data curators 2016 SPNHC Conference, 31st Annual Meeting. Society for the Preservation of Natural History Collections. pp. 142-144. doi:10.3372/SPNHC2016
  • SPNHC 2016: T.M. McPhillips, Q. Zhang, B. Ludäscher, J. Hanken, D. B. Lowery, J. A. Macklin, P. J. Morris, R. A. Morris, L. Russell, J. Wieczorek. (2016) DemoCamp: Using YesWorkflow to explore the results of cleaning a dataset using a script. Green Museum - How to practice what we preach? 2016 SPNHC Conference, 31st Annual Meeting. Society for the Preservation of Natural History Collections. pp. 138-139. doi:10.3372/SPNHC2016
  • SPNHC 2016: P.J. Morris, J. Hanken, D.B. Lowery, B. Ludäscher, J.A. Macklin, T.M. McPhillips, R. A. Morris, J. Wieczorek, Q. Zhang. (2016) DemoCamp: Kurator: Extensible and accessible tools for quality assessment of biodiversity data. 2016 SPNHC Conference, 31st Annual Meeting. Society for the Preservation of Natural History Collections. pp. 141-142. doi:10.3372/SPNHC2016
  • TDWG 2105: P.J. Morris, L. Dou, J. Hanken, S. Koehler, D.B. Lowery, B. Ludäscher, J. A. Macklin, T. M. McPhillips, R. A. Morris, T. Song. (2015) FP-Akka: EXPERIENCES AND CHALLENGES IN INTEGRATING SCIENTIFIC WORKFLOWS FOR DATA QUALITY CONTROL WITH BIODIVERSITY SERVICES. #819.
  • TDWG 2015: McPhillips, T.M, D.B. Lowery, J. Hanken, B. Ludäscher, J. A. Macklin, P.J. Morris, R. A. Morris, T. Song, J. Wieczorek. (2015) Data cleaning with the kurator Toolkit: bridging the gap between conventional scripting And high-performance workflow automation. and challenges in integrating scientific workflows for data quality control with biodiversity services. Biodiversity Information Standards TDWG 2015 Annual Conference. #822
  • SPNHC 2015 DemoCamp: B. Ludäscher, J. Hanken, D. Lowery, J.A. Macklin, T.M. McPhillips, P.J. Morris, R.A. Morris, T. Song. (2015) YESWORKFLOW: How to render a data curation script as a workflow in under 10 minutes. Society for the Preservation of Natural History Collections, 30th Annual Meeting. p. 49.
  • SPNHC 2015 DemoCamp: P.J. Morris, B. Ludäscher, S. Köhler, J. Hanken, D. Lowery, J.A. Macklin, T.M. McPhillips, P.J. Morris, R.A. Morris, T. Song. (2015) A scientific workflow tool for targeted data quality improvement of natural science collections data. Society for the Preservation of Natural History Collections, 30th Annual Meeting. p. 55.
  • SPNHC 2015 B. Ludäscher, T.M. McPhillips, T. Song, J. Hanken, D. Lowery, J.A. Macklin, P.J. Morris, R.A. Morris. (2015) KURATOR: An Extensible, open-source workflow platform for users and makers of data curation tools. Society for the Preservation of Natural History Collections, 30th Annual Meeting. pp 49-50.
  • SPNHC 2015 Poster J.A. Macklin, B. Ludäscher, J. Hanken, D. Lowery, T.M. McPhillips, P.J. Morris, R.A. Morris, T. Song. (2015) What are *Your* data curation challenges. Please tell us. Society for the Preservation of Natural History Collections, 30th Annual Meeting. p 50.
  • TDWG 2014 Ludäscher, B., Hanken, J., Lowery, D.B., Macklin, J.A., Morris, P.J. Morris, R.A., Song, T. 2014. Workflow Support for Continuous Data Quality Control in a FilteredPush Network. TDWG 2014 Annual Conference. #700

See also the publications of the FilteredPush Project

Reports

  • P. K. Morris, 2016, "SOLVE_WITH_MORE_DATA and other Lessons from Biodiversity Data Quality Initiatives at the Museum of Comparative Zoology". Presentation at FASREP Symposium: Biodiversity Data Quality Symposium: Developing a Common Framework to Improve Fitness for Use of Biodiversity Data
  • iDigBioWebinar_May2015 D. Lowery, B. Ludäscher, J.A. Macklin, T. McPhillips, P.J. Morris, R.A. Morris, T. Song, 2015 Webinar: Towards user-definable, semi-automated workflows for curating biodiversity data
  • D. Lowery, D. Mozzherin, J. Wieczorek, Data Quality Assessment, Improvement, and Annotation - Citstitch Hackathon Notes, iDigBio 2014

Internal

Developer_Documentation

Funding

Nsf1sm.gifKurator: A Provenance-enabled Workflow Platform and Toolkit to Curate Biodiversity Data
Collaborative NSF Awards 1356438 and 1356751

Personal tools
Namespaces

Variants
Actions
Navigation
SMW
Toolbox
KuratorWikiAdmin