Towards Automated Design, Analysis and Optimization of Declarative Curation Workflows

  • Tianhong Song
  • Sven Köhler
  • Bertram Ludäscher
  • James Hanken
  • Maureen Kelly
  • David Lowery
  • James A. Macklin
  • Paul J. Morris
  • Robert A. Morris


Data curation is increasingly important. Our previous work on a Kepler curation package has demonstrated advantages that come from automating data curation pipelines by using workflow systems. However, manually designed curation workflows can be error-prone and inefficient due to a lack of user understanding of the workflow system, misuse of actors, or human error. Correcting problematic workflows is often very time-consuming. A more proactive workflow system can help users avoid such pitfalls. For example, static analysis before execution can be used to detect the potential problems in a workflow and help the user to improve workflow design. In this paper, we propose a declarative workflow approach that supports semi-automated workflow design, analysis and optimization. We show how the workflow design engine helps users to construct data curation workflows, how the workflow analysis engine detects different design problems of workflows and how workflows can be optimized by exploiting parallelism.