Skip to content

Bachelor's Thesis for the Bachelor Program "Software & Information Engineering" at Vienna University of Technology

Notifications You must be signed in to change notification settings

raffaelfoidl/ProvCaptPyEnvs

Repository files navigation

Provenance Capturing for Python Environments

This repository serves as an overview of the practical part of my Bachelor's Thesis for Software & Information Engineering at Vienna University of Technology.

Abstract

Scientific experiments commonly involve several computational steps that consume certain input data and are responsible for the generation of results. Such workflows are often encoded or implemented in the form of scripts. While scripts can be adapted to every requirement, they lack valuable metadata such as experiment parameters, process flows and file accesses - in short, provenance. There have been research efforts addressing this issue which resulted in tools that collect the provenance of scripts. However, they rarely utilize a common provenance format, which makes processing and exchanging their output difficult. In this thesis, two of these tools - YesWorkflow and noWorkflow - are extended such that they produce results that are compliant to the World Wide Web Consortium's PROV standard. The main goal is to increase the utility of their output by facilitating interoperability and machine-aided processing. This work outlines how the proposed modifications to YesWorkflow and noWorkflow were implemented and how they can be leveraged. Moreover, capitalizing on the ontological representation of their provenance, possibilities to gain deeper insight into the scripts' structure and execution details are highlighted. To this end, RDF serializations of the PROV ontology are used to infer otherwise only implicitly available information. This is achieved with the help of specifically constructed SPARQL queries.

Contents of this Repository

Related Repositories

Note: The recommended Python and Java versions are Python 3.5 and Java 8 (SPARQL Playground) or 11 (YesWorkflow) since the scripts and software artifacts involved in this thesis have been developed and tested using those.

About

Bachelor's Thesis for the Bachelor Program "Software & Information Engineering" at Vienna University of Technology

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published