AVS 66th International Symposium & Exhibition
    New Challenges to Reproducible Data and Analysis Focus Topic Monday Sessions
       Session RA+AS+NS+SS-MoA

Invited Paper RA+AS+NS+SS-MoA3
Enhancing Data Reliability, Accessibility and Sharing using Stealthy Approaches for Metadata Capture

Monday, October 21, 2019, 2:20 pm, Room A211

Session: Quantitative Surface Analysis II/Big Data, Theory and Reproducibility
Presenter: Steven Wiley, Pacific Northwest National Laboratory
Correspondent: Click to Email

Science is entering a data-driven era that promises to accelerate scientific advances to meet pressing societal needs in medicine, manufacturing, clean energy and environmental management. However, to be usable in big data applications, scientific data must be linked to sufficient metadata (data about the data) to establish its identity, source, quality and reliability. This has also driven funding agencies to require projects to use community-based data standards that support the FAIR principles: Findable, Accessible, Interoperable, and Reusable. Current concerns about data reproducibility and reliability have further reinforced these requirements. Truly reusable data, however, requires an enormous amount of associated metadata, some which is very discipline and sample-specific. In addition, this metadata is typically distributed across multiple data storage modalities (e.g. lab notebooks, electronic spreadsheets, instrumentation software) and is frequently generated by different people. Assessing and consolidating all of the relevant metadata has traditionally been extremely complex and laborious, requiring highly trained and motivated investigators as well as specialized curators and data management systems. This high price has led to poorly documented datasets that can rarely be reused. To simplify metadata capture and thus increase the probability it will indeed be captured, EMSL (Environmental Molecular Sciences Laboratory) has developed a general-purpose metadata capture and management system built around the popular ISA-Tab standard (Investigation-Study-Assay Tables). We have modified this framework by mapping it onto the EMSL workflow, organized as a series of “transactions”. These transactions are natural points where metadata is generated, include specifying how samples will be generated and shipped, instrument scheduling, sample storage, and data analysis. Software tools have been built to facilitate these transactions, automatically capture the associated metadata and link it to the relevant primary data. This metadata capture system works in concert with automated instrument data downloaders and is compatible with commercial sample tracking and inventory management systems. By creating value-added tools that are naturally integrated into the normal scientific workflow, our system enhances scientific productivity, thus incentivizing adoption and use. The entire system is designed to be general purpose and extensible and thus should be a useful paradigm for other scientific projects that can be organized around a transactional model.