AVS 66th International Symposium & Exhibition
    New Challenges to Reproducible Data and Analysis Focus Topic Monday Sessions
       Session RA+AS+NS+SS-MoA

Paper RA+AS+NS+SS-MoA11
R2R(Raw-to-Repository) Characterization Data Conversion for Reproducible and Repeatable Measurements

Monday, October 21, 2019, 5:00 pm, Room A211

Session: Quantitative Surface Analysis II/Big Data, Theory and Reproducibility
Presenter: Mineharu Suzuki, National Institute for Materials Science (NIMS), Japan
Authors: M. Suzuki, National Institute for Materials Science (NIMS), Japan
H. Nagao, National Institute for Materials Science (NIMS), Japan
H. Shinotsuka, National Institute for Materials Science (NIMS), Japan
K. Watanabe, ULVAC-PHI Inc., Japan
A. Sasaki, Rigaku Corp., Japan
A. Matsuda, National Institute for Materials Science (NIMS), Japan
K. Kimoto, National Institute for Materials Science (NIMS), Japan
H. Yoshikawa, National Institute for Materials Science (NIMS), Japan
Correspondent: Click to Email

NIMS, Japan, has been developing a materials data platform linked with a materials data repository system for rapid new material searching by materials informatics. The data conversion from raw data to human-legible/machine-readable data file is one of the key preparation techniques prior to data analysis, where the converted data file should include meta-information. Our tools can convert raw data to a structured data package that consists of (1) characterization measurement metadata, (2) primary parameters which we will not call “metadata” to distinguish from (1), (3) raw parameters as written in original raw data, and (4) formatted numerical data. The formatted numerical data are expressed as matrix type with robust flexibility, not obeying a rigid definition. This flexibility can be realized by applying the data conversion style of Schema-on-Read type, not Schema-on-Write type based on de jure standards such as ISO documents. The primary parameters are carefully selected from raw parameters and their vocabularies are replaced from instrument-dependent terms to general ones that everyone can readily understand. These primary parameters with linked specimen information are useful for reproducible and repeatable instrument setup. By this R2R conversion flow, we have verified that we can generate and store interoperable data files of XPS spectra and depth profiles, powder XRD patterns, (S)TEM images, TED patterns, EELS spectra, AES spectra, EPMA spectra and elemental mapping, and theoretical electron IMFP data. We have also developed a system to allow semi-automatic data transfer from an instrument-controlling PC isolated from the network, by adopting a Wi・Fi-capable SD card’s scripting capability, while keeping the PC offline. We are working on further software development for on-demand data manipulation after R2R data conversion. So far it has been possible to perform XPS peak separation using automated information compression technique. Using these components, high-throughput data conversion/accumulation and data analyses are realized, where human interaction is minimized. Using metadata extracted from raw data, other users can reproduce or repeat measurements even if they did not carry out the original measurement. Human-legible and machine-readable numerical data is utilized for statistical analyses in informatics.