AVS 60th International Symposium and Exhibition
    Accelerating Materials Discovery for Global Competitiveness Focus Topic Thursday Sessions
       Session MG-ThA

Invited Paper MG-ThA10
Data-Driven Discovery of Physical, Chemical, and Pharmaceutical Materials

Thursday, October 31, 2013, 5:00 pm, Room 202 B

Session: Theory, Computation and Data-Enabled Scientific Discovery
Presenter: B.A. Jones, IBM Almaden Research Center
Correspondent: Click to Email

Data-driven insights have aided materials discovery in the pharmaceutical and related chemical areas for some years now, with even commercial products available. I will describe some successes in these areas, and derive lessons which might be applicable to the areas of condensed matter and polymeric materials. Three points which l will be emphasizing are that a) Just as experiment, theory, and computation must guide each other for MGI to succeed, a triangle of inter-relationships, for MGI it is really a tetrahedron, with computer science forming the fourth vertex. I will discuss the benefits which modern computer science can bring in the areas of modern data mining, machine learning, and big data analytics techniques. The volume of data on materials is fast-growing and scattered across many sources. While new tools and platforms have allowed the processing of vast volumes of data, our ability to integrate heterogeneous and unstructured data sets is still developing. The ability to correlate data from multiple sources deepens the value of data and allows new insights to emerge. b) The elements of accelerated materials discovery are different in the different scientific fields. Pharmaceutical discovery involves extracting chemical constituents and structures from patents; polymer data is scattered, unstructured, statistical and often ambiguous; and in condensed matter we tend to look at materials properties as a function of some parameter such as doping or temperature, often in graph form. Understanding the needs of both soft and hard condensed matter will help common tools and synergies to develop. c) There are many challenges ahead in fully incorporating data-enabled scientific discovery, as well as learning on both computer science and materials science sides. Getting scientific insights from both computer scientists and from data mining and databases is not yet common, and requires some work ahead in both communities to familiarize themselves with opportunities and to optimize the tools needed for future materials by design.