On the steady search for advanced or even novel materials with tailored properties and functions, high-throughput screening is by now an established branch of materials research. For successfully exploring the huge chemical-compound space from a computational point of view, two aspects are crucial. These are (i) reliable methodologies to accurately describe all relevant properties for all materials on the same footing, and (ii) new concepts for extracting maximal information from the big data of materials that are produced since many years with an exponential growth rate.
The talk will address both challenges. In particular, I will stress that big data of materials are structured in a way that is typically not visible by standard tools. Furthermore, with respect to a certain (desired) property, the practically infinite-dimensional space of different materials is very sparsely populated. Indeed, the key issue in data-driven materials science and machine learning of materials is to find the proper descriptive parameters (descriptors) that characterize the materials and their property.
We will show that and how compressed sensing, originally designed for representing a complex signal in the lowest possible dimensionality, can select, out of a huge-dimensional space of potential descriptors (features), a low dimensional descriptor. The talk also emphasizes the importance of causality in this learning process.