Data science and cyberinfrastructure: critical enablers for accelerated development of hierarchical materials

Abstract

The slow pace of new/improved materials development and deployment has been identified as the main bottleneck in the innovation cycles of most emerging technologies. Much of the continuing discussion in the materials development community is therefore focused on the creation of novel materials innovation ecosystems designed to dramatically accelerate materials development efforts, while lowering the overall cost involved. In this paper, it is argued that the recent advances in data science can be leveraged suitably to address this challenge by effectively mediating between the seemingly disparate, inherently uncertain, multiscale and multimodal measurements and computations involved in the current materials’ development efforts. Proper utilisation of modern data science in the materials’ development efforts can lead to a new generation of data-driven decision support tools for guiding effort investment (for both measurements and computations) at various stages of the materials development. It should also be recognised that the success of such ecosystems is predicated on the creation and utilisation of integration platforms for promoting intimate, synchronous collaborations between cross-disciplinary and distributed team members (i.e. cyberinfrastructure). Indeed, data sciences and cyberinfrastructure form the two main pillars of the emerging new discipline broadly referred to as materials informatics (MI). This paper provides a summary of current capabilities in this emerging new field as they relate to the accelerated development of advanced hierarchical materials (the internal structure plays a dominant role in controlling overall properties/performance in these materials) and identifies specific directions of research that offer the most promising avenues.

Materials, Manufacturing, and Informatics

Materials with enhanced performance characteristics have served as critical enablers for the successful development of advanced technologies throughout human history and have contributed immensely to the prosperity and wellbeing of various nations. A majority of the materials employed in advanced technologies exhibit hierarchical internal structures with rich details at multiple length and/or structure scales (spanning from atomic to macroscale). Collectively, these features of the material internal structure are here simply referred to as the structure and constitute the central consideration in the development of new/improved hierarchical materials. Indeed, the existence of a causal relationship between the material structure and its properties is the central tenet in the field of materials science and engineering. It should be noted that the word structure is used very broadly in these statements (and in this paper) to include and refer to any of the details of the material internal structure (spanning all relevant length or structure scales involved).

Indeed, the mathematical description of the material internal structure in its entirety, in any selected material system, is unimaginably complex and demands very high dimensional representation. For example, most materials being explored for structural applications (e.g. Ti alloys in jet engines and advanced high strength steels, Mg alloys in lightweight automobiles, Al alloys in aerospace frames, and Zr alloys in nuclear industry) exhibit polycrystalline microstructures at the mesoscale.^1–4 As an example, Fig. 1 shows details of the mesoscale structure in such materials. A rigorous representation of the hierarchical structure in such materials should also include details at other relevant length/structure scales (e.g. point defects, dislocations, grain boundaries, phase boundaries). Although the above discussion was framed in the context of a crystalline material, similar considerations exist in most other material classes. For example, the hierarchy in polymer structures⁵ includes details of monomers and their spatial arrangements into blocks and branches at the molecular or macromolecular level, micro-fibrils and crystallites at the nanoscale, and spherulites at the microscale. The hierarchy in most biological materials is indeed much richer. For example, the hierarchy in bone structure includes details of collagen molecules and mineral crystals, collagen fibrils, collagen fibre, lamella, osteons and macrostructure (e.g. cancellous or cortical).^6–8 Furthermore, most materials of interest in advanced technologies actually tend to be composites comprising multiple material classes.

Mesoscale internal structure of beta-stabilised polycrystalline titanium containing 4300 crystals (or grains) taken from Ref. 4. This experimental dataset was generated by a three-dimensional (3-D) reconstruction that entailed the use of serial sectioning, optical microscopy with intermittent electron backscatter diffraction (EBSD), and image segmentation and processing algorithms. The sample size is 1·115×0·516×0·3 mm³ (1670×770×200 voxels). The 3-D crystal lattice orientation in each voxel is included in this experimental dataset. The colour key corresponds to the stereographic projection of the crystallographic orientation parallel to the Z-axis, shown in bottom left

It is emphasised again that the discussion in this paper is exclusively focused on hierarchical materials. In other words, the simplest of these materials exhibits at least two distinct well separated length or structure scales (e.g. the macroscale and the microscale). It should also be noted that the description of the structure in such hierarchical materials implicitly includes a full description of the chemical compositions of all distinct microscale constituents (called local states) present in the material system, in addition to their relative spatial placement in the internal structure. In other words, the information included in the description of the material structure is orders of magnitude more detailed than the simple overall chemical composition typically used to identify or label a material system.

Based on the above description, it should be clear that a vast number of tiered spatial distributions have to be quantified to faithfully represent the complex hierarchical structure of advanced material systems. It is obvious that such an effort would result in an extremely large and unwieldy representation. Fortunately, the field of materials science and engineering has already empirically discovered that only certain salient features of the material structure dominate the macroscale performance characteristics of interest for any selected application. Therefore, the main challenge in the development of materials with enhanced properties reduces to identifying and tracking the salient structure features that are important to a specific engineering or technology application. In other words, the core knowledge needed to guide the materials’ development efforts can be sought and expressed as reduced-order process–structure–property (PSP) linkages that capture the roles of different unit manufacturing (or processing) steps on the salient structure features that control the property combinations (or performance characteristics) of interest. It is important to recognise that these linkages represent reduced-order models as they utilise reduced-order representations of the material structure. Historically, such efforts have been largely guided by the scientific approach that entails formulating a fundamental hypothesis and then validating it with carefully designed experiments conducted in highly controlled environments. Such science-driven approaches for establishing PSP linkages have been expensive and slow,^9–11 because their focus has been to isolate and study each physical mechanism (i.e. cause) and its associated effect in a highly systematic manner.

From a data science perspective, one can formalise the discussion above in terms of the fundamental data transformations involved, as summarised in Fig. 2. Raw data related to materials phenomena of interest is usually generated by some combination of experiments, models, and simulations. Recent years have witnessed an explosion in the ability of materials experts to generate data from novel experiments and simulations. For example, the 3-D experimental dataset shown in Fig. 1 can now be generated using mostly automated protocols.^12,13 In spite of this automation, this technique incurs a substantial amount of time (of the order of several days). An exciting development in this field is the use of a femto-second laser for fast serial sectioning of the sample,¹⁴ as opposed to the conventional mechanical approaches used in the earlier studies. This new technique has the potential to dramatically reduce the time required to obtain a 3-D structure dataset. It has also been demonstrated that a focused ion beam attached to a scanning electron microscope can be used for serial sectioning the samples and reconstructing a 3-D material structure dataset (e.g. Refs. 15 and 16). However, this technique is ideal only for studies of very small volumes of material (with length scales of the order of a few micrometres). While the approaches mentioned earlier are all destructive (they ablate the material to expose new surfaces of the sample), there are also a number of non-destructive techniques that rely on the use of X-rays. When the X-ray techniques are combined with computed tomography techniques, it is possible to produce reconstructions of a broad range of 3-D material datasets including porous structures (e.g. Refs. 17–19), mapping of defects (e.g. Refs. 20–22) and polycrystal microstructures (e.g. Refs. 23 and 24). In fact, it is now possible to obtain 4-D (three spatial dimensions and time) reconstructions using data gathered from high energy X-rays.²⁵ At the finest spatial resolution, it is also possible to obtain 3-D and 4-D structure datasets at the atomic scales using techniques such as transmission electron microscopy²⁶ and atom probe microscopy (e.g. Refs. 27 and 28). In parallel, there have also been tremendous advances in the ability to generate simulation datasets from computations at multiple length/structure scales (e.g. Refs. 29–43). The volume of this data (from both experiments and models) can be very large ushering the materials community into the materials big data era.

Schematic description of the envisioned transformations for materials data

The main purpose of the structure datasets is that they allow the materials specialists to extract trends on the evolution of selected salient structure features during a given manufacturing route and study how these details affect certain effective properties/performance characteristics of interest for the material. For example, considerable prior effort in the development of structural metals has been spent on correlating the average grain size in the final metal product to the various thermo-mechanical deformation histories applied during the manufacture of metal alloys. This is because the average grain size is generally observed to strongly influence the overall mechanical properties of the metal product in service (e.g. Refs. 44–49), although it is not the only factor influencing the final performance. However, this approach of salient structure parameter identification and exploration has provided tremendous new insights (higher value information) to improve the performance of many structural material systems of interest. In the data science formalism, one might characterise these higher value descriptions (identifying specific trends between selected parameters as opposed to comprehensive multivariate linkages) of PSP linkages as information (see Fig. 2). This is mainly because, at this stage, not all the dominant features in the PSP linkages of interest have been identified in a comprehensive manner. At the next higher level, one can aim to extract much more rigorous, reliable, and complete PSP linkages from all the available data; this information could then be characterised as materials knowledge. One of the central goals of the emerging field of materials informatics (MI) is to introduce novel data-driven approaches for mining materials knowledge from the large collections of experimental, modelling and simulation datasets available (and/or being produced) today. Furthermore, the comprehensive PSP linkages available at this stage should allow a rigorous quantification of the inherent uncertainty. At this level of knowledge, the available PSP linkages can be successfully employed in simulating manufacturing processes of interest and predicting performance of the final product. However, the main focus in the data transformations at the knowledge level continues to be in the forward direction (process→structure→properties). At the final stage of data transformation, effort would be focused on establishing invertible PSP linkages that allow customised process and materials design for targeted applications (i.e. address inverse problems). This highest level of the understanding of PSP linkages can then be characterised as wisdom. The primary focus in this paper will be on data analytics needed to extract materials knowledge from the ensembles of materials structure and performance datasets being produced by the materials experts, with an eye towards attaining wisdom in the future.

In order to realise the goals stated above for efficiently transforming materials data into knowledge and wisdom, and dramatically lowering the cost and time involved in materials development efforts, it is imperative to develop novel protocols that fully exploit the large data generation capabilities made possible through the recent advances in multiscale measurements^12–28 and simulations of materials phenomena (e.g. Refs. 34–42). The central challenge is that in spite of the many advances there remain a large number of unknowns or gaps in capturing the underlying physics (at the different length scales). These critical gaps hinder the development of fully predictive PSP linkages for most hierarchical materials of interest to advanced technologies. The only practical way forward for the foreseeable future is to formally treat the hierarchical material as a complex system,⁵⁰ which by definition is not yet amenable to predictive models. If one embraces the premise that a certain degree of uncertainty is inevitable in the formulation of the desired PSP linkages for hierarchical material systems, the focus could then be shifted to managing the uncertainty (i.e. complexity). In other words, the effort could be focused on the design, development, and validation of decision support systems that will leverage the best available understanding (with its uncertainties) and provide objective guidance on future effort investment (e.g. what combination of experiments and simulations are needed to reduce the uncertainty).

Given the high cost of the multiscale measurements, it is also obvious that the desired protocols for establishing materials knowledge and wisdom (see Fig. 2) will have to rely on a limited number of experimental investigations. However, these experiments have to be specifically designed to efficiently cross-feed multiscale structure-sensitive materials models. The central considerations for these new protocols should be:

(i)

model maturity

(ii)

model interoperability

(iii)

model inversion. Briefly, model maturity quantifies the reliability (or the uncertainty) of the predictions of any given model over a prescribed window on the input ranges.

The focus here is largely on multiscale physics-based models (these are critical for achieving adequate accuracy over sufficiently large windows on the input ranges) for predicting either the structure–property relationships or the manufacturing process–structure evolution relationships. Consequently, protocols are critically needed for robust evaluation of the model maturity over any selected range of initial structures and boundary conditions (defining either the manufacturing process conditions or the in service loading conditions). The main impediments for establishing these protocols are:

(i)

lack of a broadly adopted framework for rigorous quantification of the material structure

(ii)

lack of validated experimental protocols for direct measurement of the various materials’ parameters introduced in the multiscale models and/or the ‘at-scale’, full-field, measurements of response variables predicted by the multiscale models (needed for the critical validation of the models).

Model interoperability ensures that the distinct components of a hierarchical multiscale model chain that typically address specific materials phenomena at selected length/structure scales are able to exchange the high value information with the other components of the model chain seamlessly with manageable (quantifiable) loss of accuracy.⁵⁰ For example, in modelling the plastic response of polycrystalline metals,⁵¹ it is not yet clear what information about the dislocation structure needs to be communicated from dislocation dynamics simulations to crystal plasticity simulations. As a simple approach, one might decide to just communicate only the average dislocation density. However, if one is interested in understanding and predicting strain hardening and damage initiation/evolution, it would be necessary to communicate information on the higher moments of the dislocation field (or equivalently higher-order spatial correlations in the dislocation networks) to the crystal plasticity models operating at the next higher length/structure scale. The third key capability listed earlier, model inversion, is necessitated by the need to drive materials development efforts from considerations of performance requirements (i.e. invert the current ‘cause and effect’ approach to a transformative ‘goal-means’ approach articulated by Olson^42,52). A major impediment in model inversion arises from the simple fact that most currently used approaches in computational materials modelling have not been designed with invertibility in mind. For example, numerical approaches such as the finite element methods or the finite volume methods have been designed to study effects of imposed loading or boundary conditions on a selected initial microstructure. They are completely ill-equipped for tackling inverse problems such as identifying the set of material structures that are expected to meet or exceed a specified set of property/performance requirements. Model invertibility in most cases needs formulation of simplified, but sufficiently accurate, metamodels (also referred to as surrogate models) that cover the desired space of material structures and loading/processing conditions. In general these approaches demand compact, simple (e.g. algebraic), and sufficiently accurate representations of the PSP linkages^{1,42,50,52–54} to be of practical utility in providing critical decision support in the materials development efforts.

The above discussion should make clear the critical need and potential for the utilisation of modern data sciences (including advanced statistics, dimensionality reduction and formulation of metamodels) and cyberinfrastructure (including integration platforms, databases and customised tools for enhancement of collaborations among cross-disciplinary team members) in overcoming the impediments described above. These have been identified as the critical enablers for the emerging materials innovation ecosystems in many national and international strategic initiatives.^9,55–60 In fact, data sciences and cyberinfrastructure have already been successfully employed in a broad range of other application domains. Examples include recommendation systems (e.g. Amazon⁶¹), personal informatics (e.g. Ref. 62), drug discovery (e.g. Ref. 63), decision systems (e.g. Ref. 64), and healthcare (e.g. Ref. 65).

Data sciences and cyberinfrastructure are the foundational pillars of the emerging field broadly referred to as Materials Informatics (MI).^66–76 This emerging new field has thus far focused largely on materials discovery through combinatorial chemistry and variations of crystal structures at a single length/structure scale. In this paper, the focus will remain on hierarchical materials, where microstructural features at different length/structure scales play important roles in controlling the macroscale properties/performance characteristics of interest. Consequently, major emphasis is placed on first identifying and then communicating the high value information among the constituent length scales for a hierarchical material system. Furthermore, because the governing physics at different length/structure scales vary dramatically, and because of the highly localised nature of the knowledge and expertise of such phenomena, realisation of the goals articulated earlier is critically dependent on the availability of suitably designed cyberinfrastructure that will facilitate and enhance cross-disciplinary collaborations.

Extensible Framework for Structure Quantification

The lack of an extensible framework for material structure quantification, which is broadly applicable to the wide range of hierarchical materials of interest to emerging advanced technologies, is the central impediment in ushering materials science and engineering into the big data age. Rigorous structure quantification is also foundational to the critically needed advances in development of novel data-driven protocols for model maturation, model interoperability, and model inversion. In spite of the central role structure plays in establishing core materials knowledge expressed as PSP linkages, it has eluded a broadly accepted quantitative definition. For example, although the American society for testing of materials (ASTM) standards are widely adopted by the multiple stakeholders in the manufacturing value chain (including materials producers, product designers and original equipment manufacturers), there is no ASTM standard yet for a comprehensive quantification of the material structure. At best, the current standards only address quantification of very primitive structure measures such as the average grain size^77,78 in relatively simple material systems. Measures such as the average grain size should be considered primitive because it is easy to envision multiple hierarchical material structures that have the same exact values for such primitive measures while exhibiting dramatically different macroscale properties/performance characteristics.

Core knowledge needed for the development of advanced hierarchical materials is best archived, curated and visualised in the higher dimensional space of variables used to represent the material structure.^54,67 This is because structure evolution during processing can be represented as a distinct pathline in the structure space and each point in this space can be associated with a single value of property combinations of interest.⁷⁹ Therefore, it would be possible to visualise the salient PSP linkages in a suitably defined low-dimensional projection of the structure space. The central challenge therefore is to define a practically useful structure space. When the structure space is defined using very primitive measures, it would not be able to distinguish between structures that exhibit very distinct performance characteristics. On the other hand, if the structure space is defined to account for every minute detail of the structure (implicitly demanding a high dimensional representation), it would not be amenable to a comprehensive exploration (e.g. for the optimisation of performance characteristics of interest for a selected application). This is precisely where a data-driven approach offers many advantages. In a data-driven approach, the decision on exactly what constitutes the set of important salient features is not taken in a static manner – instead it is taken objectively based on the actual available data. It is continuously refined as more data becomes available. Therefore, the emerging interdisciplinary MI field focuses mainly on computational algorithms and tools designed to extract and curate the embedded materials knowledge in an objective (data-driven) and dynamic manner. This is accomplished using a combination of advanced statistics, applied mathematics and modern cyberinfrastructure.

The above discussion should make clear that an extensible framework for material internal structure quantification is the central starting point in formulating a data-driven approach to hierarchical materials development. Only an extensible framework would permit automated and efficient evaluation of multiple choices one faces in this daunting task. Moreover, only an extensible framework will allow automated documentation of the novel integrated workflows that are yet to be explored and evaluated in pursuit of the grand challenges identified earlier. When such a framework is implemented on a broadly accessible cyberinfrastructure, it will allow identification of the best integrated workflows (integrating experiments and models, materials and manufacturing, etc.) based on the experience accumulated from the broader community. The desired requirements laid out above can be satisfied by seeking a digital signal representation of the material structure⁸⁰ as , which denotes the probability that a specified spatial bin (or voxel) indexed by s is physically occupied by a potential local state indexed by h. Since the values of m are bounded between zero and one (in many cases it can be just binary⁸⁰), it produces a generalised representation for a broad range of materials systems at different length/structure scales. The information on the different length scales is encoded into the properties associated with the spatial bins, while the information on the local state of the material (e.g. chemical composition, phase identifiers, tensorial representations of different defect configurations of interest) is encoded into the properties associated with the bins in the local state space. In addition to transforming the material structure into a versatile digital signal, this approach inherently treats the material structure as a stochastic process because of the probabilistic interpretation assigned to the variable m. The digital signal representation of structure offers many advantages including fast computation of spatial correlations,^1,81,82 automated identification of salient structure features in large datasets,⁸³ extraction of representative volume elements (RVEs) from an ensemble of datasets,^3,15,84 reconstructions of structures from measured statistics,^81,85–87 building of real-time searchable structure databases,^67,88 and mining of high fidelity multiscale structure–performance–structure evolution correlations from physics-based models.^89–92

Because of the absence of a natural origin from where one might start indexing the spatial bins, only the relative placement of local states in the material structure contains meaningful information. An extensible framework for rigorous quantification of spatial correlations in the material structure is available in the form of n-point spatial correlations (or n-point statistics).^{1,67,81,82,93,94} Although a number of other ad hoc measures of material structure are possible, only the n-point spatial correlations provide the most complete set of measures that are naturally organised by increasing amounts of structure information. For example, the most basic of the n-point statistics are the 1-point statistics and they reflect the probability density associated with finding a specific local state of interest at any randomly selected single point (or voxel) in the material structure. In other words, they essentially capture the information on volume fractions of the various distinct local states present in the material system. The next higher level of structure information is contained in the 2-point statistics, denoted as , which capture the probability density associated with finding local states h and h′ at the tail and head, respectively, of a prescribed vector r randomly placed into the microstructure. Mathematically, these are expressed as^81,82 (1) where r indexes the bins in the space of vectors (generally the same binning scheme as that was used for the spatial domain). In equation (1), S _r denotes the number of spatial bins for which the bins indexed s and s+r, both lie within the spatial domain of the material structure instantiation being studied. If assumptions of periodicity of the material structure are invoked (e.g. this is routinely done in evaluating the response of a selected structure using numerical approaches such as the molecular dynamics (MD), dislocation dynamics, finite element models, and phase-field models), then S _r = S, where S is the total number of spatial bins in the microstructure instantiation. It is also pointed out that computationally efficient schemes for computing the spatial correlations using discrete Fourier transforms (DFTs) have been developed and utilised successfully.^81,82 Although several of the prior studies have routinely assumed periodicity of the material structure in their computations, it is relatively simple to devise a padding scheme⁹⁵ that allows one to efficiently compute the spatial correlations using DFTs, even for the case when the assumptions of periodicity are not invoked.

It should be noted that there is a tremendous leap in the amount of structure information contained in the 2-point statistics compared to the 1-point statistics. Higher-order correlations (3-point and higher) are defined in a completely analogous manner. The relationships between these microstructure measures and several of the classically defined ones are summarised in several books.^53,93 An implicit benefit of treating the material structure in a statistical framework is that it naturally leads to a quantification of the variance associated with the structure.^{30,53,96–100} The variance in structure can then be combined appropriately with the other uncertainties in the process (for example, those associated with the measurements and those associated with the models used to predict overall properties or performance characteristics of interest in an engineering application) to arrive at the overall variance in the component performance. Lack of tight variances on the performance characteristics of the final product is often cited as one of the main reasons for the inability to scale a process from the laboratory scale to the manufacturing environment. As these variances can be traced to variances in material structure (produced by variances in processing), it is imperative to track the variances in the material structure using a practical approach. Once again data-driven approaches provide a way forward to addressing this challenging task.^88,96

The strongest support for the choice of n-point spatial correlations as the most appropriate measures of material structure comes from the pioneering work of Kroner,¹⁰¹ who has taught us that the effective properties of composite material systems can be conveniently expressed as a series sum with the structure details entering this series explicitly in the form of n-point spatial correlations. These composite theories have been generalised to a broad range of materials phenomena and have been summarised in several books.^53,93,102 There are also several reports in literature, where they have been successfully applied to estimate effective properties (both linear and non-linear) of a broad range of materials with complex structures.^103–109 Physically, the n-point spatial correlations are very effective in rigorously quantifying the local neighbourhoods in the complex internal structure of most advanced materials. As the local neighbourhoods control the local response, it is only logical that the n-point spatial correlations are the ideal measures of the material structure in formulating PSP linkages of interest in designing high performance engineering components.

Reduced-order representations of microstructure

For most structural material systems of interest in advanced technologies, the set of n-point statistics is an extremely large unwieldy set even for n = 2. Rigorous analyses and mining of these datasets are only possible with the application of data science tools. For example, it was recently demonstrated that techniques such as principal component analysis (PCA)^110–112 can be used to obtain objective low dimensional representations of the 2-point statistics.^67,96 Principal component analysis provides a linear transformation of high dimensional data in a new orthogonal frame where the axes are ordered according to the observed variance among the elements of the dataset. Consequently, a truncated PCA representation provides an objective (data-driven) reduced-order representation of the original data. It is emphasised here that although PCA dimensionality reduction techniques have been explored in materials problems in prior literature,^69,113 they have only recently been employed on 2-point spatial correlations of microstructure in attempts to successfully extract high fidelity structure–property linkages.^67,88,96,114

As an example, let denote the truncated set of independent 2-point statistics⁸² of interest in a specific application. Let i = 1, 2, …, I enumerate the elements of an ensemble of material structures being studied. It is generally expected that I≤R. In such situations, PCA identifies a maximum of (I−1) orthogonal directions in the R-dimensional space that are arranged by decreasing levels of variance in the given ensemble of structures. Mathematically, the PCA representation of any member of the selected ensemble (of structures), labelled by superscript (k), can be expressed as (2) where is simply the averaged 2-point statistics for the entire ensemble, and (referred as PC weights) provide an objective representation of the (k)th structure in the new orthogonal reference frame identified by φ_ir (from PCA). Another important output from the PCA is the significance of each principal component b _i obtained in the eigenvalue decomposition performed as a part of the PCA.^110–112 The values of b _i provide important measures of the inherent variance among the members of the ensemble of structures.⁹⁶ More importantly, by retaining only the components associated with the most significant eigenvalues, it is often possible to obtain an objective reduced-order representation of the structure with only a handful of parameters. Mathematically, this reduced-order representation can be expressed as (3) where . Selection of R ^* will depend on the specific properties that need to be correlated to the structure metrics. Note also that the concepts described above can be easily extended to include higher-order statistics of the structure (e.g. 3-point spatial correlations).

The PCA representations of the n-point statistics have been successfully used in automated and efficient classification of various ensembles of structures.^67,88 An example is reproduced here in Fig. 3. Although only the first three dimensions are plotted in Fig. 3 (i.e. R ^* = 3), it should be noted that this approach yields data-driven reduced-order representations for structure ensembles to arbitrary truncation levels selected by the user. As noted earlier, PCA provides guidance regarding the significance of each principal component (b _i) through which the user can make an objective decision regarding the acceptable truncation level for a specific application.

Visualisation of an ensemble of material structures, taken from Ref. 67. Each point in the reduced-order three-dimensional PCA space represents a micrograph (examples shown on left) and each coloured volume represents a structure class. The size of each coloured region reflects the variance within the class. The axes in the 3-D plot correspond to the α_i in equation (3). The colour key for the different heat treatments is as follows: HT1 = Red, HT2 = Blue, HT3 = Green, HT4 = Cyan, HT5 = Magenta

One of the benefits of the PCA representations shown in Fig. 3 is that it also quantifies the inherent variance in a given class of structures. For example, it is clear from Fig. 3 that the structures in HT3 exhibited the highest variance, whereas those in HT2 produced the lowest variance, among the five heat treatments studied. Although quantitative values of the variance were not reported in this specific study, they were explored in great detail in a subsequent study that used the same foundational concepts.⁹⁶

In the examples presented above, the local state was defined at the continuum scale and identified the specific phase found in the micrograph. However, the same methodology can be applied to a broad range of other material structures at other length scales. In a recent paper, this approach was successfully applied to quantify the semi-crystalline polymer structure datasets produced by MD simulations.¹¹⁵

Structure measurements and reconstructions

The discussion above raises an important question: exactly what should we be measuring when we desire to extract the important PSP linkages needed for materials development efforts? The conventional approaches in materials science and engineering are generally focused on mapping contiguous volumes of the material internal structure in two or three dimensions at various length scales of interest. If indeed only a finite set of spatial correlations is needed in formulating PSP linkages of interest (as suggested by the PCA example presented earlier), it should be possible to develop customised protocols that focus exclusively on the important statistics and produce the required information in a cost-effective manner. This is especially true, when the characterisation technique involves probing the material structure voxel-by-voxel and each measurement incurs a significant cost (e.g. measurement of crystal lattice orientations by electron back-scattered diffraction¹¹⁶ and measurement of local mechanical properties using nanoindentation^117–119). For example, Adams and co-workers^53,100,120 have demonstrated that it is much easier to recover 2-point and 3-point spatial correlations in three dimensions in polycrystalline samples, when compared to the effort involved in measuring the material structure in 3-D contiguous volumes in the same class of samples.^{4,14,121–123} These authors have also demonstrated that it is often possible to recover distribution functions quantifying structure in 3-D using information gathered on 2-D sections using theories from stereology.^124,125 While these prior studies demonstrate tremendous potential for dramatically reducing the cost incurred in structure quantification, they are still very much in a nascent stage of development. Much future work is needed to further refine and critically validate these approaches and their ability to produce robust and reliable PSP linkages for a broad range of materials being developed for emerging technologies.

After deciding what should be measured, the next question to address is how much data are needed. The goal of structure measurement, in general, should be to quantify not only the expected values of the structure measures of interest, but also their variance. It is well known that control of the structure variance is the best means to control the variance in the properties/performance of the final product. When using the framework of the n-point statistics along with the PCA representations in an orthonormal frame (in the space of statistics) described earlier, the variance can be related to the eigenvalues computed as a part of the PCA decomposition.^88,96 Roughly speaking, the variance can be mathematically related to the volumes of the regions occupied by the members of the ensemble of structures extracted from the sample (or multiple samples subjected to nominally the same processing history), as depicted in Fig. 3. Consequently, it is possible to establish a data-driven process that will objectively decide how much data are adequate to reliably estimate (to within a set accuracy limit) the distribution of the selected structure measures in any given sample.

In addition to the amount of data, one also needs to decide on a scan size in structure measurements. Within the framework of n-point statistics, the relevant length to consider in deciding on the scan size is the coherence length,^53,93 defined as the length beyond which the n-point statistics (obtained on an ensemble of structures taken from a given sample) are completely uncorrelated. This length therefore depends on the specific sample being studied. For example, in a perfectly disordered structure, the coherence length is of the order of the individual spatial bin size. However, perfectly disordered structures are seldom realised in practice. As one does not a priori know the coherence length in a given sample, one needs a few preliminary measurements (for example, these could be long line scans) to establish the coherence length and then ensure that the scan size is larger than the coherence length of the structure in the given sample. Generally speaking, if the scan sizes are of the order of the coherence length, one needs to acquire a sufficiently large number of scans in order to establish reliably the variance in the desired subset of spatial correlations, as discussed earlier. The general practice in the field, however, has been to obtain very large scans (as large as practically feasible within available resources) and use a small number of these large scans instead of a large number of smaller scans from different locations in the physical sample. This practice is tantamount to sub-dividing the large scan into smaller regions and treating each smaller scan as an independent measurement (although in reality it is not!). From a statistics viewpoint, the preferred practice would be to obtain a large number of adequately sized scans (each approximately about twice the coherence length) from randomly selected regions in the physical sample.

The discussion above naturally leads to the oft-debated question of how should one produce a RVE of the material structure. In the present context, it is highly desired that the RVE reflects the expected values of the important structure measures. It is noted here that most commonly adopted definitions of RVE in current literature^{102,126–138} focus largely on the convergence in the prediction of selected macroscale (effective) properties and do not explicitly consider whether or not the RVE has captured the structure details to sufficient accuracy. Incidentally, the classical definition of RVE provided by Hill¹³⁹ requires the RVEs to capture both the representative structure and its homogenised effective properties. In the present discussion, authors will focus first on the structure aspects and then address later the predictions of macroscale properties. Within the framework of the n-point statistics presented earlier, in order to faithfully capture the material structure, the RVE should reflect as closely as possible the expected values of the salient set of n-point statistics. Given the very high dimensional representations of n-point statistics, the only practical approach to this task is through the use of reduced-order representations such as those described earlier. For example, looking at Fig. 3, the goal would be to construct an RVE for any of the ensemble of structures (from any one of the heat treatments) in such a manner that the n-point statistics of the RVE would correspond to the centre of the volume of interest shown in this figure. Herein lies the main challenge of constructing RVEs that faithfully capture the main features of an ensemble of measured structures – it is often not easy to construct such structures from a prescribed set of spatial correlations.

One trivial solution to the RVE construction described above is to think of the RVE as an equally weighted representation of all the members of the selected ensemble of structures. If one were to use this approach, each member of the ensemble would represent a statistical volume element (SVE).^{98,99,140,141} In fact, if one were to follow this approach, the size of the SVEs can be significantly smaller than that of the RVE. The use of a set of SVEs of smaller volumes (instead of a single RVE) offers tremendous computational savings, especially when the macroscale properties need to be evaluated using sophisticated physics-based numerical simulations. The main disadvantage of using the equally weighted set of SVEs is simply the fact that one typically needs a fairly large number of SVEs to approximate the RVE,¹³² especially when SVEs are selected to be of relatively small volumes.

An alternate approach was recently presented by Niezgoda et al.,⁸⁴ who introduced the concept of weighted sets of Statistical Volume Elements (WSVEs). In this approach, the identification of a WSVE is approached as an optimisation problem that searches through all weighted combinations of the available SVEs and minimises the difference between the spatial statistics of the constructed WSVE and the ensemble averaged spatial statistics from all available SVEs, while being subjected to the following constraints: (i) the number of SVEs used to build the WSVE is limited to the number prescribed by the user and (iii) the weights assigned to the individual members of WSVE have to be positive and sum up to one. In other words, WSVE approximates the RVE as a set of optimally selected and weighted SVEs (from the available ensemble of SVEs) with the weights essentially representing the volume fractions of the selected SVEs in the RVE (see Fig. 4). It was demonstrated that the WSVEs established using the concepts described above automatically approximated well the effective properties associated with the larger structure datasets, while providing major computational advantages because of the dramatic reduction in the sizes and numbers of the volume elements.^3,15 This is mainly because the WSVEs efficiently capture the spatial statistics in the ensemble of SVEs (or the RVE). The computational advantages of the WSVEs were particularly impressive when computationally expensive models (e.g. coupled multiscale models, crystal plasticity) were used to estimate the effective properties or performance associated with a given microstructure.³

Illustration of the construction of a weighted set of statistical volume element (WSVE) comprised of three weighted optimally selected statistical volume elements (SVEs) for an experimentally characterised precipitate structure. The corresponding plots of 2-point statistics are shown on the right

Automated mining of process–structure–property linkages

The structure information gathered from protocols described above has very little intrinsic value. High value (both scientific and economic) is usually derived from these structure datasets when they can be associated with appropriate information on either the properties exhibited by them or the manufacturing processes employed to modulate them (into new structures with better final properties). This additional and crucial information is typically gathered through multiscale measurements and/or execution of physics-based numerical simulation tools. It should be noted that this task usually requires allocation of significant resources and time and therefore presents a major risk to those who undertake materials development activity. The ensuing risk from such effort and time consuming tasks can be mitigated to a large extent if suitable core knowledge is mined through such activities in automated, cost-effective, ways and successfully transferred to subsequent related tasks. This can be accomplished through the mining and establishment of reliable PSP linkages that can be applied to a broad range of structures (much broader than those used to establish the linkages themselves).

As discussed earlier, PSP linkages needed for the development of advanced hierarchical materials are best archived in a suitably defined low-dimensional projection of the structure space.^54,67,79 In some cases, it is possible to establish such linkages using intuitive selection of structure measures (e.g. Hall–Petch relations^142,143). However, given the large dimensional representations demanded by the complex structures in most hierarchical material systems, it is highly desirable to explore such relationships through DATA-driven approaches. These novel approaches offer many benefits: (i) they allow for automation in evaluating the multiple options one faces invariably in mining the salient PSP linkages of interest from the available experimental and simulation datasets. (ii) These approaches often cast the PSP linkages as simple metamodels (also referred as surrogate models) that require significantly lower computational cost (when compared to the physics-based multiscale models and experiments that generated the raw data used to establish these linkages) and are potentially invertible. This feature is of significant value to the engineering design/manufacturing stakeholders in the advanced technology sectors.

The reduced-order representations of the spatial correlations in the microstructure (see equation (3)) are foundational to a new data-driven framework^67,114,144 for establishing reliable low-cost structure–(homogenised) property metamodels from ensembles of experimental and/or simulation datasets. Although the establishment of the PSP linkages in this manner incurs a one-time cost, it is expected that this effort will lead to major savings in future tasks where the low-cost metamodels can be substituted for the more expensive experiments and/or simulations. For illustration of this approach, let denote one data point for each microstructure, where (k) indexes a specific microstructure in an ensemble of microstructures, P ^(k) denotes a specific macroscale property of interest established either from experiments or models, and denote the reduced-order representation of the microstructure (see equation (3)). Consider a dataset with K (i.e. k = 1,2,…,K) data points. The goal is to mine high fidelity structure–property linkages from such a dataset. In recently reported case studies,^67,114,144 this was successfully accomplished using simple polynomial functions and ordinary least squares linear regression techniques.³⁶ In order to mine such simple linkages, one needs to define a suitable error associated with each data point E ^(k) and use it appropriately in the regression method. As an example, this can be accomplished as (4) where denotes a pth-order polynomial function. The polynomial coefficients can then be established using standard protocols of minimising the sum of the squares of the residuals in the entire dataset (including all K data points). Note that the accuracy of the extracted polynomial linkage depends critically on the selection of both p and R ^*. Critical selection of these parameters is essential for the extraction of high-fidelity structure–property linkages. Although higher values of p and R ^* will always produce a lower value of the error, they do not necessarily increase the fidelity of the extracted linkages. This is because the higher values of p and R ^* may lead to over-fitting of the linkages and can produce erroneous estimates in any subsequent application of the linkages to new microstructures (those not included in the regression analyses). Leave-one-out cross-validation (LOOCV) represents one of the many ways to provide guidance for objective selection of the parameters p and R ^*, while avoiding over-fitting of the data. This technique involves the training of the polynomial fit K times, while leaving one data point out of the test set each time. Applied over K data points, LOOCV will quantify the contribution of each data point to the coefficients of a proposed polynomial fit. For an over-fitted polynomial, the exclusion of a single data point will cause significant change in the coefficients, whereas for a good fit this change will be negligible. In summary, in this approach, one makes a judicious compromise in choosing the best fit based on a thorough consideration of the error distributions from both the regression method and the cross-validation technique.

The data-driven approach described above offers many advantages: (i) the process of establishing the PSP linkages can be largely automated with a comprehensive exploration of different error measures, different functions for capturing the linkages, and different techniques for quantifying the degree of over-fit. (ii) The established PSP linkages can often be dynamically modified with only an incremental effort (requires cleverly designed algorithms) when additional data become available. (iii) The error distributions computed as a part of these protocols also quantify the inherent uncertainty of the mined PSP linkages. Figure 5 depicts an example from our recent work,¹⁴⁴ where the focus was on establishing structure–property linkages that could guide the design of the optimal processing path in a class of steels with inclusions.

Variation of the error from the regression analyses and the cross-validation for different truncation levels in the reduced-order quantification of the spatial correlations in a class of two-phase material structures and their linkage to macroscale yield properties.¹⁴⁵ Examination of these errors indicates that R ^* = 3 and p = 4 and presents the best compromise between a good fit and the risk of over-fitting. The plot on the bottom left shows the match between the original data (gathered from finite element simulations of the type shown on bottom right) and the predictions of the metamodel mined from the data. A total of 400 data points were generated using finite element model simulations on an ensemble of material structures with a range of precipitate volume fractions, precipitate shape and size distributions to establish this structure–property linkage

In another variation of the data science approach, the computational cost of solving the numerically stiff non-linear constitutive laws of crystal plasticity theory was reduced by about two orders of magnitude.^145–149 This was accomplished through the use of a compact database of DFTs to efficiently reproduce the solutions from the physics-based model for the main functions of the crystal plasticity theory for any given crystal orientation subjected to arbitrary deformation mode. As with the earlier example, a special advantage of the database approaches suggested here is that trade-offs can be made by the user in terms of the desired accuracy and computation speed in any simulation through the selection of the truncation levels in the metamodel (in the case of crystal plasticity simulations this is controlled by the number of dominant DFTs retained in the metamodel).

The structure–property linkages described earlier are aimed at passing the salient information from lower length scales to the higher length scales. However, in certain situations, it becomes necessary to simulate coupled phenomena at two well-separated length scales. As an example, consider the simulation of a complex processing operation where different macroscale spatial locations in the workpiece experience different thermal histories (often an unavoidable consequence of the boundary conditions imposed at the macroscale). Consequently, strong variations in the material structure should be expected at different locations in the workpiece. In other words, it is not enough to track the evolution of a single representative material structure for the entire workpiece. The development of such structure heterogeneities can be expected to influence the macroscale simulation by altering the local effective properties at different locations in the workpiece. In such a situation, it is necessary to track independently material structures at multiple macroscale locations in the workpiece, and pass high value information in both directions (between the constituent length scales). Accomplishing this task within the currently employed computational frameworks requires executing a very large number of numerical simulations at the lower length scale within simulations executed at a higher length scale (e.g. multilevel finite element method¹⁵⁰). This is extremely difficult, if not impossible, to address real-world hierarchical materials design and development problems using any of the currently employed computational strategies.

The challenge described above can be addressed with modest computational resources using a data science approach called materials knowledge systems (MKS).^{90–92,151–155} In the MKS framework, the focus is on localisation (i.e. opposite of homogenisation) relationships that capture the spatial distribution of the response field of interest (e.g. stress or strain rate fields) at the microscale (on a RVE) for an imposed loading condition at the macroscale. In this approach, the localisation relationships are expressed as calibrated metamodels, whose specific forms are inspired by rigorously established composite theories called as statistical continuum theories.^{81,101,156–158} More specifically, these localisation linkages are expressed as a simple algebraic series sum those terms that capture systematically the individual contributions from a hierarchy of local structure descriptors. Each term in this series expansion is expressed as a convolution of the appropriate local structure descriptor and a physics-capturing kernel. A salient feature of the MKS appfroach is that the physics-capturing kernels are calibrated to results from previously validated numerical models for the multiscale phenomena being studied (for example, in studies of stress or strain localisation in a composite materials system, the MKS linkages would be calibrated to results obtained from execution of validated micromechanical finite element models on a selected ensemble of material structures). The most impressive benefit of the MKS approach lies in the dramatic reduction of the computational cost, often by several orders of magnitude compared to numerical approaches typically employed in material structure design problems. In various preliminary demonstrations, the MKS methodology has been successfully applied to capturing thermo-elastic stress (or strain) distributions in composite RVEs,^90,92,152 rigid-viscoplastic deformation fields in composite RVEs,⁸⁹ and the evolution of the composition fields in spinodal decomposition of binary alloys.⁹¹

Let 〈 p 〉 denote a macroscale imposed variable (e.g. local stress, strain or strain rate tensors) that needs to be spatially distributed in the microstructure as p _s for each spatial cell indexed by s. In the MKS case studies completed to date, the physical quantities of interest were chosen such that 〈 p 〉 is equal to the volume averaged value of p _s over the microscale. In other words, the response variable chosen was selected such that it is conserved in going between the constituent length scales. The MKS localisation relationship is expressed as^92,153 (5) where the kernels and are referred to as the first-order and second-order influence coefficients, respectively that are independent of the microstructure descriptors . For multi-scale problems involving elasticity, these influence coefficients are fourth-rank tensors. The influence coefficients capture the contributions of various microstructure features in the neighbourhood of the spatial position s to the local response field at that position. In this notation, t enumerates the bins in the vector space used to define the neighbourhood of the spatial bin of interest,⁸⁰ which has been tessellated using the same scheme that was used for the spatial domain of the material internal structure. It should be noted that the influence coefficients in the localisation relationship (equation (5)) are closely related to the well known Green's functions.⁸¹

The calibration of the first-order term in the MKS series is made possible by the fact that equation (5) takes a much simpler form when transformed into the DFT space, which can be expressed as (6) where denotes the DFT operation with respect to the spatial variables s or t, and the superscript * denotes the complex conjugate. Note that the number of coupled first-order coefficients in equation (6) is only H, although the total number of first-order coefficients still remains as S×H. This simplification is a direct consequence of the well known convolution properties of DFTs.¹⁵⁹ Because of this dramatic uncoupling of the influence coefficients into smaller sets, it becomes trivial to estimate the values of the influence coefficients (in the DFT space) using standard regression methods. It is emphasised here that establishing is a one-time (calibration) computational task for a selected composite material system and a selected physical phenomenon of interest (including a description of the boundary conditions).

The details of the calibration procedures for the influence coefficients have been discussed in detail in prior publications.^90,92 Briefly, the influence coefficients were calibrated using digitally created microscale volume elements (MVEs) subjected to selected periodic boundary conditions in finite element simulations. In prior work, periodic boundary conditions were utilised,^{90,92,160–162} as they are particularly well suited for DFT representations. It should also be noted that the selection of the size of the MVE can have a significant influence on the calibrated values of the influence coefficients. As the influence coefficients are expected to decay to zero values for increasing values of t, the localisation captured by equation (5) is associated with a finite interaction zone or finite memory. In order to capture the spatial characteristics of localisation accurately, it is recommended that the MVE size used for generating the calibration datasets be at least twice the size of the interaction zone. Since the size of the interaction zone is not known a priori, a few trials are typically needed to establish a suitable MVE size for a given material system and a selected physical phenomenon. Finally, it is also important to ensure that the MVEs are large enough that the boundary conditions do not significantly impact the calibrated values of the influence functions.

The influence functions established on smaller spatial domains (MVEs) can be easily extended and applied to significantly larger spatial domains such as those needed to represent RVEs.⁹² As the influence functions decay sharply with increasing t (just like Green's functions), they can be extended to larger spatial domains by simply padding the functions with zeros. It was demonstrated⁹² that the trivially extended influence coefficients accurately reproduced the microscale spatial distribution of the desired field on the larger MVEs with about the same accuracy that was realised for the smaller MVEs.

Figure 6 demonstrates the accuracy of the MKS approach for predicting the local rigid-perfectly plastic response in an example material structure with two isotropic phases.⁸⁹ The error between the MKS predictions and the FEM analysis was quantified in each spatial bin and the average error in the MKS predictions was noted to be only 2·2%. More importantly, the FE analyses using 93×93×93 3-D elements could not be performed on a regular desktop PC. It was executed on an IBM e1350 supercomputing system (part of The Ohio Supercomputer Centre) and required 94 processor hours. In contrast, the MKS method took only 32 s on a regular laptop (2 GHz CPU and 2 GB RAM).

Comparison of the contour maps of the local component of the strain rate tensor for a 3-D material structure predicted by the MKS approach and the conventional micromechanical finite element simulations. The middle section of the 3-D RVE used in the calculation is shown at the top a, while the predicted strain rate contours by the FE method b and the MKS established in this work c for the same section are shown below. Both phases are assumed to exhibit isotropic plasticity with yield strengths of 200 and 250 MPa, respectively. The macroscopic applied strain rate is 0·02 s⁻¹

Integration and collaboration platforms

The data science tools described earlier are aimed at mining the low-dimensional representations of the important PSP linkages critically needed to dramatically accelerate the rate at which new materials are designed, developed, and deployed in new high performance products introduced in the market place. However, to fully realise these ambitious goals, it is imperative to develop and validate suitable protocols for effective integration of the core materials knowledge (i.e. PSP linkages) in manufacturing process simulation and product design tools. Historically, this integration has not been easy (see Fig. 7). There exists a fundamental disconnect between how knowledge is sought and expressed in the materials and manufacturing fields. Experts in materials science often express the knowledge they accumulate from their experiments and models as highly simplified PSP linkages. Their desire to seek simplified PSP linkages is largely a byproduct of the usage of simplified intuitive measures for the quantification of the complex hierarchical material structure. However, these PSP linkages are rarely cast in a form suitable for the formulation of the internal state variable theories used widely in the manufacturing process simulation tools (same with product design tools) to describe the material constitutive response. This is because most internal state variable theories use sophisticated tensorial descriptors of the material internal state, which do not necessarily connect directly with the physical quantities measured and modelled by the materials experts. As a consequence of this fundamental disconnect in the practices in these two fields, integration of the materials knowledge into broadly used manufacturing simulation and product design tools continues to experience major hindrances.

Schematic depiction of the current and proposed protocols for integration of high value materials knowledge into manufacturing process and product design simulation tools

The data-driven approaches described in this paper offer an alternative approach that might address the challenge described above. The approaches described earlier are capable of organising the core materials knowledge (i.e. PSP linkages) as either low-cost metamodels or easily accessible databases that can be directly integrated into manufacturing simulation and product design tools (see Fig. 7). Preliminary examples of such integration are demonstrated in recent work^145,151 and have identified major computational advantages. In other words, data sciences can serve as an effective and direct integrator of the core materials knowledge into various components of the product design and manufacturing value chain. This would, however, be possible only through an intimate cross-disciplinary collaboration between materials experts, design/manufacturing experts, and data scientists. Because of the many barriers that currently exist between these fields (e.g. differences in approaches, terminology used), it is imperative to design and build novel integration platforms (i.e. cyberinfrastructure) that are specifically designed to enhance and accelerate such collaborations. Some of the desired components of this supporting cyberinfrastructure include (i) automated protocols for capturing and tracking data provenance through its many adaptations by the collaboration team members, (ii) automated protocols for the identification of the salient aspects of the data (i.e. metadata) and sharing them with cross-disciplinary team members, (iii) community building of ontologies and domain lexicons that enable and promote meaningful exchange of ideas, data, tools, and knowledge between cross-disciplinary team members, and (iv) a code repository with versioning. In essence, the approach described here can be referred to as DC-MGI or DC-ICME, and encompasses a data science and cyberinfrastructure supported approach to practical realisation of the materials genome initiative (MGI)⁹ and integrated computational materials engineering (ICME)¹⁰ visions (see Fig. 8).

Schematic depiction of the DC-materials genome initiative/integrated computational materials engineering (DC-MGI/ICME) approach

Summary and outlook

This paper has summarised the current status of an emerging framework for accelerating the development of new/improved hierarchical materials on the foundations of data sciences and cyberinfrastructure, while fully leveraging the recent advances in both experimental and computational sciences. Although the initial results described above are very promising, it should be clear that they represent the very early stages of this nascent new field. The framework described above needs several extensions before it can be applied to a large number of complex material systems explored in advanced technologies. In the results presented here, the local states of the materials were considered to be relatively simple and the materials phenomena explored were also relatively simple. Furthermore, most of the case studies completed to date considered mainly meso-length scales. It is therefore necessary to extend the framework and tools presented here to more realistic material systems where the material structure definitions demand the use of continuous state variables (e.g. polycrystalline materials where the local state description requires the specification of some combination of composition, phase identifier, and crystal lattice orientation) and span multiple length scales (from atomistic to the macroscale). Such extensions will in turn allow exploration of more complex materials phenomena encountered in typical manufacturing process routes (e.g. thermo-mechanical treatments) and in service conditions (e.g. fatigue).

As a simple example, consider a materials system where the local state in the material structure requires the description of the chemical composition (i.e. the structure description requires specification of the spatial distribution of the chemical composition). Let c _s denote the average chemical composition in the spatial bin s (suitably defined at the hierarchical length scale of interest). In prior work,⁹¹ this structure description was converted to a digital signal in a trivial manner by using the local state identifier h to index the primitive binning of the local state space. In other words, the range of composition was divided into a convenient number of bins, and each bin in this range was indexed with a different value of h to allow the easy conversion to a versatile digital signal that can be readily used with the spatial statistics calculators (based on DFT methods^81,82). While this approach produced reasonable results, it is clearly not computationally efficient, especially when a large number of local state bins are needed to capture the complex underlying physics of the problem (such as the structure datasets produced from phase-field simulations). A better approach would be to explore new spectral representations of functions over the continuous local state spaces of interest as they are likely to produce computationally efficient representations of the structure field. As an example, for the chemical composition variable mentioned earlier, preliminary (not yet published) work has indicated that spectral representations using Legendre polynomials produce highly efficient and compact representations. In a similar vein, it was very recently demonstrated¹⁵⁵ that generalised spherical harmonics¹⁶³ serve as excellent spectral basis for functions on the crystal lattice orientation space (needed in describing polycrystalline microstructures). If this is properly accomplished in a rigorous mathematical framework, it should be possible to obtain the most compact spectral representations of the n-point spatial correlations of a complex microstructure that are particularly suited to data-driven approaches, where the higher-order terms in the spectral series would be explored on an as-needed basis (as demanded by available data) with incremental non-redundant effort (i.e. the additional terms in the series do not change the values of the terms already included in the series).

A second critically needed extension to the framework presented here for the computations of the n-point statistics may focus on the treatment of point cloud datasets such as those produced in MD simulations. As these datasets do not typically provide data on a uniform spatial grid, the techniques described here need further refinement to be computationally efficient for such datasets. One possible direction would be to develop efficient computational protocols to convert point cloud datasets into digital microstructure signals described on a uniform spatial grid. Another option is to explore the use of special algorithms that compute Fourier transforms efficiently for data on a non-uniform grid. It is also possible that sometimes the microstructure information cannot be expressed as point data. This might happen in describing complex defect structures (e.g. dislocation structures). Further enhancements to the framework are needed to address such situations.

The reconstruction of the microstructure from spatial correlations represents a major gap at this time. Although it is possible to reconstruct a specific image from a knowledge of the complete set of its 2-point statistics,⁸¹ there is hardly any reason or motivation to do so. Instead, the desire is to reconstruct RVEs from an ensemble of microstructures. Although this paper presented one approach to this problem, there is a clear need for much more future work in this direction. Furthermore, the more useful reconstructions of very high practical value are the reconstructions from partial datasets. For example, one often might have only a partial set of experimentally measured spatial correlations (e.g. 2-D scans on specific sections into the sample). Also, one might be interested in reconstructing microstructures from the reduced-order PCA representations to make physical connections between the PCs and specific microstructural features. All these problems are likely to be of high value to future work in ICME and MGI efforts.

As noted earlier, it is anticipated that most advanced materials used in emerging technologies will demand a tiered description to address the hierarchical material internal structure (spanning multiple length scales). In this paper, the assumption of well separated length scales was implicitly invoked, as is routinely done in working with most composite theories. In other words, it is assumed that the same overall philosophy can be applied repeatedly, as many times as needed, in describing materials whose structures exhibit salient features at multiple well separated length scales. In practice, there might be several situations where the separation of length scales is not achieved. In such situations, one is forced to employ the spatial resolution of the smaller length scale involved and extend the RVEs to obtain a statistically meaningful representation of the spatial correlations for the larger length scale. In such situations, RVEs can become extremely large. Furthermore, the use of the RVE concept itself can encounter additional limitations in practice. For example, in thin films or graded materials, the assumption of statistical homogeneity might fail. Another example would be in applications where the structure features of interest are rare occurrences (e.g. features responsible for fatigue damage initiation), where extremely large RVEs might be needed; these might even be as large as the entire sample.

Multiscale measurements play an important role in the realisation of the goals articulated in this paper as they provide the critical data needed to improve and validate the material structure-sensitive models (i.e. model maturity). In particular, new measurement protocols are critically needed for combinatorial synthesis and/or high throughput processing and characterisation (structure and response) aimed at rapid exploration of the multiscale PSP linkages in hierarchical materials. For example, traditional approaches that combine material structure characterisation and standard mechanical testing (using simple tension or simple compression) evaluate material responses one material structure at a time and therefore produce relatively low volume of high quality data at a relatively high cost. However, this may not present the best strategy for accelerated development of new/improved materials. It might be more cost-effective to pursue testing protocols that allow high throughput material structure prototyping (e.g. single or double cone tests, Jominy bars) to be combined with fast quantification of structures (e.g. customised protocols for the direct measurement of salient spatial correlations in the structure) along with local evaluation of mechanical properties (e.g. indentation methods). Such new protocols that can provide the critical data at the requisite speed, cost and accuracy, needed to support objective decision making in the materials development efforts, present an exciting new direction for research in support of MGI and ICME.

As a final note, it is emphasised here that the data-driven approaches described here for establishing the materials core knowledge (i.e. PSP linkages or metamodels) are ideally suited for incorporation into multiscale robust design approaches such as inductive design exploration method (IDEM).^40,50,164 Implementation of IDEM requires formulation of PSP metamodels at various levels of material hierarchy, along with a rigorous quantification of the associated uncertainty. Integrating the PSP metamodels with the robust design framework of IDEM represents an exciting new direction for research that can provide a practical pathway for addressing the grand challenges described in this paper.

Footnotes

Acknowledgement

The author acknowledges funding from the Office of Naval Research (ONR) award N00014-11-1-0759 (Dr William M. Mullins, program manager). The author also acknowledges numerous discussions with colleagues Professor David McDowell and Dr Tony Fast on the various concepts presented and discussed in this paper.

<?ENTCHAR ast?>

Crystal plasticity theories are widely used to predict the plastic anisotropy of polycrystalline materials by accounting for the fundamental mechanism of plastic deformation at the scale of the constituent single crystals by taking into account the details of slip system geometry in each individual crystal.

References

Fullwood

Niezgodab

Adamsa

and Kalidindi

: ‘Microstructure sensitive design for performance optimization’, Prog. Mater. Sci., 2010, 55, (6), 477–562.

Schwartz

Kumar

and Adams

: ‘Electron backscatter diffraction in materials science’; 2000, New York, Kluwer Academic/Plenum Publishers.

Qidwai

Turner

Niezgoda

Lewis

Geltmacher

Rowenhorst

and Kalidindi

: ‘Estimating response of polycrystalline materials using sets of weighted statistical volume elements (WSVEs)’, Acta Mater., 2012, 60, 5284–5299.

Rowenhorst

Lewis

and Spanos

: ‘Three-dimensional analysis of grain topology and interface curvature in a β-titanium alloy’ , Acta Mater., 2010, 58, (16), 5511–5519.

Baer

Hiltner

and Keith

: ‘Hierarchical structure in polymeric materials’, Science, 1987, 235, (4792), 1015–1022.

Hancox

: ‘Biology of bone’; 1972, Cambridge, Cambridge University Press.

Currey

: ‘The many adaptations of bone’, J. Biomech., 2003, 36, (10), 1487–1495.

Currey

: ‘The mechanical adaptations of bones’; 1984, Princeton, Princeton University Press.

NSTC : ‘Materials genome initiative for global competitiveness’, National Science and Technology Council, 2011.

10.

Pollock

et al.: ‘Integrated computational materials engineering: a transformational discipline for improved competitiveness and national security’; 2008, Washington, DC, The National Academies Press.

11.

McDowell

and Story

: ‘New Directions in Materials Design Science and Engineering (MDS&E)’, Report of a NSF DMR-sponsored workshop, The Georgia Center for Advanced, Atlanta, October 19–21, 1998.

12.

Spowart

: ‘Automated serial sectioning for 3-D analysis of microstructure’, Scr. Mater., 2006, 5, 5–10.

13.

Spowart

Mullens

and Puchala

: ‘Collecting and analyzing microstructures in three dimensions: a fully automated approach’, J. Miner. Met. Mater., 2003, 55, (10), 35–37.

14.

Echlin

Mottura

Torbet

and Pollock

: ‘A new TriBeam system for three-dimensional multimodal materials analysis’, Rev. Sci. Instrum., 2012, 83, (2), 023701.

15.

Wargo

Hannaa

Çeçena

Kalidindib

and Kumbur

: ‘Selection of representative volume elements for pore-scale analysis of transport in fuel cell materials’, J. Power Sources, 2012, 197, 168–179.

16.

Kotula

Rohrer

and Marsh

: ‘Focused ion beam and scanning electron microscopy for 3D materials characterization’, MRS Bull., 2014, 39, (04), 361–365.

17.

Villanova

Peter

Heikki

Jérôme

François

U.-V

Elisa

Gérard

Pierre

David

Denis

Aaron

and Christophe

: ‘Multi-scale 3D imaging of absorbing porous materials for solid oxide fuel cells’, J. Mater. Sci., 2014, 49, (16), 5626–5634.

18.

Ebner

Geldmacher

Marone

Stampanoni

and Wood

: ‘X-ray tomography of porous, transition metal oxide based lithium ion battery electrodes’, Adv. Ener. Mater., 2013, 3, (7), 845–850.

19.

Betz

Wegst

Weide

Heethoff

Helfen

Lee

and Cloetens

: ‘Imaging applications of synchrotron x-ray micro-tomography in biological morphology and biomaterial science. I. General aspects of the technique and its advantages in the analysis of arthropod structures’, J. Microsc., 2007, 227, (1), 51–71.

20.

Stiénon

Fazekasa

Buffièrea

J.-Y

Vincenta

Daguierb

and Merch

: ‘A new methodology based on X-ray micro-tomography to estimate stress concentrations around inclusions in high strength steels’, Mater. Sci. Eng. A, 2009, 513-514, 376–383.

21.

Proudhon

Buffière

and Fouvry

: ‘Three-dimensional study of a fretting crack using synchrotron X-ray micro-tomography’, Eng. Fract. Mech., 2007, 74, (5), 782–793.

22.

Bingert et al

: ‘High-energy diffraction microscopy characterization of spall damage’, in ‘Dynamic behavior of materials’, (eds. B. Song, D. Casem, and J. Kimberley), Vol. 1, 397–403; 2014, Springer. New York.

23.

Wang

Almer

and Bieler

: ‘Microstructural characterization of polycrystalline materials by synchrotron X-rays’, Front. Mater. Sci., 2013, 7, (2), 156–169.

24.

Pokharel

Lind

Kanjarala

Lebensohn

Kenesei

Suter

and Rollett

: ‘Polycrystal plasticity: comparison between grain-scale observations of deformation and simulations’, Annu. Rev. Condens. Matter Phys., 2014, 5, (1), 317–346.

25.

Gulsoy

Shahani

Gibbs

Fife

and Voorhees

: ‘Four-dimensional morphological evolution of an aluminum silicon alloy using propagation-based phase contrast X-ray tomographic microscopy’, Mater. Trans., 2014, 55, (1), 161–164.

26.

Barwick

Park

Kwon

Baskin

and Zewail

: ‘4D imaging of transient structures and morphologies in ultrafast electron microscopy’, Science, 2008, 322, (5905), 1227–1231.

27.

Miller

and Forbes

: ‘Atom probe tomography’, Mater. Character., 2009, 60, (6), 461–469.

28.

Arslan

Marquis

Homer

Hekmaty

and Bartelt

: ‘Towards better 3-D reconstructions by combining electron tomography and atom-probe tomography’, Ultramicroscopy, 2008, 108, (12), 1579–1585.

29.

Kirane

Ghosh

Groeber

and Bhattacharjee

: ‘Grain level dwell fatigue crack nucleation model for Ti alloys using crystal plasticity finite element analysis’, J. Eng. Mater. Technol. Trans. ASME, 2009, 131, (2), 021003.

30.

Przybyla

and McDowell

: ‘Simulated microstructure-sensitive extreme value probabilities for high cycle fatigue of duplex Ti-6Al-4V’, Int. J. Plast., Special Issue in Honor or Nobutada Ohno. 2011, 27, (12), 1871–1895.

31.

McDowell

and Dunne

FPE

: ‘Microstructure-sensitive computational modeling of fatigue crack formation’, Int. J. Fatigue, 2010, 32, (9), 1521–1542. [Special Issue on Emerging Frontiers in Fatigue].

32.

Wang

Wen

Simmons

and Wang

: ‘Systematic approach to microstructure design of Ni-base alloys using classical nucleation and growth relations coupled with phase field modeling’, Metall. Mater. Trans. A., 2008, 39A, (5), 984–993.

33.

Wen

Simmons

Shen

Woodward

and Wang

: ‘Phase-field modeling of bimodal particle size distributions during continuous cooling’, Acta Mater., 2003, 51, (4), 1123–1132.

34.

Ghosh

Nowak

and Lee

: ‘Quantitative characterization and modeling of composite microstructures by Voronoi cells’, Acta Mater., 1997, 45, (6), 2215–2234.

35.

Ghosh

Lee

and Moorthy

: ‘Multiple scale analysis of heterogeneous elastic structures using homogenization theory and voronoi cell finite element method’, Int. J. Solids Struct., 1995, 32, (1), 27–62.

36.

Kouznetsova

Geers

MGD

and Brekelmans

WAM

: ‘Multi-scale second-order computational homogenization of multi-phase materials: a nested finite element solution strategy’, Comput. Methods Appl. Mech. Eng., 2004, 193, (48–51), 5525–5550.

37.

Kouznetsova

Geers

MGD

and Brekelmans

WAM

: ‘Multi-scale constitutive modelling of heterogeneous materials with a gradient-enhanced computational homogenization scheme’, Int. J. Numer. Methods Eng., 2002, 54, (8), 1235–1260.

38.

Kadowaki

and Liu

: ‘Bridging multi-scale method for localization problems’, Comput. Methods Appl. Mech. Eng., 2004, 193, (30–32), 3267–3302.

39.

McDowell

Choi

H.-J

Panchal

Austin

Allen

and Mistree

: ‘Plasticity-related microstructure–property relations for materials design’, Key Eng. Mater., 2007, 34, (0–341), 21–30.

40.

Choi

Mcdowell

Allen

and Mistree

: ‘An inductive design exploration method for hierarchical systems design under uncertainty’, Eng. Optim, 2008, 40, (4), 287–307.

41.

Luscher

McDowell

and Bronkhorst

: ‘A second gradient theoretical framework for hierarchical multiscale modeling of materials’, Int. J. Plast., 2010, 26, (8), 1248–1275.

42.

Olson

: ‘Computational design of hierarchically structured materials’, Science, 1997, 277, (29), 1237–1242.

43.

Kalidindi

Bhattacharyya

and Doherty

: ‘Detailed analyses of grain-scale plastic deformation in columnar polycrystalline aluminium using orientation image mapping and crystal plasticity models’, Proc. R. Soc. Lond. A, 2004, 460, (2047), 1935–1956.

44.

Choi

Kimb

Kima

and Parka

: ‘The effect of grain size distribution on the shape of flow stress curves of Mg-3Al-1Zn under uniaxial compression’, Mater. Sci. Eng. A, 2008, 488, (1–2), 458–467.

45.

El-Danaf

Kalidindi

and Doherty

: ‘Influence of grain size and stacking-fault energy on deformation twinning in fcc metals’, Metall. Mater. Trans. A, 1999, 30, (5), 1223–1233.

46.

Dimiduk

Hazzledine

Parthasarathy

Mendiratta

and Seshagiri

: ‘The role of grain size and selected microstructural parameters in strengthening fully lamellar TiAl alloys’, Metall. Mater. Trans. A Phys. Metall. Mater. Sci., 1998, 29, (1), 37–47.

47.

Morrison

and Moosbrugger

: ‘Effects of grain size on cyclic plasticity and fatigue crack initiation in nickel’, Int. J. Fatigue, 1997, 20, S51–S59.

48.

Lasalmonie

and Strudel

: ‘Influence of grain size on the mechanical behaviour of some high strength materials’, J. Mater. Sci., 1986, 21, (6), 1837–1852.

49.

Hull

: ‘Effect of grain size and temperature on slip, twinning and fracture in 3% silicon iron’, Acta Metall., 1961, 9, (3), 191–204.

50.

McDowell

Panchal

Choi

H.-J

Seepersad

Allen

and Mistree

: ‘Integrated design of multiscale, multifunctional materials and products’; 2009, Burlington, Elsevier.

51.

McDowell

: ‘A perspective on trends in multiscale plasticity’, Int. J. Plast., 2010, 26, (9), 1280–1309. [Special issue in honor of David L. McDowell].

52.

Olson

: ‘Pathways of discovery designing a new material world’, Science, 2000, 228, (12), 933–998.

53.

Adams

Kalidindi

and Fullwood

: ‘Microstructure sensitive design for performance optimization’; 2012, Waltham, Butterworth-Heinemann.

54.

Panchal

Kalidindi

and McDowell

: ‘Key computational modeling issues in integrated computational materials engineering’, Comput. Aided Des., 2013, 45, (1), 4–25.

55.

NSTC : A national strategic plan for advanced manufacturing, National Science and Technology Council, Executive Office of the President, February 2012.

56.

Office of Science and Technology Policy : ‘Obama administration unveils ‘Big Data’ initiative: announces $200 million in new R&D investments’, Office of Science and Technology Policy, Washington, DC, 20502; 2012.

57.

Allison

: ‘Integrated computational materials engineering: A perspective on progress and future steps’, J. Miner. Met. Mater. Soc., 2011, 63, (4), 15–18.

58.

Schmitz

and Prahl

: ‘ICMEg – the Integrated Computational Materials Engineering expert group – a new European coordination action’, Integr. Mater. Manuf. Innovation, 2014, 3, (1), 2.

59.

Schmitz

and Prahl

: ‘Integrative computational materials engineering: concepts and applications of a modular simulation platform’; 2012, Chichester, John Wiley & Sons.

60.

The European Materials Modelling Council : [cited 2014 Aug 12], Available at: http://emmc.info/index.html

61.

Linden

Smith

and York

: ‘Amazon.com recommendations: item-to-item collaborative filtering’, Internet Comput. IEEE, 2003, 7, (1), 76–80.

62.

Dey

and Forlizzi

: ‘A stage-based model of personal informatics systems’, Proc. SIGCHI Conf. on ‘Human Factors in Computing Systems, 557–566; 2010, New York, NY, USA, ACM.

63.

Hohman

Gregory

Chibale

Smith

Ekins

and Bunin

: ‘Novel web-based tools combining chemistry informatics, biology and social networks for drug discovery’, Drug Discov. Today, 2009, 14, (5), 261–270.

64.

Tien

: ‘Toward a decision informatics paradigm: a real-time, information-based approach to decision making’, IEEE Trans. Syst. Man Cybern. Part C Appl. Rev., 2003, 33, (1), 102–113.

65.

Wan

: ‘Healthcare informatics research: from data to evidence-based management’, J. Med. Syst., 2006, 30, (1), 3–7.

66.

Rajan

: ‘Materials informatics’, Mater. Today, 2005, 8, (10), 38–45.

67.

Kalidindi

Niezgoda

and Salem

: ‘Microstructure informatics using higher-order statistics and efficient data-mining protocols’, JOM, 2011, 63, (4), 34–41.

68.

Gorse

and Lahana

: ‘Functional diversity of compound libraries’, Curr. Opin. Chem. Biol., 2000, 4, (3), 287–294.

69.

Curtarolo

Morgan

Persson

Rodgers

and Ceder

: ‘Predicting crystal structures with data mining of quantum calculations’, Phys. Rev. Lett., 2003, 91, (13), 135503.

70.

Ceder

: ‘Predicting properties from scratch’, Science, 1998, 280, (5366), 1099–1100.

71.

Breneman

Catherine Brinson

Schadler

Natarajan

Krein

Morkowchuk

Deng

and Xu

: ‘Stalking the materials genome: a data-driven approach to the virtual design of nanostructured polymers’, Adv. Funct. Mater., 2013, 23, (46), 5746–5752.

72.

Krein

Natarajan

Schadler

Brinson

Deng

Gai

and Breneman

: ‘Development of materials informatics tools and infrastructure to enable high throughput materials design’, MRS Online Proc Libr, 2012, 1425. mrsf11-1425-uu06-05, doi:10.1557/opl.2012.57.

73.

Cebon

and Ashby

: ‘Engineering materials informatics’, MRS Bull., 2006, 31, (12), 1004–1012.

74.

Kalidindi

: ‘Microstructure informatics’, in ‘Informatics for materials science and engineering: data-driven discovery for accelerated experimentation and application’, (ed. K. Rajan, Butterworth-Heinemann), 2013, 443–466.

75.

Peurrung

Ferris

and Osman

: ‘The materials informatics workshop: Theory and application’, JOM, 2007, 59, (3), 50–50.

76.

Liu

Z.-K

Chen

L.-Q

and Rajan

: ‘Linking length scales via materials informatics’, JOM, 2006, 58, (11), 42–50.

77.

ASTM International : ‘E112 – 10: Standard test methods for determining average grain size’; 2010, West Conshohocken, PA, USA, ASTM International.

78.

ASTM International : ‘E1181 – 02: Standard test methods for characterizing duplex grain sizes’; 2008, West Conshohocken, PA, USA, ASTM International.

79.

Shaffer

Knezevic

and Kalidindi

: ‘Building texture evolution networks for deformation processing of polycrystalline fcc metals using spectral approaches: Applications to process design for targeted performance’, Int. J. Plast., 2010, 26, (8), 1183–1194.

80.

Adams

Gao

and Kalidindi

: ‘Finite approximations to the second-order properties closure in single phase polycrystals’, Acta Mater., 2005, 53, (13), 3563–3577.

81.

Fullwood

Niezgoda

and Kalidindi

: ‘Microstructure reconstructions from 2-point statistics using phase-recovery algorithms’, Acta Mater., 2008, 56, (5), 942–948.

82.

Niezgoda

Fullwood

and Kalidindi

: ‘Delineation of the space of 2-point correlations in a composite material system’, Acta Mater., 2008, 56, (18), 5285–5292.

83.

Niezgoda

and Kalidindi

: ‘Applications of the phase-coded generalized Hough transform to feature detection, analysis, and segmentation of digital microstructures’, Comput. Mater. Continua, 2009, 14, (2), 79–97.

84.

Niezgoda

Turner

Fullwood

and Kalidindi

: ‘Optimized structure based representative volume element sets reflecting the ensemble-averaged 2-point statistics’, Acta Mater., 2010, 58, (13), 4432–4445.

85.

Fullwood

Kalidindi

Niezgoda

Fast

and Hampson

: ‘Gradient-based microstructure reconstructions from distributions using fast Fourier transforms’, Mater. Sci. Eng. A Struct. Mater. Prop. Microstruct. Process., 2008, 494, (1–2), 68–72.

86.

Bochenek

and Pyrz

: ‘Reconstruction of random microstructures: a stochastic optimization problem’, Comput. Mater. Sci, 2004, 31, (1–2), 93–111.

87.

Roberts

: ‘Statistical reconstruction of three-dimensional porous media from two-dimensional images’, Phys. Rev. E, 1997, 56, (3), 3203.

88.

Niezgoda

Kanjarla

and Kalidindi

: ‘Novel microstructure quantification framework for databasing, visualization, and analysis of microstructure data’, Integr. Mater. Manuf. Innovation, 2013, 2, 3.

89.

Kalidindi

: ‘Computationally-efficient fully-coupled multi-scale modeling of materials phenomena using calibrated localization linkages’, ISRN Mater. Sci., 2012, doi:10.5402/2012/305692.

90.

Fast

and Kalidindi

: ‘Formulation and calibration of higher-order elastic localization relationships using the MKS approach’, Acta Mater., 2011, 59, 4595–4605.

91.

Fast

Niezgoda

and Kalidindi

: ‘A new framework for computationally efficient structure–structure evolution linkages to facilitate high-fidelity scale bridging in multi-scale materials models’, Acta Mater., 2011, 59, (2), 699–707.

92.

Landi

Niezgoda

and Kalidindi

: ‘Multi-scale modeling of elastic response of three-dimensional voxel-based microstructure datasets using novel DFT-based knowledge systems’, Acta Mater., 2010, 58, (7), 2716–2725.

93.

Torquato

: ‘Random heterogeneous materials’; 2002, New York, Springer-Verlag.

94.

Brown

: ‘Solid mixture permittivities’, J. Chem. Phys., 1955, 23, (8), 1514–1517.

95.

Press

Teukolsky

Vetterling

and Flannery

: ‘Numerical Recipes: The art of scientific computing, 3rd edn, 2007, Cambridge University Press.

96.

Niezgoda

Yabansu

and Kalidindi

: ‘Understanding and visualizing microstructure and microstructure variance as a stochastic process’, Acta Mater., 2011, 59, 6387–6400.

97.

Przybyla

Adams

and Miles

: ‘A method for determining property variance in polycrystalline materials’, NUMIFORM; 2004, Ohio State University, Columbus, OH, AIP Conference Proceedings.

98.

Przybyla

Prasannavenkatesan

Salajegheh

and McDowell

: ‘Microstructure-sensitive modeling of high cycle fatigue’, Int. J. Fatigue, 2010, 32, (3), 512–525. [Special issue on Fatigue of Materials: Competing Failure Modes and Variability in Fatigue Life].

99.

Przybyla

and McDowell

: ‘Microstructure-sensitive extreme value probabilities for high cycle fatigue of Ni-base superalloy IN100’, International J. Plast., 2010, 26, (3), 372–394.

100.

Gao

Przybyla

and Adams

: ‘Methodology for recovering and analyzing two-point pair correlation functions in polycrystalline materials’, Metall. Mater. Trans. A, 2006, 37, (8), 2379–2387.

101.

Kroner

: ‘Statistical Modelling’, in ‘Modelling small deformations of polycrystals’, (eds. J. Gittus and J. Zarka), 229–291; 1986, London, Elsevier Science Publishers.

102.

Milton

GraemeW

: The theory of composites (Cambridge Monographs on Applied and Computational Mathematics). Cambridge monographs on applied and computational mathematics, 6. Cambridge University Press, 1st edition, May 2002.

103.

Fullwood

Adams

and Kalidindi

: ‘A strong contrast homogenization formulation for multi-phase anisotropic materials’, J. Mech. Phys. Solids, 2008, 56, (6), 2287–2297.

104.

Adams

Garmestani

and Saheli

: ‘Microstructure design of a two phase composite using two-point correlation functions’, J. Comput. Aided Mater. Des., 2004, 11, 103–115.

105.

Saheli

Garmestani

and Adams

: ‘Microstructure design of a two phase composite using two-point correlation functions’, J. Comput. Aided Mater. Des., 2004, 11, (2–3), 103–115.

106.

Garmestani

Lina

Adams

and Ahz

: ‘Statistical continuum theory for large plastic deformation of polycrystalline materials’, J. Mech. Phys. Solids, 2001, 49, (3), 589–607.

107.

Adams

: ‘Use of microstructural statistics in predicting polycrystalline material properties’, Metall. Mater. Trans., 1999, 30A, 969.

108.

Adams

and Olson

: ‘The mesostructure–properties linkage in polycrystals’, Prog. Mater. Sci., 1998, 43, (1), 1–87.

109.

Beran

Mason

Adams

and Olsen

: ‘Bounding elastic constants of an orthotropic polycrystal using measurements of the microstructure’, J. Mech. Phys. Solids, 1996, 44, (9), 1543–1563.

110.

Halko

Martinsson

P.-G

Shkolnisky

and Tygert

: ‘An algorithm for the principal component analysis of large data sets’, SIAM J. Sci. Comput., 2011, 33, (5), 2580–2594.

111.

Rokhlin

Szlam

and Tygert

: ‘A randomized algorithm for principal component analysis’, SIAM J. Matrix Anal. Appl., 2009, 31, (3), 1100–1124.

112.

Jolliffe

IT.

: ‘Principal component analysis: a beginner's guide – I. Introduction and application.’, Weather, 1990, 45, (10), 375–382.

113.

Suh

Rajagopalan

and Rajan

: ‘The application of principal component analysis to materials science data’, Data Sci. J., 2002, 1, 19–26.

114.

CeCen

Fast

Kumbur

and Kalidindi

: ‘A data-driven approach to establishing microstructure–property relationships in porous transport layers of polymer electrolyte fuel cells’, J. Power Sources, 2014, 245, 144–153.

115.

Dong

McDowell

Kalidindi

and Jacob

: ‘Dependence of mechanical properties on crystal orientation of semi-crystalline polyethylene structures’, Polymer, 2014, 55, (16), 4248–4257.

116.

Adams

Wright

and Kunze

: ‘Orientation imaging: the emergence of a new microscopy’, Metall. Trans. A, 1993, 24A, (4), 819–831.

117.

Pathak

Michler

Wasmer

and Kalidindi

: ‘Studying grain boundary regions in polycrystalline materials using spherical nano-indentation and orientation imaging microscopy’, J. Mater. Sci., 2012, 47, 815–823.

118.

Pathak

Stojakovic

and Kalidindi

: ‘Measurement of the local mechanical properties in polycrystalline samples using spherical nanoindentation and orientation imaging microscopy’, Acta Mater., 2009, 57, (10), 3020–3028.

119.

Kalidindi

and Vachhani

: ‘Mechanical characterization of grain boundaries using nanoindentation’, Curr. Opin. Solid State Mater. Sci., 2014, 8, 196–204.

120.

Mason

and Adams

: ‘Use of microstructural statistics in predicting polycrystalline material properties’, Metall. Mater. Trans. A, 1999, 30, (4), 969–979.

121.

Uchic

Groeber

and Rollett

: ‘Automated serial sectioning methods for rapid collection of 3-D microstructure data’, JOM, 2011, 63, (3), 25–29.

122.

Ferry

Mateescu

Cairney

and Humphreys

: ‘Techniques for generating 3-D EBSD microstructures by FIB tomography’, Mater. Charact., 2007, 58, (10), 961–967.

123.

Van Boxel

Schmidt

Ludwig

Zhang

Jensen

and Pantleon

: ‘Direct observation of grain boundary migration during recrystallization within the bulk of a moderately deformed aluminium single crystal’, Mater. Trans., 2014, 55, (01), 128–136.

124.

Adams

and Field

: ‘Measurement and representation of grain-boundary texture’, Metall. Trans. A, 1992, 23A, (9 pt 2), 2501–2513.

125.

Adams

: ‘Orientation imaging microscopy: application to the measurement of grain boundary structure’, Mater. Sci. Eng. A, 1993, 166, (1–2), 59–66.

126.

Gusev

: ‘Representative volume element size for elastic composites: a numerical study’, J. Mech. Phys. Solids, 1997, 45, (9), 1449–1459.

127.

Nemat-Nasser

and Hori

: ‘Micromechanics: overall properties of heterogeneous materials’, 2nd edn; 1999, Amsterdam, Elsevier.

128.

Hornung

: ‘Homogenization and porous media, Interdisciplinary Applied Mathematics Series’, Vol. 6; 1997, Berlin, Springer.

129.

Cherkaev

: Variational methods for structural optimization, Applied Mathematical Sciences’, Vol. 140; 1991, New York, Springer.

130.

Chen

and Liu

: ‘Square representative volume elements for evaluating the effective material properties of carbon nanotube-based composites’, Comput. Mater. Sci., 2004, 29, (1), 1–11.

131.

Drugan

and Willis

: ‘A micromechanics-based nonlocal constitutive equation and estimates of representative volume element size for elastic composites’, J. Mech. Phys. Solids, 1996, 44, (4), 497–524.

132.

Kanit

Forest

Galliet

Mounoury

and Jeulin

: ‘Determination of the size of the representative volume element for random composites: statistical and numerical approach’, Int. J. Solids Struct., 2003, 40, (13–14), 3647–3679.

133.

Shan

and Gokhale

: ‘Representative volume element for non-uniform micro-structure’, Comput. Mater. Sci., 2002, 24, (3), 361–379.

134.

Sab

: ‘On the homogenization and the simulation of random materials’, Eur. J. Mech. A. Solids, 1992, 11, (5), 505–515.

135.

Ostoja-Starzewski

: ‘Microstructural randomness and scaling in mechanics of materials’; 2008, Boca Raton, Chapman & Hall/CRC.

136.

Kröner

: ‘Berechnung der elastischen Konstanten des Vielkristalls aus den Konstanten des Einkristalls’, Zeitschrift für Physik A Hadrons Nuclei, 1958, 151, (4), 504–518.

137.

Willis

: ‘Variational and related methods for the overall properties of composite materials’, Adv. Appl. Mech., 1981, 21, 2–78.

138.

McCoy

: ‘Macroscopic response of continua with random microstructures’, in ‘Mechanics today’, (ed. S. Nemat-Nasser), Vol. 6, 1–40; 1981, Oxford, Pergamon Press.

139.

Hill

: ‘Elastic properties of reinforced solids: some theoretical principles’, J. Mech. Phys. Solids, 1963, 11, 357–372.

140.

Jeulin

and Ostoja-Starzewski

: ‘Mechanics of random and multiscale microstructures’; 2001, Wien, New York, Springer.

141.

McDowell

Ghosh

and Kalidindi

: ‘Representation and computational structure–property relations of random media’, JOM, 2011, 63, (3), 45–51.

142.

Petch

: ‘Cleavage strength of polycrystals’, J. Iron Steel Inst., 1953, 174, (Part 1), 25–28.

143.

Hall

: ‘The deformation and ageing of mild steel III. Discussion of results’, Proc. Phys. Soc. Sect. B, 1951, 64, 747–753.

144.

Gupta

et al: Structure–property linkages for non-metallic inclusions/steel composite system using a data science approach. Acta Mater., 2014, in preparation.

145.

Al-Harbi

and Kalidindi

: Crystal plasticity finite element simulations using a database of discrete Fourier transforms. Int. J. Plast., 2014, doi: 10.1016/j.ijplas.2014.04.006.

146.

Knezevic

Al-Harbi

and Kalidindi

: ‘Crystal plasticity simulations using discrete Fourier transforms’, Acta Mater., 2009, 57, (6), 1777–1784.

147.

Knezevic

Kalidindi

and Fullwood

: ‘Computationally efficient database and spectral interpolation for fully plastic Taylor-type crystal plasticity calculations of face-centered cubic polycrystals’, Int. J. Plast., 2008, 24, (7), 1264–1276.

148.

Kalidindi

Duvvuru

and Knezevic

: ‘Spectral calibration of crystal plasticity models’, Acta Mater., 2006, 54, (7), 1795–1804.

149.

Al-Harbi

Knezevic

and Kalidindi

: ‘Spectral approaches for the fast computation of yield surfaces and first-order plastic property closures for polycrystalline materials with cubic-triclinic textures’, Comput. Mater. Continua, 2010, 15, (2), 153–172.

150.

Feyel

: ‘A multilevel finite element method (FE²) to describe the response of highly non-linear structures using generalized continua’, Comput. Methods Appl. Mech. Eng., 2003, 192, (28), 3233–3244.

151.

Al-Harbi

Landi

and Kalidindi

: ‘Multi-scale modeling of the elastic response of a structural component made from a composite material using the materials knowledge system’, Modell. Simul. Mater. Sci. Eng., 2012, 20, (5), 055001.

152.

Landi

and Kalidindi

: ‘Thermo-elastic localization relationships for multi-phase composites’, Comput. Mater. Continua, 2010, 16, (3), 273–293.

153.

Kalidindi

Niezgoda

Landi

Vachhani

and Fast

: ‘A novel framework for building materials knowledge systems’, Comput. Mater. Continua, 2010, 17, (2), 103–125.

154.

Landi

Niezgoda

and Kalidindi

: ‘Multi-scale modeling of elastic response of three-dimensional voxel-based microstructure datasets using novel DFT-based knowledge systems’, Acta Mater., 2009, 58, (7), 2716–2725.

155.

C Yabansu

Patel

and Kalidindi

: ‘Calibrated localization relationships for elastic response of polycrystalline aggregates’, Acta Mater., 2014, 81, 151–160.

156.

Kroner

: ‘Bounds for effective elastic moduli of disordered materials’, J. Mech. Phys. Solids, 1977, 25, (2), 137–155.

157.

Binci

Fullwood

and Kalidindi

: ‘A new spectral framework for establishing localization relationships for elastic behavior of composites and their calibration to finite-element models’, Acta Mater., 2008, 56, (10), 2272–2282.

158.

Kalidindi

Binci

Fullwood

and Adams

: ‘Elastic properties closures using second-order homogenization theories: Case studies in composites of two isotropic constituents’, Acta Mater., 2006, 54, (11), 3117–3126.

159.

Oppenheim

Schafer

and Buck

: ‘Discrete time signal processing’; 1999, Englewood Cliffs, NJ, Prentice Hall.

160.

Anglin

Lebensohn

and Rollett

: ‘Validation of a numerical method based on Fast Fourier Transforms for heterogeneous thermoelastic materials by comparison with analytical solutions’, Comput. Mater. Sci., 2014, 87, (0), 209–217.

161.

Lebensohn

Kanjarla

and Eisenlohr

: ‘An elasto-viscoplastic formulation based on fast Fourier transforms for the prediction of micromechanical fields in polycrystalline materials’, Int. J. Plast., 2012, 32–33, 59–69.

162.

Moulinec

and Suquet

: ‘A numerical method for computing the overall response of nonlinear composites with complex microstructure’, Comput. Methods Appl. Mech. Eng., 1998, 157, (1–2), 69–94.

163.

Bunge

H.-J

: ‘Texture analysis in materials science. Mathematical Methods’; 1993, Göttingen, Cuvillier Verlag.

164.

Choi

Allen

Rosen

McDowell

and Mistree

: ‘An inductive design exploration method for robust multiscale materials design’, J. Mech. Des., 2008, 130, (3), 031402.