## I Introduction

Fast and accurate emulation of atomic forces and energy is essential to access the microscopic details of chemical and biological events via molecular simulation. Classical molecular dynamics (cMD) relies on a pre-defined force field with semi-empirical forms of the potential energy which often lacks accuracy, while *ab initio* MD (AIMD) sacrifices computational efficiency for a higher level of accuracy. In principle, machine learning (ML) can provide a surrogate model to achieve both accuracy and computational efficiency at the levels of AIMD and cMD, respectively, thus promising new applications that would not be accessible with conventional methods. While recent years have witnessed enormous development of ML potentials, the field is still rapidly evolving and many theoretical and computational issues remain to be addressed for the efficient and robust representation of potential-energy surfaces by extending the limits of machine learning tools on large-scale simulation from electronic structure calculations.

The essential task in the constructing an ML potential is to develop an efficient map between the potential energy surface and atomic coordinates

. Deep neural network (DNN) and kernel ridge regression (KRR) that encodes physical symmetries are popular tools to emulate AIMD simulation containing a large number of single atoms or simple small molecules (such as H

O) bartok2013representing; bartok2018machine; lu202186. The task becomes more challenging for molecules that contain tens to hundreds of atoms, due to heterogeneous effects from the different atom types, chemical bonds between atoms and so on. Some effective machine learning approaches have been developed for emulating the dynamics of single molecules containing a few types of atoms. The KRR approach based on pairwise diatomic positions and nuclear charge, for instance, was proposed to emulate potential energies of single organic molecules rupp2012fast. DNN architectures were also developed to emulate dynamics of single molecules gilmer2017neural, where training samples were used to achieve high accurate predictive performance. The KRR typically requires fewer samples for accurate predictions, as the closed-form expression of the KRR predictor is available, whereas DNN typically requires optimization on large parameter estimation. Yet the KRR has a known computational limit, as the computational operations increase cubically with respect to the number of simulated runs, prohibiting using a large sample in KRR. It was recently found that atomic forces can also be predicted accurately, and combining force and energy samples can improve the predictive accuracy of potential energies of AIMD simulation chmiela2017machine; chmiela2018towards; schutt2018schnet; zhang2018deep; christensen2020fchl. The approach, called the gradient-domain machine learning (GDML) chmiela2017machine, starts with the principle of the conservation of energy, where the force on each atom is related to the potential energy(1) |

where is a matrix of atomic Cartesian coordinates for a system with atoms, and denotes the 3D coordinates for each atom. The pairwise inverse distance of atom positions were often used to construct the descriptor of a molecule

(2) |

where denotes the Euclidean distance. The similarity of potential energies between two molecules with atom positions and can be expressed by a covariance matrix . By the conservation of energy in Equ. (1), the covariance of atomic forces can be denoted by , where the element is . Given configurations of a molecule, each containing atoms, parameter estimation and predictions of the atomic forces from KRR with conservation of energy principle chmiela2017machine; chmiela2019sgdml involve constructing and inverting a Hessian covariance matrix. The computational cost of constructing this covariance matrix scales as and the cost of its inversion scales as , both increasing rapidly along with the number of atoms or the number of available simulation runs. This immense computation complexity of the surrogate model limits its application to predicting forces in small single molecule simulation and uses a modestly small training sample size.

The second point of interest in ML is predicting molecular configurations (e.g., protein structures) from the potential energy landscape. To achieve this goal, we need an efficient emulator of potential energy surface of large molecules or protein structures. However, the KRR approach that satisfies the conservation of energy chmiela2017machine; chmiela2018towards; christensen2020fchl

contains large computational complexity for predicting the potential energies. This is caused by computing the inversion of large covariance matrix of force vectors, which is at the order of

, where is the number of simulated (training) runs of a molecular configuration each containing atoms. The computational bottleneck limits the applications in energy prediction for single molecules with approximately 20 atoms or less. For molecules with more atoms or systems with multiple molecules, a faster approach is needed to bypass the computational bottleneck in simulation and emulation.To reduce the computational cost associated with inverting this enormous correlation matrix used in conventional KRR approaches for force emulation, we propose a new data-driven approach called the atomized force field (AFF) emulator, which reduces the amount of computational operations while maintaining high predictive accuracy in predicting force and energy. We outline three unique features of our approach. First, we partition the atoms into permutationally equivalent (PE) atom sets (formally defined later), whereas the correlation of atomic forces at different atom sets is found to be almost negligible. Thus, we can decompose the large covariance matrix of the simulated force vectors into small sub-covariance matrices for each permutationally distinguishable atom, which contain the most information in parameter estimation and predictions. This feature reduces computational operations in matrix inversion from in the GDML approach chmiela2017machine to in the AFF model. Second, inspired by the popular induced input approach in approximating Gaussian processes snelson2006sparse; wilson2015kernel, we develop a new model that approximates the prediction of potential energy at a molecular configuration by simulated energy vectors and forces that are strongly correlated to this molecular configuration, which reduces computational operations in emulating potential energy. For molecules without permutational symmetries or with a small number of permutational symmetries, the total computational cost for emulating forces and the potential energy required in constructing the covariance matrix in the AFF model is , and in computing the likelihood and predictions is , which is much smaller than the computational operations required in the GDML approach chmiela2017machine

. Compared to some recent DNN approaches, the AFF model requires much less number of samples, typically ranging from a few hundred to a thousand, to achieve high accuracy in predictions, which makes it an attractive approach for emulating computationally expensive AIMD simulations. Finally, our model gives both prediction and uncertainty quantification, as any quantile of the predictive distribution has a closed-form expression. Quantifying the uncertainty in predictions is critically important in an inverse problem, such as optimizing molecular structures based on constraints of physical properties. Based on these new features, we are able to accurately reproduce atomic force vectors and the potential energy from simulation for molecules with more atoms given the same computational budget. For instance, we are able to accurately emulate the simulation of molecules with around 50 atoms from AIMD simulation within only tens of seconds in a desktop computer, and quantify the uncertainties for both the predicted forces and energy.

Although the AFF model is motivated by imposing physically informed sparse structure in covariance matrix for simplifying computations, the model maintains some key physical ingredients of the more computationally intensive approaches, such as the GDML approach and its variants chmiela2017machine; chmiela2018towards. The kernel function of force vectors in both GDML and AFF models, for instance, is the same, and it was derived by the conservation of energy in Equ. (1). Besides, predictions of each atom in the AFF model depends on the information of all other atoms in a molecular configuration, which is expressed as the pairwise distance of atoms’ positions. Thus, our approach should not be interpreted as a conventional method to capture local atomic information. Empirical results of various examples show that our method is more efficient and more accurate than the alternatives based on the same held-out data set.

The rest of the article is organized as follows. We first present the motivation of the AFF method in Sec. II. The formal definition of PE atoms and an algorithm to find PE atom sets are introduced in Sec. II.1. In Sec. II.2, we give a detailed introduction of the AFF framework on force prediction Then, in Sec. II.3, we discuss the details of our approach to energy prediction. Finally, in Sec. III, we apply our method to emulate atomic force vectors and potential energy in ab initio simulation, and demonstrate that our approach provides higher predictive accuracy and reliable uncertainty assessment compared to other widely used methods. The conclusion of this study and potentially research directions are provided in Sec. IV.

## Ii Methodology

Let us consider a molecule with atoms and denote the atomic coordinates of two configurations of this molecule as and . Assume that the vectorized descriptors and are used as input variables, and the correlation of the potential energy of the two configurations of this molecule is encoded by a kernel function, denoted as , where the explicit form of the kernel function is discussed Sec. II.2. By applying the connection between potential energy and force vectors in Equ. (1), the Hessian covariance matrix of atomic forces, denoted as , has the th element , where and denote the th column of and the th column of . Each term of of the Hessian covariance matrix

can be written explicitly by the chain rule (see Appendix

A), from which it suggests that the correlation of force vectors between different atoms in any two configurations is small in general.Part (b) in Fig. 1 gives the empirical covariance for force vectors of three configurations of Uracil in the MD17 dataset chmiela2017machine. Note that the correlation between the force vectors of the same atom in three simulations is relatively large, whereas the correlation of force vectors between different atoms is close to zero, which coincides with mathematical results shown in Appendix A. Since the correlation between atomic forces on different atoms of Uracil is small, we can construct separate emulators of force vectors based on the sub-covariance matrix of force vectors of the same atom in different configurations. Note that the computational cost of emulators separately for each atom is much smaller than an emulator jointly for all atoms, such as the GDML approach, due to the difference between the size of covariance matrices. Thus our approach is more computationally efficient, than the GDML and sGDML, and it keeps the large correlation between force vectors to maintain the most information for predictions. Hence, more simulated runs can be used to train the model to improve predictive accuracy given the same computational budget, and the approach can be applied to predicting atomic forces of molecules with many more atoms.

Another improvement comes from incorporating the physical symmetries into the emulator. Atoms in a molecule may rotate, and they may switch positions when recorded in simulation. Emulators that encode the physical symmetries information can typically achieve higher predictive accuracy xie2010permutationally; bartok2013representing; jiang2014permutation; chmiela2018towards; koner2020permutationally. The covariance function of force vectors in the sGDML approach chmiela2018towards, for example, was adjusted to satisfy the permutational symmetries in the system. Here we use the similar approach to find the permutational symmetries, and then we extend it to define the permutationally equivalent (PE) groups of atoms. Unlike Uracil, we found a few PE groups for other molecules, such as benzene, aspirin and naphthalene, in the MD17 dataset chmiela2017machine. After grouping these PE atoms and parameterized the covariance by the permutationally symmetric kernel function (formally defined in Sec. II.1), we are able to capture the large absolute covariance between these atoms. This feature empirically improves the predictive accuracy, as will be shown in Sec. III.

The key idea of the new approach, called the atomized force field (AFF) emulator, is to partition atoms into different PE groups and to encode large correlation between force vectors of PE atoms at different configurations into the model. Noting that the descriptors of each atom still contain the information from all other atoms in this configuration. Thus our approach is not to approximate the local structure of atoms. Rather, it only maintains the non-negligible elements in Hessian covariance matrix across different configurations in the training sample, to reduce computational challenges in emulation. Next, we first discuss partitioning the atomic space to obtain PE subsets of atoms.

### ii.1 Permutationally equivalent set

We first define a group of atoms to be permutationally equivalent (PE) if they are interchangeable through any permutational operation. We call atoms from different PE sets the permutationally distinct atoms. The AFF approach predicts the force of an atom in a molecule based on force from its PE group of atoms rather than all atoms in this molecule. The near-zero element of the Hessian kernel matrix in Fig. 1 indicates that we may not need to include all atoms in a large covariance matrix for predicting atomic force, to achieve computational efficiency in emulation. Note that the force of an atom is expected to be similar to other atoms in its PE atom set, and the similarity can be captured by correlation in the model. For example, all four hydrogen atoms in methane () form one set of PE atoms, while the carbon atom itself is another PE set, as the coordinates of all hydrogen atoms are interchangeable among all permutation symmetries. Benzene (), as another example, is comprised of just two sets of PE atoms–the first PE set containing the six carbon atoms and the second PE set containing the six hydrogen atoms. By contrast, all twelve atoms in a uracil molecule () are permutationally unique, which leads to twelve PE atom sets in uracil.

PE sets of atoms can be found by minimizing the loss function through the permutation matrix

umeyama1988eigendecomposition; chmiela2019sgdml(3) |

where and are adjacency matrices of two isomorphic molecules, where . By analyzing the index location from the permutation matrices of all permutation symmetries on the same type of molecule, we can partition atoms of molecule into sets of PE atoms where are atoms belonging to the atom set, for .

Note that the atom in a PE set may exchange positions (e.g. through rotation) and thus the force may be recorded in different order in simulation. The Euclidean distance of the inverse pairwise distance descriptor in Equ. (2) cannot capture the similarity between two atom forces in this scenario. To represent the large similarity (correlation) between force of atoms in a PE set, one may permutate the positions of atoms, through a permutationally symmetric (PS) kernel function proposed in chmiela2019sgdml:

(4) |

where is the number of permutation symmetries found by Equ. (3), and is the permutation matrix of permutation symmetry. Note that for molecule like uracil, where no permutation symmetry exists, we have and , and the kernel reduces to a conventional Hessian kernel. The correlation between the atom of and the atom of using the PS kernel function are the average of Hessian covariance matrix where , and .

As shown in Fig. 1 (a) and (d), as rotational symmetries were found by Equ. (3), the absolute correlation between the atoms in a PE set by the PS kernel in Equ. (4), is much larger than zero. We found that grouping the PE atoms and using all PE atoms significantly improves the predictive accuracy of atomic forces, compared with the approach that groups each atom as one set. This result is sensible as forces of PE atoms are similar, and the correlation of forces from the PS kernel between PE atoms can capture their similarity. In contrast, the conventional Hessian kernel does not encode the permutational symmetries into the model.

As mentioned before, the sGDML approach groups all atoms into one set for prediction, and thus the computational cost is high, as it conditions on all atoms in all simulated configurations. Note that the correlation of force between atoms from different PE sets is close to zero, compared to the correlation between atoms from the same PE set. This feature allows us to define a marginal model of forces of atoms in each PE separately, which substantially reduces the computational complexity.

### ii.2 Atomized force field model

Consider a molecule that has atoms grouped into PE atom sets, each set containing atoms, for . To avoid inverting the covariance matrix to compute the predictive distribution required in GDML and sGDML approaches, we decompose the large covariance matrix to construct predictive models for each PE atom set in parallel. Let be configurations of this model that have been simulated before (henceforth, training configurations), and let be a matrix that contains the PE set’s atomic coordinates in . Denote the forces of the atoms of PE set in training configurations by a vector , where is a atomic force vector of the th PE set at the th training configuration.

For a new molecular configuration , the KRR estimator minimizes the loss function that penalize both squared error fitting loss, and the complexity of the latent function simultaneous rasmussen2006gaussian, leading to an weighted average of the force vectors at training configurations:

(5) |

where the weights follow with

is an identity matrix of size

. Here is a covariance matrix with the (j,k)-th block term being the Hessian matrix of the kernel function ; is an estimated regularization parameter; is a matrix, where the th block term is . Note that the KRR estimator in Equ. (5) is the predictive mean of a Gaussian process (GP) regression model rasmussen2006gaussian; kanagawa2018gaussian. In addition to a point estimator for predictive the untested run, the GP model also provides uncertainty assessment of the predictions, as any quantiles and intervals of predictions have a closed-form expression.Here we construct a GP model for approximating expensive simulation of force and potential energy for each PE atom set in parallel. For any input set , the marginal distribution of the force vector

follows a multivariate normal distribution:

(6) |

for , where

is a variance parameter for the

th PE set, and is the nugget parameter shared across all PE sets. Let us assume the variance parameter can differ across different PE sets, as the scale of the force can vary significantly for atoms in each PE atom set, especially for those atoms of different types. The range and nugget parameter are assumed to be the same across different PE atom sets, as the smoothness of the latent function that maps atoms’ positions to forces are approximately the same across different atom sets. The computational complexity of the predictive mean in a GP emulator with the same kernel and nugget parameters across atom sets is smaller than the GP emulator with different parameters Gu2016PPGaSP. Similar assumptions were used in constructing the parallel partial Gaussian process emulator for emulating computationally expensive computer simulations with massive outputs Gu2016PPGaSP.The power exponential covariance and the Matérn covariance function are widely used as the covariance function in GP emulator rasmussen2006gaussian. The Matérn kernel function with roughness parameter 5/2 is used as default covariance function of a few GP emulator packages roustant2012dicekriging; gu2018robustgasp, as well as the GDML appproach for energy-conserving force field emulation chmiela2017machine, as the sample path of the process is twice differentiable, leading to relatively accurate predictions for both rough and smooth functional data. Here we also use the Matérn kernel function with roughness parameter 5/2 to define the kernel function:

(7) |

where is the range parameter, and is the Euclidean distance between the vectorized descriptors and . Similar to the adjustment of kernel function used in the sGDML approach chmiela2018towards, we transform the Matérn kernel to the PS kernel function in Equ. (4) in the AFF emulator, to capture permutational symmetries between PE atoms. Conditional on and , the maximum likelihood estimator (MLE) of is with for the th PE atom set. The nugget parameter and the inverse range parameter can be estimated through MLE through numerically optimizing the profile likelihood or through the cross validation with respect to squared error loss in predictions. When the number of training configurations is small, the marginal posterior mode may be used to avoid unstable estimation of the range and nugget parameters gu2018jointly.

Conditional on the estimated parameters , the predictive distribution of forces of atoms in the th PE atom set at any atoms’ positions follows a multivariate normal distribution

(8) |

where the predictive mean vector and predictive covariance matrix follows

(9) | ||||

(10) |

with being a Hessian matrix of the kernel function . Note that the predictive mean in Equ. (9) is exactly the same as KRR estimator in Equ. (5). The advantage herein is the quantified uncertainty of predictions, as the predictive distribution follows a multivariate normal distribution. The uncertainty assessment of predictions is important to determine a sequential design for efficiently emulating computationally expensive simulations.

Note that the computation cost of AFF model is much smaller than the that from GDML, sGDML, or a joint model using both force and energy chmiela2017machine; chmiela2018towards; christensen2020fchl, as the covariance matrix of the AFF model is for the th PE atom set, which is much small than the covariance matrix in the GDML or sGDML model. Computing the likelihood function and predictive mean of the AFF model only costs for training configurations of a molecule with PE atoms, whereas the computational cost of predictive mean from GDML or sGDML is in comparison.

### ii.3 Predicting potential energy through the AFF model

Emulating energy based on simulated force vector and energy could also induce high computational costs, due to computation of inversion of large covariance matrix of simulated force vector and energy christensen2020fchl. Here we introduce a computationally feasible approach to emulate the potential energy. For any molecule with atomic configuration , the potential energy correlates with the vector of potential energy from previously simulated molecular configurations , and the atomic force at this molecular configuration (noting that is not observed). Conditional on and , the correlation between and forces at other configurations is small. Thus the predictive distribution of conditional on both simulated energy and atomic force can be approximated by , where can be estimated by the predictive distribution in the AFF model discussed in Section II.2. The motivation of method is relevant to the induced point approximation approach of Gaussian processes wilson2015kernel; snelson2006sparse, where given outcomes of a function at a set of well-chosen induced pseudo-inputs, the predictive distribution of the outcome at a new input is assumed to be conditionally independent to outputs in the training dataset. Here the induced input points of are , due to large correlation between these variables. Conditional on , we assume the force vector at this configuration is approximately independent to other training configurations of force vectors. This simplification avoids constructing and computing the large Hessian covariance matrix of force vectors, allowing us to perform inversion of a covariance matrix, instead of inversion of a covariance matrix, in computing the predictive distribution of energy at this configuration. When we need to predict energy at many new molecular settings, matrix inversion of the sub-covariance matrix for simulated energy is shared among all predictive distributions. Thus they are only needed to be computed once. Details of efficient computation for predictive distributions of energy, and model extension by conditioning on a small batch of force vectors (instead of only one force vector) to improve predictive accuracy, are discussed in Appendix B.

For any molecule with atomic coordinates , denote a combined vector of force and energy , where is the potential energy of molecule with atomic coordinates being , and is the force field vector of this molecule. Assuming a GP model for potential energy with covariance function and mean function , the random vector follows a multivariate normal distribution:

(11) |

where the mean vector follows

with assumed to be an unknown constant (estimated by MLE in this work), and . The covariance matrix in Equ. (11) follows

where the upper left 4 matrix blocks are the covariance of energy vectors . The element of the correlation matrix is , for and , and (denoting the correlation of total energy between the configuration and itself). The vector denotes the correlation between the potential energy at input and the potential energy at training inputs . Besides, the correlation matrix between forces is denoted by . Finally, notation denotes the correlation between energy and forces. Here is a matrix with the th column being for , and is the correlation matrix between force and energy for molecule configuration with atom positions .

Similar to parameter estimation of AFF model discussed in Sec. II.2, the mean and variance parameter can be estimated by the MLE of the simulated (training) energy vector below

The range parameter and the nugget parameter in the kernel function can be estimated through numerical optimization by cross-validation or MLE.

Based on previous discussion, assuming that the given , is approximately independent of the rest of force vectors, then we have

(12) |

where denotes the approximation of the predictive distribution, and the predictive mean is a weighted average of training energy and training force :

(13) |

Closed form expressions of , and are derived in Appendix C.

In pactice, the noise term is added to the distribution of from Equ. (11), and the energy on the testing set is estimated in batches using the conditional distribution at Equ. (12). A particular benefit of our method is that we exploit the estimable information from the force but avoid computing the inverse of the gigantic kernel matrix on , which substantially simplify the computation.

The comparison between our predictive model and GDML model for predicting the energy is illustrated in Fig. 2. For GDML, as well as for both sGDML and FCHL, all simulated energy and force are used, but inversion of a large covariance matrix is computationally expensive. Here, conditional on the simulated energy and force of a new molecular configuration, we assume the potential energy of a new molecule is independent of forces of other molecular configuration simulated before. Since our approach does not need to handle the covariance matrix of the force, our method is scalable to large molecules and potentially emulating MD simulations with multiple molecules.

## Iii Results

We evaluate the performance of the AFF approach by analyzing the required training time and learning curves on a variety of molecules, including benzene, uracil, naphthalene from the MD17 dataset and aspirin, alpha-glucose, hexadecane from our simulated dataset. We compare the predictive error and required training time from AFF with some of the most commonly used KRR-based models, such as the GDML and sGDML approaches for force and energy predictions. All comparisons are implemented under the same training and testing set. In addition, we also provide the uncertainty assessment of the predictions from our model through the number of held-out outcomes covered in the predictive interval, and the average length of the predictive interval (see Table. 1 for details on the prediction accuracy, required training time and uncertainty assessment of the AFF predictions). An efficient method should have small mean squared error, small training cost, short average length of the predictive interval, and around of the held-out outcome covered by the predictive interval.

Previous studies have shown that the GDML and sGDML have relatively small error, compared with other approaches chmiela2018towards; christensen2020fchl. Indeed according to Fig. 3, the error of both approaches is relatively small. However, both GDML and sGDML has a large computational cost, mainly due to the inversion of covariance matrix of force vectors at all training configurations. Because of the reduced computational order on force prediction by partitioning the atoms into PE atom sets, the AFF model has smaller predictive error of force prediction (blue curves) when using similar or even less training time (blue histograms) compared to GDML and sGDML approaches. The improved accuracy of force predictions by the AFF model is even more noticeable on the additional simulation of aspirin, alpha-glucose and hexadecane shown in Fig. 3.

The sGDML approach typically has smaller error compared with GDML when there are at least two PE atom sets exist in simulation, such as the simulation of benzene, aspirin and naphthalene molecule, consistent with the result reported in literaturechmiela2018towards. This is because the PS kernel in Equ. (4) allows the sGDML approach to properly represent the large correlation of forces between atoms in the PE atom set. Note that here the reduced computational complexity in AFF allows us to train our models with more observations than the GDML and sGDML approaches for predicting the force given the sample computational budget. The number of training observations required in training the AFF model, however, is still very small (from a few hundred to a thousand). This differs from some recent studies of deep neural network approaches schutt2018schnet; zhang2018deep; zhang2019embedded; unke2019physnet, where a much larger set of training observations (ranging from to ) are required. The computational cost of obtaining a substantially larger simulation data set by deep neural network approaches is not negligible.

Given the same number of observations, the error in predicting the potential energy by the AFF model is typically smaller than the sGDML and GDML approaches, shown in the second rows of Fig. 3. For some of test molecules, such as naphthalene and benzene in the MD17 data set, and aspirin and alpha-glucose in our simulated data set, the AFF model has much smaller predictive error than sGDML approach. This is because our approach incorporates both force and energy vectors in energy prediction through a joint statistical model, and making predictions on energy based on both simulated force and energy vectors. The joint model of force and energy was recently studied in christensen2020fchl, whereas directly implementing the conditional prediction incur large computational cost. Here the approximation by the induced point approach introduced in Sec. II.3 allows us to keep the computational complexity of predicting the energy the same as predicting the force, whereas maintaining high predictive accuracy as that in christensen2020fchl. For larger molecules, like alpha-glucose, aspirin, and hexadecane (with 21 to 51 atoms) in our simulated dataset, the computation reduction is huge (see the blue histograms in Fig .3). For these examples, AFF achieves higher accuracy in predicting potential energy and atomic force, despite costing less than 10% of the training time of the sGDML approach, as shown in Table. 1. Therefore, our approach is potentially applicable to emulate the physical quantities of interest for simulation of larger molecules containing many more atoms.

Among all molecules we compared, the AFF model has a larger predictive error for the uracil, using the same number of training input (third panel in the second row in Fig. 3). Since the AFF model estimates the energy based on the emulated atomic force, the accuracy of energy prediction would be reduced when the emulated force is not accurate. As shown in the Fig. 3, the estimated force by AFF for uracil is not accurate when the number of training sample is small. This problem can be solved by using a moderately large sample size ( 1000) to achieve similarly accurate predictions as the sGDML model. The predictive energy vector by the AFF model along the AIMD trajectories is graphed in Figure 4 along with the held-out energy in the simulation. Also plotted are the predictive atomic forces and the truth at two held-out configurations. Based on simulated forces and energies, predictions of potential energies and forces by the AFF model are accurate.

Performance of AFF | |||||
---|---|---|---|---|---|

Molecule | Energy [kcal/mol] | Force [kcal/mol/Å] | Training Time [s] | (95%) | (95%) |

Naphthalene | 0.07 (0.12) | 0.11 (0.11) | 68 (345) | 98.5% | 1.36 |

Benzene | 0.04 (0.07) | 0.173 (0.176) | 23 (45) | 85% | 0.7 |

Uracil | 0.10 (0.10) | 0.239 (0.249) | 16 (43) | 97.8% | 2.37 |

Alpha-glucose | 0.09 (0.36) | 0.0003 (0.012) | 32 (964) | 100% | 0.03 |

Hexadecane | 0.008 (0.35) | 0.0008 (0.003) | 37 (767) | 99% | 0.05 |

Aspirin | 0.06 (0.09) | 0.0028 (0.009) | 32 (629) | 99.8% | 0.08 |

is the 95% predictive credible interval from the multivariate normal distribution in (

8); and is the length of the 95% predictive credible interval. The number of training samples used in AFF and sGDML (in parentheses) method is: naphthalene 1600 (1000), benzene 1200 (1000), uracil 1600 (1000), alpha-glucose 1000 (1000), aspirin 1200 (1000), and hexadecane 600 (600).Table. 1 give the predictive error of force vectors and energy, the percentage of forces covered in the predictive interval, the average length of the predictive intervals of forces and computational costs in emulation from different methods. First the predictive error of AFF methods for both forces and energy are typically not larger than the sGDML approach. For some molecules such as alpha-glucose and hexadecane, the predictive error of the AFF model seems to be one order of magnitude smaller, based on the same held-out test set. It is worth noting that it takes the AFF model less much smaller computational costs (ranging from 1/2 to 1/30 of costs compared to sGDML) to achieve the higher level of predictive accuracy. These results indicate the AFF model is more efficient in emulating atomic forces and energy in AIMD simulation. Furthermore, around (or higher percentage) of the held-out atomic forces are covered by relatively short predictive intervals from the AFF approach, indicating that AFF model provides a reliable way to quantify the uncertainty in predictions.

Finally, it is worth mentioning that the reduction of computational cost by the AFF model is pronounced on the molecule with more PE atoms’ sets, such as alpha-glucose, aspirin, and hexadecane. For molecules with fewer PE sets such as benzene (where we can only partition the atoms into two PE sets for each configurations), the computational reduction will be smaller. Thus, our approach may be useful for reducing the computational cost of interactions between a large number of molecules, as most PE sets may only contain one atom.

## Iv Concluding remarks

We have proposed an accurate and computationally efficient approach to predict potential energy surfaces and molecular force fields in *ab initio* simulation. While the theoretical framework of the gradient-based KRR and GP models such as GDML, sGDML, and FCHL, were already established, the challenge posed by the huge computational cost limited the applicability of these methods in emulating systems with a larger size of molecules. We propose the AFF emulator to overcome this computational challenge without compromising its accuracy. The efficient emulation of forces was hinged upon the fact that the similarity of atomic forces between permutationally equivalent atoms is high. By partitioning the atoms of a molecule into different atom sets, the AFF model can capture large correlation of forces between PE atoms, thereby providing accurate predictions of atomic forces of the molecule at a new configuration with less computational costs. Second, we introduce the induced input approach to reduce the computational complexity for emulating the potential energy, compared to the joint model of energy and atomic forces of simulated configurations.
Numerical results have shown predictions are more accurate than some alternative approaches, given the same computational budget. The AFF approach discovers a novel path on representing correlation between forces and energy with significantly less computational costs.
We anticipate that our approach will enable chemists to run fast and accurate molecular dynamics simulations on larger molecules with machine learned potentials and forces throughout chemical space.

There are a few potential research directions of emulating simulations of a large-scale system with interactions between a large number of atoms. First, for emulating simulation involving a large number of molecules, one may represent interactions between atoms by partitioning the atoms into PE atoms to decompose the covariance matrix for efficient computation. Second, sparse Cholesky factorization or Markov models may be used to reduce the large computational cost when the required number of simulations using in training machine learning models get large. Third, given a set of physical constraints, one may inversely design the atomic positions to achieve a particular force field, or potential energy. The uncertainty of the map from atomic positions to potential energy learned by the AFF model is important for a sequential design to reduce the uncertainty of this problem. These improvements can empower the molecular dynamics simulation for large molecules with the cost of cMD and accuracy of AIMD.

## Acknowledgements

This study was supported by the U.S. National Science Foundation’s Computational and Data-Enabled Science and Engineering program under Award No. 2053423. MZ and JW acknowledge financial support from the U.S. National Science Foundation’s Harnessing the Data Revolution (HDR) Big Ideas Program under Grant no. NSF 1940118.

## Data Availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

## References

## Appendix A The correlation of force between different atoms

The correlation between the force of two molecules and with positions and is a matrix , where the element follows . Direct computation through the chain rule gives . Based on the form of the descriptor matrix in Equ. (2), the gradient of element of w.r.t follows

(14) |

The correlation between the th atom of molecule and the th atom of molecule is given by

where is the simplified notation of

According to the above equation, when , the number of terms in the summation is much larger than the number of terms when . This mathematical fact indicates that the absolute correlation between different atoms , where , is typically small compared to the correlation between same atoms . Note that this empirical result typically holds for usual kernel function such as Gaussian kernel and Matérn kernel before enforcing the constraint for permutational symmetry.

The low absolute correlation of force between different atoms does not reflect the permutational symmetries that intrinsically hold for some molecules. The permutational symmetry can be enforced by the PS kernel function from Equ. (4). As shown in e.g. part (a) in Fig. 1 for benzene, after adopting the KS kernel function, the correlation between PE atoms is large due to permutational symmetries, whereas the correlation of force between atoms in different PE atom sets is still small, indicating that we may only need to condition on forces of atoms in the same PE atom set for efficiently calculating the predictive distribution.

## Appendix B Fast Predictions of potential energies in batches

This section discusses the efficient way to calculate the inversion of the covariance matrix in the AFF model on energy prediction. We achieve the reduced computational cost by predicting the molecules’ energies in batches rather than the energy for one configuration each time. Let be the batch size, and be configurations of a molecular structure. For predicting the energy of a new configuration of this molecular structure, we need to calculate the inversion of , where , with being a matrix, being a matrix, being a matrix, and being a matrix. Note that the sub-covariance matrix of training energy samples is the same among all different batches in prediction. Thus we need to invert once and by applying the block matrix inversion for , we have:

(15) |

where . Accordingly, for each batch samples, we just need to do a matrix inversion on a matrix , which is much faster than inverting the entire covariance matrix for each batch sample.

## Appendix C Predictive distribution of potential energy

To simplify the notation, we would use the batch size in this section. Conditional on simulated energies in the training dataset , and the atomic force at the new configuration , and the estimated parameters , the conditional distribution of the energy at this configuration follows

(16) |

where the conditional mean follows

with and the conditional variance follows

Note that we do not observe and thus Equ. (16) cannot be directly used for predicting energy at the new configuration . Given the energy of training set and the atomic force of training set , we use the total expectation to integrate the unobserved force vector by its predictive distribution:

where denotes the approximation of by , which is equivalent to assume that given , is independent of the rest of force vectors in simulated configurations. Plugging the predictive mean from the above equation to replace in , we approximate the predictive mean of energy for by with the following expression:

The predictive variance in Equ. (12) can be computed by properties of multivariate normal distributions:

Let and be the first block matrix and the latter block matrix of respectively, and , where follows from Equ. (5). We can also write the as the weighted average value of and :

(17) |

where

and .

## Appendix D Simulation details

In addition to molecules available from MD17 dataset, force and energy of additional molecules with more atoms and complicated structure are generated in this work. *ab initio* MD (AIMD) simulation is performed via Q-Chem to generate highly accurate molecular force and energy to benchmark AFF and other machine learning force field. In this work, all AIMD simulations are carried in NVT ensemble with timestep of 1 fs at room temperature (300 K). All the calculations were performed at the level of Perdew-Burke-Ernzerhof(PBE)/6-31G(d,p). vdW interactions are taken into account by using TS-vdW method. The Nosé-Hoover thermostat is used to control the temperature.

## Appendix E Timings

All timings were performed on a compute cluster equipped with Intel 6148 CPUs (20 cores each) with a high speed OmniPath interconnect. The compute nodes consist of 64 nodes of 40 core/192GB of RAM compute systems, 4 nodes with 768GB of RAM plus 300 GB Intel Optane Memory Drive, and 3 GPU nodes with four NVIDIA V100/32GB GPUs with NVLINK.

Comments

There are no comments yet.