My group's research is in computational applied mathematics, including numerical analysis, mathematical modelling and software. We are interested in taking challenging science or engineering problems and identifying efficient computational strategies to address them. We work with applications scientists in many different areas, as well as computer scientists, high performance computing specialists, and with industry. We use methods from analysis (especially numerical analysis), probability, and statistics in our work. We also incorporate and rely on many concepts from statistical physics and theoretical chemistry. We also develop our own industrial-strength software for various applications, an increasingly demanding part of our work.
Molecular Dynamics.
Much of our work has been on the foundations of numerical methods for simulating the dynamics of molecules. We usually assume the that the model can be described in terms of the motions of the nuclei of the system. It turns out that this is the natural starting point in many cases. So essentially we have a large system of atoms, they interact through forces that are described by potential energy functions, and the goal is to figure out how the molecules move and arrange themselves over time. For example, the images below show two conformational states of a certain molecule.
These would represent more or less likely states, and it is important that we observe simulated states with a likelihood that is similar to that with which they would be observed in reality; that is, we would wish to have an accurate approximation of the natural distribution of different states.
In the simplest setting one solves equations defined by Newton's 2nd law (F=MA), but more commonly one has to include additional constraints on the system or make the model relevant for a laboratory environment (where temperature and pressure come into play) or include some additional contributions to the forces from a noisy environment or an auxiliary quantum computation. So things get more complicated, but the basic idea is straightforward: We start with a conformation and approximate the motion after a brief instant, then take another step, and another and so eventually construct a series of snapshots that describe the motion of the system. Ultimately we might be interested in how the molecule arranges itself after a very long time--the so-called "invariant distribution".
As examples of our recent successes in this area, I would highlight the understanding of theoretical foundations and the development of numerical methods for Langevin dynamics. In Langevin dynamics, the Newtonian dynamics model is supplemented by dissipative and random perturbations which capture the interactions with suppressed degrees of freedom or compensate for other limitations of the conservative description. The system of interest is thus a set of stochastic differential equations. We have consider the formulation of the equations of motion, the properties of those equations (e.g. questions of ergodicity), and the design of accurate and efficient numerical discretisation methods. Some of our work in this area is described in the textbook I have written with Charles Matthews (Molecular Dynamics, Springer 2015).
More recent work has generalised the algorithms to constrained systems and systems with memory (generalised Langevin), or looked at the redesign of the stochastic-dynamical model to enhance exploration in the presence of extreme metastability due to energetic or entropic barriers. We also have an important stream of research on mesoscale particle models such as dissipative particle dynamics.
Molecular Dynamics Software.
It is also worth mentioning our recent work on molecular dynamics software. In this domain there are some very large consortia which have produced the most important products, such as CHARMM, NAMD, Gromacs, Amber, Lammps. These packages generally involve combined contributions of dozens of authors from different disciplines. Because all classical molecular dynamics models involve compromises (quantum mechanics would be more accurate, but is computationally astronomically expensive due to the 'curse of dimensionality'), a lot of work is put into making the force fields as accurate as possible. This means fitting empirical force field models to functions of the pair distances, coordinates of local triples, etc. It's highly nontrivial. Other substantial work goes into making the codes run fast on the latest hardware, for example GPUs.
When we develop new algorithms, it can take a long time to get them implemented in these codes. There are often development queues and priorities that would delay implementation. So our algorithms languish on the shelf sometimes for many years before finally getting tested, which can be frustrating. For this reason, we have worked with scientists from the Edinburgh Parallel Computing Centre (EPCC) to create a software interface called MIST ("Molecular Integrator Software Tools") that allows us to very easily implement our algorithms. The software interface handles all the communication with various molecular simulation software packages such as NAMD, GROMACS, etc. In this way we can take advantage of the optimisations built into those codes while also having the ability to test our algorithms rapidly on "real-world" problems. MIST is freely available.
Bayesian Models for Data Analysis. We have also been progressively exploring algorithms that can be used for Bayesian parameterisation of complex models. The model can be a neural network or a traditional mathematical model. Let us denote the model by \(\Phi \) and suppose that it takes input \(X \) to output \(Y\), given parameters \(\theta\):
\[ Y = \Phi(X,\theta). \]
Now in a common setup, one assumes there is stochastic uncertainty in the relationship and so it is natural to consider a model in which the in
puts and outputs are interpreted as random variables and the strict equality above is replaced by a statistical model, say
(1) \(\pi \left (x,y|\theta \right ) = Z^{-1}\exp \left ( \frac{\left [ y-\Phi(x,\theta)\right ]^2}{2\sigma^2} \right )\).
where, for simplicity, we take \(\sigma\) fixed. This is just one (elementary) choice for the statistical model, but it will serve for this exposition.
Now given a collection of samples (data) \(D=\{(x_{\alpha},y_{\alpha})\}\), we may ask what is the best choice of the parameter vector \(\theta\) to fit the given model to the given data. From the Bayesian perspective, the parameters themselves can be viewed as uncertain, with a distribution defined by Bayesian inversion:
\[\pi\left (\theta| D \right) \propto \pi(D|\theta)\pi_0(\theta).\]
The factors here are all probability densities. \(\pi_0\) is an assumed ``prior'' for the parameters which might, for example, restrict the parameters to be taken from a specified finite domain (or it may be Gaussian). The usual choice for the conditional distribution, \(\pi(D|\theta)\), is based on the likelihood which is defined by the statistical model (1) under the assumption that all the data points are independent:
\[\pi(D|\theta) = \prod_{\alpha=1}^{|D|} \pi(x_{\alpha},y_{\alpha}|\theta).\]
It is now possible to treat the parameterization problem in various ways. The most usual approach is to optimise the parameters by finding the global maximum of the probability density \(\pi(\theta|D)\). A more nuanced approach is to embrace the uncertainty in the choice of \(\theta\) in the Bayesian perspective, and explore, i.e. ``sample'' the parameter distribution. The distribution is normally multimodal and we may attempt to find a number of the modes or to average the parameter values across the distribution. This is the approach we are currently taking in our data analytics work.
Among the specific problems we have looked at/are looking at in this domain, I would mention the following. First, there is a basic problem in the simple technique I mentioned above in the context of large data sets. Every time we compute a new value of the density (or its gradient, as is often needed), we must compute a product of likelihoods across the entire data set which will be very expensive if \(|D|\) is large. One way to reduce the computational burden is to replace the full likelihood by an approximation obtained by using a subsampling of the data set. This might just be a small percentage of the full data set, so the properties of the data set, e.g. its concentration, play a critical role in determining when this approach will be successful in practice. Nonetheless it is widely used in the data community, where it goes under the name of "Stochastic Gradients". A very similar problem arises in multiscale modelling, when the forces are computed by some device which perturbs the conservative structure. This gradient noise creates an additional source of noise in the parameterisation process. It turns out that this noise can be neutralised by use of a ``thermostat'' method, something we studied extensively in the context of chemical/physical modelling. This remains very much an open problem and we are currently exploring it as part of other work we are doing.
Enhancing Sampling. Another important issue is the treatment of multimodality of the probability distribution. In many cases there will be multiple reasonable parameterizations of the model
defined by the specific data provided. In such situations it becomes crucial to explore or sample the probability distribution defined by Bayes' formula. This task is computationally demanding since the standard MCMC schemes will get stuck in one or the other localised region of the parameter space. In order to enhance exploration one may use innovative sampling strategies which are very like the schemes we would commonly employ to sample for example the conformations of biological macromolecules. A very promising strategy for this purpose is to employ a collection of coupled 'walkers' to enhanced the exploration by providing a means of rescaling the dynamics to increase exploration, using our so-called ensemble quasinewton preconditioned MCMC scheme. Another exciting possibility is to use multiple temperatures to enhance the exploration, as in our infinite swap simulated tempering scheme. A third intriguing method is based on the diffusion map analysis of simulated data (the data being parameters, in this case). Research in these areas, and on their application to real world applications, is a very active current direction.
Simulating Hamiltonian Dynamics
B. Leimkuhler and S. Reich,
Cambridge University Press, 2005 publisher Geometric integrators are time-stepping methods, designed such that they exactly satisfy conservation laws, symmetries or symplectic properties of a system of differential equations. In this book the authors outline the principles of geometric integration and demonstrate how they can be applied to provide efficient numerical methods for simulating conservative models. Beginning from basic principles and continuing with discussions regarding the advantageous properties of such schemes, the book introduces methods for the N-body problem, systems with holonomic constraints, and rigid bodies. More advanced topics treated include high-order and variable stepsize methods, schemes for treating problems involving multiple time-scales, and applications to molecular dynamics and partial differential equations. The emphasis is on providing a unified theoretical framework as well as a practical guide for users. The inclusion of examples, background material and exercises enhance the usefulness of the book for self-instruction or as a text for a graduate course on the subject.
• Thorough treatment of a relatively new subject, covers theory, applications and also gives practical advice on implementing the techniques • Emphasis on 'efficient' numerical methods • Large number of examples and exercises.
Contents
1. Introduction; 2. Numerical methods; 3. Hamiltonian mechanics; 4. Geometric integrators; 5. The modified equations; 6. Higher order methods; 7. Contained mechanical systems; 8. Rigid Body dynamics; 9. Adaptive geometric integrators; 10. Highly oscillatory problems; 11. Molecular dynamics; 12. Hamiltonian PDEs.



Molecular Dynamics
B. Leimkuhler and C. Matthews,
Springer Verlag, 2015 publisher This book describes the mathematical underpinnings of algorithms used for molecular dynamics simulation, including both deterministic and stochastic numerical methods. Molecular dynamics is one of the most versatile and powerful methods of modern computational science and engineering and is used widely in chemistry, physics, materials science and biology. Understanding the foundations of numerical methods means knowing how to select the best one for a given problem (from the wide range of techniques on offer) and how to create new, efficient methods to address particular challenges as they arise in complex applications.
Aimed at a broad audience, this book presents the basic theory of Hamiltonian mechanics and stochastic differential equations, as well as topics including symplectic numerical methods, the handling of constraints and rigid bodies, the efficient treatment of Langevin dynamics, thermostats to control the molecular ensemble, multiple time-stepping, and the dissipative particle dynamics method.
Contents
1. Introduction; 2. Numerical integrators; 3. Analyzing geometric integrators;
4. The stability threshold; 5. Phase space distributions and microcanonical averages; 6. The canonical distribution and stochastic differential equations; 7.
Numerical methods for stochastic differential equations; 8. Extended variable methods.



TATi: The Thermodynamic Analytics ToolkIt, TATi stands for "Thermodynamic Analytics Toolkit". It is a state-of-the-art python software suite for examination of loss landscapes of neural networks,
based on tensorflow’s Python API. It brings advanced sampling methods to neural network training. Its tools allow to explore the loss manifold which is a function of the
employed neural network and the dataset. The simulation module makes applying sampling
Python codes in the context of neural networks easy and straight-forward. The goal of the software is to enable the user to
analyze and adapt the network employed for a specific classification problem to best fit her or his needs.
TATi has received financial support from a seed funding grant and through a Rutherford fellowship from the Alan Turing
Institute in London (R-SIS-003, R-RUT-001) and EPSRC grant no. EP/P006175/1 (Data Driven Coarse Graining using
Space-Time Diffusion Maps, B. Leimkuhler PI). For a discussion of TATi refer to the Arxiv article at https://arxiv.org/abs/1903.08640, F. Heber, Z. Trstanova, B. Leimkuhler
link 
MIST: The Molecular Integration Simulation Toolkit, MIST is the the Molecular Integration Simulation Toolkit, a lightweight and efficient software library written in C++ which provides an abstract interface to common molecular dynamics codes, enabling rapid and portable development of new integration schemes for molecular dynamics. The initial release provides plug-in interfaces to NAMD-Lite, GROMACS and Amber, and includes several standard integration schemes, a constraint solver, tem- perature control using Langevin Dynamics, and two tempering schemes. The architecture and functionality of the library is meant to be highly flexible and includes C and Fortran APIs which can be used to interface additional MD codes to MIST. We have shown, for a range of test systems, that MIST introduces negligible overheads for serial, shared-memory parallel, and GPU-accelerated cases, except for Amber where the native integrators run directly on the GPU itself. It has been applied to implement simulated tempering algorithms and been used to study the free energy landscapes of biomolecules in both vacuum and detailed solvent conditions. A research article describing MIST is available: https://www.sciencedirect.com/science/article/pii/S0010465518303540?via%3Dihu, I. Bethune, R. Banisch, E. Breitmoser, A.B.K.Collis, G. Gibb, G. Gobbo, C. Matthews, G.J.Ackland, and B.J. Leimkuhler
link 
MDdotM: Molecular Dynamics in Matlab, This is a small software package for MATLAB, designed to teach students about MD in a familiar and easy-to-use setting.
HTML documentation is included in the folder. (After unzipping, find and load the file MDdotM/Doc/MDdotM.html in your web browser.), B. Leimkuhler and C. Matthews
link 
ThermoML, C++ and QT tool for studying machine learning algorithms, B Leimkuhler
link 