We’ve Been Developing Batteries 1,000,000x Slower Than We Could Be.
How machine learning is revolutionizing the guess-and-check methods in material development, experiment design, and testing for batteries
If we made a highlight reel of the things that have helped us get to where we are today during the past 200 years, automation would be a big focus.
Everything from manufacturing to farming to cleaning has seen a paradigm shift with machines taking over tedious tasks and enabling humans to focus on other pressing matters (unless we’re talking about Roomba — cause let’s be honest, who does anything other than watch it bump into objects and laugh?)
Nowadays, AI and its subset machine learning (or ML), are at the forefront of this ongoing transition. But ironically, the very things powering the shift could use some of the magic too.
Battery development is painfully slow — and reliant on the reasoning of chemists and a largely trial-and-error based approach, this is not too surprising.
Whereas most of the technological advancements to date have resulted from mechanical improvements like shrinking transistors or buses on a circuit board, batteries are a complex fusion of several scientific disciplines, making it ever harder to innovate on.
Yet, it’s absolutely crucial that we speed up the process if we want to achieve the clean transition. Not only can energy storage compensate for the intermittency of renewables, it is important to support the grid as we turn away from fossil fuels.
We’re just now seeing a glimpse of how advanced data analytics techniques like ML could send a wave of renewed hope to the battery R&D cycle.
In this article, we’ll explore how ML is influencing areas of the research cycle with specific case studies to demonstrate its positive impact.
The structure will be as follows:
- A brief description of ML and the general structure of workings
- The problem with the current research & commercialization cycle in battery tech
- How the technology can be used in: Materials discovery and experiment design, Battery testing, Imaging and Understanding of degradation pathways, explained with specific case studies
Machine learning is a discipline aimed at driving insights and extracting knowledge hidden in data.
One of the key things that makes it special is that it learns from experience in order to produce rapid and repeatable results.
When it comes to the applicaiton of batteries, the focus lies mainly on property prediction and materials informatics.
While specific models are accordingly used to satisfy the problem and steps will depend on the type of learning method, there exists a general structure to how ML is applied to a situation.
Understanding this can help in wrapping your head around the core of how ML works and why.
- Defining a problem: Getting an idea of the problem at hand and what the model will need to learn and predict in order to address it.
- Data collection: Building a dataset containing inputs and outputs accordingly, for example material features and resulting properties. The data can consist of experiments, simulations, or both.
- Pre-processing: Readying the data for following steps. Raw data is almost always messy, as there can be different units and scales, missing, inaccurate or incomplete, data, and disorganization. Therefore, it is important to pre-process it for both the purpose of “cleaning” it and for removing unintended bias.
- Feature extraction and selection: At this point, raw data is translated into usable inputs that the chosen algorithm will require. This process, feature extraction, is key because the needs of the algorithm can be best met by appropriately representing the data. Therefore the raw day may be reformatted, combined, and totally transformed from its primary features into new ones. Feature selection deals with eliminating the unimportant features from the dataset to prevent overfitting.
- Learning: The crucial step of training an algorithm with the previous data in order to obtain a model. As such there are two types of learning methods: supervised and unsupervised learning. The goal of supervised learning is to match any given X with a corresponding Y using labeled datasets, whereas unsupervised learning uses unlabeled datasets and rather seeks to discover hidden patterns or structure in data. There are other other categories of learning methods as well, which will be mentioned later in the article. During the learning process, the optimal hyperparameter is tuned to get the most accurate and valuable results.
- Deriving value: Extracting knowledge from a model is not always straightforward. The most important part is then to analyze the results and see what new insights it gives way to.
The development of novel battery materials is on a crash course with time.
While the process of discovering, designing, testing, and validating novel materials typically spans 15–25 years and requires the synthesis + characterization of thousands upon million of samples, the people funding research demand results in a fifth of that time.
The reason behind this is that the governments or businesses backing materials development are influenced by competition and environmental considerations, and therefore need something to show during the time the decision-makers are in control.
What makes the current battery research cycle so tedious and time-consuming?
Consider the general process of materials discovery in the battery space:
a. A problem to solve or metric to optimize is identified, taking into consideration a specific battery type and use case
b. A hypothesis to address the issue is formed based on the researcher’s prior knowledge and an experiment is designed
c. Numerous forms of analysis and testing are run on the synthesized material within a minuscule test cell (sometimes smaller than your pinky fingernail!)
d. Researchers investigate the results and try to understand the scientific reasoning behind the happenings. This step can be plodding, especially when the results are unheard of and the researchers therefore have no previous knowledge to decipher what happened.
e. Based on the results, different forms of action may follow. Most commonly, the results serve to identify what needs further improvement and inform the next experiment.
f. The loop repeats.
After countless runs of trial-and-error, it’s possible that nothing but a “this doesn’t work” conclusion follows, but otherwise, the years of hard work pays off.
What this typically looks like is a company coming in and licensing the technology that the researchers have developed. The company then proceeds to spend 1–2 years testing the technology for themselves, tuning it to their needs, and figuring out manufacturing, etc.
In any case, it’s an overly lengthy process and looks something like this:
Thankfully, computational tools to focus experimental efforts in the most promising directions like ML, are one of the three “missing links” that the U.S. Materials Genome Initiative have identified to easen the process (the other two of which are repositories to aggregate learnings and identify trends, and higher-throughput experimental tools.)
An important player in batteries x ML models is density functional theory (DFT)
Density functional theory tries to solve the theory of nature that is the Schrödinger equation.
At a very high level, Schrödinger’s equation tells us how an electron will behave. It involves describing the energy and position of an electron in space and time.
The purpose behind solving this equation can be many-sided but is primarily to understand the electronic structure of an atom or molecule.
The electronic structure of an atom is useful in materials development as it determines the chemical reactions the atom can participate in and determines the kinds of molecules that atoms can combine into to form more complicated substances.
In addition, understanding the allowed energy levels in an atom is a key piece of information to predict properties and understand the behaviour of the atom.
The issue is that solving Schrödinger’s equation is no easy task given that electrons don’t follow the rules of particles, but rather, waves.
In fact, it’s really only possible to solve it for hydrogen. With one electron and one proton, the number of degrees of freedom (ie. the range of states that the electrons and protons can exist in, namely x, y, and z coordinates) is merely 6.
But as you get into more complex molecules, as scientists always do, it becomes near impossible to solve the equation and you must resort to using approximations, hindering accuracy.
Density functional theory aims to save the trouble of venturing into complexity and takes a different approach to determine the way that quantum systems will behave.
The approach is modelled off of the discovery of Walter Kohn and John A Pople; that you don’t really need to solve Schrödinger’s equation, it’s enough to merely approximate its value, with the most important factor to consider being electron density. This value can tell us the relative probability of finding an electron at a particular point in space.
As opposed to the wave functions that Schrödinger’s equation relies on, DFT is much easier to compute and obtain accurate results from because electron density is a physical characteristic. Plus, although electron density is still a function with the three variables of the x, y, and z-position of the electrons, the determination of the electron density is independent of the number of electrons.¹
Therefore, just like Schrodinger’s equation, DFT can allow us to calculate several important values that act as inputs for a range of purposes. Specifically, it enables much of the computational simulation, atomistic modelling, and property / behaviour prediction under different conditions that’s possible today.
A few examples of how DFT applies to battery tech are explained well in the abstract of the paper “Understanding electrode materials of rechargeable lithium batteries via DFT calculations”:
The applications of DFT calculations involve in the following points of crystal structure modeling and stability investigations of delithiated and lithiated phases, average lithium intercalation voltage, prediction of charge distributions and band structures, and kinetic studies of lithium ion diffusion processes, which can provide atomic understanding of the capacity, reaction mechanism, rate capacity, and cycling ability. The results obtained from DFT are valuable to reveal the relationship between the structure and the properties, promoting the design of new electrode materials.
Several studies highlight the value ML could drive in different aspects of battery development.
Discovering improved materials for li-ion electrodes
The Problem
Even after decades development lithium ion batteries still suffer from problems like the difficulty tied to the diffusion of lithium ions through conventional materials used for the positive electrodes, which are usually transition metal oxides.
This slows down the reaction and is thus a big contributor to poor power capacity.
Solution & Approach
Therefore, one study set out to develop novel cathode materials with improved ionic diffusivity, ensuring that the charge and energy densities remain untouched.
This would in turn enhance the cell voltages, power capacity, and charge and energy capacity.
The study was based on a high-throughput screening method, which is a general term for the use of automated equipment to rapidly test thousands to millions of samples at the molecular level. It made use of quantum mechanical density functional theory (DFT) modelling combined with a machine learning strategy.
The goal was to predict the redox potential of a given material, which is a highly determining factor in the performance of an electrode.
The DFT modelling served the purpose of cheaply computing the necessary quantum mechanical characteristics to use as inputs, some of which were electron affinity (EA), highest occupied molecular orbital (HOMO), and lowest unoccupied molecular orbital (LUMO).
Other input parameters were elements like Carbon, Lithium, Boron, Oxygen and Hydrogen.
The study used an artificial neural network (ANN) to fulfill the needs of this problem. ANNs are the pillar of deep learning, wherein models imitate the workings of a human brain and “understands”and works with a datasets without explicit instructions on what to do with it.
With deep learning, the training dataset is a group of examples without a specific desired outcome or correct answer. A neural network aims to automatically find structure in the data and extract features on its own, using “hidden layers” that do the work of understanding the importance of the inputs and generating an output accordingly.
Result
The ANN used two hidden layers and achieved an accuracy of 96.5% in predicting the redox potential of a given material, demonstrating the usefulness of using ML to inform materials experiments for battery components and save time.
Advancing progress in solid state li-ion batteries
The Problem
To date, solid state versions of the lithium-ion battery have remained a pipe dream due to the many challenges associated with their development.
Unsurprisingly, one of them is material discovery, especially for the solid electrolyte.
At its core, the difficulty lies in finding materials with high lithium conductivity, layered with that the need for robust chemical and phase stability, a wide electrochemical stability window, low electronic conductivity, and low cost.
While there number of potential electrolytes is >10,000, merely tens to low hundreds have been studied, and the process is overwhelmingly slow.
Solution & Approach
To do away with this challenge, Evan Reed’s team at Stanford University set out to apply machine learning algorithms to the data collected over the last several decades.
Their approach mimicked a funnel, with each step further eliminating candidate materials.
First, atomic and electronic structure data for all 12,000+ Li-containing materials was leveraged from the popular Materials Project database. The materials were filtered to possess thermodynamic phase stability, low electronic conduction, low cost, high earth abunance, high electrochemical stability, and no transition metals (to enhance stability against reduction).
This left them with 317 candidate materials. The missing key was whether the materials were fast ion conductors.
So, they then applied a logistic regression model, a form of supervised learning to predict the probability of a binary (yes/no) event occurring. In this case, the condition was predicting the likelihood “Psuperionic” that an arbitrary material exhibits fast Li-ion conduction at room temperature. Psuperionic is a logistic function.
The input parameters to the model were features taken from the atomistic structure of the unit cell.
What resulted was a list of potentially suitable crystalline compounds narrowed down by nearly 99% — or 21 in total — all of which were predicted to be rapid ion conductors with great structural and electrochemical stability.
But the team didn’t want to stop making enormous gains in speed and efficiency yet.
From there, they eliminated a few of the candidates based on invalidating understandings and performed DFT-molecular dynamics (DFT-MD) calculations on 19 identified materials.
As described explicitly in their paper “Machine learning-assisted discovery of solid Li-ion conducting materials”, the DFT-MD simulation process was as follows:
Materials are chosen either according to the machine learning-based model (left stream) or by random selection (right stream). Materials are initially simulated at 900K for approximately 50–100 picoseconds. If no Li diffusion is observed on this timescale, the materials are immediately considered poor ion conductors and no further simulation is performed. If materials exhibit Li diffusion at 900K, they are simulated again at alternate temperatures so a simple two- or three-point Arrhenius extrapolation to room temperature can be made. If melting is observed in the ML-selected materials at 900K, the simulation is restarted at a lower temperature. Randomly chosen materials that exhibit melting at 900K are discarded immediately for computational efficiency.
Result
Screening all known candidates for solid electrolyte materials while applying a range of criteria and using both ML and DFT simulation is proven to be effective by this study. Quickening the process of eliminating materials one-by-one by nearly one million times, this work represents a big step forward in solid-state batteries, which could eliminate the problem of battery-related fires and explosions.
State of health testing
The Problem
As we know, materials discovery is only one part of the process. Testing is equally as important to not only ensure effectiveness but also safety and suitability for applications.
One key metric to test is the usable lifetime, which is controlled by a battery’s design, combined with its cycling protocol, and ultimately determines the usefulness of a battery for a particular use case.
This directly relates to a battery’s state of health (SOH), which gets affected by interior material degradation and structural changes.
Right now, SOH is typically defined by % of capacity retention. However, this definition doesn't take into account future behaviour and therefore doesn't refer to the actual “health”.
This is true because several cells could have the same SOH by this definition, but their true SOH could be drastically different as they will all have unique modes of failure and therefore futures.
The issue is that failure modes are complex, nonlinear and mutually intersecting, which means they require mechanistic chemistry- and physics-based models in order to be represented.
Sadly, the data needed to feed said models hinders this possibility.
The issue is important because the current approximations of usable life and SOH are merely approximations that can be dangerous and restricting. For grid applications for example, the economics necessitate that knowledge of life-related metrics exists.
Solution & Approach
In response, a team of data scientists and electrochemists at the Argonne National Laboratory, owned by the DOE, came together to use Advanced SOH descriptors to diagnose the health of lithium ion batteries with limited data usage.
In order to predict battery life and degradation trends like they wanted to, accessible degradation data was necessary to train the models. Data was obtained by collecting historical testing data from thousands of cells tested by Argonne and others, and by generating synthetic cycling data.
Later, several li-ion chemistries were examined, with voltage and timestamps being recorded each second of each cycle for each cell.
After gathering data and extracting features from cycling data, correlations and relationships were drawn using an ANN, in order to eventually “decode” the data and understand the expected cycles to failure.
Result
The deep learning approach is repordetly tranferable to other chemistries, as the neural network can help to evaluate the similarity of a given cell to others based on factors like cathode/anode/electrolyte composition, cycling protocol, and environmental temperature and shape its predictions accordingly.
While this application has been around for longer than others, this project is special in that it enables more accurate predictions on account of the model taking into consideration real-life operating parameters.
Materials imaging: a non-battery case study
The Problem
Microstructure image data (or micrographs) — microstructure = structure visible under 25× magnification — is highly useful for understanding a material’s morphology, formation of its microstructure, as well as the mechanisms responsible for its behaviour and performance.
Therefore, effective anlaysis of micrographs have a big role to play in establishing processing–structure–property relationships and thus the design of new material systems.
The problem lies within the inconsistent and inaccurate recognition and analysis of sad image data. The challenge exists due to the great knowledge and skill needed to obtain micrographs, to the diversity of image data types, to specific challenges in image analysis techniques, and more.
Solution & Approach
Thankfully, ML has a role to play here too.
In a model scenario involving a binary uranium–molybdenum (U–Mo) alloy, one group sought to create a method enabling objective, repeatable analysis of image data.
The U–Mo alloy has potential as a nuclear fuel for test reactors but there remains a need to understand microstructure–processing relationships to allow for improved fabrication, design and fuel qualification.
One example of such relationships is the high temperature the material undergoes during fabrication that in turn alters the microstructure.
To better understand cause-and-effect relationships like this one, multi-class classification was performed to link microstructure to processing condition.
Multi-class classification is just as it sounds; it aims to match a given input to an output which exists in a group of options that is greater than two.
Result
The training dataset consisted of micrographs for ten different thermo-mechanical processing conditions of a U-10Mo alloy. The image data was segmented for feature extraction and it was found that area, spatial, and texture information are needed for accurately describing image data.
Using their proposed approach, an F1 score of 95.1% was achieved during the training period. This measure takes into account the # of true positives over the # of total things labelled positive as well as the # of positive examples labeled correctly over the total number of things that were actually positive.
While this study maybe somewhat unrelated to the topic at hand, it shows how automating our approach to comprehending relationships in materials and their respective system can enable improved process design and quality control, and overall microstructure-processing understanding.
Adding onto such examples, a future workflow incorporating not only ML, but other techniques within AI, digital twins and other methods, may look something like this:
Challenges and opportunities
Like any nascent technology, machine learning for battery development faces a few key challenges.
Most prominently, there is data scarcity. Datasets for battery applications are notoriously minute, due to the high cost of running experiments and therefore obtaining data, or the limitedness in resources for informing your model. This issue is important to get around because otherwise, the result could be issues with overfitting.
In most cases, data gathering and cleaning is the most time-consuming aspect of the process due to these challenges.
While the difficulty persists, we’re already seeing creative ways to surmount this challenge with the use of efficient and non-data-hungry algorithms.
For example, transfer learning is a technique that works by taking a model that’s trying to solve a similar problem to yours (it may not be focusing on the measurement i’m looking for, but it is correleated to it), but maybe not as accurately. Instead of comparing it to your model to it side by side, you use it as an input while building yours. The data sources should be heterogenous and the structure is hierarchical, as seen below.
Another challenge is the selection of universal materials descriptors, which is close to wholly responsibly for the efficiency and success of a ML model. As it stands, the process of identifying descriptors suitable for an arbitrary target property are far from systematic.
Automatic schemes capable of creating universally relevant descriptors, taking into scientific knowledge and the like, could be a big step forward in aiding the practicality and accuracy of ML models for battery applications.
Poor real-world applicability due to difficulties assessing accuracy. Immature representation of results is highly likely given that AI methods cannot fully incorporate physical laws governing complex materials attributes.
In other words, the way that error is quantified is a little iffy because models are usually not fully representative of the full chemical space under exploration. Therefore, improved ways to assess the error bars and transferability of ML models are needed.
Conclusions
The case studies above hopefully shine a light on the sheer vastness of interesting applications of ML in the battery space.
ML is capable of speeding up and revolutionizing the way many things are done in the field of batteries, but the primary purpose its serving now is to greatly minimize the test matrix for materials, and therefore provide time slashes on current trial-and-error approaches in battery materials research.
Matured applications are estimation of battery metrics like SOH, state of charge, and remaining useful life.
Another observation you may have already made is that the lion’s share of research work to date remains in lithium-ion batteries. This is to be expected with its high adoption and relevance today, but that also leaves a big opportunity to start bringing over knowledge gained from the one area to other battery types.
Some interesting applications that I didn’t touch on for the sake of readability are text mining, deriving reaction mechanisms from electrochemical measurements like cyclic voltammetry, battery manufacturing, and more.
If you’re interested in the first one however, I recommend checking out this paper about generating a database of battery materials based on extracted info from scientific papers.
Thanks for reading my article! I hope you enjoyed learning about how ML is beginning to make traditional approaches in battery development more of an optimization problem than a chance for researchers to rack their brains.
Feel free to connect with me on my socials below.
LinkedIn | YouTube | Newsletter | Twitter