Our Latest News

Knowledge embedding and knowledge Discovery in scientific machine learning

Knowledge embedding and knowledge Discovery in scientific machine learning

On January 11th, at the Machine Heart AI Technology Annual Conference, Professor Zhang Dongxiao, the lecturer, academician of the US National Academy of Engineering and executive vice president of the Institute for Advanced Study of Eastern Technology, delivered a keynote speech “Knowledge Embedding and knowledge Discovery in Scientific Machine Learning”. In the speech, he briefly introduced the cutting-edge technology of data-driven model. After that, it focuses on the data-driven model guided by theory — knowledge embedding, and the data-driven model mining — knowledge discovery. Academician Zhang pointed out that machine learning algorithms can effectively solve problems with complex nonlinear mapping relationships; By introducing industry knowledge, the machine learning model can be effectively improved. Combining the embedding of knowledge and the discovery of knowledge to form a closed loop can greatly improve the ability of AI to solve practical problems.

The following is Zhang Dongxiao’s speech at the Heart of the Machine AI Technology Annual Conference, which has been edited and arranged by Heart of the Machine without changing the original intention:

I am very glad to have the opportunity to attend the Heart of the Machine Online AI Technology Annual Conference and share with you some of our recent thoughts. Today I share three parts, the first part is data-driven model; The second part is the data-driven model guided by theory, namely knowledge embedding. The third part is data-driven model mining, namely knowledge discovery.

First, data-driven model

First of all, you’re very clear about the model-driven approach, where you go through a model and you get an output. When we learn to program, whether it’s a complex algorithm or a simple algorithm, we’re building a model so that if there’s an input, there’s an output. Of course, the algorithm can be deterministic or random.

On the other side are data-driven patterns. We don’t yet know the mapping between the input and output of this pattern, but we do have the data. If we use the data, we learn, we can establish a mapping between input and output. Of course, this mapping might be a black box, it doesn’t have to be a displayed expression. But if we have a mapping, we have a new input, which gives us a new output. This is the data-driven approach at the heart of the current generation of machine learning.

We have to solve the question is it data driven or model driven? This is a question worth thinking about.

First of all, let’s take a look at some data-driven examples, such as the familiar big data analysis, data science machine learning and so on. In the balance of data and model, this is data focused, looking for mapping relationships through data.

Here are a few examples, such as the prediction of renewable energy generation. If we have irradiation, temperature, humidity, wind speed, day and night conditions, and historical PV production data, we can build a mapping from these data. Based on this mapping and weather forecast data, we can predict how much PV will produce the next day. The same goes for wind power.

Mapping relationships can be established by various methods, such as support vector machines, convolutional neural networks, or cyclic neural networks. Its core is to find the complex mapping relationship between multiple input variables and target variables, so as to construct the model it predicts.

This is a case study of a centralized photovoltaic power plant with the information just mentioned. We can process it, build a mapping and make predictions about the future. In the end, the next day forecast was 97% accurate.

The same is true of wind power, of course. Because there’s also a good correlation between historical wind power generation and historical wind speed data. So, if you can build a mapping relationship between them, you can make a prediction of electricity generation.

Data-driven models work well in many problems, but for many applications, data is not readily available. For example, for the important side well curve of underground resource exploration and development, it may cost tens of millions of yuan to drill a well for measurement; Another example is that it takes a long time to do a set of adsorption analysis experiments. It is difficult to obtain sufficient data to model such problems based on a data-driven approach.

As we all know, big models need big data and big computing power. The more famous GPT-3 has 96 layers, more than 10,000 hidden layers of dimensions, and nearly 175 billion parameters. To train such a model, the cost is very high and the amount of data required is very large.

In addition, the indicators of data-driven models often have limitations, such as the commonly used MSE (mean square error), which is an average measure of error. It does not distinguish the physical process of error. For example, whether a system increases in entropy or decreases in entropy, it is the same for MSE, although entropy increase and entropy decrease are very different for a physical system. Indicators based on the average sense of data tend to ignore physical processes, such as we have one foot on the ice and one foot on the fire, the average temperature may be comfortable, but it is not the case. MSE, on the other hand, only looks at the average result of such data. Therefore, in practice, data driven indicators such as MSE are often limited.

On the other hand, because many of the models we build are lack of common sense, it does not have a variety of knowledge of the human world, so it is easy to attack. For example, in the anti-sample problem, the picture was originally a panda, but with a slight noise, the machine might think it was a gibbon. Or this Arabic numeral, for humans, with a little bit of noise, we still think it’s 8 or 9; But the machine probably doesn’t, because a lot of the time it doesn’t have common sense.

Since data-driven is problematic, could it be possible to model based on knowledge, as in the early days of AI? However, for many complex problems, it is difficult to construct models based solely on knowledge. This is also one of the reasons why data-driven AI models are now widely promoted.

or many industries, such as energy, model robustness and interpretability are highly required, while data collection is time-consuming and costly, and the system is extremely complex, with a large number of high-dimensional nonlinear mapping relationships. These characteristics lead to the pure data driven or knowledge driven model cannot achieve satisfactory results. To solve this problem, we hope to build a dual-driven model of knowledge and data by flexibly utilizing the knowledge accumulated over the years in the energy industry, improve the accuracy and robustness of the model, and reduce the demand for data.

Here we propose a concept of intelligent energy, which is a technical system built based on domain knowledge, observational data and artificial intelligence methods.

The convergence of domain knowledge and data drive involves two aspects. One is the embedding of knowledge, that is, how to build AI models with general knowledge of physics. By embedding domain knowledge into the AI model, on the one hand, the strong fitting ability of machine learning can be used to describe the high-dimensional complex mapping relationship between variables, improving the accuracy of the model. At the same time, the prior knowledge of the industry is used to ensure that the prediction results conform to the physical mechanism and do not violate common sense. This is the role of knowledge embedding in machine learning.

On the other hand is the use of scientific machine learning to discover knowledge, namely knowledge discovery. Knowledge discovery is the use of deep learning to explore the principles of physics, directly mining governing equations from observational data or experimental data, to advance the frontiers of human cognition. Knowledge embedding and knowledge discovery can form a closed loop to realize the integration of knowledge and data.

In the second part I will focus on knowledge embedding, which is how to build an AI model with general knowledge of physics. The third part is about knowledge discovery, how to use artificial intelligence to discover new knowledge, such as physics principles, governing equations, first principles and so on. With such a closed loop, many problems such as simulation, inverse problems, interpretability, etc., can be solved well.

Ii. Data-driven Model guided by Theory (knowledge Embedding)

Let’s look at knowledge embedding in Part two. In this process, there is both data and models, and it is a balance between the two approaches. We want to do both and embed knowledge in the whole process of data-driven modeling.

The purpose of knowledge embedding is to build a physically reasonable, mathematically accurate, computationally stable and efficient machine learning model by introducing physics knowledge into a data-driven model. Therefore, the core problems to be considered include the embedding of complex governing equations, the embedding of general knowledge beyond governing equations, the embedding of knowledge of irregular physical fields, and the automatic adjustment strategy of regular term weight in loss functions, etc.

Knowledge embedding can be carried out in multiple links of the modeling process. For example, in the process of data preprocessing, physical constraints and human domain knowledge and prior experience can be embedded. This is often related to feature engineering and data normalization. In addition, in the link of model structure design, the network structure or topology structure of the model can also be adjusted based on domain knowledge. Moreover, domain knowledge can also be embedded in the optimization and adjustment of the model, for example, knowledge can be embedded through punishment and incentive in the learning process, among which the simplest method is to build a specially designed loss function. Let’s take a few examples.

The first example is the prediction of electrical loads in a power system. Knowledge embedding in this work is mainly reflected in the aspects of data preprocessing and model feedback update. In the aspect of data preprocessing, we introduced a method of power load ratio decomposition to embed knowledge, and used a self-developed EnLSTM model to optimize the feedback update link. This model uses domain algorithm to improve the optimization process.

In the aspect of data preprocessing, we decompose the power load data into a large trend and local disturbance. The large trend reflects the internal pattern of the predicted region, such as energy structure and population structure, and is determined according to historical data and expert experience. Local disturbance is the change of the system under the influence of external driving force such as weather, which is predicted by data-driven model. Finally, large trends and small perturbations are combined. In addition, we also use a load ratio conversion method to achieve data stabilization. The period of the ratio is determined by a physical process. I’m not going to go into the details of this, but if you’re interested you can look at our paper TgDLF published in 2021.

This method has been tested in 12 districts in Beijing. Based on the real hour-level data of more than 3 years, the model is trained with the data of some districts and the other districts are predicted. For example, on the right is the forecast result of the power load in Fengtai District. A total of 1,362 days of forecast results are shown in the figure. Five parts are zoomed in. The black line is the actual measurement, the red line is the predicted value, and the gray is the confidence interval. We see this effect is still very good, the accuracy rate can be stable to more than 90%. It should be noted that we did not use platform data in the training of the model, but only used the surrounding area for training.

In addition, knowledge can also be embedded in the effect evaluation stage of the model. For example, in the problem of wind power generation, we embed the information contained in the probability distribution into the data-driven model as constraints, so as to expand the optimization loss function with the help of a prior probability density function.

It is known that in wind power generation, the generating power of the fan has a great relationship with the wind speed. Due to the complex actual working conditions, the curve is not a one-to-one mapping, but needs to be represented by the probability distribution function describing the relationship between wind speed and generating power. If we have historical data, we can get a prior wind power curve from the historical data, and then embed it into the training process of the model by modifying the loss function. The artificial intelligence model established by this method not only has the advantage of being data-driven, but also ensures that the output results conform to the prior probability distribution.

This is a prediction, and the actual results are very good.

In the case of noise, the model with a prior probability density distribution function embedded is much better than the pure data-driven model. This shows that the noise resistance and robustness of the model can be effectively improved by embedding domain knowledge.

Then we introduce the method of embedding domain knowledge in the model effect evaluation. There are sufficient examples in this aspect. The governing equation is mainly embedded into the artificial intelligence model as a constraint based on the method of improving loss function.

In the training process of the model, on the one hand, the data-driven model can be used to learn based on a large number of data; on the other hand, we also have the governing equations, physical laws, engineering theories, expert experience and other prior information. If we can embed these prior information into the data-driven model, we can not only fit the data, but also ensure that the output results of the model follow the physical criteria and engineering theories, so that the model has better generalization ability.

For example, we have observational data, governing equations, boundary conditions, initial conditions, engineering control criteria, and expertise. These factors can be converted into different regularization terms in the loss function, thus constraining the output of the model. A loss function constructed in this way has multiple regular terms, each preceded by a coefficient. Why do you need coefficients? Because the actual physical meaning of each term is different, the dimensions are often different. You can’t just add them together in this situation. So these weights are very important, and that’s why some people solve a lot of problems with a framework like this and find it works, but some people solve other problems and find it doesn’t work. That is because the process is not a simple and straightforward accumulation, the design and adjustment of weights is very important. If done well, this approach to embedding knowledge can improve the predictive power of the model and has a strong generalization ability.

This coefficient may change during learning. For example, if the data is sufficient, the weight of the regular term corresponding to the data in the interpolation problem is very large. If you don’t have enough data, or if you don’t have accurate data, or if you’re extrapolating, the governing equation is very important. However, the function of the equation is not isolated. If you give me an equation, it’s actually like giving me a trajectory, a very complicated trajectory, and it’s floating around in the sky with uncertainty. We also have to have boundary conditions or initial conditions to get the equation in the right orbit. If you don’t have these boundary conditions, initial conditions, then the trajectory is going to be erratic.

Let’s give an example of the importance of knowledge. If we had a model to predict the temperature in Shenzhen these two days, what would you say it would be today? If you say minus 10℃, in the whole country, people may not feel what kind of temperature, but this is never in Shenzhen. Experts tell you that the lowest temperature in Shenzhen’s history is a few tenths of a degree, not even below zero. In this case, adding expert experience to the model can be very helpful.

Let’s illustrate this problem with a series of examples. For example, in the case of groundwater flow, if we have the data for an initial period of time, but change the boundary conditions later, that is, the scenario has changed a lot, and there is no observation data for a later period of time, then the conventional data-driven model cannot be solved. However, if the boundary conditions and governing equations are known, combined with the data of the initial time, the prediction can be made, and the actual effect is very good, as shown in the comparison results in the figure. Since the conditions of the flow field have changed, the data distribution of the observed data in the beginning period has been inconsistent with that in the later period. However, the data in the two periods are constrained by the same governing equation. Therefore, if the data-driven model is directly used to predict, there will be a large error. However, if domain knowledge is added, such as governing equations and boundary conditions, the accuracy of the model will be effectively improved. The governing equations and boundary conditions play a big part in this problem.

    GET A FREE QUOTE

    FPGA IC & FULL BOM LIST

    We'd love to

    hear from you

    Highlight multiple sections with this eye-catching call to action style.

      Contact Us

      Exhibition Bay South Squre, Fuhai Bao’an Shenzhen China

      • Sales@ebics.com
      • +86.755.27389663