当前位置：网站首页>When transformer encounters partial differential equation solution

When transformer encounters partial differential equation solution

2022-06-27 00:01:00 【Shengsi mindspire】

This article shares with you the recent reading Transformer Solving partial differential equations Choose a Transformer: Fourier or Galerkin, The paper has been NeurIPS2021 receive .

Background introduction

In our world , From the motion of stars in the universe , To the weather forecast of temperature and wind speed , And then to the interaction between molecules and atoms , A lot of engineering 、 Natural science 、 Both economic and business processes can be solved by partial differential equations （PDE） describe . Traditional approach , Such as finite element 、 Finite difference 、 Spectral method, etc , Using discrete structure, the infinite dimensional operator mapping is simplified to a finite dimensional approximation problem . In recent years, physical information neural network （PINN） Wait for the model [1], By sampling in the solution space , Training neural networks to approximate PDE Explain . But for traditional methods or physical information neural networks, etc , A slight change in boundary conditions or equation parameters , It usually requires recalculation and training .

by comparison , The goal of operator learning is to learn the mapping between infinite dimensional function spaces , In this way, the partial differential equations can be solved without retraining , Thus greatly saving computing resources .PDE Operator learning in solving （operator learner） It is a new research direction with vigorous development at present , The typical representative is Fourier neural operator （FNO）[2].

With NeurIPS2021 Release of , be based on Transformer Operator learning articles 《Choose a Transformer: Fourier or Galerkin》[4] For parameterization PDE A new explanation is given for the solution of , In the end, we achieved state-of-the-art Result .

Main work

In this paper ,operator learner Use supervised learning training , Training samples are obtained by sampling input function and output function on the same discrete grid points , As shown in the figure below , The solution of the equation can be transformed into seq2seq Question and pass Transformer[3] Modeling .

chart 1 operator learner schematic

be based on Transformer The job of , The main contributions of this paper are as follows ：

1. nothing softmax The attention mechanism of . Put forward scale-preserving Self attention mechanism and none softmax Of attention, The mathematical explanations of the two schemes are given .

2. A parameterized PDE Of operator learner. The new attention operator is compared with FNO Combine , Significantly improved in parameterization PDE Accuracy in solving benchmark problems .

3. State-of-the-art experimental result . In three benchmark in , The accuracy and performance of the solution are greatly improved .

Pipeline

chart 2 A two-dimensional operator learner Network structure

operator learner The network structure is shown in the figure above , It mainly includes the following modules ：

1. Feature extractor （Feature extractor）： One dimensional problems use feedforward neural networks 、 Two dimensional problems use CNN Network, etc ;

2. Interpolation based CNN（Interpolation-based CNN）： On the sampling / Lower sampling layer and CNN The stack of gets ;

3. Location code （Positional encoding）： The Cartesian coordinates of each grid point are connected to the input data as additional feature dimensions .

4. decoder （Decoder）： The representation features learned by the encoder are mapped back to the original dimension .

Among them, network training loss Function as follows ：

The main body of the loss function is the network output and label Between MSEloss, in addition loss Additional output and label Difference between regular terms .

among Fourier and Galerkin Type of Transformer The calculation method is as follows ：

chart 3 Fourier Attention

chart 4 Galerkin Attention

experimental result

1. Burger’s equation

The equation is defined as follows ：

The task in this article is from the initial moment （t=0） obtain t=1 The moment of solution u, Model and FNO The comparison of is shown in the following table , The accuracy of the results is better than that of the FNO.

2. Darcy flow problem

The equation is defined as follows ：

The problem is defined from two-dimensional random geometry coefficients a, To a two-dimensional solution u Mapping . Model and FNO The comparison of is shown in the following table , The accuracy of the results is better than that of the FNO.

While comparing the accuracy of the model , The performance of the model is also compared , The comparison results are as follows , among Galerkin Attention The way of Transformer It has obvious advantages in memory occupation and performance .

Thinking and summary

Galerkin Transformer From a mathematical point of view Attention Mechanism , And it is introduced into parameterization by combining it with operator learning PDE To solve the problem , The accuracy and performance are better than those of operator learning “ Big brother ”FNO. Later, it can be used in higher dimensional and more complex scenes , Verify the validity of the model .

Reference

[1] Raissi M, Perdikaris P, Karniadakis G E. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations[J]. Journal of Computational Physics, 2019, 378: 686-707.

[2] Li Z, Kovachki N, Azizzadenesheli K, et al. Fourier neural operator for parametric partial differential equations[J]. arXiv preprint arXiv:2010.08895, 2020.

[3] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Advances in neural information processing systems. 2017: 5998-6008.

[4] Cao S. Choose a Transformer: Fourier or Galerkin[J]. arXiv preprint arXiv:2105.14995, 2021.