Galerkin Method

Table of contents

Video lectures
Finding the best approximation
Weighted Residual Method
Examples
Differential equations using Galerkin
References

Video lectures

Finding the best approximation

Before approximating solutions to differential equations we shall introduce the concept and method on a simpler problem:

Given a polynomial function $U(x)$ of some order, find the best approximation to some target function $u(x)$ in the range $x \in [a,b]$

So $u(x)$ is a known function and we need to find another function $U(x)$ in the given range to approximate $u$ .

First we need to define what is meant with "finding an approximation".

Given that $U(x)$ is a polynomial function of some order we have the following choices to make (degrees of freedom):

We need to chose the polynomial order
We need to choose the constants of the polynomial function

We can define the polynomial function as e.g.,

U(x) = a_1 + a_2 x + a_3 x^2

which is a second order polynomial where $a_i$ are the constants to be tweaked to find the best approximation to $u(x)$ . More generally we can say that the approximation function is defined as

U(x) = \sum_{i=1}^n a_i x^{i-1}

where $n-1$ defines the order of the polynomial and we need to find the constants $a_1, \cdots, a_i, \cdots, a_n$ which give the best approximation.

We can also choose to define $U$ as linear combinations of basis functions using control points $u_i$ :

u(x)\approx U(x)=u_{1}\varphi_{1}(x)+u_{2}\varphi_{2}(x)+\cdots+u_{n}\varphi_{n}(x)=\sum_{i=1}^{n}u_{i}\varphi_{i}(x)

The control point always lie on the function $U$ and we shall see that this has more advantages when we get to later topics.

Now, what does "best approximation" mean? Well, there can be many definitions, but we shall look at it as finding the parameters $u_i$ such that the area between the curves is as little as possible. This is equivalent to saying that on average the error between the functions is 0. Now, let's address the obvious: If we know $u$ , why not just choose the same polynomial or a very high polynomial? The answer is: We do not know what $u$ is in general, this exercise here is just to illustrate the method which, as we shall see, also works the same for approximating solutions to differential equations. Besides, choosing a high polynomial as an approximation comes with computational cost and other numerical problems, there are better ways of achieving higher accuracy as we shall see.

Example

Let the approximation be linear, basically a linear interpolation between the nodal values $u_1$ and $u_2$

U(x)=u_{1}\underset{\varphi_{1}}{\underbrace{\dfrac{b-x}{b-a}}}+u_{2}\underset{\varphi_{2}}{\underbrace{\dfrac{x-a}{b-a}}}

$\varphi_i(x)$ are called basis functions and are Lagrange Polynomials which are used in a type of interpolation called Lagrange Interpolation. We prefer this type of interpolation because $u_i$ become points on the approximation curve.

⚠ Note

We shall later see how the basis functions

\varphi_i

are defined and learn about their properties.

To define an average area between the curves we first define the difference between the curves, a.k.a the residual: $r(x, u_i) = U(x, u_i) - u(x)$

Now our problem becomes on the form: Find $u_i$ such that we minimize the residual $r(x, u_i) = U(x) - u(x)$ on the domain $x \in [a,\ b]$

This is equivalent to minimizing the area between the curves and we can formulate

\boxed{ A = \int_a^b r(x, u_i) w(x) dx = 0 }

Weighted Residual Method

Minimizing the area $A$ in (5) means finding which $u_i$ minimize it. This method is called the weighted residual method, and the $w(x)$ in the equation is the weight function for which there are several choices.

The Ritz method

The Ritz method is a weighted residual method and aims at minimizing the square of the residual. The Ritz method was suggested by the Austrian physicist W. Ritz as a way of obtaining approximate solutions to physical problems fulfilling a minimum principle [3]. Lord Rayleigh published an article claiming that Ritz's idea was already presented in his own prior work, leading to the name Rayleigh-Ritz for this method, used by many authors even to this day. Although it has been shown clearly in [2] that Rayleigh's claim is not justified.

The Ritz method suggests to approximate the actual solution $u$ by a given function $U$ on the form

u(x) \approx U(x) = \sum_{i=0}^n a_i x^i

which then shall minimize the functional

S(a_{i})=\dfrac{1}{2}\int_{a}^{b}r^{2}(x,a_{i})dx

with respect to the parameters $a_i$ .

This is minimized by setting the partial derivatives to zero

\dfrac{\partial S}{\partial a_{i}}=0\Rightarrow a_{i}

which leads to a system of equations to solve for the parameters $a_i$ .

If we instead use the form

u(x) \approx U(x) = \sum_{i=0}^n u_i \varphi(x)

with $\varphi_1 = x, \varphi_2 = x^2, etc.$ we can get the weight function by

\dfrac{\partial S}{\partial u_{i}}=0\Rightarrow w_{i}(x)=\dfrac{\partial r}{\partial u_{i}}

Evaluating the weight function

w_i(x) = \dfrac{\partial r}{\partial u_{i}}=\dfrac{\partial}{\partial u_{i}}\left(\sum_{i=1}^{n}u_{i}\varphi_{i}(x) - u(x)\right)=\varphi_{i}(x)

which becomes

w_i(x) = \dfrac{\partial r(x, u_i)}{\partial u_i} = \varphi_i(x)

and thus we end up with a system of equations to solve for the parameters $u_i$ .

\boxed{ A=\int_{a}^{b}r(x,u_i) \varphi_i(x) dx = 0 }

Ritz method applied on a differential equation

We have not been introduced to the Ritz method in the proper setting (which is what it initially was used for); applied on a minimum principle for solving problems consisting of differential equations, see e.g., [4] for an historical overview of the method and how it led up to the modern method we now know as the finite element method. Thus consider the differential equation

\dfrac{d}{dx}\left(k\dfrac{du}{dx}\right)=f\quad\text{on }x\in[0,1]

with $u(0)=0$ and $\frac{du}{dx}(1)=0$ . Now, set $k=1$ and $f=1$ . Let $u\approx U(x)=u_{1}\varphi_{1}(x)+u_{2}\varphi_{2}(x)$ with $\varphi_{1}(x)=x$ and $\varphi_{2}(x)=x^{2}$ . The exact solution to this problem is $u(x)=x-x^{2}/2$ , so the approximation has a form which contains the exact solution. Now, there exists a minimization principle of (14) such that we minimize the functional

S(u_{1},u_{2})=\dfrac{1}{2}\int_{0}^{1}\left(\dfrac{dU}{dx}\right)^{2}dx-\int_{0}^{1}Udx

This is a variational form of (14) and represents the potential energy of the physical system which (14) models. The solution $u$ to the problem in (14) is the one that minimizes the potential energy in (15). The potential energy is minimized by differentiating the functional with respect to the parameters, $u_{1}$ and $u_{2}$ and setting the derivatives to zero. We will first differentiate and arrive at

\begin{aligned} \dfrac{\partial S}{\partial u_{1}} & =\int_{0}^{1}\left(u_{1}\dfrac{d\varphi_{1}}{dx}+u_{2}\dfrac{d\varphi_{2}}{dx}\right)\dfrac{d\varphi_{1}}{dx}dx-\int_{0}^{1}\varphi_{1}dx=0\\ \dfrac{\partial S}{\partial u_{2}} & =\int_{0}^{1}\left(u_{1}\dfrac{d\varphi_{1}}{dx}+u_{2}\dfrac{d\varphi_{2}}{dx}\right)\dfrac{d\varphi_{2}}{dx}dx-\int_{0}^{1}\varphi_{2}dx=0 \end{aligned}

Now we insert $\varphi_{1}$ and $\varphi_{2}$ and formulate the matrix system

\def\arraystretch{1.5} \begin{bmatrix}\int_{0}^{1}dx & \int_{0}^{1}2xdx\\ \int_{0}^{1}2xdx & \int_{0}^{1}4x^{2}dx \end{bmatrix}\begin{bmatrix}u_{1}\\ u_{2} \end{bmatrix}=\begin{bmatrix}\int_{0}^{1}xdx\\ \int_{0}^{1}x^{2}dx \end{bmatrix}

which after integration leads to

\begin{bmatrix}1 & 1\\ 1 & 4/3 \end{bmatrix}\begin{bmatrix}u_{1}\\ u_{2} \end{bmatrix}=\begin{bmatrix}1/2\\ 1/3 \end{bmatrix}

and thus $u_{1}=1$ and $u_{2}=-1/2$ and subsequently $U(x)=x-x^{2}/2=u(x)$ .

This method can be generalized to use an arbitrary number of approximating functions for a problem which can be written as a variational form or a minimization principle. We summarize the method in general terms, let the approximation have the form

u(x)\approx U(x)=\sum_{i=1}^{n}u_{i}\varphi_{i}(x)

where $u_{i}$ are the unknown parameters to be determined by the method and $\varphi_{i}(x)$ are known functions, typically simple polynomials of trigonometric function chosen cleverly to approximate the differential equation well. The Ritz's method minimizes the functional $S$ by only using functions of the specific form (18), which is straight forward since now the functional is a function of the parameters $u_{i}$ :

S(U)=S\left(\sum_{i=1}^{n}u_{i}\varphi_{i}(x)\right)

and the minimum is obtained by differentiating $S$ with respect to $u_{i}$ and setting the derivative to zero, leading to $n$ equation to be solved for the unknown parameters $u_{i}$ . For our model problem (as an example) we have

\dfrac{\partial}{\partial u_{j}}\left(\dfrac{1}{2}\int_{0}^{1}k\left(\sum_{i=1}^{n}u_{i}\dfrac{d\varphi_{i}(x)}{dx}\right)^{2}dx-\int_{0}^{1}f\sum_{i=1}^{n}u_{i}\varphi_{i}(x)dx\right)=0\quad\forall j=1,...,n

which leads to

\int_{0}^{1}k\dfrac{d\varphi_{j}(x)}{dx}\left(\sum_{i=1}^{n}u_{i}\dfrac{d\varphi_{i}(x)}{dx}\right)dx=\int_{0}^{1}f\varphi_{j}(x)dx

where $u_{i}$ is independent of $x$ and can be pulled out of the integral as such

\sum_{i=1}^{n}\left[\int_{0}^{1}k\dfrac{d\varphi_{j}(x)}{dx}\dfrac{d\varphi_{i}(x)}{dx}dx\right]u_{i}=\int_{0}^{1}f\varphi_{j}(x)dx

this can be written in matrix form using $\frac{d\bm{\varphi}}{dx}=[\frac{d\varphi_{1}}{dx},...,\frac{d\varphi_{n}}{dx}]$ , $\bm{\varphi}=[\varphi_{1},...,\varphi_{n}]$ and ${\bf u}=[u_{1},...,u_{n}]$

{\bf Su=f}

where

{\bf S}=\int_{0}^{1}k\frac{d\bm{\varphi}^{T}}{dx}\frac{d\bm{\varphi}}{dx}dx

and

{\bf f}=\int_{0}^{1}f\bm{\varphi}^{T}dx

Written out for clarity this becomes

\def\arraystretch{2.0} {\bf S}=\begin{bmatrix}\int_{0}^{1}k\frac{d\varphi_{1}}{dx}\frac{d\varphi_{1}}{dx}dx & \int_{0}^{1}k\frac{d\varphi_{1}}{dx}\frac{d\varphi_{2}}{dx}dx & \cdots & \int_{0}^{1}k\frac{d\varphi_{1}}{dx}\frac{d\varphi_{n}}{dx}dx\\ \int_{0}^{1}k\frac{d\varphi_{2}}{dx}\frac{d\varphi_{1}}{dx}dx & \int_{0}^{1}k\frac{d\varphi_{2}}{dx}\frac{d\varphi_{2}}{dx}dx & \cdots & \int_{0}^{1}k\frac{d\varphi_{2}}{dx}\frac{d\varphi_{n}}{dx}dx\\ \vdots & \vdots & \vdots & \vdots\\ \int_{0}^{1}k\frac{d\varphi_{n}}{dx}\frac{d\varphi_{1}}{dx}dx & \int_{0}^{1}k\frac{d\varphi_{n}}{dx}\frac{d\varphi_{2}}{dx}dx & \cdots & \int_{0}^{1}k\frac{d\varphi_{n}}{dx}\frac{d\varphi_{n}}{dx}dx \end{bmatrix}_{n\times n},

{\bf f}=\begin{bmatrix}\int_{0}^{1}d\varphi_{1}dx & \int_{0}^{1}d\varphi_{2}dx & \cdots & \int_{0}^{1}d\varphi_{n}dx\end{bmatrix}_{n\times1}^{T}

Note that the stiffness matrix, ${\bf S}$ , is symmetric, i.e., that ${\bf S}_{ij}={\bf S}_{ji}$ since

\int_{0}^{1}k\dfrac{d\varphi_{i}}{dx}\dfrac{d\varphi_{j}}{dx}dx=\int_{0}^{1}k\dfrac{d\varphi_{j}}{dx}\dfrac{d\varphi_{i}}{dx}dx

and that it is positive definite since

{\bf u}^{T}{\bf S}{\bf u}=\sum_{i=1}^{n}u_{i}\int_{0}^{1}k\dfrac{d\varphi_{i}}{dx}\left(\sum_{j=1}^{n}u_{j}\dfrac{d\varphi_{j}}{dx}\right)dx=\int_{0}^{1}k\left(\dfrac{dU}{dx}\right)^{2}dx>0

as long as we have non-trivial solutions away from the boundary, i.e., $U\neq0$ , recall that the boundary conditions state $U(0)=0$ and $\frac{dU}{dx}(1)=0$ . The positive definiteness guarantees that ${\bf S}$ is non-singular, and thus that the system ${\bf S}{\bf u}={\bf f}$ has a unique solution. The fact that the stiffness matrix is symmetric, positive definitive and non singular is a cornerstone of an efficient numerical method and characteristic for the finite element method.

The strength of the Ritz method lies in that it recovers the optimal solution for the parameters $u_i$ , since it is based on the minimum of potential energy principle and thus any other choice of $u_i$ will take us away from the minimum of potential energy. Additionally, if the exact solution is contained in the approximation, we are certain to recover it using the method. This is however, not the case in general, especially in several dimensions, we can only get an approximation of the solution which is in a sense optimal on average.

The method which Ritz proposed was almost immediately utilized by soviet engineers, such as Timoshenko who first realized the importance of the method for engineering applications and used the method on many interesting problems [5]. Then Ivan Bubnov discovered the method from Timoshenko and used it on shell computations for ship (and submarine) building [6] leading to a manual on ship building with a heavy use of the Ritz method. Bubnov contributed to the development of the finite element method by realizing in [7] that simple approximate solutions could be obtained by multiplying the differential equation by $\varphi_{i}$ and integrate over the entire domain to obtain the same equations that Timoshenko found (Bubnov did not cite the work of Ritz directly). Bubnov's realization gave the recipe for deriving the weak form and deriving the linear system without needing a principle of minimum potential energy as is the case with the Ritz method.

Galerkin method

In 1915, the soviet engineer Boris G. Galerkin (more accurately Galyorkin), suggested a variation of the Ritz method in his most famous publication [1], which is usually quoted when referring to the Galerkin method. Galerkin's main contribution in that paper was to realize that the minimization principle is not needed in order to construct a finite-dimensional system following the recipe by Bubnov. It is noteworthy that Galerkin did refer to this method as the Ritz method. Providing many examples from application maybe is an explanation for why we today know this as a Galerkin method instead of the Ritz method.

Galerkin's method suggests, instead of differentiating $r(x, u_i)$ in (5), we differentiate the approximation $U(x, u_i)$ with respect to the parameters $u_i$

\dfrac{\partial U}{\partial u_{i}}=\dfrac{\partial}{\partial u_{i}}\left(\sum_{i=1}^{n}u_{i}\varphi_{i}\right)=\dfrac{\partial}{\partial u_{i}}\left(u_{1}\varphi_{1}+u_{2}\varphi_{2}+\cdots+u_{n}\varphi_{n}\right)

which becomes

w_i(x) = \dfrac{\partial U(x)}{\partial u_i} = \varphi_i(x)

and thus we end up with the same expression as with Ritz method

A=\int_{a}^{b}r(x, u_i) \varphi_i(x) dx = 0

The Galerkin method states: Find $u_i$ such that

\boxed{\int_a^b r(x, u_i) \varphi_i(x) dx = 0 \quad \text{for } i = 1,2,...,n}

Galerkin for solving differential equations

Considering the same model problem from (14), we multiply the equation by $v$ and integrated over the whole domain by parts to get

\int_{0}^{1}k\dfrac{du}{dx}\dfrac{dv}{dx}dx=\int_{0}^{1} f v dx

Galerkin suggested that given that the trial function or approximation has the form $U(x)=\sum_{i=1}^n u_i \varphi_i(x)$ , the weight function $v$ should have the same form as the trial function, i.e., $v(x) = \sum^n_{i=1} c_i \varphi_i(x)$ , meaning we should simply choose the basis functions as the weight functions, and loosely speaking, setting $v = \varphi_j$ , and in the case of the weak form the derivative of the basis weight function become $\frac{d v}{d x} = \frac{d \varphi_j}{d x}$ such that the weak form in (34) becomes:

\int_{a}^{b}k\dfrac{dU}{dx}\dfrac{d\varphi_{j}}{dx}dx=\int_{a}^{b}f\varphi_{j}dx

which is the same expression obtained using the Ritz method! This holds in a much more general setting but much more importantly: the Galerkin method can be used even if no minimum principle exists. It is always possible to take a general (partial) differential equation and multiply by a test function, integrate over the entire domain (using partial integration) to obtain a Galerkin method. The real strength of Galerkin's method is in its minimization properties (and thus the relation to Ritz' method).

Example: Find an approximate solution to the differential equation

\dfrac{d^{2}u}{dx^{2}}+2\dfrac{du}{dx}=x\quad\text{for }x\in[0,1]

with the boundary conditions $u(0)=u(1)=0$ .

It is possible to show that there is no corresponding minimum principle in the sense we have discussed earlier, and thus Ritz' method cannot be applied. The exact solution to this equation is given by $u=x(x-1)/4$ , and we use the approximation $U=ax(x-1)$ . A weak form can be derived using the recipe of multiplying by $v$ and integrating by parts to get

\int_{0}^{1}-\dfrac{du}{dx}\dfrac{dv}{dx}+2\dfrac{du}{dx}vdx=\int_{0}^{1}x v dx\quad\forall v\text{ such that }v=0\text{ where }u\text{ is known.}

Applying Galerkin's method means using $v=\frac{\partial U}{\partial a}=x(x-1)$ , so we obtain

\int_{0}^{1}-a\left(2x-1\right)^{2}+2a(2x-1)x(x-1)dx=\int_{0}^{1}x^{2}(x-1)dx

which yields $a=1/4$ .

Galerkin orthogonality

Above we mentioned the minimization properties of Galerkin's method. Let us explore what this means and how Galerkin's method relates to minimization, least squares and orthogonality. For a great visual proof that inspired this text see this video by Virtually Passed. Another great explanation of the Galerkin method with special emphasis on the orthogonality is this video by Good Vibrations with Freeball

The least squares method is about minimizing the squared distance of a set of functional values $g(x_{j})$ to an approximation of $g(x)$

g(x)\approx G(x)=\sum_{i=1}^{n}a_{i}\phi_{i}(x),

where $a_{i}$ are the unknown parameters to be decided and $\phi_{i}(x)$ are known functions. In other words, we are trying to fit an approximation $U(x,a_{i})$ to a set of points $g(x_{j})$ . The reader might remember that in the discrete least squares method one minimizes the squares of the difference point-wise:

\min_{a_{i}}\sum_{j=1}^{m}r(x_{j},a_{i})^{2},

where

r(x_{j},a_{i})=G(x_{j},a_{i})-g(x_{j})

One way of getting to the solution and find the parameters $a_{i}$ that minimize the residual is using calculus, we remember that taking the derivative and setting it to zero leads to the minimum. We denote the squared residual

S=r^{2},

and differentiate it using the parameters

\dfrac{\partial S}{\partial a_{i}}=0=2r(a_{i})\dfrac{\partial r(a_{i})}{\partial a_{i}}=0

here the derivatives of the residual are

\dfrac{\partial r(a_{i})}{\partial a_{i}}=\dfrac{\partial}{\partial a_{i}}\left(\sum_{i=1}^{n}a_{i}\phi_{i}(x)-g(x_{j})\right)=\phi_{i}(x)

so we get

\dfrac{\partial S}{\partial a_{i}}=2r(a_{i})\cdot\phi_{i}=0

which is a system of $n$ equations where we solve for $a_{i}$ . This can be written using matrices to express the equation as a linear system which can be solved more efficient, also using some linear algebra we can show the orthogonality more intuitively by using geometric relations.

We notice that $r_{j}=a_{1}\phi_{1}(x_{j})+a_{2}\phi_{2}(x_{j})+...+a_{n}\phi_{n}(x_{j})-g(x_{j})$ , which we can write as

\begin{bmatrix}r_{1}\\ r_{2}\\ \vdots\\ r_{m} \end{bmatrix}_{m\times1}=\begin{bmatrix}\phi_{1}(x_{1}) & \phi_{2}(x_{1}) & \cdots & \phi_{n}(x_{1})\\ \phi_{1}(x_{2}) & \phi_{2}(x_{2}) & \cdots & \phi_{n}(x_{2})\\ \vdots & \vdots & \vdots & \vdots\\ \phi_{1}(x_{m}) & \phi_{2}(x_{m}) & \cdots & \phi_{n}(x_{m}) \end{bmatrix}_{m\times n}\begin{bmatrix}a_{1}\\ a_{2}\\ \vdots\\ a_{n} \end{bmatrix}_{n\times1}-\begin{bmatrix}g_{1}\\ g_{2}\\ \vdots\\ g_{m} \end{bmatrix}_{m\times1}

{\bf r}={\bf A}{\bf x}-{\bf g}

with $g_{j}=g(x_{j})$ . The error to be minimized is defined as

S=\sum_{j=1}^{m}r_{j}^{2}={\bf r}\cdot{\bf r}=||{\bf A}{\bf x}-{\bf g}||

The minimum occurs when

\dfrac{\partial S}{\partial a_{i}}=0\Rightarrow\begin{array}{c} 2\sum_{j}^{m}\frac{\partial r_{j}}{\partial a_{1}}r_{j}=0\\ 2\sum_{j}^{m}\frac{\partial r_{j}}{\partial a_{2}}r_{j}=0\\ \vdots\\ 2\sum_{j}^{m}\frac{\partial r_{j}}{\partial a_{n}}r_{j}=0 \end{array}

which we can write in matrix form, noting that $\frac{\partial r_{j}}{\partial a_{i}}=\phi_{i}(x_{j})$

\begin{bmatrix}\phi_{1}(x_{1}) & \phi_{1}(x_{2}) & \cdots & \phi_{1}(x_{m})\\ \phi_{2}(x_{1}) & \phi_{2}(x_{2}) & \cdots & \phi_{2}(x_{m})\\ \vdots & \vdots & \vdots & \vdots\\ \phi_{n}(x_{1}) & \phi_{n}(x_{2}) & \cdots & \phi_{n}(x_{m}) \end{bmatrix}_{n\times m}\begin{bmatrix}r_{1}\\ r_{2}\\ \vdots\\ r_{m} \end{bmatrix}_{m\times1}=\begin{bmatrix}0\\ 0\\ \vdots\\ 0 \end{bmatrix}_{n\times1}

from this we recognize the transpose of ${\bf A}$ and thus we can have

{\bf A}^{\mathrm{T}}{\bf r}={\bf 0}

inserting ${\bf r}={\bf A}{\bf x}-{\bf g}$ into this we get

\begin{aligned} {\bf A}^{\mathrm{T}}\left({\bf A}{\bf x}-{\bf g}\right) & ={\bf 0}\\ \Rightarrow & {\bf A}^{\mathrm{T}}{\bf A}{\bf x}={\bf A}^{\mathrm{T}}{\bf g} \end{aligned}

from which we get the components $a_{i}$ by

\boxed{ {\bf x}=\left({\bf A}^{\mathrm{T}}{\bf A}\right)^{-1}{\bf A}^{\mathrm{T}}{\bf g} }

If we use linear approximation

U(x)=a_{1}\phi_{1}(x)+a_{2}\phi_{2}(x)

we have

\phi_{1}(x)=1-x

and

\phi_{2}(x)=x

. Which would yield an

m-\text{by}-2

{\bf A}

matrix where

m

is the number of data points to fit the approximation to.

Now that the matrix formulation is introduced we can formulate the problem differently: Find the vector ${\bf x}$ such that

\min||{\bf A}{\bf x}-{\bf g}||^{2}

which can be accomplished by making the approximation ${\bf A}{\bf x}$ as close to ${\bf g}$ as possible. For any non-zero choice of ${\bf x}$ , the transformation by ${\bf A}$ will make the approximation adhere to a plane, a so called column space of ${\bf A}$ . Recall that the matrix ${\bf A}$ is made up by functions $\bm{\phi}_{i}=[\bm{\phi}_{i}(x_{1}),\bm{\phi}_{i}(x_{2}),...,\bm{\phi}_{i}(x_{m})]^{\mathrm{T}}$ such that

{\bf A}=\begin{bmatrix}\bm{\phi}_{1} & \bm{\phi}_{2} & \cdots & \bm{\phi}_{n}\end{bmatrix}_{m\times n}

which means that for any choice of $a_{i}$ the resulting vector ${\bf A}{\bf x}=a_{1}\bm{\phi}_{1}+a_{2}\bm{\phi}_{2}+...$ will always be parallel to the vectors $\bm{\phi}_{i}$ .

We can exemplify the functions $\bm{\phi}_{i}(x)$ as basis functions which span a plane (see the figure) onto which the approximation $U(x)$ must be confined. For any point ${\bf g}_{i}$ a shortest distance to the solution plane is obviously an orthogonal projection as is clear from the figure. This means that the optimal choice of $a_{i}$ will lead to the residual error vector $\bm{r}$ being orthogonal to the basis functions $\bm{\phi}$ . Let ${\bf x}^{*}$ denote the optimal parameters $a_{i}$ such that the approximation vector becomes ${\bf U}={\bf A}{\bf x}^{*}$ and from the figure we see that

{\bf r}={\bf g}-{\bf A}{\bf x}^{*}\perp\bm{\phi}_{1}\perp\bm{\phi}_{2}

since

\bm{\phi}_{i}\cdot\left({\bf g}-{\bf A}{\bf x}^{*}\right)=\bm{0}

but we can express this scalar product as a matrix multiplication

\bm{\phi}_{i}\cdot\left({\bf g}-{\bf A}{\bf x}^{*}\right)={\bf A}^{\mathrm{T}}\left({\bf g}-{\bf A}{\bf x}^{*}\right)={\bf 0}

from which we get

\begin{array}{rcl} {\bf A}^{\mathrm{T}}{\bf A}{\bf x}^{*} & = & {\bf A}^{\mathrm{T}}{\bf g}\\ & \Rightarrow & \boxed{ {\bf x}^{*}=\left({\bf A}^{\mathrm{T}}{\bf A}\right)^{-1}{\bf A}^{\mathrm{T}}{\bf g} } \end{array}

This discrete setting can be generalized into a continuous setting, where we have functions instead of vectors and integrals instead of finite sums. The functions $u$ and $v$ are orthogonal when the integral of their product is zero, i.e.,

\int f(x)g(x)dx=0

this is equivalent of a scalar product. This, so called Galerkin orthogonality is the central idea of the Galerkin method and what Galerkin discovered. This orthogonality concept is what connects to the least squares method, to minimization and to Ritz and to variational calculus.

\boxed{ \int r(x, u_i) \varphi_j(x) dx = 0 }

The residual function $r(x, u_i)$ is orthogonal to the basis function $\varphi_j$ if the above integral holds. From this we find $u_i$ which minimizes the residual error and any deviation from these will increase the residual error.

Examples

Example 1

Consider a function $u(x) = x^2$ in $0 \leq x \leq 1$

We want to approximate the function $u$ using a linear function $U$

U(x) = \varphi_1(x) u_1 + \varphi_2 u_2 = u_1 \dfrac{1-x}{1-0} + u_2 \dfrac{x-0}{1-0}

Thus $\varphi_1 = 1-x$ , $\varphi_2 = x$

We want to determine the function $U(x)$ that minimizes the area between $U(x)$ and $u(x)$ . Using the Galerkin method this means finding $u_1$ and $u_2$ such that

\int_0^1 \left( U(x) - u(x) \right) \varphi_i(x) dx = 0 \quad \text{for } i=1,2

This leads to a system of two equations

\begin{aligned} \left.\begin{array}{c} i=1,&\ \varphi_{1}=1-x\Rightarrow\int_{0}^{1}\left(u_{1}(1-x)+u_{2}x-x^{2}\right)\left(1-x\right)dx=0\\ \\ i=2,&\ \varphi_{2}=x\Rightarrow\int_{0}^{1}\left(u_{1}(1-x)+u_{2}x-x^{2}\right)\left(x\right)dx=0 \end{array}\right\} \end{aligned}

from which we get $u_1=-\frac{1}{6}$ and $u_2 = \frac{5}{6}$ and thus $U(x) = x - \frac{1}{6}$

clear
syms x u1 u2
phi = [x-1, x]
u = [u1, u2]

U = phi*u.'

ue = x^2

r = U-ue

eqs = int((r*phi).', x, 0, 1)==0

[u1,u2] = solve(eqs, [u1,u2])

U = subs(U)

figure
fplot(ue,[0,1],'color','r'); hold on;
fplot(U,[0,1],'color','b');

Example 2

Choosing the approximation $u \approx U = u_1 \varphi_1 + u_2 \varphi_2$ is not the only option- We can increase the polynomial order by adding terms to get quadratic, cubic and even higher order approximations.

We will spend a whole section on studying how to define and create basis functions $\varphi_i$ later. For now let us take a closer look at the quadratic approximation.

We want to approximate $u=x^2$ using quadratic basis functions on the interval $0 \leq x \leq 1$ for which we have the basis functions to be

\begin{aligned} \varphi_{1} & =2\left(x-1\right)\left(x-1/2\right)\\ \varphi_{2} & =-4x\left(x-1\right)\\ \varphi_{3} & =2x\left(x-1/2\right) \end{aligned}

and our approximation becomes

u(x) \approx U(x) = \sum_{i=1}^3 u_i \varphi_i(x)

Using Galerkin, we get 3 equations:

\int_{0}^{1}(U-u_{i})\varphi_{i}dx=0\quad\text{for }i=1,2,3

3 equations and 3 unknowns $\Rightarrow$ unique solution.

clear; close all
syms x u1 u2 u3

The quadratic basis function is defined for $x\in \left\lbrack 0,1\right\rbrack$

phi = [2*(x-1)*(x-1/2)
       -4*x*(x-1)
       2*x*(x-1/2)].'

phi =

\left(\begin{array}{ccc} {\left(2\,x-2\right)}\,{\left(x-\dfrac{1}{2}\right)} & -4\,x\,{\left(x-1\right)} & 2\,x\,{\left(x-\dfrac{1}{2}\right)} \end{array}\right)

u = [u1; u2; u3]

u =

\left(\begin{array}{c} u_1 \\ u_2 \\ u_3 \end{array}\right)

U = phi*u

U =

2\,u_3 \,x\,{\left(x-\dfrac{1}{2}\right)}-4\,u_2 \,x\,{\left(x-1\right)}+u_1 \,{\left(2\,x-2\right)}\,{\left(x-\dfrac{1}{2}\right)}

ue = x^2;

r = U-ue;

eqs = int(r*phi, x, 0, 1) == 0;

We transpose eqs for easier readability

eqs = eqs.'

eqs =

\left(\begin{array}{c} \dfrac{2\,u_1 }{15}+\dfrac{u_2 }{15}-\dfrac{u_3 }{30}+\dfrac{1}{60}=0\\ \dfrac{u_1 }{15}+\dfrac{8\,u_2 }{15}+\dfrac{u_3 }{15}-\dfrac{1}{5}=0\\ \dfrac{u_2 }{15}-\dfrac{u_1 }{30}+\dfrac{2\,u_3 }{15}-\dfrac{3}{20}=0 \end{array}\right)

[u1, u2, u3] = solve(eqs, [u1, u2, u3] )

u1 =

0

u2 =

\dfrac{1}{4}

u3 =

1

U = subs(U)

U =

2\,x\,{\left(x-\dfrac{1}{2}\right)}-x\,{\left(x-1\right)}

Let's simplify this expression

U = simplify(U)

U =

x^2

As we can see our approximation (trial function) becomes the same as the exact function.

figure;
fplot(ue,[0,1], 'Color','r', 'LineWidth',3); hold on;
fplot(U,[0,1], 'Color','b')
xlabel('$x$','Interpreter','Latex');ylabel('$y$','Interpreter','Latex');
set(gca,'FontName','Times')
legend('$u(x)$','$U(x)$','Interpreter','Latex','location','northwest')
grid on

Let's plot the points $u_i$ as well for $x=\left\lbrack 0,1/2,\;1\right\rbrack$

plot([0,0.5,1],[u1,u2,u3],'ob','MarkerFaceColor','b')
text([0+0.05,0.5-0.05,1-0.1],[u1+0.05,u2+0.05,u3-0.05], ...
    {'$u_1$', '$u_2$', '$u_3$'}, 'Interpreter','Latex')

Example 3

If we take a look at the function $u(x) = 2x\sin (2\pi x) +3$ for $x$ in $0 \leq x \leq 1$

We can see from the graph that a linear approximation or even a quadratic approximation will not be able to "approximate" this exactly! In general, even with higher and higher polynomials we would not be able to exactly approximate sinusoids... on the whole domain. Instead of trying to fit one function across the whole domain, we can approximate smaller pieces of it by just limiting the approximation to a set of subdomains. This is called piecewise approximation. We essentially divide the domain into finite elements.

For $x \in [x_1, x_2]$ , let $a=x_1$ and $b=x_2$

Using linear basis functions be have

\varphi = \left[ \dfrac{b-x}{b-a}, \dfrac{x-a}{b-a} \right]

Then we let $x \in [x_2, x_3]$ and $a=x_2$ , $b=x_3$ , etc

1: The Finite Element Method

In the next section, we will see how this method is derived and implemented in Matlab.

Differential equations using Galerkin

References

[1]

B. G. Galerkin, “Series occurring in various questions concerning the elastic equilibrium of rods and plates,” Vestnik Inzhenerov i Tekhnikov, (Engineers and Technologists Bulletin), vol. Vol. 19, pp. 897–908, 1915.

[2]

A. W. Leissa, “The historical bases of the rayleigh and ritz methods,” Journal of Sound and Vibration, vol. 287, no. 4, pp. 961–978, 2005, doi: https://doi.org/10.1016/j.jsv.2004.12.021.

[3]

W. Ritz, “Über eine neue methode zur lösung gewisser variationsprobleme der mathematischen physik.” vol. 1909, no. 135, pp. 1–61, 1909, doi: doi:10.1515/crll.1909.135.1.

[4]

M. J. Gander and G. Wanner, “From euler, ritz, and galerkin to modern computing,” SIAM Review, vol. 54, no. 4, pp. 627–666, 2012, doi: 10.1137/100804036.

[5]

S. P. Timoshenko, Sur la stabilité des systèmes élastiques. A. Dumas, 1914, pp. 496–566.

[6]

I. G. Bubnov, “Structural mechanics of shipbuilding,” 1914.

[7]

I. G. Bubnov, “Report on the works of professor timoshenko which were awarded the zhuranskyi prize,” Symposium of the Institute of Communication Engineers, 1913.