Convolutional neural networks for the shape design of a magnetic core for material testing: Forward and inverse approaches

Abstract

In this paper CNNs are used for solving an optimization problem with two different approaches: CNN is used as a surrogate model of the forward problem, inserted in an optimization loop governed by a genetic algorithm, in the first approach, while a CNN is trained for solving directly the inverse problem in the second approach. The case study is the shape design of a magnetic core used for material testing.

Keywords

Convolutional neural networks inverse problems material testing magnetic field finite elements

1. Introduction

In the area of non-destructive testing, a magnetic field is applied to the object under test and any resulting changes of magnetic flux in the region of interest can be detected. Either direct or alternating fields can be applied. Localized phenomena such as surface or sub-surface cracks in ferrite steels can be detected by means of a flux linkage variation [1–7]. In this contest, the shape design of the magnetic core of the electromagnet is particularly important in view of a deep and homogenous flux density induced in the material under test. Traditionally, the problem is treated in terms of optimization procedures based on Finite Element Method (FEM) for the field analysis. Recently, deep learning techniques have been used to build surrogate models for field analysis of electromagnetic devices [8–10]. Convolutional Neural Networks (CNNs) are neural networks which allow to evaluate a field quantity (either a value or a vector of values like e.g. a field distribution), starting from an image [10,11]. The training of these CNNs is usually done with numerical analyses and in particular the FEM is used. Once a neural network is trained, it can be used as a surrogate model and inserted in an optimization loop. This way, the computational burden is limited to the net training, while the evaluation of the field quantities during the optimization loop is rather inexpensive. Another fascinating and innovative way to use CNNs for solving optimization problems is to invert the net: given a field quantity or distribution, geometry is drawn by the net [12]. To this end, transposed 2D convolution layers are used in the net solving the inverse problem. In this paper, these new approaches are applied to an optimization problem of shape design of an iron core used for material testing.

2. The case study

Let us consider an iron-cored electromagnet which is widely used in magnetic non-destructive testing of ferromagnetic materials. It is assumed that the electromagnet is operated by DC and with adjustable geometry. The electromagnetic device is schematically shown in Fig. 1 (due to the symmetry, only right part is shown). The sensitivity of detection for any portion of a component being tested varies not only with the distance between the pole pieces but also with the shape of the ferrite core. Hence, correct design of the overall unit is essential in order to ensure the highest system sensitivity. The question we ask is as follows: what is the electromagnet’s optimal shape when applying it for energizing a magnetic field in a ferrite steel plate under test. It is obvious that the magnetic flux density should be as high as possible on the reverse side of the plate under test.

Fig. 1.

(a) Geometry of the iron core electromagnet with ferrite steel plate, the sampling points (blue dots) for the magnetic induction field are highlighted (b) exemplary finite element mesh (c) magnetic field map.

In order to reduce the computational burden, after neglecting the edge effects, the field analysis can be done by using a simplified 2D FEM model. Such an approach is commonly used in practice, when at the beginning a rough solution is achieved by using a 2D model, and then final 3D model could be built around the previously obtained results [13,14]. Comsol Multiphysics software is used and a magnetostatic problem is solved [15]. In Fig. 1(b), an exemplary finite element mesh is shown. The mesh used for calculations consists of around 55,000 elements. The calculation domain has been truncated by applying the infinite elements on the outer edge of the calculation area. In the calculations it is assumed that the electrical parameters of the coil are: electrical conductivity 60 MS/m, number of coil turns N = 500, wire diameter w_d = 0.5 mm. For the purpose of reducing the computational cost, the subsequent calculations have been performed with linear materials, i.e. μ_r = 2000 was taken for the iron-cored electromagnet, while μ_r = 1000 for the ferrite steel plate.

In a single optimization process the iron core electromagnet geometry is adjustable with the help of four design variables, i.e.: k, l, m and 𝛼, whereas core thickness d, and air gap w, are constant.

3. The inverse problem

Since the core geometry greatly affects the detection sensitivity of the device, the problem is to find the optimal core shape which gives the maximum average value of magnetic flux density on the reverse side of the slab under test (boundary Γ).

The region of interest, the boundary Γ, is defined as in Fig. 1a. The magnetic flux density is evaluated in ten points on the boundary Γ. Four design variables x = [k, l, m, 𝛼] are considered (see Fig. 1). The variation range of the design variables is shown in Table 1.

Table 1
Variation range for the design variables

k (mm) l (mm) m (mm) 𝛼 (deg)

Minimum value 6 11 5 −70

Maximum value 60 70 40 70

	k (mm)	l (mm)	m (mm)	𝛼 (deg)
Minimum value	6	11	5	−70
Maximum value	60	70	40	70

The objective function f is defined as follows $\begin{eqnarray}\displaystyle f(x)=\frac{1}{|{\rm\Gamma}_{i}|}\int _{{\rm\Gamma}_{i}}B_{t}(x)d{\rm\Gamma}_{i}, & & \displaystyle\end{eqnarray}$ (1) where Γ_i is the interior side of the Γ boundary; discretizing the boundary Γ_i in n_p = 10 points, one obtains the discretized version of (1) $\begin{eqnarray}\displaystyle f(x)=\frac{1}{n_{p}}\mathop{\sum }_{k=1}^{n_{p}}B_{t}(x_{k}), & & \displaystyle\end{eqnarray}$ (2) where x_k lies in the grid discretizing the slab boundary.

For the sake of a comparison, the optimization problem has been first solved in a classical way: a genetic algorithm is applied and the field analysis is performed with FEM.

As an optimization tool a modified version of the GA presented in [16] has been implemented in MATLAB, while the field equations have been solved in Comsol Multiphysics software. At the beginning of the optimization process, genetic algorithm parameters, electromagnetic core parameters, the objective function (as a magnetic field density B), and limitations are defined. Next, with the help of floating point number generator, the initial population is randomly generated and encoded. Objective function is calculated for all the candidates. Next, a selection process is then carried out, which in this case bases on roulette-wheel selection. In all the cases selection coefficient is equal to 0.5, which means, that exactly half of the population size passes to the next process called crossover. After the crossover process, random changes are introduced (i.e. mutation). The purpose of the introduced random variations is to improve candidates, making them into more efficient solutions. The mutation rate in all the cases has been set to 0.2. Digital offsprings go to the next generation and become a new set of candidate solutions, which are subjected to the second round of objective function evaluation. This process repeats. The expectation is that the average objective function value of the population will increase each round, and so by repeating this process very good solutions to the problem will be obtained. The process stops when the maximum number of generations is reached and finally the best solution is stored. The flowchart of the optimization procedure is shown in Fig. 2.

Fig. 2.

The flowchart of the optimization procedure.

4. CNN-based approaches

In order to solve the optimization problem, two different approaches are implemented: the forward and the inverse approach. In the forward approach, the field analysis is based on a surrogate model and a genetic algorithm is applied for solving the optimization problem [17]. The objective function used by the optimization algorithm is evaluated by means of the surrogate model. In the inverse approach a surrogate model of the inverse problem is performed.

To this end, two CNNs are built and trained. The first CNN (CNN-1), relevant to the forward approach, receives an image of the geometry as input and returns the magnetic field profile along Γ boundary as output. The genetic algorithm uses the CNN-1 for the objective function evaluation.

In turn, the second CNN (CNN-2), relevant to the inverse approach, receives the vector of the magnetic field profile prescribed along Γ boundary and returns the corresponding geometry image of the iron core.

For implementing the CNN approach, an appropriate database has been built by solving, with the help of FEM, 25,000 geometries for the forward approach (CNN-1) and 70,000 for the inverse approach (CNN-2). As far as the definition of a suitable control region, a sub-domain Ω_c is considered (see Fig. 3a): for each analyzed geometry the corresponding image of 80 × 140 pixels representing Ω_c is stored with the relevant magnetic field profile. The resolution is chosen in a way on one hand to reduce the number of pixels as much as possible and, on the other hand, to be able to appreciate the image details.

The stored image relevant to Ω_c contains only the core of the device, cutting out the slab and the feeding coil, and it is a black/white image (matrix composed of 1 or 0 entries). In Fig. 3 both the images of the domain of the FEM and the corresponding image, with lower resolution and stored in the database, are shown.

Fig. 3.

Example of a possible geometry: image of the domain of the FEM, the control sub-region is highlighted (a) and B/W image 80 × 140 pixels (b).

4.1. Forward approach

The convolutional neural network CNN-1 is composed of 18 layers, in which 4 blocks can be highlighted. Each block is composed of a convolutional layer, a batch normalization layer [18] and a Rectified Linear Unit ReLU function, as shown in Table 2.

The ReLU function is one of the most used activation function for CNN because it showed good performances in training this kind of neural networks in terms of avoiding overfitting [11].

Table 2
CNN-1 structure

Input Image based input (size 80 × 140)

Block 1 Convolution 2D (size 3 × 8)

Batch normalization

ReLU activation function

Average pooling 2D (size 2 × 2)

Block 2 Convolution 2D (size 3 × 16)

Batch normalization

ReLU activation function

Average pooling 2D (size 2 × 2)

Block 3 Convolution 2D (size 3 × 32)

Batch normalization

ReLU activation function

Block 4 Convolution 2D (size 3 × 32)

Batch normalization

ReLU activation function

Dropout (20% probability)

Fully connected layer (10 outputs)

Output Regression layer

Input	Image based input (size 80 × 140)
Block 1	Convolution 2D (size 3 × 8)
	Batch normalization
	ReLU activation function
	Average pooling 2D (size 2 × 2)
Block 2	Convolution 2D (size 3 × 16)
	Batch normalization
	ReLU activation function
	Average pooling 2D (size 2 × 2)
Block 3	Convolution 2D (size 3 × 32)
	Batch normalization
	ReLU activation function
Block 4	Convolution 2D (size 3 × 32)
	Batch normalization
	ReLU activation function
	Dropout (20% probability)
	Fully connected layer (10 outputs)
Output	Regression layer

The convolutional layers are characterized by filters of size 3 × 3. The number of filters vary from 8 to 32, depending on the block. Two average pooling layers with filter of size 2 × 2 are applied in order to obtain a more stable solution.

At the end of the structure a dropout layer is used and a fully connected layer followed by the regression layer allows to obtain a vector of 10 elements as output.

4.2. Inverse approach

Since there are relatively few input data (field vector 1 × 10) while the output data are more than 11,000 values (image matrix), a typical upsampling problem should be handled, in view of a reliable training.

To this end, the convolutional neural network CNN-2 is composed of 25 layers, three of which are transposed 2D convolutional layers, while the same block as in CNN-1 is used 7 times. Transposed convolutional layers are used to upsample the input vector. The transposed convolution operation implements the convolution operator on a modified (enlarged) input image. Hence, given an input image and a filter, the input image is first modified by inserting zeros in between rows and columns and eventually around the input image. Then, the filter is applied to this enlarged image i.e. the filter is slided across the image, implementing the convolution operator. The result of this operation is an image as output which is larger than the input image.

The CNN-2 structure is represented in Table 3.

Table 3
CNN-2 structure

Input Image based input (size 1 × 10)

Transposed 2D convolution Transposed convolution 2D (size 80 × 140 × 10)

Block 1 Convolution 2D (size 3 × 64)

Batch normalization

ReLU activation function

Block 2 Convolution 2D (size 3 × 32)

Batch normalization

ReLU activation function

Block 3 Convolution 2D (size 3 × 16)

Batch normalization

ReLU activation function

Block 4 Convolution 2D (size 3 × 8)

Batch normalization

ReLU activation function

Block 5 Convolution 2D (size 3 × 4)

Batch normalization

ReLU activation function

Max pooling 2D (2 × 2)

Transposed 2D convolution Transposed convolution 2D (size 41 × 71 × 2)

Block 6 Convolution 2D (size 3 × 2)

Batch normalization

ReLU activation function

Max pooling 2D (1 × 2)

Transposed 2D convolution Transposed convolution 2D (size 41 × 71 × 1)

Block 7 Convolution 2D (size 3 × 1)

Batch normalization

Clipped ReLU activation function

Output Regression layer

Input	Image based input (size 1 × 10)
Transposed 2D convolution	Transposed convolution 2D (size 80 × 140 × 10)
Block 1	Convolution 2D (size 3 × 64)
	Batch normalization
	ReLU activation function
Block 2	Convolution 2D (size 3 × 32)
	Batch normalization
	ReLU activation function
Block 3	Convolution 2D (size 3 × 16)
	Batch normalization
	ReLU activation function
Block 4	Convolution 2D (size 3 × 8)
	Batch normalization
	ReLU activation function
Block 5	Convolution 2D (size 3 × 4)
	Batch normalization
	ReLU activation function
	Max pooling 2D (2 × 2)
Transposed 2D convolution	Transposed convolution 2D (size 41 × 71 × 2)
Block 6	Convolution 2D (size 3 × 2)
	Batch normalization
	ReLU activation function
	Max pooling 2D (1 × 2)
Transposed 2D convolution	Transposed convolution 2D (size 41 × 71 × 1)
Block 7	Convolution 2D (size 3 × 1)
	Batch normalization
	Clipped ReLU activation function
Output	Regression layer

5. Results

A quantitative method for evaluating the goodness of the net training is the mean square error e calculated on the validation set as $\begin{eqnarray}\displaystyle e=\frac{1}{n_{v}}\mathop{\sum }_{i=1}^{n_{v}}(Y_{i}-Y_{i\_pred})^{2}, & & \displaystyle\end{eqnarray}$ (3) where Y_i is the true value and Y_{i_pred} the predicted value. The error e is, by the way, the loss function minimized by the optimization method when training the net.

A qualitative method to grasp the goodness of the trained net is to plot the true values versus the predicted values e.g. for the training set. The more the points are close to the diagonal line, the more the model prediction is accurate.

5.1. Forward approach

The net CNN-1 has been trained with n = 25,000 samples, n_t = 23,000 of which for the training set and n_v = 2000 for the validation set. The training lasted 200 epochs and the Stochastic Gradient Descent with Momentum (SGDM) method has been applied for training.

The mean square error obtained after training the net is 0.04.

The results relevant to CNN-1 are shown in Fig. 4.

Fig. 4.

(a) Magnetic induction field FEM solutions (coloured continuous line) and predicted values (black star) for different geometries, (b) true values vs predicted values for the validation set.

Figure 4 shows how the net is able to predict the field profile, given the image of the geometry. In Fig. 4a a comparison between true values of field profile i.e. FEM results and values predicted by the network after training is performed for different geometries. In turn in Fig. 4b the distribution of true values versus predicted values is plotted for all points in the validation set: apparently all points are very close to the diagonal (red dashed line) thus showing a good agreement between true and predicted values.

For the sake of a comparison, the optimization problem has been first solved in a classical way: the genetic algorithm described previously is applied and the field analysis is performed with FEM, no network was resorted to. The optimization was based on 12 individuals and run for 150 iterations. The exemplary effects of optimization have been shown in Fig. 5a, whereas results in numerical form have been summarized in Table 4.

Fig. 5.

Results of the genetic algorithm: optimal geometry obtained with the objective function calculation based on (a) FEM, (b) CNN-1.

Subsequently, the same genetic algorithm with the same settings (i.e. 12 individuals and 150 generations) is run but the objective function is now calculated by means of CNN-1, previously trained with FEM. However, during this optimization, no FE calculations are performed.

The optimized geometry so found is shown in Fig. 5b and described in Table 4.

The relevant optimal design variables and objective function are shown in Table 4.

Table 4

Optimization results of the forward approach

	k (mm)	l (mm)	m (mm)	𝛼 (deg)	Obj. fun (T)
Optimal solution FEM	10.1	69.4	5.3	69.5	1.79
Optimal solution CNN	48.1	21.4	19.9	69.9	1.71*

The * indicates that the optimal value of the objective function is re-calculated by FEM, even though the optimization is based on the CNN-1.

As can be noted form Table 4, the values of the objective function obtained with the two methods are close, but the relevant device geometries are different.

5.2. Inverse approach

The net has been trained with n = 70,000 samples, n_t = 60,000 of which for the training set and n_v = 10,000 for the validation set. The training lasted 100 epochs and the ADAptive Moment estimation (ADAM) method has been applied for training.

The mean square error obtained after training the net is 0.042.

CNN-2, after being trained, is able, given a field profile of 10 values as input, to find a cloud of grey and black pixels as shown in Fig. 6b.

Fig. 6.

Example of (a) input and (b) output of CNN-2.

In Fig. 6b the drawing of “true” geometry which gives rise to the field profile in Fig. 6a is over posed to the cloud found by the net.

As can be noted in Fig. 7, given two different profiles, the net is able to find two different geometries.

Fig. 7.

Examples of (a) input and (b) output of CNN-2 regarding two different field profiles.

When the CNN is fed by similar field profiles, similar output (similar clouds) is found. However, different geometries gave rise to similar profiles, as shown in Fig. 8.

Fig. 8.

Example of (a) input (similar field profiles), (b) output (similar clouds) of the CNN-2 and (c) output cloud with the “true” geometries super imposed.

From the Figs 6–8 it can be noted that the there is a part of the output image, specifically a subset of pixels belonging to the shaded cloud, which is darker than the rest of the cloud. This means that there is a more likely probability that the geometry is in that part of the domain: it appears that the position d of the final part of the core (see Fig. 8c) is correctly identified in most of the cases. Also the angle 𝛼 appears to be well identified because the grey cloud is shaped rather precisely in the subregion close to the pole.

It can be stated that the grey cloud represents a kind of probability density function helping to identify the topology of the core, where the grey scale is an indicator of probability. Unfortunately, the CNN is not able to find a unique geometry, given a field profile. This issue is due to the ill-posedness of the problem: because different geometries can give rise to very similar profile, it is not always possible to find a unique geometry, given a field profile [19].

6. Conclusion

In recent papers and in this work, CNNs are showing good capabilities to work as surrogate models in electromagnetics, mimicking field analysis. However, further studies are needed to investigate the computational burden, relevant to the net training, compared to an optimization based on FEMs.

A further approach is proposed in the paper: to use a CNN for solving directly the optimization problem, without the use of FE analysis neither of an optimization algorithm. The CNN is able to reconstruct a cloud of probability density to find the geometry in the given pixel of the cloud. However, due to the ill-posedness of the problem, the CNN is not able to find the correct geometry.

All in one, the paper, by comparing the two different approaches based on CNNs, aims to go one step further in the direction of deep learning applied to optimization problems in electromagnetism.

References

Komorowski

Gratkowski

and Chady

, Choice of the distance between the pole-pieces of the electromagnet yoke in a magnetic method of material testing, AIP Conference Proceedings760 (2005), 602.

Blitz

, Electrical and Magnetic Methods of Nondestructive Testing, Adam Hilger, New York, USA, 1991, ISBN 0-7503-0148-1.

Huang

and Wang

, New Technologies in Electromagnetic Non-destructive Testing, First edn, Springer, 2016, ISBN: 978-981-10-0577-0.

Trimm

, An overview of nondestructive evaluation methods, Practical Failure Analysis3 (2003), 17–31.

Stupakov

Tomáš

and Kadlecová

, Optimization of single-yoke magnetic testing by surface fields measurement, Journal of Physics D: Applied Physics39(2) (2006), 248–254.

Zhao

Roy

Addepalli

and Tinsley

, A review of miniaturised non-destructive testing technologies for in-situ inspections, Procedia Manufacturing16 (2018), 16–23.

Gotoh

Hirano

Nakano

Fujiwara

and Takahashi

, Electromagnetic nondestructive testing of rust region in steel, IEEE Transactions on Magnetics41(10) (2005), 3616–3618.

Khan

Ghorbanian

and Lowther

, Deep learning for magnetic field estimation, IEEE Transactions on Magnetics55(6) (2019), 1–4.

Khan

Mohammadi

M.H.

Ghorbanian

and Lowther

D.A.

, Efficiency map prediction of motor drives using deep learning, IEEE Transactions on Magnetics56(3) (2020).

10.

Lei

Bramerdorfer

Peng

Sun

and Zhu

, Machine learning for design optimization of electromagnetic devices: Recent developments and future directions, Applied Sciences11(4) (2021), 1627.

11.

Goodfellow

Bengio

and Courville

, Deep Learning, The MIT Press, 2016.

12.

Baldan

Di Barba

and Nacke

, Magnetic properties identification by using a bi-objective optimal multi-fidelity neural network, IEEE Transactions on Magnetics57(6) (2021).

13.

Ziolkowski

Kwiatkowski

Gratkowski

and Ziolkowski

, Static analysis of a balanced armature receiver, COMPEL - The International Journal for Computation and Mathematics in Electrical and Electronic Engineering37(4) (2018), 1392–1404.

14.

Tumanski

and Bakon

, Measuring system for two-dimensional testing of electrical steel, Journal of Magnetism and Magnetic Materials223(3) (2001), 315–325.

15.

COMSOL, Inc., Burlington, MA, USA, 2020 [Online]. Available http://www.comsol.com.

16.

Haupt

R.L.

and Haupt

S.E.

, Practical Genetic Algorithms, John Wiley & Sons, Inc., 2003.

17.

Di Barba

, Multiobjective Shape Design in Electricity and Magnetism, Lecture Notes in Electrical Engineering. Springer, 2010.

18.

Ioffe

and Szegedy

, Batch normalization: Accelerating deep network training by reducing internal covariant shift, in: 32nd International Conference on Machine Learning, ICML, Vol. 1, 2015, pp. 448–456.

19.

Neittaanmäki

Rudnicki

and Savini

, Inverse Problems and Optimal Design in Electricity and Magnetism, Oxford University Press, 1996.