Abstract
The authors explore the possibility of applying a convolutional Naeural Network (CNN) to the solution of coupled electromagnetic and thermal problem, focusing on the classical problem of induction heating systems, traditionally solved by resorting to Finite Element (FE) models. In fact, FE modelling is widely used in the design of induction heating systems due its accuracy, even if the solution of a coupled nonlinear problem is expensive in terms of computational time and hardware resources, notably in 3D analysis.
A model based on CNN could be an interesting alternative; in fact, CNN is a learning model selected for its excellent ability to converge, even when trained with a limited dataset. CNNs are able to treat images as input and they are used here as follows: given a temperature map in the workpiece, identify the corresponding vector of current, frequency and process heating time; this mapping is a model of the inverse induction heating problem. Specifically, we consider as an example the induction heating of a cylindrical steel billet, made of C45 steel, placed in a solenoidal inductor coil exhibiting the same axial length of the billet (TEAM 36 problem). A thorough heating process is usually applied before hot working of the billet, as in an extrusion process, but this methodology can be applied also in the design of induction hardening processes.
First, a CNN has been trained from scratch by means of a dataset of FE solutions of coupled electromagnetic and thermal problems. For the sake of a comparison, a transfer learning technique is applied using GoogLeNet, i.e. a Deep Convolutional Neural Network able to classify images: starting from the pre-trained GoogLeNet, its training has been subsequently refined with the dataset of solutions from FE analyses.
When the training dataset contains a limited number of samples, GoogleNet shows good accuracy in predicting the process parameters; in the case of a high number of samples in the training set, namely beyond a threshold like e.g. 1500, both CNNs show good accuracy of the result.
Introduction
Electrothermal technologies are increasingly applied in various industrial sectors for processing of materials as they are generally very efficient and allow for repeatable and controllable processes. Furthermore, the electrification of thermal treatments often allows for more sustainable industrial processes in terms of enviromental impact and rational use of primary energy [1].
In this article we focus on a numerical model of induction heating devices and on the possibility of using models based on deep learning to create extremely fast models that can be used not only in design but also for the control and optimization of the process itself.
The design of an induction heating system generally requires the optimal sizing of both the physical equipment and the process. Typically, the greatest difficulties are found in the design of the inductor (like e.g. the working frequency, number and density of turns, distance from the workpiece, magnetic flux yokes) which must guarantee meeting the process specifications and at the same time exhibits lumped parameters, e.g. equivalent resistance and inductive reactance, that can be matched with the frequency generator characteristics commonly available in the market. For example, in induction hardening, the process must fulfil specific surface distributions of hardness and depth of hardening.
For some configurations characterized by a simple geometry it is possible to find analytical solutions that foresee the distribution of the heat sources and the consequential thermal profile. In reality, the geometries of the pieces to be treated are often complex and the process can include several phases, for example heating, equalization, final cooling [2].
The use of numerical models for the design of induction heating systems has seen a rapid evolution in recent years and they have become a prevailing method of designing electrothermal systems in the industrial sector [3]. In fact, numerical modelling has made it possible to significantly reduce design times and costs [4–6]. In turn, before the use of advanced and reliable software, design was based on the experience of the designer who could rely on formulae based on highly simplified analytical models.
An induction heating process involves various physical phenomena, in particular the process can be described with sufficient precision using coupled electromagnetic and thermal models, where the properties of the materials are updated during the evolution of the thermal transient. During the heating cycle, the electromagnetic properties (electrical resistivity and magnetic permeability) and the thermal properties (conductivity and heat capacity) vary with the temperature. The coupling is developed in a weak way since the time constants that characterize the electromagnetic and thermal phenomena are very different. The temporal evolution of the process is divided into a certain number of time steps: at each step an electromagnetic solution is calculated that provides a distribution of the heat sources which in turn is used as a source of the transient thermal solution. Although different numerical methods have been used, such as finite differences, the use of finite element models represents the state of the art in these applications. FE models are generally robust and — thanks to the evolution of information technology — the solution requires less and less time. Although computation times may be acceptable, for example for design, they are still quite burdensome in the case of three-dimensional models or when the materials exhibit highly non-linear properties, such as magnetic permeability and the specific heat of ferromagnetic materials. Long calculation times are not easy in the processes of automated optimal design and are restrictive in control system applications that are based on the prediction provided by models or digital twins [7].
The neural networks, suitably trained through data sets of solutions obtainable either through numerical models or from measurements, are able to provide the solution in a very short time, making them suitable for developing automated optimization processes, notably when the optimization algorithm requires many calls to the FE solver [8].
To this end, the authors have solved the direct problem of the TEAM 36, i.e. given the frequency, the current and the time instant, find the temperature map, using a deep learning approach and they presented their work at the CEFC 2022 conference [9].
On the other hand, preliminary results of a deep learning approach for solving an inverse problem relevant to the TEAM 36 problem have been presented at the Compumag 2021 conference [10], where a pre-trained CNN (GoogLeNet), refined by means of a few FE analyses, was used. In contrast, in this paper, a comparison of the performance of the pre-trained GoogLeNet with a CNN trained from scratch is presented. Moreover, two databases of different sizes are used for training both the CNNs and the relevant results are shown.
The use of pretrained networks has shown good results in different fields of research like the recognition of faults in electrical machines [11]. Motivated by this background work, a contribution to the use of convolutional neural networks applied to coupled-field problems is now proposed.
Proposed case study: TEAM Benchmark problem n.36
The ‘Compumag’ scientific community dealing with computational electromagnetism has made available a number of benchmark examples useful for verifying the correctness of the solutions of numerical models. Only recently, a problem related to coupled electromagnetic and thermal calculations was proposed, problem TEAM (Testing Electromagnetic Analysis Methods) n.36 [12].
To evaluate the possibility of using metamodels based on neural networks for coupled problem, it was decided to use the problem proposed in the mentioned benchmark.
The device under study [13] is composed of a solenoidal inductor coil and cylindrical steel billet, coaxially located with respect to the inductor and exhibiting the same axial length, see Fig. 1.
The billet is made of C45 steel, whose material properties, specific heat capacity, thermal conductivity, electrical conductivity and mass density are functions of temperature, as shown in Table 1. The mass density of C45 steel is 7,800 kg m−3.

Geometry of the system h∕2 = 50 cm, r = 3 cm, h c = 4 cm and w c = 2 cm (a), mesh detail (b), magnetic field pattern (c), thermal map (d) at t = 7 s.
Linearized material properties of C45 steel
As far as the magnetic permeability is concerned, the following B-H relationship of the steel is assumed
The temperature dependence of the magnetic permeability is modelled as follows:
The inductor coil is composed of 20 circular turns, made of copper tube, series connected and uniformly spaced along the Y axis. The inductor is assumed to be supplied at a frequency f in the range 2–6 kHz with a sinusoidal current I with rms value in the range 2–6 kA. The general scope of the field analysis is to compute the thermal map in the billet longitudinal section at the end of the heating process (final time instant).
The electromagnetic (EM) and thermal (TH) problems can be solved using a 2D axisymmetric Finite Element (FE) model.
Specifically, the EM problem is solved under time-harmonic conditions, whereas the TH one is solved under transient conditions, with thermal sources due to the power density induced in the billet.
In turn, this asks for an adaptive refinement of the FE mesh discretizing the billet region; in fact the magnetic mesh must be structured according to the value of the penetration depth δ
In fact, because the magnetic relative permeability is strongly temperature dependent, in particular when it is close to the Curie temperature, it happens that, for some time instants, the billet material exhibits substantial property changes depending on its temperature. It behaves like a non-linear magnetic material in the region of the billet below the Curie temperature, and like a non-magnetic one in the region of the billet being above the Curie temperature leading to a quite complex distribution of Joule’s losses.
The space transition between the two regions depends on time.
In order to save on the computational time without deteriorating accuracy, it could be effective to divide the billet domain into two regions: the one below the Curie temperature and the one above it. This way, two different penetration depths can be considered, namely δ c and δ h , respectively, with δ c (related to the part below the Curie temperature) lower than δ h (related to the part above the Curie temperature). Moreover, the FE mesh could be structured accordingly: below the Curie temperature it is important that the problem, especially the magnetic one, is solved with high accuracy, hence the maximum element size of the mesh could be set e.g. to half of the penetration depth δ c for second order elements or to fourth of δ c in case of first order elements. In contrast, in the region above the Curie temperature, the maximum element size of the mesh could be set e.g. to twice the penetration depth δ h .
Because the transition zone between the hot and the cold regions of the billet is time-varying, it is necessary that the two different meshes in the billet are both time- and space-varying as well. Unfortunately, this feature is not usually implemented in the commercially licensed FE codes. Consequently, the maximum element size of the mesh should be set considering the smallest value of penetration depth δ c in the whole billet; however, this could be fatal in terms of running times.
The mesh generated for solving the EM problem was a first-order mesh, with about 50,000 triangular/quadratic elements. In Fig. 1b a detail of the mesh is shown: also mesh in the copper turn respect the same rule applied to the relevant skin depth value δCu.
The same discretization in the billet region has been used for solving the weakly coupled transient thermal problem.
The time harmonic solution resorts to the usual magnetic vector formulation coupled with scalar electric one, A-AV, using phasorial unknowns. The transient thermal solution solves directly the temperature field using an explicit method for the time evolution.
Taking cylindrical symmetry into account, the electromagnetic domain is composed of one half of the inductor, one half of the billet and the surrounding air region; in turn, the thermal domain is composed of half of the billet and the following relevant boundary conditions are applied: the symmetry applied along the X axis imposes adiabatic condition while thermal exchanges due to radiation and convection are set on the lines describing the outer surfaces of the billet.
In order to set up a metamodel driven by a CNN, a numerically-evaluated temperature map is necessary; to this end, the temperature is sampled on a grid of 32 × 12 regularly-spaced points in the billet cross-section. In particular, one sampling point about every 1.6 cm in the z-direction is considered, while a more dense sampling i.e. one sampling point every 0.25 cm is considered in the radial direction: in fact, in the radial direction, both the magnetic field and the temperature change more steeply than in the z-direction.
In order to reduce the burden of computing the temperature distribution in the billet region, while still maintaining good accuracy in the predicted values, a linearized model is used to train the neural network. It is based on a two-level approximation of (2), in which magnetic permeability and electrical resistivity values below and above the Curie temperature, respectively, are assumed to be constant; this still preserves a prediction of the final temperature distribution close to the real one. The values of the other material properties (specific heat capacity, thermal conductivity, electrical conductivity and mass density) are set in a similar way. The final temperature distribution computed through the linearized model is strongly dependent on the criterium adopted for the switching of the heat distribution from the one computed below Curie’s point to the one calculated at high temperature. Several criteria to set the switch time have been experimented with, like the update of thermal sources occuring when a point, or a grid of points has overcome Curie’s temperature. A criterion based on thermal energy balance gives the best results at the end of the heating transient. This criterion is based on the theoretical energy required to heat an external layer, the thickness of which depends on the skin depth of the billet up to 770 °C.
For the purpose of validating the linearized model by means of the energy-based criterion, a comparison with the non-linear model, based on the strongly-coupled finite-element analysis of the magneto-thermal field described by (2)–(3) and (4), had been performed. After validating the linearized model, it was used for creating a database of solutions, e.g. 1,654 finite-element solutions were calculated. All in one, the linearized model solves the forward field problem, i.e. given current, frequency and process duration, find the temperature map in the billet at the end of the process.
The trained convolutional neural network has been applied to solve the inverse problem: given a temperature map, the CNN has to identify the input vector of data, notably frequency, current intensity and process time, that leads to that temperature distribution.
The identification problem reads as follows: given a temperature map, the corresponding vector of current, frequency and time duration is identified. More generally, the temperature map can represent a prescribed thermal field that should be inverted.
For solving the problem, a CNN-based approach is used. In such a case, the network is a surrogate of the inverse problem, meaning that by applying an image of the temperature field as input to the CNN, the current, frequency and final time instant of the heating process are identified without the use of a FE model.
In order to train the CNNs for solving this problem, two subsets of the database, both based on the linearized model, are used:
Subset 1: 120 solutions; Subset 2 (full dataset): 1654 solutions.
For solving the identification problem, a CNN trained from scratch is used. The CNN is composed of 18 layers, in which 4 blocks can be highlighted. Each block is composed of a convolutional layer, a batch normalization layer [14] and a Rectified Linear Unit ReLU function, as shown in Table 2.
The ReLU function is one of the most widely used activation functions for CNNs because it has shown good performance in training this kind of neural network and avoiding overfitting [7].
The convolutional layers are characterized by filters of size 3 × 3. The number of filters vary from 8 to 32, depending on the block. Two average pooling layers with filter of size 2 × 2 are applied in order to obtain a more stable solution.
At the end of the structure a dropout layer is used and a fully connected layer followed by the regression layer allows a vector of 3 elements to be obtained as the output.
For the sake of comparison, a pre-trained net is used for solving the same identification problem. In particular, the convolutional neural network GoogLeNet, i.e. a Deep Convolutional Neural Network able to classify 1,000 image categories, is used [15].
Using a pre-trained neural network means applying a transfer learning technique: a network trained to solve a problem belonging to a given class is reused for solving a problem belonging to another class; the two problems may not be necessarily related. In particular, one or more layers of the first network can be reused in the second network. In this paper, the last three layers of GoogLeNet (Fully-connected layer, Soft-max layer and Classification output layer) are replaced by means of a Fully connected layer with three outputs and a Regression layer.
This way, the weights of neurons in re-used layers are the starting point for the new training process and they are adapted in response to the new problem; this use of transfer learning is a form of weight initialization scheme.
For refining the training of the pre-trained net, both Subset 1 (120 FE solutions) and Subset 2 (1,654 FE solutions) are used, respectively.
CNN architecture (18 layers, trained from scratch)
Both CNNs (the one trained from scratch and the pre-trained one) are able to treat images as input and they are used here as follows: given a temperature map in the billet, the corresponding vector of current, frequency and final time instant is predicted; this mapping is a model of the inverse induction heating problem.
For evaluating the quality of the CNN prediction, a possible figure of merit is the Mean Absolute Percentage Error (MAPE) calculated on N = n
v
points of the validation set, namely
The CNN trained with Subset 1 uses 100 solutions for the training set and 20 for the validation set. The training was performed with the ADAM solver and lasted 200 epochs. When GoogLeNet is trained with Subset 1, the same settings are used (100 solutions for the training set, 20 for the validation set, ADAM solver, 200 epochs).
The normalized solutions obtained with the CNN so trained are shown in Fig. 2a.

True versus predicted normalized values of current (circle), frequency (cross) and final time instant (red star) using a CNN with 120 finite-element samples (a) and using GoogLeNet with 120 finite-element samples (b).
The plane “true vs predicted values” shows the relevant level of agreement: the closer the points to the diagonal, the more accurate the solution. The points represented in Fig. 2 are spread in the plane, thus meaning that the CNN is able to identify the three quantities with a low level of accuracy. This is probably due to the low number of samples in the training set.
In turn, the normalized solutions obtained with GoogLeNet trained with Subset 1 are shown in Fig. 2b.
The final time instant is identified reasonably well, while a higher error is introduced when identifying the current and the frequency values (crosses and circles more spread than stars in Fig. 2a).
When training CNN and GoogLeNet with Subset 2, 1,654 samples are used, 1,500 of which for the training set, 154 for the validation set. The ADAM solver is used and 1,000 epochs are considered for the CNN, while, in order to prevent overfitting, only 200 epochs for GoogLeNet were used.
In Figs 3(a) and (b), the results obtained with the Subset 2 are shown: in particular, in Fig. 3(a) the CNN results, while in Fig. 3(b) the GoogLeNet results are shown, respectively.

True versus predicted normalized values of current (circle), frequency (cross) and final time instant (red star) with 1,654 finite-element samples, (a) using a CNN, (b) using GoogLeNet.
Both the approaches i.e. CNN and GoogLeNet, show good accuracy in identifying the three quantities.
In Table 3 a quantification of the MAPE error, computed according to (5), is given for the three identified quantities.
MAPE error of GoogLeNet and CNN predictions
The CNN trained from scratch is not able to identify with good accuracy the three quantities when only a few samples are used for training e.g. 120 samples of Subset 1. On the other hand, GoogLeNet, being a pre-trained net, is able to identify the three quantities in a satisfactory way, even with Subset 1. When increasing the number of samples, e.g. using Subset 2, which has more than 1,600 samples, both CNN and GoogLeNet are able to identify the three quantities with good accuracy. In particular, one has the highest prediction accuracy for current intensity and frequency.
In this paper, a surrogate model of the inverse induction heating problem has been developed.
When searching for a surrogate model for the identification, two different convolutional networks were considered: a pretrained CNN GoogLeNet and a CNN trained from scratch. The following remarks can be put forward:
both the CNNs were able to solve the inverse problem with good accuracy when the number of samples used for training them was high; for solving the identification problem, a database size of about 1,600 samples is sufficient for training purposes; GoogLeNet showed good performance even when the number of samples was low (120 samples only).
In the authors’ opinion this result is interesting and original, because GoogLeNet is pretrained on image classification, while — in this paper — this net is used for solving a regression problem.
In fact, it might be considered paradoxical that a network pretrained on a dataset coming from a completely different physical domain could be adapted to solve the identification problem investigated here, in spite of very few modifications to the final layers of the pretrained network. However, the capability of learning is exploited by refining the a priori knowledge; this is a very nice feature of pretrained neural networks.
