Design study of FIT dataflow machine for high performance 3-D electrostatic field simulation

Abstract

To investigate portable, low power consumption, low cost and green high-performance computing (HPC) technologies which are suitable for industry applications, authors have been working on the development of dedicated computer based on dataflow architecture for electromagnetic field simulations. In this paper, we propose a Finite Integration Technique (FIT) dataflow machine based on BiCG-Stab scheme for 3-D electrostatic field simulations.

Keywords

HPC technologies FIT dataflow machine BiCG-Stab electrostatic field simulation

1. Introduction

The high-performance computing (HPC) technologies have been developing until now, and the peak performance of the latest supercomputer has reached to 1680 pflop/s. However, such the supercomputers are extremely huge system, which consist of over 8 million cores and need 20 MW power consumption. This means that such the HPC technologies are not suitable for product design in industry. We have been working on the development of a dedicated computer to aim to achieve a portable, low cost, low power consumption and green HPC to be used for industry applications [1–4]. In previous works, we proposed a dedicated computer for 2-D magneto-static field simulations [5–7], in which we designed hardware circuit of the BiCG-Stab matrix solver based on dataflow architecture to execute the finite integration technique (FIT) scheme in 2-D grids space. In addition, we proposed a dedicated computer for microwave simulations [8–16], in which we discussed a hardware circuits of finite-difference time-domain (FDTD) scheme based on sliced 3-D dataflow architecture for 3-D grids space. In this paper, we consider to design a 3-D FIT dataflow machine for electrostatic field simulations combing techniques of the BiCG-Stab scheme circuits and sliced 3-D grid dataflow architecture in previous works.

2. FIT scheme for 3-D electrostatic field simulation

In order to achieve high performance dedicated computer for the 3-D electrostatic fields, the use of FIT scheme is crucial to achieve highly parallel computation based on dataflow architecture. We here summarize an overview of the FIT scheme for 3-D electrostatic field simulation.

In the FIT scheme for 3-D electrostatic field simulation, the following integral form of Gauss’s law is discretized in 3-D grids space, $\begin{eqnarray}\displaystyle \oint _{S}({\varepsilon}{\nabla}{\phi})\cdot d\mathbf{S}=-\int _{V}{\rho}dv & & \displaystyle\end{eqnarray}$ (1) where 𝜌 is the charge density, 𝜙 is the scalar potential and ϵ is permittivity. As shown in Fig. 1, If we allocate the component of the scalar potential 𝜙 at i-th, j-th, k-th grid in all 3-D grids space, ((1)) can be expressed in discretized form as follows: $\begin{eqnarray}\displaystyle c_{0}{\phi}_{i,j,k}-c_{1}{\phi}_{i+1,j,k}-c_{2}{\phi}_{i,j+1,k}-c_{3}{\phi}_{i,j,k+1}-c_{4}{\phi}_{i-1,j,k}-c_{5}{\phi}_{i,j-1,k}-c_{6}{\phi}_{i,j,k-1}=-{\rho}_{i,j,k}{\Delta}l^{2}, & & \displaystyle\end{eqnarray}$ (2) where, $\begin{eqnarray}\displaystyle \begin{array}{@{}l@{}}\displaystyle c_{1}=\frac{{\varepsilon}_{i,j,k}+{\varepsilon}_{i,j,k-1}+{\varepsilon}_{i,j-1,k-1}+{\varepsilon}_{i,j-1,k}}{4},\\[10.0pt] \displaystyle c_{2}=\frac{{\varepsilon}_{i,j,k}+{\varepsilon}_{i,j,k-1}+{\varepsilon}_{i-1,j,k-1}+{\varepsilon}_{i-1,j,k}}{4},\\[10.0pt] \displaystyle c_{3}=\frac{{\varepsilon}_{i,j,k}+{\varepsilon}_{i,j-1,k}+{\varepsilon}_{i-1,j-1,k}+{\varepsilon}_{i-1,j,k}}{4},\\[10.0pt] \displaystyle c_{4}=\frac{{\varepsilon}_{i-1,j,k}+{\varepsilon}_{i-1,j,k-1}+{\varepsilon}_{i-1,j-1,k-1}+{\varepsilon}_{i-1,j-1,k}}{4},\\[10.0pt] \displaystyle c_{5}=\frac{{\varepsilon}_{i,j-1,k}+{\varepsilon}_{i,j-1,k-1}+{\varepsilon}_{i-1,j-1,k-1}+{\varepsilon}_{i-1,j-1,k}}{4},\\[10.0pt] \displaystyle c_{6}=\frac{{\varepsilon}_{i,j,k-1}+{\varepsilon}_{i,j-1,k-1}+{\varepsilon}_{i-1,j-1,k-1}+{\varepsilon}_{i-1,j,k-1}}{4},\\[10.0pt] c_{0}=c_{1}+c_{2}+c_{3}+c_{4}+c_{5}+c_{6}\end{array} & & \displaystyle\end{eqnarray}$ (3)

To obtain the distribution of the electrostatic potential 𝜙, which satisfy the FIT discretization (2) for all 3-D grids space simultaneously, we need to construct the FIT matrix equation as shown in Fig. 2 and solve the FIT matrix equation. Then, the boundary condition is taken to be 0 at the outer boundary which is sufficiently far away from the field source.

Fig. 1.

FIT discretization in 3-D grid space.

Fig. 2.

FIT matrix equation of (2).

3. Hardware circuit of BiCG-Stab matrix solver based on dataflow architecture

For the FIT matrix equation (Fig. 2), we here consider to use the BiCG-Stab scheme which is a relatively stable matrix solver. The detail procedure of the BiCG-Stab scheme for a matrix equation Ax = b is as follows:

1. Set an initial value x₀

2. Compute initial residual r₀ = b − Ax₀

3. Set the shadow residual vector r₀^∗ s.t. (r₀, r₀^∗) ≠ 0, e.g., r₀^∗ = r₀

4. Set 𝛽₋₁ = 0, then the search direction vector p₀ = r₀

5. For n = 0, 1, 2, …, until $\|\mathbf r_{n+1}\|\leq\varepsilon\|\mathbf b\|$ (ϵ = 10⁻⁹)

begin $\begin{eqnarray}\displaystyle \begin{array}{@{}ll@{}}\displaystyle \quad {\alpha}_{n}=\frac{(\mathbf{r}_{0}^{\ast },\mathbf{r}_{n})}{(\mathbf{r}_{0}^{\ast },A\mathbf{p}_{n})} & \qquad \qquad \text{(i)}\\[10.0pt] \displaystyle \quad \mathbf{t}_{n}=\mathbf{r}_{n}-{\alpha}_{n}A\mathbf{p}_{n} & \qquad \qquad \text{(ii)}\\[5.0pt] \displaystyle \quad {\varsigma}_{n}=\frac{(A\mathbf{t}_{n},\mathbf{t}_{n})}{(A\mathbf{t}_{n},A\mathbf{t}_{n})} & \qquad \qquad \text{(iii)}\\[10.0pt] \displaystyle \quad \mathbf{x}_{n+1}=\mathbf{x}_{n}+{\alpha}_{n}\mathbf{p}_{n}+{\varsigma}_{n}\mathbf{t}_{n} & \qquad \qquad \text{(iv)}\\ \displaystyle \quad \mathbf{r}_{n+1}=\mathbf{t}_{n}-{\varsigma}_{n}A\mathbf{t}_{n} & \qquad \qquad \text{(v)}\\[5.0pt] \displaystyle \quad {\beta}_{n}=\frac{{\alpha}_{n}}{{\varsigma}_{n}}\frac{(\mathbf{r}_{0}^{\ast },\mathbf{r}_{n+1})}{(\mathbf{r}_{0}^{\ast },\mathbf{r}_{n})} & \qquad \qquad \text{(vi)}\\[10.0pt] \displaystyle \quad \mathbf{p}_{n+1}=\mathbf{r}_{n+1}+{\beta}_{n}(\mathbf{p}_{n}-{\varsigma}_{n}A\mathbf{p}_{n}) & \qquad \qquad \text{(vii)}\end{array} & & \displaystyle\end{eqnarray}$ (4) end

In the BiCG-Stab scheme, before the iteration process (i)–(vii) starts, the appropriate initial values x₀ and initial residual r₀ are set. Then, the iteration process (i)–(vii) of the BiCG-Stab are repeated until the residual r_n satisfies convergence condition $\|\mathbf r_{n+1}\|\leq\varepsilon\|\mathbf b\|$ (ϵ = 10⁻⁶ ∼ 10⁻⁹). The unit grid circuit of FIT dataflow machine for 3-D electrostatic fields simulation based on BiCG-Stab scheme is designed as shown in Fig. 3. All of unkown values of (p_n, Ap_n, t_n, At_n, x_n, r_n) in the iteration process of the BiCG-Stab and the coefficient values of $(c_{i,j,k}^{(0)},c_{i,j,k}^{(1)},c_{i,j,k}^{(2)},c_{i,j,k}^{(3)},c_{i,j,k}^{(4)},c_{i,j,k}^{(5)},c_{i,j,k}^{(6)})$ of ((2)) are stored in the registers of each grids, and these registers are connected each other by arithmetic circuits to execute the iteration process (i)–(vii) in ((4)) of the BiCG-Stab scheme. For example, the circuit connection of p_n of the BiCG-Stab iteration process of (vii) in ((4)) is highlighted as shown in Fig. 3, and the logic circuit of other steps of BiCG-Stab iteration process can be implemented as same as the circuit of p_n in Fig. 3. In addition, Fig. 3 includes circuits for one row of matrix-vector multiplications Ap_n and At_n in the BiCG-Stab iteration process (i) and (iii). If the unit grid circuits of Fig. 3 are connected all over 3-D grids space, inner product calculation of (i) and (iii) in ((4)) can be executed in single clock cycle, which is the extremely high-performance computation. However, such 3-D grid circuits will result in a very large size hardware, and it is impossible to be implemented in a single LSI (Large-Scale Integrated Circuits).

Fig. 3.

Unit grid circuit of BiCG-Stab matrix solver.

4. Sliced 3-D dataflow architecture for BiCG-Stab scheme

In this work, we consider to use sliced 3-D dataflow architecture, which was used in 3-D FDTD dataflow machine, for implementation of the 3-D dataflow architecture machine in practical hardware size. Figure 4(a) depicts three grid circuits which consist of arithmetic grid circuits of Fig. 3 at the middle and additional two register grid circuits. The upper and lower grid circuits (register grid circuit) contain only registers for storing unknown values (p_n, Ap_n, t_n, At_n, x_n, r_n) of the BiCG-Stab iteration process and coefficient values $(c_{i,j,k}^{(0)},c_{i,j,k}^{(1)},c_{i,j,k}^{(2)},c_{i,j,k}^{(3)},c_{i,j,k}^{(4)},c_{i,j,k}^{(5)},c_{i,j,k}^{(6)})$ of (2). We here called these circuit “arithmetic 3 grid circuit”. In Fig. 4(b), additional register grid circuits (which are same as the upper or lower grid circuit in Fig. 4(a)) are connected vertically upon the arithmetic 3 grid circuit. To connect the vertical grid circuit of Fig. 4(b) horizontally, 3-D grid circuit is constructed as in Fig. 4(c). Then, in the 3-D grid circuit of Fig. 4(c), after execution of calculation of (4) for the bottom layer register values, it is necessary to exchange register values by the upper layer register values to do shift-down operation. That is, to repeated the execution of (4) at the bottom layer and shift-down operations, the calculation of (4) can be executed for all 3-D grids. It is known that hot-spot parts of the BiCG-Stab scheme are calculations of the matrix-vector multiplications of Ap_n and At_n in (i) and (iii) of (4). In the 3-D grid circuits for the FIT scheme, a part of calculations of the Ap_n for one layer can be executed in a single clock cycle as shown Fig. 5. If we denote the number of layers of 3-D grid circuits of Fig. 4(c) as N_z, the calculation of Ap_n can be done only by 2N_z clock cycles including vertical shift operations.

Fig. 4.

Sliced 3-D dataflow architecture.

Fig. 5.

The matrix equation corresponds to 3-D grids space.

Fig. 6.

The whole configuration of FIT dataflow machine.

Fig. 7.

VHDL simulation for MASTER SCHEDULER.

Fig. 8.

VHDL simulation result for one columon of 3-D grids space.

5. Configuration of FIT dataflow machine for 3-D electrostatic field simulation

In this section, we construct entire structure of the FIT dataflow machine for 3-D electrostatic field simulation to use the 3-D grid circuit of Fig. 4(c), and described the machine operation in particular for execution of the BiCG-Stab matrix calculation. The whole configuration of 3-D FIT dataflow machine consists of 3 parts as shown in Fig. 6, MASTER SCHEDULER, INNER PRODUCT MODULE, and FIT GRID MODULE. The FIT GRID MODULE executes the BiCG-Stab iteration process (i)–(vii) as mentioned in the previous section. The INNER PRODUCT MODULE collects the multiplications of r_{0i, j, k}r_{i, j, k}, r_{0i, j, k}Ap_{i, j, k}, t_{i, j, k}At_{i, j, k}, At_{i, j, k}At_{i, j, k} from the FIT GRID MODULE, and inner product calculations of 𝛼_n, 𝜍_n, 𝛽_n are executed to be sent back to the FIT GRID MODULE. These two modules are controlled by the data strobe (DS) signals from the MASTER SCHEDULER. The circuits of the dedicated computer is designed by a hardware description language VHDL. A VHDL logic circuit simulation of MASTER SCHEDULER for single iteration process of the BiCG-Stab scheme of (4) is depicted in Fig. 7. For each calculations of (ii), (iv), (v), (vii), arithmetic calculation at the bottom layer and vertical shift operation are carried out all over the 3-D grids. Accordingly, the single BiCG-Stab iteration process for all 3-D grids space takes (2N_z × 5 + 3N_z + 48) clock cycles.

6. Numerical example

In Fig. 8, an example of the VHDL logic circuit simulation for the p_n of (vii) in (4) in the vertical grid circuit (Fig. 4(b)) is indicated. When the DS signals for p_n is high-level, the value of p_n in the computation layer is updated. On the other hand, when the DS signals for p_n is low-level, the unkown values (p_n, Ap_n, t_n, At_n, x_n, r_n) of the BiCG-Stab iteration process and coefficients value $(c_{i,j,k}^{(0)},c_{i,j,k}^{(1)},c_{i,j,k}^{(2)},c_{i,j,k}^{(3)},c_{i,j,k}^{(4)},c_{i,j,k}^{(5)},c_{i,j,k}^{(6)})$ of each layer are shifted-down to the lower layer. We confirm that the calculation of p_n is performed normally according to the scheme of (vii) in (4), which means that the VHDL design of the part of the vertical grid circuit in the sliced 3-D dataflow machine is carried out correctly.

7. Conclusion

In this paper, we have presented the design of the FIT dataflow machine for 3-D electrostatic fields simulations. The detailed logic circuits of sliced 3-D dataflow architecture for the BiCG-Stab scheme is proposed, and a whole configuration for 3-D electrostatic fields FIT dataflow machine is discussed. The logic circuit of the vertical grid circuit of 3-D FIT dataflow architecture machine was designed by the VHDL and it was confirmed by the VHDL circuit simulation that the logic circuits for the BiCG-Stab scheme are implemented correctly. We will proceed to the VHDL logic circuit simulation for the whole 3-D FIT dataflow machine of electrostatic field simulation in near future.

References

Placidi

Verducci

Matrella

Roselli

and Ciampolini

, A custom VLSI architecture for the solution of FDTD equations, IEICE Trans. Electron.E85-C(3) (2002), 572–577.

Sano

Hatsuda

Wang

and Yamamoto

, Performance evaluation of Finite-Difference Time-Domain (FDTD) computation accelerated by FPGA-based custom computing machine, Interdisciplinary Information Sciences15(1) (2009), 67–78.

Fujita

and Kawaguchi

, Development of improved memory architecture FDTD/FIT dedicated computer based on SDRAM for large scale microwave simulation, International Journal of Applied Electromagnetics and Mechanics32(3) (2010), 145–157.

Kawaguchi

Takahara

and Yamauchi

, Design study of ultra-high speed microwave simulator engine, IEEE Transactions on Magnetics38(2) (2002), 689–692.

Wang

C.X.

Ota

and Kawaguchi

, Conceptual design of dataflow machine for magnetostatic field simulation, in: Proceedings of the 2021 International Conference on Electromagnetics in Advanced Applications (ICEAA), Hawaii, USA, 2021, p. 223, ID:694.

Wang

C.X.

and Kawaguchi

, Design study of BiCG-Stab matrix solver circuit for FIT scheme based on dataflow architecture, in: Proceedings of the International Conference on Simulation Technology (JSST 2021), Kyoto, Japan, 2021, pp. 402–403.

Wang

Kawaguchi

and Watanabe

, Study of FIT dedicated computer with dataflow architecture for high performance 2-D magneto-static field simulation, IEICE Trans. Electron.E106-C(4) (2023), to be published.

Kawaguchi

Fujita

Fujishima

and Matsuoka

, Improved, architecture of FDTD/FIT dedicated computer for higher performance, computation, IEEE Transactions on Magnetics44(6) (2008), 1226–1229.

Fujita

and Kawaguchi

, Full custom PCB implementation of FDTD/FIT dedicated computer, IEEE Transactions on Magnetics45(3) (2009), 1100–1103.

10.

Fujita

and Kawaguchi

11.

Fujita

and Kawaguchi

, Development of portable high performance computing system by parallel FDTD dedicated computers, in: 18th International Conference on the Computation of Electromagnetic Fields, 2011.

12.

Matsuoka

Ohmi

and Kawaguchi

, Study of a microwave simulation dedicated computer, FDTD/FIT data flow machine, IEICE Trans. Electron.,E86-C(11) (2003), 2199–2206.

13.

Kawaguchi

and Matsuoka

, Conceptual design of 3D FDTD dedicated computer with dataflow architecture for high performance microwave simulation, IEEE Tran. Magn.51(3) (2015), 7202404.

14.

Kawaguchi

, Improved architecture of FDTD dataflow machine for higher performance electromagnetic wave simulation, IEEE Tran. Magn.52(3) (2016), 7206604.

15.

Kawaguchi

and Matsuoka

, Implementation of microwave simulation at dispersive material in dataflow architecture FDTD dedicated computer, IEEE Tran. Magn.54(3) (2018), 7202205.

16.

Kawaguchi

, Design study of domain decomposition operation in dataflow architecture FDTD/FIT dedicated computer, IEICE Trans. Electron.E101-C(1) (2018), 20–25.