Mental health diagnosis of college students based on facial recognition and neural network

Abstract

In recent years, college campus incidents caused by mental health problems have been increasing year by year, and college students’ mental health problems have become the focus of attention of schools, society and parents. Based on this, this paper proposes a facial emotion recognition method for college students. By using moving target detection, target classification, target tracking, and a series of image preprocessing techniques, this method achieves intelligent monitoring of the area where college students are located and can automatically alert when a potentially dangerous target is found. Moreover, this method uses a combination of shape features and motion features to select and extract feature quantities. In addition, the method calculates the similarity between the target and candidate target corresponding sub-models, and according to the ability of each feature to distinguish between the target and the background, monitors the student’s mental health in real time and prevents various problems from occurring. Through experimental research, we can see that the model constructed in this paper has good performance.

Keywords

Facial recognition neural network college students mental health

1 Introduction

The psychological quality of contemporary college students not only affects their own development, but also relates to the improvement of the quality of the entire nation and the cultivation of talents. For contemporary college students, having a positive and optimistic attitude towards life, good interpersonal communication skills, and healthy psychological qualities are important factors to ensure their growth and development, and are the basic requirements of society for high-quality talents. Therefore, it is more and more important to set up psychological health education courses for college students. The psychological health education courses for college students generally teach mental health knowledge to students through scientific, systematic and focused teaching methods. Moreover, it promptly educates and guides the confusion and psychological problems of college students, which allows students to learn to properly adjust their emotions and promote the healthy development of body and mind while mastering relevant knowledge theories [1].

As a new research trend in psychology, positive psychology can provide a new way of thinking for the study of the effectiveness of psychological health education courses for college students. By studying the effectiveness of college students’ mental health courses, we can find problems in the course teaching activities in a timely manner and find solutions to the problems, thereby improving the effectiveness of the course teaching effects. This requires that we must correctly understand and understand the basic links and characteristics of mental health education and teaching activities. Fundamentally speaking, it requires teachers to give full play to their leading role, and students actively learn to digest knowledge and internalize ability, so as to truly achieve the unity of education and self-education [2].

College students are lucky and outstanding among their peers. However, during the period of economic transformation and educational transition in my country, the great changes in the social environment have caused them to encounter various contradictions, difficulties, frustrations and worries in their studies and life. Since college students are not yet fully psychologically mature and have insufficient ability to recognize problems, self-regulation and self-control, when dealing with contradictions and conflicts in their studies and life, they often suffer from setbacks and obstacles, which cause anxiety and trouble, cause psychological depression and psychological tension, and cause various psychological problems. These psychological problems are mainly manifested as environmental stress problems caused by changes in the environment and conditions, self-awareness-related inadaptation, problems in interpersonal relationships and personality, and sexual-related inadaptation caused by psychological problems and psychological barriers [3].

With the development and progress of modern education, colleges and universities are paying more and more attention to the psychological health problems of college students. For college students who have just entered an unfamiliar environment like university, their psychological, physical, and ideological awareness is gradually maturing, and compared with other ages, they are more prone to problems in psychological and physical aspects. College students have more problems with mental health. The reason is that after entering the university, great changes have taken place in the living environment, interpersonal relationships, personal roles, learning methods, etc., which requires students to be able to make corresponding changes to the above changes in a short period of time. If the adjustment is inappropriate or incorrect, students are prone to problems in certain areas. If these problems cannot be solved in time, it will cause students to have other psychological problems and affect students’ study and life during and after college.

2 Related work

The simple local binary mode LBP proposed in the literature [4] has been widely applied to the description and expression of 2D facial features and has rapidly expanded to 3D mode. However, due to the limitations of the 2D image itself, it cannot directly act on the depth map, or just adopt the method of mapping the depth map to the grayscale image. The depth map needs to be normalized to adapt to the influence of posture changes, and it will also be affected by self-occlusion (such as the occlusion caused by the nose). At present, many 3D face recognition methods have been continuously proposed. The following will focus on some related solutions at home and abroad. Literature [5] has shown their effectiveness on 2D static images in the application of LBP-based solutions in image processing. Inspired by these achievements, the idea of extending LBP to the three-dimensional geometry of the human face has been improved and developed in some studies. Most face recognition methods based on LBP are operated on depth images. This format allows the direct application of LBP in 2D, which has been verified in the pioneering research results of the literature [6]. The literature [7] proposed multi-scale extended LBP (eLBP), which is composed of LBP coding of multiple layers, and is obtained by calculating the difference of the pixel value between the central pixel and the surrounding pixels. The literature [8] introduced the Local Regular Binary Mode (LNBP), which uses the angle of the normal between two points to obtain the LBP binary code instead of using the depth value. This novel idea has also been adopted by many researchers in subsequent research. The literature [9] extracted the normal information of the surface from the three-dimensional data, and then used the normal component values along the three coordinate axes as the depth values and calculated the LBP by using the normal component values on these depth maps. In a further extension, the literature [10] constructed an image of azimuthal equidistant projection. The azimuthal equidistant projection can project the normal to the point of Euclidean space according to the direction. Although the projected information is not depth information, 2D LBP features can still be calculated on the projected image based on the normal of the 3D surface. The three-dimensional LBP method proposed in the literature [11] calculates the interpolation between the normal of a central vertex and the angle between 8 adjacent vertices. This descriptor can be used to represent region-based facial information, similar to the representation process in 2D face recognition. This research includes the idea of using computational normal on the grid, but the grid requires a rigorous preprocessing operation to compress it to 8 vertices close to the center. In addition, according to the different sorting methods of these vertices, different LBP modes can be calculated, and this method also does not support the calculation of multi-resolution LBP [12].

With the advancement of 3D imaging technology, the new generation of acquisition equipment has been able to capture information such as the geometry and texture of 3D objects. The format of the geometric information captured by this 3D acquisition device is usually a point cloud format, which represents the 3D coordinates of a series of 3D object surface points. However, dealing directly with such a point cloud is inconvenient or even impossible. Therefore, the representation of 3D objects in other formats was later developed [13]. Depth images are also a more commonly used imaging method, which can be used to solve many computer vision depth dimension and pattern recognition solutions and can be directly extended to photometric information in 2D images [14]. Although it is also a research hotspot to extend 3D information directly to the 2D method, it is actually 2.5D information, and all geometric information will be lost in this way. However, the triangular grid manifold method allows complete 3D shape information to be retained, and has the characteristics of simple, compact and flexible encoding format. Therefore, it can be widely used in many fields such as animation, medical imaging, computer-aided design, terrain modeling and so on. In addition, in the application of shape scanning and modeling research, 3D mesh manifold supports the method of fusing photometric and geometric information and has rich mesh feature information [15].

In order to evaluate the recognition rate and effectiveness of face recognition technology, comparison experiments and evaluations are usually performed on a unified public face data set [16]. In the past ten years, with the continuous improvement and update of software and hardware, more and more public data sets are available for research scholars, which brings them many conveniences. At present, instead of being restricted by the difficulty of acquiring 3D face images, scientists can directly use the standard data sets that have been created to conduct research and experiments [17]. The article [23] dealt with IoT and human behaviour data with the collection and analysis of data from distinctive resources. The article [24] implements cooperative cognitive intelligence in the field of vehicular communication. The article [25] proposes the concept of SmartBuddy for implementing intelligent and smart city-based environments. The article [26] uses partitioning algorithm for speeding up the process of video processing. The article [27] does IoT and BigData Analytics in the real time environments using Hadoop ecosystem [28, 29].

3 Extraction and normalization

When performing feature extraction, the contour of the target area needs to be obtained. We can use the Canny edge detection algorithm to perform edge detection on the target area and perform morphological processing. The schematic diagram of the detection effect is shown in 1.

The extraction of shape features is generally based on the static contour similarity of the target. This article first circumscribes the smallest rectangle to the target area. In terms of shape feature selection, this paper uses the following parameters as shape feature metrics [18].

(1) Area

Area

The area is the sum of the total number of pixels in the target contour, which can reflect the size of the target volume to a certain extent, and it can be used as an indicator to distinguish between large and small targets (such as large animals and small animals).

(2) Rectangularity

The ratio of the area A_rect of the target area to the area A of the smallest circumscribed rectangle is the squareness: $R = \frac{A}{A_{rect}}$ (1)

Rectangularity reflects how full the area is to the smallest circumscribed rectangle. When the area is round, R = 1; When the area is rectangular, R = π/4;when the area is an area with curved boundaries and irregular distribution, 0 < R < 1 [19].

(3) Dispersion

Dispersion is also called density and complexity, which is defined as the ratio of the square of the perimeter P of the target area to the area A. $D = \frac{P^{2}}{A}$ (2)

The dispersion describes the perimeter of the unit area of the area. The greater the degree of dispersion, the greater the perimeter of the unit area, that is, the area is discrete and a complex shape. Conversely, when the dispersion is small, the regions are dispersed into a relatively simple shape. When the image area is a circle, D has a minimum value of 4π. When the image area is an image area of another shape, D > 4π. Moreover, the more complex the shape, the larger the D value. Dispersion is one of the important criteria to distinguish rigid body from flexible body [20].

(4) Roundness

Circularity C is the feature quantity defined by all points on the target contour, that is: $C = \frac{δ_{R}}{μ_{R}}$ (3)

In the formula, μ_R is the average distance from the contour centroid to the boundary point, and δ_R is the mean square deviation of the distance from the contour centroid to the boundary point. Among them, the values of μ_R and δ_R are respectively: $μ_{R} = \frac{1}{N} \sum_{n = 1}^{N} | (x_{n}, y_{n}) - (\bar{x}, \bar{y}) |$ (4) $δ_{R} = \frac{1}{N} \sum_{n = 1}^{N} {[| (x_{n}, y_{n}) - (\bar{x}, \bar{y}) | - μ_{R}]}^{2}$ (5)

When the target contour tends to be round, the feature quantity C is monotonously decreasing and tends to zero, it is not affected by the translation, rotation and scale changes of the region, and can be applied to the complex shape changes of the flexible body. Target contour map as show in Fig. 1.

Fig. 1

Target contour map.

The above four parameters are denoted by φ₁, φ₂, φ₃, φ₄ respectively. Shape features are mainly used for classification between rigid and flexible bodies, and also provide certain reference coefficients for classification between flexible bodies (such as large animals and small animals).

The extraction of motion features is mainly based on changes in the shape or shape of the target, such as the inclination angle of the human or animal body relative to the horizontal plane and the change in the distance between the limbs. Similarly, this paper first circumscribes the smallest ellipse to the target area, as shown in Fig. 2. After that, this paper analyzes the geometric characteristics of the minimum circumscribed ellipse, and uses the following parameters as the target motion feature metric [21]:

Fig. 2

Schematic diagram of circumscribed ellipse analysis.

(1) Target movement gait and amplitude

Because the movement gait between humans and animals is obviously different, it can be classified by the gait and span of the moving target. Specifically, this article uses the following two parameters ∠EOF and EF to perform classification. Where, O is the center of the circumscribed ellipse (that is, the centroid of the target area), the image coordinates are (x₀, y₀), and E and F are the two lowest points in the target minimum circumscribed ellipse where the target area and the ellipse intersect. In addition, the image coordinates are (x₁, y₁) and (x₂, y₂), respectively. Then, ∠EOF is: $∠ EOF = arccos (\frac{\vec{OE} \cdot \vec{OF}}{| \vec{OE} | \cdot | \vec{OF} |})$ (6)

In the formula, $\begin{matrix} \vec{OE} = (x_{0} - x_{1}, y_{0} - y_{1}) \\ \vec{OF} = (x_{0} - x_{2}, y_{0} - y_{2}) \end{matrix}$ (7)

EF is $EF = \sqrt{(x_{1} - x_{2}) +^{2} {(y_{1} - y_{2})}^{2}}$ (8)

∠EOF represents the average swing angle of the lower limbs during target movement, and in general, the value of this parameter for humans is smaller than that for animals. EF represents the gait amplitude value when the target is in motion. When this value is greater than a certain threshold, the possibility of targeting small animals can be ruled out.

(2) Target minimum circumscribed ellipse axis length and length-to-short axis ratio

Because the size of humans or large animals is different from that of small animals, they can be classified by the axial length of the smallest circumscribed ellipse and the ratio of the length to the length. The three parameters used are the long axis AB, the short axis CD and AB/CD of the ellipse. When AB is small, we think the target is a small animal. When AB and CD are large, we think the target is a large animal. Meanwhile, when AB/CD is large, we think the target is human.

The above five parameters are denoted by φ₁, φ₂, φ₃, φ₄, φ₅ respectively. as shown in Fig. 2

In φ₁, φ₂, φ₃, φ₄, φ₁, φ₂, φ₃, φ₄, φ₅, φ₂, φ₃, φ₄, φ₁ is the target feature quantity with translation, rotation and scale invariance. Although φ₁, φ₂, φ₃, φ₄, φ₅ does not satisfy the above feature invariance, this article limits the collection of these kinds of feature values to a specific image area and the feature quantities referred to in this article are the average of the feature quantities. Therefore, φ₁, φ₂, φ₃, φ₄, φ₅ can be viewed as being approximately feature-invariant.

Since the size difference of the target’s various feature data is obvious, if it is directly used in subsequent calculations, there will be a problem of uneven feature weights, that is, feature data with a larger value will play a leading role. In order to avoid this situation, it is necessary to do certain processing on the extracted target features to achieve effective classification in the recognition system. Therefore, this paper first normalizes the features, and uses the following formula to normalize each feature value into the [0, 1] range, that is [22]: $E = \frac{E_{0} - E_{min}}{E_{max} - E_{min}}$ (9)

In the formula, E_min is the minimum value in the feature quantity, E_max is the maximum value in the feature quantity, and E₀ is the original feature value. Meanwhile, E is the normalized feature value.

4 Adaboost classifier overview

The basic flow of Adaboost algorithm is shown in Fig. 3.

Fig. 3

Adaboost algorithm flowchart.

We assume that x represents the sample feature space and Y represents the sample category identification set. Since the target classification mentioned in this article is a binary classification problem, Y = {- 1, 1} is defined, which corresponds to the “positive” and “negative” of the sample, respectively.

We set S = {(x₁, y₁) , (x₂, y₂) , (x₃, y₃) , ⋯ , (x_N, y_N)} as the sample training set. Among them, x_i ∈ X, y₂ ∈ Y, i = 1, 2, ⋯ , N, and N are the number of samples. The specific steps and procedures are as follows:

(1) Sample weight initialization

For each (x_i, y_i) ∈ S, we set $D_{1} (x_{i}, y_{i}) = \frac{1}{N}$ (10)

(2) We set parameter t = 1

a. The weak classifier is selected, that is $h_{t} (x_{i}) = {\begin{matrix} 1 & λ_{i} x_{i} < λ_{i} θ_{i} \\ 0 & λ_{i} x_{i} ⩾ λ_{i} θ_{i} \end{matrix}$ (11)

Among them, the threshold θ_i is the median of such feature values, and λ_i∈ { 1, - 1 } represents the offset direction of the unequal sign. After that, this paper trains and learns the sample weight D_t to obtain the weak classifier h_t : X → Y.

b. The misjudgment rate is calculated, that is $ɛ_{t} = \sum_{i : y_{i} \neq h_{t} (x_{i})} D_{t} (x_{i}, y_{i})$ (12)

If ɛ_t < 0.5, the algorithm chooses $α_{t} = \frac{1}{2} In [(1 - ɛ_{t}) / ɛ_{t}]$ (13)

If ɛ_t ⩾ 0.5, the algorithm deletes the weak classifier generated in the current round, t = t + 1, and returns a.

c. The sample weight is updated, that is $D_{t + 1} (x_{i}, y_{i}) = \frac{D_{t} (x_{i}, y_{i}) e^{- α_{i} y_{i} h_{t} (x_{i})}}{Z_{t}}$ (14)

Among them, Z_t is a normalization factor, which can make the following formula hold: $\sum_{i = 1} D_{t} (x_{i}, y_{i}) = 1$ (15)

d. When t = t + 1, we set T as the maximum number of iterations of the weak classifier, which represents the upper limit of the number of training rounds. If t = T, the training ends, and if t < T, the training returns a.

(3) The strong classifier is defined, that is $H (x) = sign (\sum_{t = 1}^{T} α_{i} h_{t} (x))$ (16)

Among them, α_i is the performance evaluation factor of the weak classifier h_t (x) generated after t training, which is determined by the sum of the sample weights of the classification errors generated by h_t (x) acting on the sample set ɛ_t. α_t is the subtraction function of ɛ_t. The smaller ɛ_t is, the larger α_t is, and the more important h_t (x) is. The strong classifier H (x) is obtained by weighted summation of all weak classifiers h₁ (x) , h₂ (x) , ⋯ , h_T (x).

5 Multi-feature similarity measurement and target location

After establishing the target and candidate target models, it is necessary to measure the multi-feature similarity between the target and the candidate target to complete the feature matching to achieve target tracking. Since the distinguishing ability of different features to targets and candidate targets is different in actual scenes, the similarity measure should focus on those features with strong distinguishing ability. Based on this consideration, this paper first calculates the similarity between the corresponding sub-models of the target and the candidate target, then according to the ability of each feature to distinguish the target and the background, the total similarity measure obtained by linear weighted summation is used to describe the Similarity measure ρ, that is: $ρ (T_{i}, C_{j}) = \sqrt{1 - \sum_{k = 1}^{l} λ_{k} ρ_{k}}$ (17)

In the formula, ρ_k is the similarity measure between the kth sub-models of T_i and C_j, and λ_k is the weight assigned to each sub-model according to the feature selection rate, which satisfies $\sum_{i = 1}^{n} λ_{k} = 1$ . Among them, n = 2 and λ₁ = 0.417, λ₂ = 0.583. In terms of similarity measurement, measuring the similarity of two sub-models can be expressed by Euclidean distance. Then, the similarity ρ_k between the kth sub-model of the target and the candidate target is expressed as: $ρ (T_{i, k}^{⌢}, C_{j, k}^{⌢}) = \sqrt{\sum_{h = 1}^{m_{k}} {(T_{i, k, h}, C_{j, k, h})}^{2}}$ (18)

Then, the total similarity measure can be expressed as: $ρ (T_{i}, C_{j}) = \sqrt{1 - \sum_{k = 1}^{l} λ_{k} \sqrt{\sum_{h = 1}^{m_{k}} {(T_{i, k, h}, C_{j, k, h})}^{2}}}$ (19)

Here, a total similarity metric threshold ρ_min is set to describe the minimum similarity limit between the target and the candidate target.

According to the similarity measure, the shape, LI, and coordinates of the target area and its matching candidate target area are obtained and recorded respectively to realize the positioning of the target.

In order to make the actual monitoring area correspond to the image, that is, when establishing the reversible transformation relationship between the image plane coordinates and the actual background coordinates, camera calibration is required. Afterwards, from the perspective of projective geometry, this paper proves that there are two quadratic constraints that approximate two Kruppa equations between two images of the same scene. In this paper, the camera parameters can be obtained by directly solving the Kruppa equations. In view of the difficulties caused by directly solving the Kruppa equation, Hartley and Triggs also proposed the idea of hierarchical and stepwise calibration, that is, projective reconstruction of the image sequence, and then radiometric calibration and European calibration based on this.

Here, a method of estimating the linear projection matrix M_3×4 of the three-dimensional world coordinates N-dimensional image coordinates is used. The principle of this method is that when the coordinates of the actual background of n ⩾ 6 non-collinear points and the image plane are known, we can use linear least squares to solve the following equation to obtain the transformation matrix M_3×4: $\begin{matrix} p_{m} = 0 \\ p = [\begin{matrix} p_{1}^{T} & 0^{T} & - u_{1} p_{1}^{T} \\ 0^{T} & p_{1}^{T} & - v_{1} p_{1}^{T} \\ \dots & \dots & \dots \\ p_{n}^{T} & 0^{T} & - u_{n} p_{n}^{T} \\ 0^{T} & p_{n}^{T} & - v_{n} p_{n}^{T} \end{matrix}] \end{matrix}$ (20)

In the formula, $\begin{matrix} m = (m_{1}, m_{2}, m_{3}) \\ p_{i} = {(x_{i}^{W}, y_{i}^{W}, z_{i}^{W}, 1)}^{T} \end{matrix}$ (21)

$x_{i}^{W}, y_{i}^{W}, z_{i}^{W}$ represents the actual background coordinates of the i-th point, u_iv_i represents the image coordinates of the i-th point, and m_i is the i-th row of the transformation matrix M_3×4.

In this paper, only the situation where the monitoring target is moving on the road is considered, so the three-dimensional space in the actual background can be simplified to a two-dimensional plane, that is, the Z-axis coordinate value in the world coordinate system is always zero. Then, the transformation matrix is reduced to M_3×3. The equation to be solved is the same as the formula. From $p_{i} = {(x_{i}^{W}, y_{i}^{W}, 1)}^{T} M_{3 \times 3}$ , 9 unknowns, we can see that solving the equation requires at least 5 points of actual background coordinates and image coordinates. After solving and obtaining M_3×3, the formula for calculating image coordinates from actual background coordinates can be obtained: ${\begin{matrix} u = \frac{m_{1} \times p}{m_{3} \times p} \\ v = \frac{m_{2} \times p}{m_{3} \times p} \end{matrix}$ (22)

We set invM_3×3 as the inverse matrix of M_3×3 and n_i as the i-th row of invM_3×3. Then, the formula for calculating the actual scene coordinates from the image coordinates is: ${\begin{matrix} x = \frac{n_{1} \cdot p}{n_{3} \cdot p} \\ y = \frac{n_{2} \cdot p}{n_{3} \cdot p} \end{matrix}$ (23)

In the formula, p = (u, v, 1).

Thus, the transformation relationship between the actual scene coordinates and the image coordinates is obtained. In specific applications, the vertical point of the camera lens to the road surface can be used as the origin of the road surface coordinates, and the camera can be directed in a certain direction. After determining the road coordinate system of the actual scene, you need to set the marker points for camera calibration. Figure 4 shows the marked points set in the background pictures collected in the laboratory, which are denoted by A-K, respectively, and the road surface coordinates (x, y) in the actual scene are shown in Table 1. After solving the transformation matrix by using the road coordinates and image coordinates of the five points A, C, E, F, H, this paper uses the formula to calculate the estimated road coordinates (x′, y′) of these points. It can be seen from the marking results that although only five marking points are used, the calibration results have reached a very good level, and the relative error can be controlled within 1%. However, in order to prevent the larger measurement error of some of the five points from causing errors in the entire transformation matrix, this article recommends using more marked points.

Fig. 4

Marking points set in camera calibration.

Table 1

Mental health score results

No.	Score	No.	Score	No.	Score
1	35	24	75	47	74.5
2	45	25	75.5	48	74
3	60	26	76	49	73.5
4	63	27	76.5	50	73
5	65	28	77	51	72.5
6	66	29	77.5	52	72
7	66.5	30	78	53	71.5
8	67	31	78.5	54	71
9	67.5	32	79	55	70.5
10	68	33	79.5	56	70
11	68.5	34	80	57	69.5
12	69	35	80.5	58	69
13	69.5	36	81	59	68.5
14	70	37	79	60	68
15	70.5	38	78.5	61	67.5
16	71	39	78	62	67
17	71.5	40	77.5	63	66.5
18	72	41	77	64	66
19	72.5	42	76.5	65	65
20	73	43	76	66	63
21	73.5	44	75.5	67	60
22	74	45	75	68	45
23	74.5	46	75	69	35

After the camera is calibrated, the actual monitoring area can correspond to the image. However, in actual surveillance images, not all interested targets pose a threat to the tower. Therefore, in order to determine whether the target is a potentially dangerous person, it can be achieved by setting a sensitive area and further analyzing the movement trend of the target in the sensitive area.

Aiming at the special environment in which college students are located, a circular monitoring area is set around the tower, that is, monitoring is performed by a spherical camera fixed on the tower. In order to facilitate the calculation, this paper circumscribes the hexagon in the circular area. The camera should have a reasonable height and a positive angle of view in order to achieve better detection results. This article sets the camera to be installed at the height 2/3 of the tower. Considering that the camera cannot cover all the monitoring area in many cases, you need to activate the “tour” function of the spherical camera for monitoring. In this paper, the horizontal rotation angle of the camera is set to 120°, and the depression angle is fixed at 30°. Then the actual area to be monitored when the camera is stationary is divided into three parts, and each area is an isosceles trapezoid. Moreover, the camera rotates 120° at intervals to monitor the three areas. Its vertical plan view and three-dimensional schematic diagram are shown in Fig. 5. Among them, the black shading area is the trapezoidal area that actually needs to be monitored.

Fig. 5

Vertical plane and three-dimensional schematic diagram of the sensitive area.

The trapezoidal area is still embodied as an isosceles trapezoid in the obtained image angle of view. Therefore, the view of the sensitive area in the image plane is shown in the trapezoidal area in Fig. 6a). After the sensitive area is determined, the image coordinate system needs to be described mathematically on the image. When performing image processing, the origin of the image coordinate system is at the upper left corner of the image frame. Then, the sensitive area analysis in the image coordinate system is shown in Fig. 6b. Among them,

Fig. 6

Plan view of the sensitive area image and its analysis diagram.

$(x_{1}, y_{1}) (x_{2}, y_{1}) (0, y_{0}) (x_{0}, y_{0})$

is the coordinates of the four vertices of the trapezoid sensitive area, h₀ is the height of the trapezoid, and θ is the inner angle of the trapezoid. Here, the sensitive area is represented as A (x, y) in the image coordinate system. Then, when the value of A (x, y) is l, the target is in the sensitive area, namely $A (x, y) = {\begin{matrix} \begin{matrix} 1, y > - cos θ x + y_{0} \cap \\ y > cos θ (x - x_{0}) + \\ y_{0} \cap y > y_{0} - h_{0} \cap y < y_{0} \end{matrix} \\ 0, otherwise \end{matrix}$ (24)

Among them, $\begin{matrix} h_{0} = y_{0} - y_{1} \\ cos θ = \sqrt{\frac{x^{2}}{{(y_{0} - y_{1})}^{2} + x_{1}^{2}}} \end{matrix}$ (25)

When the target appears in a sensitive area, in many cases, the target is not necessarily a potentially dangerous target. At this time, it is necessary to judge based on certain conditions. This paper believes that when a target appears in the sensitive area for more than the maximum alert time T_max and continues to approach the tower, it may be a potentially dangerous target. That is, the vertical distance between the target and the bottom edge of the trapezoidal sensitive area continues to decrease within a certain period of time, and this process can be reflected in the Y coordinate value of the image coordinates of the target centroid continuously increasing. The schematic diagram of the trajectory of potentially dangerous targets is shown in Fig. 7. Among them, the black dots represent the target centroid, and the dotted line simulates the change trajectory of the target centroid.

Fig. 7

The trajectory of potentially dangerous targets.

In order to determine whether the target is a potentially dangerous target, an alarm function needs to be established, and this function is determined by analyzing the target’s trajectory. To this end, this article first defines an alarm transition function C_j (T_i), and its mathematical expression is as follows:

$C_{j} (T_{i}) = {\begin{matrix} 1, y_{j} - y_{j - 12} > 0 \cap A (x_{j}, y_{j}) = 1 \\ 0, otherwise \end{matrix}$ (26)

Among them, (x_j, y_j) is the centroid coordinate of the target T_i at the jth frame, y_j-12 is the centroid Y coordinate of the T_i at the j - 12 frame, C_j (T_i) indicates the movement state of the target within 12 frames. That is, when the target of interest is still in the sensitive area and close to the tower after 12 frames, C_j (T_i) is set to 1, and this state of the target is recorded.

The alarm function can be expressed as: $C (T_{i}) = {\begin{matrix} 1, \prod_{l = 1}^{n} C_{j - 12 l} (T_{i}) = 1 \\ 0, otherwise \end{matrix}$ (27)

Among them, n represents T_max = 12n, that is, a decision is made every 12 frames, and a total of n decisions are made. After synthesizing the past judgment results of n times, the value of the alarm function can be determined.

The alarm function C (T_i) is determined by the value of the alarm transition function C_j (T_i) in consecutive T_max frames. This alarm mechanism can be understood as: when a tracking target has been in the sensitive area for the alert time T_max, and the Y coordinate value of the centroid coordinates continues to increase, the alarm on the tower is triggered and alerts the monitoring center. If no such situation occurs, monitoring is continued.

6 Model performance analysis

On the basis of the above analysis, based on the neural network crisis, this article combined with facial recognition to carry out psychological analysis of college students. Taking 50 students as an example, this article analyzes videos of students’ daily lives and classroom teaching videos, combines facial recognition with neural networks to perform image analysis, and connects images to students’ mental health. In this paper, 69 students were tested to score mental health. The results obtained are shown in Table 1 and Fig. 8.

Fig. 8

Statistics of mental health score.

It can be seen from Fig. 8 that the statistical results of the model constructed in this paper are relatively consistent with the actual situation. Moreover, judging from the statistical chart, although the results of the system identification in this paper are not normally distributed, the results are similar to the normal distribution, which can be approximated as a normal distribution.

Based on the above analysis, this paper conducts model performance analysis. Because this article uses face recognition for mental health diagnosis, it is also necessary to verify the effect of this system on student emotion recognition. This paper performs emotion recognition on 69 students, and the results are shown in Table 2 and Fig. 9.

Table 2

Statistical table of emotion recognition results

No.	Accuracy (%)	No.	Accuracy (%)	No.	Accuracy (%)
1	93	24	88	47	86
2	90	25	90	48	87
3	96	26	91	49	96
4	86	27	93	50	87
5	96	28	98	51	93
6	96	29	92	52	87
7	88	30	91	53	89
8	90	31	89	54	87
9	96	32	95	55	90
10	90	33	91	56	86
11	96	34	97	57	90
12	94	35	93	58	92
13	89	36	92	59	90
14	96	37	90	60	94
15	88	38	90	61	88
16	94	39	92	62	87
17	91	40	98	63	90
18	90	41	90	64	92
19	88	42	92	65	93
20	92	43	97	66	96
21	94	44	96	67	89
22	90	45	91	68	87
23	91	46	89	69	88

Fig. 9

Statistical graph of emotion recognition results.

It can be seen from Fig. 9 that the psychological diagnosis system for university learning constructed in this paper performs well in the recognition of college students’ emotions. It can be seen that the model constructed in this paper can be applied to the diagnosis of efficient college students’ mental health.

7 Conclusion

The application system of mental health information management system has been developed. Moreover, the mental health test system combines mental health knowledge, mental health test, mental health forum and other functions to comprehensively and comprehensively conduct mental health interventions. At present, some mental health systems have been able to integrate mental health knowledge, classify and display data in various aspects to provide users with more comprehensive mental health knowledge. The application of intelligent video technology to the safety monitoring of college students’ mental health is of great significance for ensuring the safe and stable operation of college education and improving the overall quality of students. Aiming at various unstable factors that affect facial recognition and the special environment in which college students are located, this paper proposes a method for facial emotion recognition of college students. Moreover, the method uses moving target detection, target classification, target tracking, and a series of image preprocessing technologies to achieve intelligent monitoring of the area where college students are located. In addition, the method can automatically alert when a potentially dangerous target is discovered, monitor students’ mental health in real time, and prevent various problems from occurring. Through experimental research, we can see that the model constructed in this paper has good performance.

Footnotes

Acknowledgments

This paper was supported by Chengde Medical University Youth Fund for Humanities and Social Sciences research projects: Research on the relationship of social support, academic self-efficacy and mental health among Chengde Medical University Upgraded College students, No. 201841.

References

Anjos

, Chakka

M.M.

and Marcel

, Motion-based counter-measures to photo attacks in face recognition, Biometrics, IET 3(3) (2014), 147–158.

Zhuang

, Chan

T.H.

, Yang

A.Y.

, et al., Single-Sample Face Recognition with Image Corruption and Misalignment via Sparse Illumination Transfer, Proceedings / CVPR, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE Computer Society Conference on Computer Vision and Pattern Recognition 114(2-3) (2014), 272–287.

Wang

, Shi

, Shu

, et al., Embedded Manifold-Based Kernel Fisher Discriminant Analysis for Face Recognition, Neural processing letters 43(1) (2016), 1–16.

Chuk

, Chan

A.B.

and Hsiao

J.H.

, Understanding eye movements in face recognition using hidden Markov models, Journal of Vision 14(11) (2014), 8–18.

Bouchech

, Selection of optimal narrowband multispectral images for face recognition, Monthly Notices of the Royal Astronomical Society 402(4) (2015), 2140–2186.

Ramachandra

and Busch

, Presentation Attack Detection Methods for Face Recognition Systems: A Comprehensive Survey, ACM Computing Surveys 50(1) (2017), 8.1–8.37.

Huang

, Shan

, Wang

, et al., A Benchmark and Comparative Study of Video-Based Face Recognition on COX Face Database, IEEE Transactions on Image Processing 24(12) (2015), 5967–5981.

Wei

C.P.

and Wang

Y.C.F.

, Undersampled Face Recognition via Robust Auxiliary Dictionary Learning, Image Processing IEEE Transactions on 24(6) (2015), 1722–1734.

Krisshna

N.L.A.

, Deepak

V.K.

, Manikantan

, et al., Face recognition using transform domain feature extraction and PSO-based feature selection, Applied Soft Computing 22(4) (2014), 141–161.

10.

Weng

, Lu

and Tan

Y.P.

, Robust Point Set Matching for Partial Face Recognition, Image Processing IEEE Transactions on 25(3) (2016), 1163–1176.

11.

Cui

, Chang

, Shan

, et al., Joint sparse representation for video-based face recognition, Neurocomputing 135(21) (2014), 306–312.

12.

Phillips

P.J.

and Otoole

A.J.

, Comparison of human and computer performance across face recognition experiments, Image and Vision Computing 32(1) (2014), 74–85.

13.

Tang

, Zhu

, Yu

, et al., A novel sparse representation method based on virtual samples for face recognition, Neural Computing & Applications 24(3-4) (2014), 513–519.

14.

Mehta

, Yuan

and Egiazarian

, Face recognition using scale-adaptive directional and textural features, Pattern Recognition 47(5) (2014), 1846–1858.

15.

Raghavendra

, Raja

K.B.

and Busch

, Presentation Attack Detection for Face Recognition Using Light Field Camera, IEEE Transactions on Image Processing 24(3) (2015), 1060–1075.

16.

Yang

, Zhu

, Liu

, et al., Joint representation and pattern learning for robust face recognition, Neurocomputing 168(30) (2015), 70–80.

17.

Yan

, Wang

and Suter

, Multi-subregion based correlation filter bank for robust face recognition, Pattern Recognition 47(11) (2014), 3487–3501.

18.

Fan

, Ni

, Zhu

, et al., Weighted sparse representation for face recognition, Neurocomputing 151(31) (2015), 304–309.

19.

Tang

, Feng

and Cai

, Weighted group sparse representation for undersampled face recognition, Neurocomputing 145(16) (2014), 402–415.

20.

Al-Arashi

W.H.

, Ibrahim

and Suandi

S.A.

, Optimizing principal component analysis performance for face recognition using genetic algorithm, Neurocomputing 128(27) (2014), 415–420.

21.

Andrés

A.M.

, Padovani

, Tepper

, et al., Face recognition on partially occluded images using compressed sensing, Pattern Recognition Letters 36(1) (2014), 235–242.

22.

Ding

and Tao

, Pose-invariant face recognition with homography-based normalization, Pattern Recognition 66(2) (2017), 144–152.

23.

Paul

, Internet of Things: A primer, R Jeyaraj Human Behavior and Emerging Technologies 1(1) (2019), 37–47.

24.

Paul

, Daniel

, Ahmad

and Rho

, Cooperative cognitive intelligence for internet of vehicles, IEEE Systems Journal 11(3) (2017), 1249–1258.

25.

Paul

, Ahmad

, Rathore

M.M.

and Jabbar

, Smartbuddy: defining human behaviors using big data analytics in social internet of things, IEEE Wireless Communications 23(5) (2016), 68–74.

26.

O’Regan

, Some Topological Theorems for Compact Multifunctions, Dynamic Systems and Applications 28 (2019), 827–836.

27.

Shabani

, Sheikhani

A.H.R.

and Aminikhah

, Robust Control for Variable Order Time Fractional Financial System, Dynamic Systems and Applications 29(1) (2020), 111–122.

28.

Paul

, Jiang

Y.C.

, Wang

J.F.

and Yang

J.F.

, Parallel reconfigurable computing-based mapping algorithm for motion estimation in advanced video coding, ACM Transactions on Embedded Computing Systems (TECS) 11 (S2), 1–18, 2012.

29.

Rathore

M.M.

, Paul

, Hong

W.H.

, Seo

H.C.

, Awan

and Saeed

, Exploiting IoT and big data analytics: Defining smart digital city using real-time urban data, Sustainable Cities and Society 40 (2018), 600–610.