Abstract
Artificial intelligence image processing has been of interest to research investigators in tumor identification and determination. Magnetic resonance imaging for clinical detection is the technique of choice for identifying tumors because of its advantages such as accurate localization with tomography in any orientation. Nevertheless, owing to the complexity of the images and the heterogeneity of the tumors, existing methodologies have insufficient field of view and require expensive computations to capture semantic information in the view, rendering them lacking in universality of application. Consequently, this thesis developed a medical image segmentation algorithm based on global field of view attention network (GVANet). It focuses on replacing the original convolution with a transformer structure and views in a larger field-of-view domain to build a global view at each layer, which captures the refined pixel information and category information in the region of interest with fewer parameters so as to address the defective tumor edge segmentation problem. The dissertation exploits the pixel-level information of the input image, the category information of the tumor region and the normal tissue region to segment the MRI image and assign weights to the pixel representatives. This medical image recognition algorithm enables to undertake the ambiguous tumor edge segmentation task with low computational complexity and to maximize the segmentation accuracy and model property. Nearly four thousand MRI images from the Monash University Research Center for Artificial Intelligence were applied for the experiments. The outcome indicates that the approach obtains outstanding classification capability on the data set. Both the mask (IoU) and DSC quality were improved by 7.6% and 6.3% over the strong baseline.
Introduction
Osteosarcoma is the malignant cancer that accounts for one-fifth of all primary bone cancers. Osteosarcoma typically occurs in adolescents or children under 20 years of age, and is the most common pediatric bone malignancies [1]. It can improve survival with chemotherapy (from 20% to 60%). However, such methods mainly work in developed countries and regions, and have not been widely applied in developing countries [2, 3]. In China, about 15% of patients initially diagnosed have already developed metastases. Once metastasis and recurrence of osteosarcoma occur, patients face a 5-year probability of survivability of under 30% [4–6]. Prompt and correct diagnosis is the key to administering appropriate treatment and ensuring successful treatment.
Most developing countries do not have a high enough medical technical capacity and quality of care, insufficient overall investment in health care, misallocation of limited resources, and uneven distribution of medical resources [7–10]. These reasons have led to resource constraints and underutilization, with various problems in the treatment of osteosarcoma [11–13]. In China, for example, nearly eighty-five percent of the medical facilities in urban areas exist in secondary and tertiary hospitals, with the rest dispersed in primary communal health care facilities [14–16]. Osteosarcoma has a large incidence base and is highly lethal, and the current cure rate for patients is only about 65% [17]. Accurate lesion diagnosis plays a crucial role in prognosis, lesion resection, and treatment outcome. The main imaging devices/methods used for diagnosing osteosarcoma are plain radiography, CT, and MRI. MRI has become the standard practice for diagnosing osteosarcoma due to the absence of radiation damage, clear imaging, and the accuracy of soft tissue masses and intramedullary lesions [18–20]. The precision of MRI image recognition of osteosarcoma determines treating physician’s tendency in medical practice and influences the judgment of the patient’s treatment outcome after surgery. The depiction by the medical imaging technologist [21–23]. However, since numerous MRI images of patients are available, the diagnosis is highly subjective and lacks reproducibility. Therefore, there is a need to propose an automated tumor segmentation method [24].
Intelligent expert systems have been increasingly introduced in the medical field to assist diagnosis [25–27], among which, automated segmentation methods for osteosarcoma have also been extensively studied. Automated osteosarcoma segmentation methods help medical practitioners avoid complicated image preprocessing operations and directly store and visualize diagnostic results, which to a certain extent alleviate the problem of medical resource constraints in developing countries, improve diagnostic accuracy, and reduce the rate of missed diagnoses [28]. However, there are still many challenges in establishing an intelligent expert system for tumor tissue segmentation of osteosarcoma images: (1) Ambiguity of the boundary of osteosarcoma. It difficult to determine the boundaries of the tumor. (2) Specificity of osteosarcoma. Different patients with osteosarcoma and different tumors in the same patient have different symptoms [29]. (3) Differences in MRI imaging equipment protocols. MRI images of osteosarcoma are derived from various imaging devices, which may protocol can produce non-negligible differences between images [30, 31]. All these reasons cause MRI image segmentation of osteosarcoma to be a challenging task.
Most of the existing medical image processing techniques use CNN-based segmentation methods [32]. Since the end-to-end FCN was proposed [25], deep convolutional networks have become the leading solution to address image segmentation. Early studies focused on the space scale of the background, that is, spatial extent [33–35], exploit multiscale contextual locations, use pyramidal pooling modules, and aggregate contextual location representations [36–39]. The context of an area refers to the pixels around that point. A common drawback of many neural networks architected as Convolutional neural networks (CNNs), including U-net, is the global field of view, where CNNs have only a fixed size kernel field of view [40, 41]. Moreover, the characteristics of CNNs lead to their susceptibility to dense pixel values, making it almost impossible to learn accurate shape information [42]. And the accuracy of the boundary determined by shape information is closely related to the correct rate of tumor diagnosis, which is very important in clinical diagnosis [43].
For the purpose of resolving the above problems, this thesis has designed a neural network based on the attention mechanism of the global field of view for osteosarcoma-assisted segmentation (GVAOS). The MixMatch method is used to enhance the dataset in the dataset optimization stage to resolve the issue of large discrepancies in MRI images. These images are segmented using both pixel-level information and category information in the network and weighted to enhance the pixel representation, with pixel-level information representing all pixels in the input image and category information being the intra-regional pixel representation of tumor regions and normal tissue regions. The effect of accurate segmentation of osteosarcoma images is achieved by using a transformer architecture with a larger field of view based on the introduction of more information. First, the dataset is optimized by the dataset preprocessing module, then the image is enhanced, and finally, the image is passed into the global field of view attention network (GVANet) to predict the final segmentation result.
The principal work and innovative elements of this dissertation is as follows: In this dissertation, the semi-supervised MixMatch algorithm is applied, which partitions the MRI image dataset into UP and HP, and selectively feeds UP and HP into the network based on the sequential order for reinforcing the training efficiency and generalization capabilities of the model, as well as further promoting the detection productivity of clinicians. The dissertation presents a medical image recognition method that undertakes the task of obscure tumor boundary identification and predictive classification by replacing the original convolution with a transformer structure and views in a larger field of view domain, establishing a global view at each layer, and weighted aggregation of MRI image pixel information and tumor region category information with similarity as weights, which optimizes the prediction precision and thus facilitates the predictive classification of tumor MRI images. The medical image recognition algorithm based on Global View Attention Net (GVANet) enables to undertake the ambiguous tumor edge segmentation task with low computational complexity, which optimizes the model property and the network generalization capability to a great extent. A dataset containing more than four thousand tumor MRI images from the Monash University Research Center for Artificial Intelligence was studied and analyzed, and the effectiveness of the model architecture was verified. The outcome reveals that our suggested automatic and rapid segmentation approach outperforms other approaches substantially.
Related works
In medical image analysis, it has become common practice to use artificial intelligence to assist in segmentation of pixel classes. Since medical images require higher segmentation accuracy than ordinary images, improving network segmentation has become a hot research topic to meet the demand for high-standard segmentation. In the context of this chapter, the involved work regarding the segmentation of tumor images is conducted.
The majority of the previous supervised osteosarcoma image segmentation algorithms are based on Pattern Recognition Methods and Mathematical Morphology Method [28], which has the disadvantages of plenty of redundant calculations, slow operation speed, and inability to learn global features. It seriously affects the accurateness of the segmentation networks in predicting the outcome of osteosarcoma MRI images.
Ronneberger et al. [44] have designed the U-shaped network structure. Considering the limited acceptance domain in FCN and its inability to capture contextual semantics, researchers have done extensive work to refine FCN. Wang et al. [45] designed a pattern in which high-resolution convolutions and low-resolution convolutions are connected in parallel. A high-resolution backbone network named HRNet, represented by the rate, expands the receptive field through multiple convolutional parallel connections. DenseNet [46] proposed in DenseASPP connects a set of dilated convolutions more densely on the basis of ASPP to form a denser feature pyramid and obtain a wider range of dilation rates. The PSPNet used by Zhao et al. [47] fused features of different scales.
However, fully convolutional networks are constrained by the small effective perceptual domain and cannot adequately capture long-range information. Aiming to allow full convolutional networks to combine multi-scale characteristics to obtain better performance, Chen et al. [48] introduced the attention mechanism into semantic segmentation networks with multi-scale inputs. After that, Zhao et al. [49] argued that the physical structure design of the convolutional kernel resulted in the information flow in CNNs being constrained in local regions, thus limiting the understanding of complex scenes, and designed PSANet to address the issue. The main principles are to learn an attention mask adaptively, which connects each position on the feature mapping to moderate this local neighborhood constraint and design bidirectional information propagation paths. DANet [50] and OCNet [51] only calculates the similarity of each pixel with its peer and same column, i.e., the pixel on the cross, and indirectly calculates to the similarity of each pixel with other pixels by performing a loop. Works such as Double attention [52–55] and ACFNet [56] group pixels by regions that exploit class-level contextual information, and then enhance the similarity by pixel representation as a weight.
Although artificial intelligence techniques have made great progress in image segmentation and clinical diagnosis [57], MRI images of osteosarcoma are different from natural images in that their edge features are extremely sensitive to medical diagnosis and require higher accuracy for lesion and tumor segmentation in images. To meet the needs of MRI image segmentation of osteosarcoma, our method learns the tumor region with ground-truth segmentation and uses it to enhance the pixel representation, and obtains the wider global field of view while reducing parameters by using an attention mechanism instead of all convolutions. We propose a global field-of-view attention network-based MRI segmentation system for osteosarcoma, which would facilitate clinicians in the analysis and detection of MRI images generated by patients with osteosarcoma.
System design
The cure rate of patients is limited by high medical costs and strained medical resources due to the inadequate level of medical technology capability and medical quality in developing countries, insufficient overall investment in medical and health care, misallocation of limited resources, and uneven distribution of resources. At the same time, because of the large incidence base of osteosarcoma, the prognosis is difficult to judge, and the treatment cost is high. Many patients do not receive timely, adequate diagnosis and treatment.
MRI is essential for the staging of osteosarcoma and is often used as the primary means of evaluating the efficacy of neoadjuvant chemotherapy. MRI can clearly show the relationship between the tumor and surrounding normal structures such as muscles and nerves and the spread of the tumor within the medullary cavity and into the epiphysis and joint cavity. However, the MRI image presentation of osteosarcoma is complex and diverse, with a large number and sometimes even lower quality images existing. In developing countries where medical resources are scarce, if we rely solely on the manual judgment of physicians, there is a risk of missing examinations due to the insufficient ability of imaging physicians or excessive fatigue, which may lead to delays for clinicians in diagnosing patients with osteosarcoma.
In the development of computer techniques and digital health, automated image processing methods are occupying an increasingly important position in clinical diagnosis. In clinical practice, automated image segmentation methods for osteosarcoma mainly address the shortage of medical imaging technologists in developing countries, which can assist physicians in diagnosis, effectively improve their work efficiency, reduce their workload, and provide an additional layer of protection for patients’ diagnosis. At the same time, the automatic diagnosis technology also provides aid for medical supervisors to judge the diagnostic level of hospitals. Based on the above points, we propose GVAOS based on Global View Attention Net (GVANet). This approach can assist physicians in diagnosing osteosarcoma MRI images so that they can identify normal tissues and diseased areas of patients more effectively and quickly and provide more accurate results for patients. Figure 1 illustrates the general flow of osteosarcoma image segmentation. After dataset optimization and preprocessing, our GVANet integrates category information and pixel information for tumor image segmentation.

Overall process framework diagram.
The general architecture is illustrated at Fig. 1. It consists of two main sections: preprocessing and optimizing the dataset and input neural network segmentation.
Dataset optimization
After the acquisition of an osteosarcoma MRI image dataset, the pictures incorporated in it are often extensive. Still, not all images are useful for model training, and some of the images have very blurred tumor regions. Therefore, in the image selection section of the Dataset processing module this study employs the MixMatch method to optimize the dataset by dividing the images into the useful picture (UP), hard picture (HP), and continuously dividing the newly added picture, as shown Figure 2.

The flow chart of the MixMatch dataset optimization algorithm.
In the course of selecting the residual network as the classifier of MixMatch algorithm, we considered various residual structures of ResNet-18, ResNet-50, and ResNet-101 as the network backbone. Nevertheless, compared with the former, ResNet-50 and ResNet-101 occupy larger parameters, suffer from high computational complexity, and reveal insignificant data optimization effects. Hence, we used ResNet-18 as the classifier to divide the dataset, which exhibited more stable performance and faster convergence, and manifested better data set optimization. ResNet-18 contains 17 convolutional layers (Conv) and one fully-connected layer (FC), with residuals introduced between the layers as a way for addressing the gradient explosion and gradient disappearance cases. Finally, a fully-connected layer at the end performs classification.
The MixMatch algorithm is applied in this thesis to divide the newly added image dataset without labels and improve the model’s generalization performance. The specific operation is classified into four steps as follows. Data augmentation. We perform standard cropping and flipping on labeled and unlabeled data. Label Guessing. The unlabeled data is predicted by the model to form a label, which is smoothed and used for unsupervised loss calculation. Sharpening. The mean obtained by the prediction model is used. MixUp. Labeled MRI data and unlabeled MRI image data are combined into a new hybrid data.
The network input is randomly divided into X1 and X2, where X1 and X2 has no label. The network parameter is θ. Below we introduce the specific implementation process of the algorithm. Perform data augmentation on this batch of samples to obtain the augmented data Input the classifier to get the average classification probability, and apply the Sharpen algorithm to get the guess label Y2, The Sharpen algorithm is shown in Equation 4, Adjusting the temperature parameter T can adjust the classification entropy, i.e., reduce the classification entropy of the UP, HP label for osteosarcoma decrease.
Mix and rearrange X1 with X2 and labels to get dataset ϖ.
X1 is augmented with X2 and its labels and ϖ using MixUp algorithm to obtain augmented data , and similarly .
The weight factor λ′ in the formula is obtained by sampling the Beta function through the hyperparameter α.
The loss terms are calculated separately for the labeled augmented data and the unlabeled augmented data . Among them, the L2 loss with stricter constraints is used for unlabeled data.
Where is the size of Batch Size, while the character H represents Cross Entropy loss function. The two are weighted to get the total loss function. The hyperparameter k is used to balance two classes of functions:
Since MixMatch combines a variety of mechanisms for exploiting unlabeled data, it introduces various hyperparameters. However, we found in practice that most of the hyperparameters of MixMatch are fixed and do not need to be adjusted on a per-experiment or per-dataset basis. Specifically, for all experiments, we set the sharpening temperature T = 0.5 and the number of unlabeled augmentations K = 2. And α = 0.75 was found to be good starting points for tuning. The MixMatch preprocessing afforded promising label guessing outcomes, which enhanced the prediction accuracy and generalization ability of GVANet network for tumor MRI masks in the segmentation stage. Finally, the dataset is divided into UP and HP, where UP accounts for 43% and HP accounts for 57%. Selective input of UP and HP into the network partition can attain better training results. At the same time, the priority application of these useful MRI pictures (UP) not only facilitate model training, but also effectively enhance the detection efficiency of clinicians.
This study preprocessed the data in two ways:
A. Image normalization
MRI images of osteosarcoma were obtained using different MRI instruments, and the range of MRI intensities varied from patient to patient. Therefore, to reduce the variation caused by various MRI instruments, the intensity of each patient’s image was normalized using a linear normalization method:
Where I represents input MRI image grayscale values, I norm denotes mean value of pixel grayscale after normalization process, and I max and I min are the maximum and minimum image grayscale values, respectively.
B. Positioning images
In osteosarcoma MRI images, the effective area only occupies a part of the image, such as osteosarcoma tumor in the joint region. The specific process of localizing the image is divided into four steps, as shown in Fig. 3.

Pre-processing flow chart.
(1) Noise reduction processing
During the acquisition of MRI images of osteosarcoma, the noise will be inevitably generated, so in this study the images will be weighted and averaged by Gaussian convolution kernel to obtain Gaussian filtered images.
(2) Extracting edge information
The gradients in horizontal and vertical directions are calculated first, and then the specific images with high-level gradients and low vertical gradients are left to extract the edge information of the tumor images.
(3) Fuzzy binarization
Since we want to locate the effective information region and distinguish it from the background region, we convert the grayscale image into a binary image and use fuzzy binarization to distinguish it.
(4) Obtain the contour of the effective region and locate it
The contour of the effective area is obtained by the pixel boundary points of the effective area. The osteosarcoma area will only appear in the effective region, so the contour region is kept in the cropping.
In this section, this study increases the MRI image quality through the algorithm of normalization and localization of osteosarcoma MRI images, which preserves the contour regions after image noise reduction, edge information extraction, fuzzy binarization, and contour line localization and cropping of the effective regions of osteosarcoma. Besides, the noise-reduced MRI images with contour line cropping of the effective region of osteosarcoma further enhance the diagnostic efficiency of clinicians, which is beneficial to the detection of osteosarcoma MRI images.
After dataset optimization and preprocessing, the dissertation designs the Global View Attention Net (GVANet) to handle the course of medical image recognition and analysis. This algorithm undertakes the task of obscure tumor boundary identification and predictive classification. With regard to more substantial details, the network replaces the original convolution with a transformer structure and views in a larger field of view domain, establishing a global view at each layer, and weighted aggregation of MRI image pixel information and tumor region category information with similarity as weights, which optimizes the prediction precision and facilitates the predictive classification of tumor MRI images. GVANet integrates category information and pixel information for segmentation of tumor images, thus assisting physicians in diagnosing MRI images of osteosarcoma, enabling them to more efficiently and quickly identify normal tissue and lesion areas in patients, providing more accurate results for patients.
Given an input MRI image
Then, the initial pixel information is passed through the pixel-level module ℘ and then passed through FFN to obtain the exact pixel information
The initial pixel information is used to initially distinguish lesion areas from normal tissue areas by the category module
Where
We use “Scaled Dot-Product Attention”. The object category information Obj is calculated to further determine the location of the osteosarcoma lesion area.
Where
Next, the object category information Obj for determining normal tissue region or lesion region is used as K, V and the exact pixel information
Finally, the result z after the linear model ρ (x) prediction.
Where function ρ is realized by 1×1 self-attention ⟶ BN ⟶ ReLU.
The Pixel Module ℘ is proposed to aggregate the refined pixel information within each region, with the following expression:
Where the input features
The introduction of Cate Module
Loss Function
As mentioned in Section 3.1, a multi-task loss is used to jointly optimize the parameters of our model.
ξ represents the transformatio, CK×H×Wis the output after Softmax (C) on the predicted values. The L
O
makes the classification of each pixel more accurate, as follows:
The final multi-task loss is constructed as formula (25):
In this chapter, the suggested architecture GVAOS is outlined. The GVANet in this article integrates category information and pixel information for osteosarcoma MRI image segmentation. To begin with, the dataset is optimized by the MixMatch dataset preprocessing module, and the images are classified into the UP and HP, which are also subjected to image noise reduction, edge information extraction, fuzzy binarization, and contour line localization of the effective region of osteosarcoma using image normalization and image localization. Prioritizing these available MRI images effectively enhances the model generalization capability, and the noise reduction MRI images with contour line clipping of the effective region of osteosarcoma further boosts the detection efficiency of clinicians. After that, the architecture selectively passes the images into the backbone to get the initial pixel information, extracts the accurate pixel information and categorization information using the Pixel Module and the Cate Module respectively, and gets the object category information using the Cate Module then gets the object pixel information by the Pixel Module and predicts the final segmentation result after information augmentation processing. Our system can locate the invasion area of osteosarcoma quite precisely and distinguish the tumor from the surrounding important normal tissues or organs, helping doctors to understand the relationship between tumor and normal tissues and organs.
Experimental setup
This thesis validates the performance of the above suggested algorithm on a real-world osteosarcoma dataset provided by the Monash University Research Center for Artificial Intelligence, incorporating over 4000 osteosarcoma MRI images and other metrics values [2]. The positive and negative sample ratio of the data set is about 1 : 15, keeping the sample in line with the real data distribution as much as possible and maintaining a balanced positive and negative sample ratio. We augment the data by performing random flipping vertically and horizontally, randomly rotating the image by 90°, random scaling in the range [0.5, 2] and random brightness dithering in the range [–10, 10]. Images were split into 164/40 images as training and test sets.
Our proposed GVAOS method is combined with Pyramid Pooling Model (PSPNet) [47], Multi-Supervised Convolutional Network MSFCN [58], UNet [44], Multi-Supervised Residual Network (MSRN) [59], Panoramic Feature Pyramid Network (FPN) [60] and fully convolutional network (FCN) [25] for comparative experimental analysis. In this process, we mainly compared the precision (Pr), accuracy (Acc) of each method. In addition, altogether eight hundred epochs were performed to train these segmentation networks in this paper.
Results and discussion
In the model selection module, we will use the semi-supervised algorithm MixMatch to divide the dataset into UP and HP, as shown in Fig. 4, where UP is the image that the model can eventually learn while HP has little value for model training. The boundary between the tumor and bone tissue can be seen very clearly in the UP image, and the osteosarcoma is clearly displayed.

UP and HP comparison.
Figure 5 shows the results of tumor lesion of osteosarcomas MRI images on the test set, where the ground truth is described by medical imaging technologists and the other results are from the models proposed in FCN8, FCN16, FPN, U-net, MSFCN, MSRN, respectively, we can observe that FCN, FPN fails to show the full picture of osteosarcoma and MSFCN shows severe residuals in the overall shape. In most cases, our tumor boundary features are closer to ground-truth and distinguish better between tumor and normal bone tissue boundaries. It can also accurately segment the tumor edge when the tumor area is relatively small. Introducing global field of view and object class features makes GVANet more accurate than other methods in both overall and detailed execution of segmentation and achieve closer to ground-truth values for osteosarcoma edge delineation.

The prediction results of each segmentation network.
For quantitative analysis, we compared the scores of Accuracy, DSC, etc. To measure the number of network parameters and inference speed, we compared the number of parametric Params, which is presented at Table 1. It shows that the GVANet suggested by us is distinctly superior to the convolutional baseline and outperforms the other approaches as regards IOU, DSC, and other indexes. Compared with the U-net baseline, which performs better than other methods, it achieves 5.6 points increase in Precision, 2.4 points increase in Recall, 1.3 points increase in F1-score, 6.7 points increase in accuracy, 7.6 points increase in IOU, 6.3 points increase in DSC.
Comparison of image segmentation effects of different models
Figure 6 shows the parametric number and DSC comparison of different model approaches. It is 6% higher than the second model U-net while only increases the parameter amount of 12.26M, and DSC improves by 6.7% compared to the third FPN model with a lower number of parameters. It accelerates the segmentation speed and reduces the equipment burden of the hospital. Specifically, our GVANet model achieves an accuracy of 0.933 in DSC while using only 29.72M counts, which improves the training and inference speed.

Parameters of each model and DSC.
Figure 7 indicates the variation of accuracy for each model, and we selected the first 45 epochs of the six models to show the comparative analysis. The figure shows that after 30 epochs, all models stabilize in accuracy, and the learning rate drops to a very low order of magnitude. In terms of result values, our model is in the highest stability with a maximum of 99%. The accuracy ranking is GVANet > MSRN > MSFCN > U-net≈FPN > SepUNet.

Accuracy changes in the first 45 epochs of each model.
In the recall rate variation shown in Fig. 8, UNet, FPN, and SepUNet vary a lot in the first 30 epochs and are in an unstable state. In the first 30 epochs, UNet, FPN, and SepUNet vary a lot and are in an unstable state, and after 30 epochs, except for SepUNet, which continues to fluctuate, all other models reach the stable state. The recall of GVANet model is at the highest value in the models, and converges quickly and stabilizes. This allows our method to provide better coverage of tumors and can avoid the occurrence of patient misses to a greater extent. This procedure definitely validates the stationarity and convergence of the GVANet.

Recall changes in the first 45 epochs of each model.
Then, we chose the same six models to compare their F1-scores, as shown in Fig. 9, from which we can see that our model F1-score is always at the highest level, MSFCN, SepUNet fluctuates drastically, FPN and UNet are more stable, but their values are always lower than our GVANet. Hence, it can be analyzed that the segmentation precision of GVANet model is superior and much stable, which is beneficial to the practical implementation of the system in clinical environment.

F1-score changes in the first 45 epochs of each model.
At Last, we show the change in IoU of our model over the first 50 epochs in Fig. 10. The blue line is the IoU value of normal tissues, the yellow line is the IoU value of osteosarcoma, and the green line is the average IoU value. From the Fig. 10, we can see that the IoU index of the model for identifying normal tissues has been in a high state. Hence, the main learning target of the model is the identification of abnormal bone tissues. The prediction of the extent of the tumor region was gradually close to the true extent, with increasing overlap, and finally reaching a high level of correlation. Compared with the IoU of other models in Table 1, our proposed model has a better segmentation effect. It can be an aid in clinical diagnosis and provide a reference for physicians.

IoU changes of every epoch.
In this work, we propose a deep neural network osteosarcoma MRI image segmentation system based on the Global View Attention Net (GVANet). A semi-supervised method to optimize the dataset, image preprocessing, and model segmentation. Our proposed GVANet expands the field of view by the attention module at the building block level compared to the existing improved FCN-based methods, effectively reducing the number of parameters of the model and obtaining the global field of view at each stage of feature learning. By introducing category features and object features, we successfully enhance the accuracy of the model for pixels within each object region and the accuracy of edge segmentation. The improvement of segmentation accuracy would certainly better support clinicians in the identification of osteosarcoma.
In the future, we will further refine the osteosarcoma edge texture to analyze the associated characteristics, segment the osteosarcoma edge region more accurately, demonstrate the true shape of osteosarcoma, and improve the sensitivity of the model to the gray-scale difference between osteosarcoma and normal tissue. In the meantime, we will take into consideration the analytical processing of 3D osteosarcoma images, as well as the visualization of osteosarcoma from a 3D perspective, to further establish a piece of detection evidence for clinicians, so that the model can be a real panacea for osteosarcoma segmentation.
Data availability statement
Data used to support the findings of this study are currently under embargo while the research findings are commercialized.
Footnotes
Acknowledgments
All data analyzed during the current study are included in the submission.
Conflicts of interest
The authors declare no conflict of interest.
Funding
The general project of Changsha Technology Bureau (Grant No. KC1705026), Natural Science Foundation of Hunan Province (Grant No.2020JJ4647, Grant No.2020JJ6064), and Hunan Provincial Natural Science Foundation of China under Grant 2023JJ30701 and Grant 2023JJ60116 support this work.
