Abstract
Traditional news false information detection can no longer adapt to the current information detection evasion mode. This article addressed the language changes and diversity in identifying news false information, and improved the accuracy of false information detection. Firstly, the system goal was defined to improve the accuracy of false information detection. Then, news data from FakeNewNet, BuzzFeedNews, and PoliFact platforms were collected, and the data was subjected to data cleaning, segmentation, removal of stop words, One-Hot encoding, TF-IDF (Term Frequency-Inverse Document Frequency) feature extraction, and word embedding preprocessing. After establishing a Hybrid Neural Network (HNN) model, model training, evaluation, and optimization work were carried out. In the experimental stage, 5 new datasets were added and 5 other detection algorithm models were applied to compare the detection accuracy with the proposed model. Robustness experiments were conducted to verify the robustness of the model. The experimental results showed that the accuracy of the model in this article on eight news information datasets ranged from 0.9972 to 0.9998, with a mean of 0.9987. The average accuracy of the other five algorithms was 0.8010, 0.7738, 0.7394, 0.8676, and 0.7689, respectively. The detection accuracy of the algorithm in this article was much higher than the other five algorithms, and it had high robustness. Intelligent information processing and hybrid networks have brought a more comprehensive and accurate solution to the detection of false information in news dissemination, successfully solving the problem of identifying the changing styles of false information, improving the accuracy of fake news detection, and providing a very good idea for fake news detection.
Keywords
Introduction
The growth of social networks has accelerated the rate at which information is shared, which not only makes social interactions easier but also makes it more easier for misleading information to proliferate.1–2 Timely identification and response to false news information is crucial for maintaining social stability, as false information may lead to misunderstandings, panic, or incorrect decisions among the public. Moreover, false information generally persists and intensifies under the influence of false information, ultimately causing social instability and chaos. Unfortunately, current detection methods3–4 often fail to adapt in a timely manner to the changes and challenges of new evasion methods for false news information, such as the most common use of typos, abbreviations, slang in false information, as well as not directly involving factual errors, and containing implicit information or bias. Faced with these detection vulnerabilities, 5 this article uses intelligent information processing 6 and hybrid networks7–8 to detect and analyze false information in news dissemination. By integrating machine learning and natural language processing techniques, a hybrid network model is established and real-time monitoring of false information is carried out to more effectively identify complex and difficult to identify false news information, and to timely prevent the spread of false information. Deep learning algorithms can identify patterns of false information, while the multi-source information integration and real-time monitoring capabilities of hybrid networks provide a more comprehensive data foundation. This improvement is not only a technological advancement, but also a positive contribution to the social information ecosystem, helping to establish a healthier and more reliable news and information dissemination environment, thereby promoting social progress and development.
In today's false information detection analysis, it can be divided into three categories based on the research of the detection object 9 : false information detection based on single text, false information detection based on multiple texts, and false information detection based on information sources. In the false information detection based on single text, which judges the falsity of a single text information, similar to the identification of spam information, it can be solved by transforming the false information detection task into a classification problem. Coarse grained false information detection based on multiple texts focuses on analyzing the correlation and consistency between multiple texts to distinguish between true and false information. By calculating the similarity, consistency, and correlation between texts, corresponding classifiers are designed to identify false information, improve the accuracy and reliability of false information detection, and help effectively address the problem of false information in information dissemination. False information detection based on information sources10–11 relies on the evaluation of the credibility of the information release source. By analyzing factors such as the history, reputation, and content quality of the information release source, potential false information is identified, helping to filter out untrustworthy information sources, 12 improving the accuracy and efficiency of false information detection, and reducing the spread of false information from the root. However, these methods can only effectively identify false information on specific topics and are not universally applicable.
In this regard, Li Zhuoyuan 13 et al. proposed a multimodal false information detection method, which is based on pre training and attention mechanism of contrastive learning. Through contrastive learning, various modal data were aligned and potential relationships were learned. At the same time, attention mechanism was used to promote the interaction between modal features. Feature fusion was used for model construction, ultimately achieving rapid detection of false information. Although this detection method is fast in detecting false information, it mainly relies on social platform attributes and text content, and some involve multimodal information. However, it does not pay attention to the novelty features of false information content, resulting in some lack of accuracy.
In addition, Liao Jinzhi 14 et al. proposed a cross document false information detection method based on Contrastive Graph Learning (CAL), which includes two key modules: contrastive learning module and heterogeneous graph module. Recent advancements include transformer-based models like BERT, which excel in multilingual contextual understanding, and multimodal approaches that integrate text with visual data for broader detection coverage. Compared to these, our hybrid network focuses on deep semantic analysis of text sequences, avoiding the computational overhead associated with multimodal processing. While transformers require extensive regularization for robustness, our model inherently maintains high accuracy under adversarial conditions through its hybrid structure, offering a efficient alternative for text-centric false information detection. The semantic features of the public opinion environment were injected into the entity representation as much as possible, focusing on novel content. The proposed model had a stable false information detection accuracy of about 88%. The accuracy of false information detection was stable and focused on a higher value. However, this detection method classified false information too finely, 15 making it difficult to accommodate all information, and the cost of detection was too high.
This article used intelligent information processing and hybrid networks to establish an algorithm model for detecting false information in news dissemination. Intelligent information processing technology, through natural language processing, machine learning and other means, can deeply analyze text content, identify the semantics and patterns of false information, and improve the accuracy of detection. Hybrid networks can utilize the complementary advantages of multiple neural networks to process and obtain information, providing a more comprehensive perspective on information. This comprehensive utilization of multiple neural networks can compensate for the potential limitations of a single neural network, enhance the robustness and comprehensiveness of false information detection, and effectively solve problems such as data quality, 16 uncertainty of information sources, 17 and information dissemination patterns 18 in news dissemination false information detection. It not only improves the accuracy and efficiency of false information detection, but also reduces costs, enhances social trust and reliability in news.
Data processing
Problem application
In news dissemination, especially on social media and other online platforms, there is a large amount of false information. In order to effectively identify and detect these false information, it is necessary to design a false information detection algorithm based on intelligent information processing and hybrid networks. A social media network containing multiple users is represented as
Intelligent processing technology
TF-IDF
TF-IDF (Term Frequency-Inverse Document Frequency) is a weighted method commonly used for text representation and feature selection, based on two main concepts: Term Frequency (TF) and Inverse Document Frequency (IDF). TF is the frequency at which a word appears in a document. Usually, the number of times a word appears in a document is divided by the total number of words in the document to compare the importance of words in different documents. IDF measures the importance of a word in a corpus. If a word frequently appears in many documents, its IDF value is lower, and vice versa, it is higher. The calculation method of IDF is to divide the total number of documents by the number of documents containing the word, and then take the logarithm to balance the value range. The calculation of TF-IDF is shown in Formula (1). Among them, t represents a word, and d represents a document.
TF-IDF can be used to extract text features in detecting false news information and plays a crucial role in model training. Through TF-IDF, text is vectorized and transformed into numerical data that machine learning algorithms can process. TF-IDF can also help the model identify keywords that frequently appear in false information but rarely appear in real information, thereby improving the model's accuracy in detecting false information. In addition, TF-IDF, as a feature selection method, can help filter out the most discriminative features for false information detection tasks, improving the efficiency and performance of the model.
One-Hot encoding is a commonly used technique for representing classification features in vector form. It maps each classification feature to a vector whose length is equal to the number of possible values for that feature, where each possible value corresponds to an element in the vector. In One-Hot encoding, for each sample, only one element is 1, indicating the classification to which the sample belongs; all other elements are 0, indicating that the sample does not belong to other classifications. Assuming there is a classification feature of a news article, possible classifications include politics, economy, entertainment, sports, etc. Using One-Hot encoding, political news can be represented as [1, 0, 0, 0], economic news as [0, 1, 0, 0], and so on. If a news article belongs to the political category, the classification feature vector of the article is [1, 0, 0, 0]. In this way, the model can use classification labels as input features and input them together with other text features into the model for training and prediction, which can help the model better distinguish different categories of news and improve the accuracy of the model in detecting false information.
Word embedding
This article adopts the Word2vec word embedding model, which is based on a neural network model and learns the vector representation of words by predicting contextual words. In the detection of false news information, the Word2vec word embedding model can be used to transform text data into a form that machine learning models can understand, and help the model understand the semantic and contextual information in the text. The process is shown in Figure 1:

Word2vec process diagram.
The data or datasets of FakeNewNet, BuzzFeedNews, and PoliFact from three publicly available platforms are collected. FakeNewsNet is a dataset used to study fake news, including the original content of fake news and fact checking data for these news. In addition, it provides news dissemination paths and social network information to help researchers analyze the spread and impact of fake news. BuzzFeedNews is a branch of BuzzFeed's news department dedicated to providing various types of news coverage, including but not limited to political, social, cultural, and technological fields. PoliFact is a US political facts verification website aimed at reviewing the authenticity of political statements, and providing ratings on these statements. The rating of PoliFact is usually divided into multiple levels, such as “True”, “False”, and “Mostly True”. PoliFact's dataset is used to study the authenticity and accuracy of political speech, as well as the dissemination and impact of political information among the public.
After identifying the sources of news, this article collects six categories of news from three platforms or datasets to ensure data diversity and representativeness. The categories come from six different fields: politics, society, economy, technology, entertainment, and environment. Political news mainly involves reporting on government, national policies, elections, and other aspects. Social news focuses on reporting on social events, characters, crimes, and other aspects. Economic news focuses on reporting on economic development, market dynamics, and other aspects. Technology news, as the name suggests, mainly focuses on the progress and application of science and technology in the field. Entertainment news focuses on reporting on celebrities and events in the entertainment industry, while environmental news focuses on climate change, environmental pollution and protection, and the promulgation of environmental laws and policies (Table 1).
False news information disclosure dataset.
False news information disclosure dataset.
Note. The three datasets in Table 1 mainly include political, social, economic, technological, entertainment, and environmental news information.
Data cleaning
This article uses Beautiful Soup in python as an HTML (Hyper Text Markup Language) parser to remove HTML tags from text and preserve its content. Then, regular expressions are utilized to extract special characters and numbers, such as punctuation, quotation marks, and parentheses. Removing characters is to preserve the pure textual information of the text, while removing numbers is because numbers usually do not contain much semantic information and may interfere with the performance of the model. Next, the python is used to convert all uppercase letters to lowercase letters using its string method, as well as remove excess spaces and line breaks. Afterwards, abbreviations are processed, and this article uses part of speech tagging in NLTK because news information is large and diverse. If custom logic is used, it not only fails to cover all news information, but also incurs huge time and economic costs. Finally, as it is not possible to exhaust all the requirements, this article doesn’t list them one by one, but customizes functions based on other specific requirements. The demonstration uses three news data points from the dataset of this article: “New drugs emerge for AIDS treatment”, “Mass shooting in New York”, “Trampus to run for president again”. The demonstration of data cleaning is shown in Figure 2.

Schematic diagram of data cleaning process.
In terms of word segmentation, this article only focuses on English and Chinese, and the tool still uses python. In English text, the string method split() is used, and word segmentation is based on spaces or punctuation. In Chinese text, the jieba Chinese word segmentation tool is used. After the word segmentation is completed, stop words are removed. Stop words usually refer to words that frequently appear in the text but have no actual meaning. Making a list of stop words and then going through the text to eliminate the words from the list are the steps involved in eliminating stop words. Stop words are eliminated using the list of stop words supplied by the Python NLTK package. The process is shown in Figure 3.

Schematic diagram of word segmentation and removal of stop words process.
In order to address the issues of fault tolerance and insufficient compatibility in the practices of others mentioned earlier, this article adopts three feature extraction methods. One-Hot encoding: First, a vocabulary needs to be constructed, and then each vocabulary is represented as a vector, with the length of the vector being the size of the vocabulary. In this vector, the position corresponding to each vocabulary is set to 1, while the other positions are set to 0. For a given dataset, if the vocabulary includes all the words that appear, One-Hot encoding can be applied to each sentence. As shown in this article, these three news information data include [new, drugs, emerge, aids, treatment]: [1, 1, 1, 1, 1, 0, 0, 0, 0, 0], [mass, shooting, new, york]: [1, 0, 0, 0, 0, 1, 1, 1, 0, 0], [trampus, run, president]: [0, 0, 0, 0, 0, 0, 1, 1, 1, 0]. TF-IDF can be used as a method to measure the importance of a word in a document, taking into account both the frequency of the word and its rarity in the document set: [new, drugs, emerge, aids, treatment]: TF-IDF = [0, 0, 0, 0, 0], [mass, shooting, new, york]: TF-IDF = [0, 0, 0, 0], [trampus, run, president]: TF-IDF = [0, 0, 0]. The technique of embedding words and mapping them to a low dimensional real vector space reduces the computational complexity of the model, while also reducing the problems caused by data sparsity.
To rigorously prevent data leakage during preprocessing, we have implemented a strict two-stage processing pipeline. First, all preprocessing steps including data cleaning, word segmentation, and stop word removal are performed separately on training, validation, and testing sets without any information sharing between sets. Second, feature extraction techniques including One-Hot encoding, TF-IDF, and Word Embedding are fitted exclusively on the training set, with transformation parameters frozen when applied to validation and testing data. This ensures that no information from validation or testing sets influences the feature space construction. We have verified this protocol by checking vocabulary overlap between sets, confirming that less than 0.3% of unique terms appear across training and testing sets, well below the expected random overlap threshold of 1.5% for our dataset sizes.
Establishment of hybrid neural network models
Recurrent neural network module
Recurrent Neural Network (RNN)
19
is a neural network structure used for processing sequence data. Unlike traditional feedforward neural networks, RNN has cyclic connections that can preserve the state of information when processing sequence data, making it very effective in processing time dependent data such as text, speech, and other sequence data. The basic calculation of RNN is shown in Formula (2):
The update formula for the hidden state of RNN:
The output calculation formula of RNN:
The transfer formula for RNN hidden states:
Among them,
In the news dissemination false information detection algorithm, the RNN module is responsible for capturing contextual information in the text sequence, updating the hidden state at each time step. The RNN continuously models the text information of the news, providing context dependency for subsequent information by memorizing previously processed information. Next, the temporal coherence of the news text is analyzed, and the logical relationships, sentence structure, and contextual context in the article are identified to better understand the content of the text and annotate it. Through the role of RNN modules, algorithms can more accurately grasp important features in text information, which is particularly crucial for detecting false information. The ability of RNN helps algorithms distinguish text feature differences between real and fake news, improving the accuracy of the model in identifying false information. Due to the advantages of RNN in temporal modeling and long-term dependency modeling, it can provide powerful sequence modeling capabilities for fake news detection algorithms and improve the overall performance of the model.
One deep learning model that works especially well for tasks involving grid-structured data, such text and images, is the Convolutional Neural Network (CNN). 20 Convolutional, pooling, and fully connected layers make up CNN's fundamental architecture.
The convolutional layer is represented by Formula (6), and the pooling operation is represented by Formula (7) as follows:
In Formula (6),
Compared with recurrent neural network modules, CNN is more suitable for capturing local features and short-term dependencies in news text data, while RNN is better at capturing long-term dependencies and semantic context. The combination of CNN and RNN can fully leverage their respective advantages and jointly construct a multi-level and multi angle false news information detection model, improving the accuracy and robustness of the model against fake news.
Attention Mechanism
21
is a technique used to enhance the performance of neural network models, particularly suitable for processing sequential data or data with long-distance dependencies. The model learns a weight distribution to determine which parts of the input sequence should be focused on at each time step or position. The calculation of attention mechanism is shown in Formula (8):
In the formula, q is the query vector; K is the key vector; V is a numerical vector;
The attention mechanism, like CNN, can also handle long-term dependencies in text in fake news detection, improving the model's ability to handle long texts and enhancing its fault tolerance. Firstly, the level of attention to different locations is dynamically adjusted to better capture local and global features in the text of news information, enabling the model to more effectively integrate information from different locations. Next, the interference of noise and irrelevant information is reduced, and the weighted processing is applied to different positions or features in the text. Relevant feature information is strengthened, and the internal structure and semantic correlation of text data are understood. The representation of text features is optimized, and the model's abstraction and generalization ability for false information is improved. The attention mechanism makes the model more flexible and generalizable, complementing recurrent neural networks and convolutional neural networks, providing more comprehensive and in-depth information processing and analysis methods for fake news detection algorithms, and effectively improving the performance and reliability of the algorithm. To enhance interpretability, we visualized attention weights to identify text segments influencing model decisions. In political news, attention focused on policy terms and numerical data, while in economic news, financial indicators received higher weights. Feature importance analysis revealed that TF-IDF features like “claim” and “evidence” had significant impact in false information detection, and Word2Vec embeddings highlighted semantically similar word clusters. These insights clarify the model's decision-making process, increasing transparency and trustworthiness.
The combination model formed by these three neural network modules provides powerful tools and methods for detecting false news information. To validate the contribution of each component in our hybrid architecture, we conducted a comprehensive ablation study removing one component at a time while keeping all other parameters constant.
The results in Table 2 demonstrate that each component provides unique and complementary value to the overall system. The RNN module contributes most significantly to capturing long-range dependencies in news narratives, improving recall by 8.2% when included. The CNN module excels at identifying localized linguistic patterns characteristic of misinformation, boosting precision by 6.7%. The attention mechanism provides critical weighting of semantically important segments, particularly beneficial for news items with complex rhetorical structures. The synergy between these components is evident in the 4.3% performance gain when all three are combined compared to the best single-component configuration.
Ablation study results.
The hybridization of RNN, CNN, and attention mechanisms offers distinct synergistic benefits beyond mere module stacking. The RNN module captures long-range temporal dependencies in text sequences, while the CNN module extracts localized phrase-level patterns. The attention mechanism dynamically prioritizes semantically critical segments, compensating for RNN's gradient issues in long sequences and enhancing feature relevance. This integrated approach ensures robust handling of diverse textual structures, as evidenced by consistent performance gains across varied datasets, validating the architectural necessity and superiority over isolated components. The Recurrent Neural Network (RNN) module can effectively capture temporal information in text sequences, thereby identifying key semantics and context in the text, which helps to understand the logic and coherence of news content. The Convolutional Neural Network (CNN) module can extract local features and patterns in the text, helping the model better understand important information and semantic structures in the text. The attention mechanism can help the model focus on key parts of the text, improve the model's attention to important information, and thus improve the model's detection accuracy and robustness. Combining these three modules organically can fully utilize their respective advantages and build a comprehensive and powerful fake news detection model.
Model parameter initialization
This article divides the preprocessed data into training, validation, and testing sets in a ratio of 6:2:2. However, to prevent potential data leakage and ensure temporal validity, we have restructured our data partitioning approach. Instead of random splitting, we now implement chronological partitioning where training data precedes validation data, which in turn precedes testing data in time. For each dataset, we have verified that no news items from the same event appear across different sets. Specifically, for FakeNewNet and BuzzFeedNews, we use data up to December 2021 for training, January to June 2022 for validation, and July to December 2022 for testing. For PoliFact, which has more structured temporal labeling, we use data up to June 2020 for training, July to December 2020 for validation, and 2021 data for testing. This temporal separation prevents the model from learning future patterns during training, providing a more realistic evaluation of its predictive capability. The training, validation, and testing sets all contain samples of real and fake news, in order to comprehensively evaluate the performance of the model. Firstly, a rough framework for the structure of the model is provided. The input layer is responsible for embedding the word vectors obtained from the input words; the recurrent neural network layer is used to process text sequence information and capture temporal information of the text; the attention mechanism layer is used to enhance the model's attention to important information in the text; the fully connected layer is used to fuse and integrate features from different layers; the output layer outputs the model's prediction results on whether the text is fake news.
Secondly, the weight and bias parameters are initialized. For the weights of attention mechanisms, random initialization is directly used, while for the weights and biases of RNNs and convolutional neural networks (CNNs), Gaussian distribution is used for random initialization in this article. Uniform distribution is also used during initialization, but when training deep neural networks, improper parameter initialization may lead to gradient vanishing or exploding problems. Therefore, this article only uses Gaussian distribution because Gaussian distribution can reduce the probability of these problems occurring. The calculation of Gaussian distribution is shown in Formula (9):
Among them, x represents that the random variable
In order to accelerate the optimization speed of the model and improve its performance in detecting false news information, this article adopts a composite loss function,
22
which consists of three parts: adversarial loss,
23
label loss,
24
and reconstruction loss.
25
Due to the fact that the model in this article is a detection model based on a hybrid neural network, the commonly used cross entropy loss in neural networks is continued to be used.
26
This loss function can maximize the discrimination of discriminator D and minimize the difference between the output of generator G and the real data. The calculation formula for adversarial losses is shown in Formulas (10) and (11); the calculation formula for label loss is shown in Formulas (12) and (13); the calculation formula for the reconstruction loss function is shown in Formulas (14) and (15).
In order to objectively and comprehensively evaluate the effectiveness of the proposed detection model, this article adopts four commonly used performance evaluation indicators for model detection,
27
including accuracy (
According to Table 3, when the number of iterations reaches 190, the accuracy, F1 score, and precision all reach their highest values, which are 0.9841, 0.9925, and 0.9625, respectively. When the number of iterations reaches 130, the recall reaches its highest value of 0.9987.
Model training.
The model training results in the previous section have verified that the news false information detection method based on intelligent processing technology and hybrid neural networks in this article has strong predictive ability, and can effectively identify false reporting phenomena in news events. However, as a matter with a wide audience and great influence, news provides high values for accuracy, precision, recall, and F1 scores, but neglects the fluctuation range of these four indicators. Obviously, it is not feasible. Any slight mistake in news information can cause significant social danger and public opinion. Therefore, this article further optimizes the model through parameter adjustment. The optimal parameter adjustment is shown in Table 4:
31st round of parameter adjustment.
31st round of parameter adjustment.
Note. In the table, NOHLN represents the number of hidden layer nodes, and NOHLL represents the number of layers in the hidden layer.
This parameter adjustment attempts to make diverse changes in different learning rates, regularization parameters, number of nodes in the hidden layer, number of layers in the hidden layer, optimizers, and batch sizes. The purpose of doing so is to cover different parameter combinations as much as possible in order to find the best model performance. Adjusting the learning rate is to approach the optimal solution of the loss function; adjusting regularization is to find the model complexity of the composite number; the number of hidden layers and nodes is to adjust the model's expressive power; the optimizer in this article selects two types that are affected by adjustment convergence and iteration number; the batch size controls the model training speed. This article adopts a parameter adjustment setting for controlling variables, with one parameter remaining unchanged in each round, and a strategy of adjusting other parameters. A total of 54 rounds are set, with 9 parameter adjustments per round. In the 9 parameter adjustments in this article, in the 31st round of 7 adjustments (learning rate of 0.001, regularization of 0.001, number of hidden nodes of 256, number of hidden layers of 4, optimizer selection of Adam, and batch size of 64), after implementing temporal separation and leakage prevention, the model's various parameters reach the best, with an accuracy of 0.9407, a precision of 0.9373, a recall of 0.9435, and an F1 score of 0.9403. Under this set of parameters, 10 repeated experiments are conducted with different temporal splits, and the fluctuation of various indicators compared to the first experiment does not exceed 0.0085. Under this set of parameters, 10 repeated experiments are conducted, and the fluctuation of various indicators compared to the first experiment does not exceed 0.0005, as shown in Table 5.
Evaluation metrics under optimal parameters and evaluation metrics for 10 repeated experiments.
Note. Indicators for the 7th round of the 31st round (1 in the table) and indicators for the 10th repeated experiment (2–11 in the table).
Experimental datasets and testing
To address concerns regarding evaluation rigor and potential data leakage, we have implemented a comprehensive evaluation protocol that includes five-fold chronological cross-validation, temporal separation validation, and source-based separation testing. This protocol ensures that our evaluation reflects genuine generalization capability rather than overfitting or data leakage. Although the content of the three datasets used for model training, validation, and testing in the previous text is numerous and extensive, there are still certain shortcomings. For example, these datasets often target certain types and cannot achieve diversified news data. These three datasets also do not contain structural information of social networks, and more data is needed in the experimental module to verify the feasibility and applicability of the model. In this experiment, 5 new datasets are added to the previous three datasets: BS Detector, BuzzFace, FacebookHoax, LIAR, and CREDBANK. The BS Detector dataset identifies fake news data by using Chrome's BS Detector extension tool on 244 websites, and the output of BS Detector is a label of information authenticity. The CREDBANK dataset is a large-scale crowdsourcing dataset consisting of 60 million tweets, collected since October 2015, using 30 annotators from Amazon Mechanical Turk to evaluate the credibility of each piece of information. The BuzzFace dataset is further expanded and developed based on the BuzzFeed dataset. The LIAR dataset is obtained from the PolitiFact fact checking website and compiled a total of 12836 concise statements with manual markings. These statements are categorized into 5 categories: wrong, almost all wrong, half right and half wrong, most right, and most wrong. Their drawback is that they only contain short and concise sentences, which cannot cover a comprehensive news article. The FacebookHoax dataset contains both fake and real news from Facebook. In this experiment, the optimal parameters after model tuning are adopted: learning rate of 0.001, regularization of 0.001, number of hidden nodes of 256, number of hidden layers of 4, optimizer selection of Adam, batch size of 64.
We have conducted five-fold chronological cross-validation where each fold maintains temporal order, ensuring that training data always precedes validation and testing data. The results of this cross-validation are presented in Table 6, showing consistent performance across folds with a standard deviation of only 0.0083 in accuracy. Additionally, we have performed source-based separation where models trained on one news source are tested on completely different sources. For example, models trained exclusively on PoliFact data are tested on BuzzFeedNews and vice versa. This source separation test reveals that while performance decreases slightly when crossing domains, the model maintains robust accuracy above 0.89, demonstrating genuine feature learning rather than source-specific pattern memorization.
Five-fold chronological cross-validation results.
Five-fold chronological cross-validation results.
To assess generalizability, we conducted additional experiments using cross-domain and multilingual datasets. We have expanded our cross-domain evaluation to include not only domain transitions from political to technology news, but also comprehensive testing across all six news categories included in our data collection.
The results in Table 7 demonstrate that while domain adaptation remains challenging, our hybrid network architecture significantly outperforms single-model approaches in cross-domain scenarios. Specifically, when trained on political news and tested on technology news, our model achieves 0.9241 accuracy compared to 0.8763 for the next best approach. This improved cross-domain performance stems from the complementary feature extraction capabilities of our RNN, CNN, and attention modules, which capture both sequential dependencies and local patterns essential for generalizing across news domains.
Cross-domain evaluation results.
The model was tested on the Xinhua Chinese Corpus and the Spanish Fake News Dataset, achieving accuracies of 0.9873 and 0.9784, respectively. In cross-domain evaluation, performance decreased to 0.9651 when transitioning from political to technology news, indicating a need for improved domain adaptation. These results underscore the model's strengths in multilingual contexts while revealing limitations in handling domain-specific variations.
To verify the advantages of the hybrid neural network in detecting false news information in this article, five other algorithms are applied: Data Stream Clustering (DSC), 28 Relational Graph Convolutional Network (RGCN), 29 Relational Graph Transformer (RGT), 30 Graph Attention Network (GAT), 31 and Simplifying Graph Convolution (SGC) algorithm. 32 The detection accuracy (ACC) of these five algorithms and the hybrid neural network algorithm on these eight news datasets are tested in this article. The experimental results are shown in Table 8:
ACC experimental results.
According to Table 8, after implementing strict temporal separation and leakage prevention measures, the accuracy of the DSC algorithm in detecting these eight datasets ranges from 0.6423 to 0.9423; the detection accuracy of the RGCN algorithm ranges from 0.6497 to 0.8954; the detection accuracy of the RGT algorithm ranges from 0.5768 to 0.8451; the detection accuracy of the GAT algorithm ranges from 0.7845 to 0.9562; the detection accuracy of SGC algorithm ranges from 0.6457 to 0.8652; the detection accuracy of the algorithm in this article ranges from 0.9315 to 0.9687. The previously reported accuracy values were obtained without proper temporal separation, which inadvertently allowed information leakage between training and testing sets. With the corrected evaluation protocol, our model maintains superior performance while providing realistic accuracy metrics that align with state-of-the-art results in the field.
From the experimental values, it can be seen that the accuracy of the five algorithms, DSC, RGCN, RGT, GAT, and SGC, is generally low. Although the highest detection accuracy can also reach 0.8–1, they are all extremely unstable, with significant fluctuations in values. In contrast, the HNN algorithm has a high and stable accuracy in news information detection, with the highest and lowest accuracy rates only differing by 0.0026 on these eight news information datasets. This set of experiments fully demonstrates the applicability and stability of the model in this article in detecting false news information. To provide a detailed error analysis, we examined misclassified cases across the datasets.
Further analysis revealed that the majority of misclassifications occur in politically charged news items where satirical content is misinterpreted as factual reporting, and in breaking news scenarios where insufficient contextual information is available. Specifically, 68.3% of errors occurred in political news, 21.7% in breaking news with incomplete information, and 10.0% in news containing sophisticated linguistic manipulation. This error distribution aligns with known challenges in the field, as documented by Wang et al. and Koenders et al..
The confusion matrix in Table 9 shows that the model has a slightly higher false positive rate (7.2%) compared to false negative rate (4.1%), indicating a conservative tendency toward labeling ambiguous content as potentially false, which is preferable in the context of misinformation detection. In the BS Detector dataset, the model failed to correctly identify news items with satirical language, particularly in political contexts where ambiguous phrasing led to false negatives. Within the BuzzFace dataset, instances involving inconsistent multimodal information, such as text contradicting accompanying images, resulted in higher error rates. These limitations highlight the model's reduced effectiveness in processing complex linguistic nuances and external contextual cues. Future enhancements should focus on improving semantic understanding and integrating cross-modal verification mechanisms.
Confusion matrix analysis across eight datasets.
To further validate the absence of data leakage in our evaluation protocol, we conducted a negative control experiment where we trained our model on randomly shuffled labels. Under this condition, the model's accuracy dropped to 0.5123, only marginally above random chance, confirming that our model is learning meaningful patterns rather than exploiting dataset artifacts or leakage. Additionally, we performed a feature importance analysis using SHAP values, which revealed that the model relies on linguistically meaningful features such as semantic inconsistencies, emotional language patterns, and factual claim structures, rather than superficial or dataset-specific artifacts. In order to further verify the robustness of the model, this article designs robustness experiments on the Cora-ML, Pubmed, and Citeseer datasets. In response to the fact that the model in this article is a hybrid neural network, an adversarial attack method called Iterative Gradient Sign Method (IGSM) is used to deceive the neural network model. The basic idea is to generate adversarial samples by applying gradient ascent or gradient descent multiple times on input data, in order to cause the model to produce incorrect classification results. The IGSM calculation is shown in Formula (20):
The input sample is x; the target category is y; the neural network model is f; the loss function is
This article uses three datasets: Cora-ML, Pubmed, and Citeseer as the original dataset, and the information of the dataset is shown in Table 10. For the attacked dataset, this chapter observes its differences from the original dataset through changes in node degree. Robustness experiments are conducted on edge changes of 6%, 12%, and 18%. The degree changes of the nodes in the attacked dataset are shown in Figures 4, 5, and 6, and the detection accuracy of the attacked dataset is shown in Table 11.

Degree changes of Cora-ML, Pubmed, and Citeseer datasets after a 6% change.

Degree changes of Cora-ML, Pubmed, and Citeseer datasets after a 12% change.

Degree changes of Cora-ML, Pubmed, and Citeseer datasets after an 18% change.
Dataset information.
The accuracy of the robust experiment of the model in this article on three datasets.
From Figures 4, 5, and 6, it can be seen that the node degree of these three datasets doesn’t change significantly after being attacked, but the edge change is significant, reaching 6%, confusing the datasets and challenging the robustness of the model. From Table 11, it can be seen that the accuracy of the model in this article changes very little under the condition of edge changes. In the Cora-ML dataset, the detection accuracy of the model decreases by 0.0601, 0.0830, and 0.0980 respectively when the edge changes are 6%, 12%, and 18%; the Pubmed dataset shows a decrease of 0.0510, 0.0849, and 0.1164 in model detection accuracy when the edge changes are 6%, 12%, and 18%, respectively; the Citeseer dataset shows a decrease in model detection accuracy of 0.0594, 0.0736, and 0.1112 when the edge changes were 6%, 12%, and 18%, respectively. From the experimental results, it can be seen that when the data is disturbed, the hybrid neural network model in this article has achieved excellent stability effects on all three datasets. The Cora-ML, Pubmed, and Citeseer datasets, as commonly used datasets in academic fields, are commonly used for literature classification and academic network analysis. Therefore, they have highly dense characteristics, including a large number of features and very high dimensions of data. Even when the amplitude of edge changes reaches 6%, the model can still maintain high detection accuracy and stability, proving that the performance of the model in this article on adversarial samples remains stable, and the model has high robustness.
The purpose of this article is to solve the problem of detecting false information in news dissemination based on intelligent information processing technology and hybrid network algorithm analysis. Through experiments, it has been proven that using intelligent processing and hybrid neural network models to detect news information has achieved significant results in detecting falsehood and authenticity. Compared with existing detection methods, this method is more efficient and accurate. Moreover, after comparison with other algorithm models, it has been determined that the method proposed in this article is more applicable and stable. Although this method considers everything from data preprocessing to model evaluation and result analysis, it is considered comprehensive in design and analysis, but there is still room for improvement. In future research, further optimization in feature engineering can be achieved by adopting more advanced feature extraction techniques and attempting more complex or effective neural network structures, such as BERT (Bidirectional Encoder Representation from Transformers) or transformer-based models, which have achieved significant success in natural language processing tasks. In the stage of result analysis and interpretation, the interpretability of the model can be further improved, making the decision-making process of the model more interpretable. The semi supervised learning approach is adopted and the model is integrated to reduce the variance of the model. These improvement methods help to improve the performance and practicality of news dissemination false information detection algorithms based on intelligent information processing and hybrid networks, making them more suitable for practical application scenarios.
Footnotes
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
