Abstract
Understanding consumer behavior is vital for businesses seeking to personalize services, optimize marketing strategies, and improve customer retention. However, analyzing such behavior at scale presents significant challenges due to the volume, velocity, and variety of data, as well as the need for accurate and interpretable prediction models. Traditional classification methods often fall short when applied to large-scale, high-dimensional behavioral datasets, leading to issues in scalability, accuracy, and real-time processing. To address these limitations, this paper introduces a novel framework for consumer behavior analysis using an Improved Fuzzy Classification with Bagging and MapReduce Coordination (IFCBMC) approach, specifically designed for big data environments. The primary objectives of this research are: (1) to develop a scalable classification model suitable for distributed data processing, (2) to enhance prediction accuracy through fuzzy rule-based learning, and (3) to evaluate the robustness of the proposed model against existing state-of-the-art classifiers. The process begins with data preprocessing, including cleaning and modified normalization, followed by distribution of the data across a MapReduce architecture to manage scale and speed. Extracted features from multiple data partitions (mappers) are aggregated and processed by an enhanced fuzzy rule-based classification model. To improve prediction robustness, a bagging ensemble strategy is applied, where multiple classifiers are trained on different data subsets, and the best-performing models are randomly selected and merged during the reduce phase. The proposed IFCBMC method outperforms all compared models, achieving the highest accuracy of 0.960, significantly surpassing traditional approaches such as LSTM (0.862), LINKNET (0.866), SQUEEZENET (0.858), SVM (0.863), DCNN (0.860), Bi-GRU (0.854), RNN (0.861), and DNN (0.860).
Introduction
Recent developments in big data technologies and the spread of digital platforms have completely changed how companies anticipate and comprehend consumer behavior. Large-scale datasets might potentially yield valuable insights through the use of big data technologies, but putting these technologies to use needs advanced data processing and analytical methods. An individual or group that is not actively involved in commercial or company activities but wants to order or use purchased items, goods, or products mainly for their own social, relatives, or domestic purposes is referred to as a consumer (Anand et al., 2021; Dengler & Prüfer, 2021). Focusing on every consumer's needs and demands separately is crucial because consumers are always significant in an entrepreneur's or a company's activity. This serves as an example of the idea for analyzing consumer behavior (Li et al., 2023; Reyes-Menendez et al., 2022).
An organization can learn how consumers choose a product or service by conducting a consumer behavior study (Simona-Vasilica et al., 2021). They require a combination of both quantitative and qualitative information from surveys of consumers, customer interviews, and data acquired from observation of their activity in-store and online (Hung & Van-Nam, 2023; Shiu-Li & Yi-Hsien, 2022). Due to the lack of personal contact, there is doubt regarding the product's quality when an individual buys it via a digital retailer (Abedin et al., 2023). Unexperienced buyers can cut down on their search expenses and uncertainty about the quality of the products they purchase by utilizing the knowledge from past review data (Ulph et al., 2023). Organizations can also use user-generated feedback to create, build and market new goods as well as to gauge specific customer needs, wants, fulfilment, and concerns (Sundararaj & Rejeesh, 2021).
Organizations may target certain demographics with their advertising efforts, increase customer loyalty, and spot new trends by studying customer behavior. Additionally, by using this information, organizations may stay one step ahead of the competition and adjust to shifting consumer preferences. A wide range of industries, including electronic commerce (Li et al., 2020; Mou & Benyoucef, 2021; Shiu-Li & Yi-Hsien, 2022; Tassell & Aurisicchio, 2023), marketing (Raj Kannan et al., 2021), social networking sites (Mikalef et al., 2023; Sharifi & Shokouhyar, 2021; Sundararaj & Rejeesh, 2021), tourism (Li et al., 2022), business (Tao & Zhou, 2023), financing (Abedin et al., 2023), utilities (Simona-Vasilica et al., 2021), etc., are heavily utilizing the development of consumer behavior analysis.
Decisions are mostly based on the benefits of consumer behavior analysis models, which are essentially one-dimensional (Taghikhah et al., 2021). However, those models deal only with a few limitations in some circumstances, and they assume that individuals remain the same while ignoring variances in customers and goods, values or passions, age, or gender (Xue et al., 2022; Zaremohzzabieh et al., 2021). Consumer behavior studies are important for businesses because they help them understand their target market, identify the needs of their customers, and develop effective marketing strategies that influence their decisions to buy (Bilal et al., 2020; Stangherlin et al., 2023). Previous research frequently concentrates on either conventional statistical techniques or basic big data framework applications, failing to fully utilize sophisticated machine learning models. Furthermore, in the field of customer behavior research, not many studies have thoroughly examined the combination of MapReduce for distributed processing and fuzzy rule-based models for categorization. This study is motivated by the necessity to fill these gaps by putting forward a novel IFCBMC strategy. This project aims to improve the scalability and accuracy of customer behavior prediction models by utilizing the scalability of the MapReduce framework and integrating fuzzy logic with ensemble learning through bagging-based classifiers. In this work, an IFCBMC-based consumer behavior analysis model is introduced, and the major contribution is summarized below: Proposing the modified Normalization in the preprocessing phase. Here, the min-max normalization is immersed into the Tanh normalization process to enhance the input data by removing the redundant data and also ensuring the data are ready for further processing. Proposing a MapReduce framework to handle big data, in which features like improved entropy and correlation-based features are extracted in the mapper phase, which enhances the uncertainty measures in the big data analysis. Particularly, the improved entropy is the combination of Modified Deng entropy and Belief Entropy. Proposing an Improved fuzzy rule-based classification model that includes the data Bagging and Mapping criteria.
Research Objectives
To investigate the preprocessing phase of big data, focusing on data cleaning and modified normalization techniques.
To utilize the MapReduce framework to handle and process large volumes of data efficiently.
To extract and combine features from distributed data sources to enhance classification accuracy.
Research Questions
How effective is the IFCBMC approach in analyzing customer behavior compared to traditional methods?
What are the impacts of data cleaning and modified normalization on the quality of big data used for customer behavior analysis?
How does the MapReduce framework contribute to managing and processing big data in the context of customer behavior analysis?
The paper is formatted as: Section 2 represents the related works and the problem statement based on the existing models. Section 3 gives the proposed work for predicting consumer behavior using an improved fuzzy ruler-based classification model. Next, Section 4 represents the investigational outputs and analysis. Lastly, Section 5 includes the conclusion.
Literature Review
In 2019, Mirashk et al. (2019) has introduced a novel approach that consists of the use of big data tools and techniques based on networked computing-based methodologies. The models used to forecast consumer behavior have mostly been successful with small sets of information and parameters. In this study, RNNs were used to examine transactional data to create a model for forecasting POS user activity. RNNs were capable of forecasting high-dimensional information or time-series information. The created model was validated using actual data from one of Iran's private banks and the projected outcomes were around 87% accurate. The outcome is contrasted with earlier methods, which perform better than later-discussed methods.
In 2020, Raj Kannan et al. (2021) have developed a model for consumer behavior evaluation by using the mouse movement pattern as the basis. This aided in gathering data or mining that aided in forecasting client behavior in the online marketplace. Usually, classification methods are termed Multi-layer NN approaches and one of the efficient algorithms for data mining, named the decision tree algorithm, was used for the behavioral evaluation. These methodologies enabled precise analysis and determination of customer behavior. Several standard data sets were employed for testing and assessment, and the findings demonstrated that the suggested model provided superior analysis than previous research.
In 2020, Argyris et al. (2020) has suggested a new conceptual framework of VCSI through the Similarity-Attraction Model that was contextualized in the Social Influence literature. They identified, using VCSI, in what way Influencers create close relationships with followers by using visual congruence as illustrations of common interests in a certain field. The favorable impacts of visual consistency on followers’ brand engagement were stimulated by this implied affinity. To automatically group each image, DL techniques were then deployed. Social networking analytics were then used to reveal any undiscovered relationships between visual components and brand interaction. These outcomes from tests supported VCSI with evidence, furthering theories in the quickly developing area of multimodal information.
In 2020, Tao and Zhou (2023) has explored the forecast of business closure using digital customer evaluations and validated the models for forecasting in the food service industry, which saw an unusually high rate of attrition. The proposed method for predicting business closure included several new elements, such as the application of time-series analysis and DL methods, the use of a unique triple word embedding model for text representation, and the collection of data from online reviews through hybrid classification methods. The suggested method's examination utilizing Yelp online reviews showed improved performance in predicting business closure, demonstrating that online reviews were a reliable source of information for this purpose.
In 2021, Chaudhary et al. (2021) used the concept of big data technology to process and analyze data to forecast customer behavior on social media platforms. According to a few metrics and requirements, they examined customer behavior on social media platforms. The authors investigated consumer attitudes and perceptions of the social media network. To get high-quality outcomes, they employed a number of data preparation strategies to find mistakes, noise, outliers, and duplicate entries. To forecast consumer behavior on the social media platform, they created computational modelling applying ML.
In 2021, Jeong (2021) has examined customers’ DFs for thermostats that were programmable to recognize and forecast unexpected customer preferences, using a collection of data of 141 million Amazon assessments. This study proposed novel methods for extracting PCDs from review text information, predicting individual customer satisfaction emotions before deciding on a purchase, predicting consumers’ sentiment before they write a review, and classifying consumers’ sentiment toward a particular PCD using dependent on context word incorporation and DL models. In conclusion, the method proposed in this paper was practical, scalable and comprehensible for identifying significant factors influencing customer evaluations of various products in a specific sector and could be applied by the business to construct consumer-focused marketing approaches.
In 2022, Yuan (2022) has categorized consumer purchasing behavior features research into three distinct groups such as data-driven, theory-driven, and experience-driven. The concept of merging customer consumption behavior characteristics like happiness and devotion was offered, along with an analysis algorithm based on consumer consumption behavior. A comparative analysis showed that the data-driven strategy was the most successful in analyzing the characteristics of online customers’ purchasing patterns. Different service plans for clients with various levels of evaluation were accomplished using the choice support information base. The genetic method was used for optimizing samples to increase the sample classification accuracy and enhance the output function. The classification of samples of customer transaction data was proposed using a deep neural network structure method. It was advantageous for businesses to understand the concept of personalization as well as evaluate consumer behavior in terms of consumption and production planning.
In 2022, Li et al. (2022) has investigated the use of an ML technique to utilize social media analysis of big data for predicting interest in tourism. To anticipate the number of tourists arriving, they showed ways to gather the key themes addressed on Twitter and compute the mean emotion score for each issue as an estimate of people's overall feelings about those topics. The analysis identified important social media discussions that could be utilized to forecast Sydney visitor numbers. The analysis produced both theoretical and practical ramifications for marketing destinations and tourism research.
In 2022, Meena and Kumar (2022) have studied the efficiency of OFD organizations and the expectations of customers during the COVID-19 epidemic by using social media information. The findings, customers from India were more involved with societal responsibility than those in the US were with money-related issues. Compared to American customers, Indian consumers were generally happier about OFD companies throughout the COVID-19 pandemic. Additionally, they discovered that variables like the identity of the OFD enterprises, the scale of the market, the nation, and the COVID-19 waves were significant in modulating consumer attitudes. The study's findings provided some managerial insights.
In 2023, Abedin et al. (2023) used a variety of feature conversion approaches to translate behavioral information into different information structures, while analyzing consumer behavior and activity in the banking industry. The research that employed a real bank customer dataset consisting of 24,000 active and inactive customers offered a fresh viewpoint on the role of attribute engineering in the bank classification of clients. This paper presents a thorough, methodical analysis of the modeling of bank client behavior that could help financial service providers take the required steps to increase customer activity.
In 2024, Sitharamulu et al. (2024) have suggested a hybrid classifier model, CSDHAP, that uses the MapReduce framework to hybridize the algorithms for deer hunting optimization (DHO) and sunflower optimization (SFO) with an adaptive pollination rate. Classifiers are used in the CSDHAP, a technique for classifying data. Several criteria are used to compare the effectiveness of the suggested approach to those that are currently in use. It is noteworthy that every traditional model is surpassed by the recommended system.
In 2024, Navin and Krishnan (2024) have provided a novel mapping tool model intended to evaluate electronic health records and offer evidence-based decision support to healthcare providers. The work focuses on indicators taken from normal health exams and analyzes health information from hospital databases. The incorporation of a fuzzy rule-based classifier system into the suggested system is the fundamental component of this strategy. The existing methods of features and challenges are tabulated in Table 1.
Features and Limitations of Reviewed Works.
Features and Limitations of Reviewed Works.
A review of existing works on consumer behavior analytics reveals a growing reliance on machine learning (ML) and deep learning (DL) approaches, such as RNNs, CNN-LSTM, fuzzy logic, optimization algorithms, and hybrid models. These methods have been applied to various domains such as banking, retail, tourism, and social media analytics. However, most suffer from common drawbacks such as limited scalability, poor generalizability across industries, difficulty in handling imbalanced datasets, and lack of interpretability. Some methods effectively use ensemble learning or sentiment analysis, while others rely on single-platform datasets, which restrict broader applicability. Few existing works integrate distributed frameworks like MapReduce with explainable models such as fuzzy rule-based systems. These gaps underscore the need for a scalable, interpretable, and generalizable framework like the proposed IFCBMC model.
Proposed Framework for Consumer Behavior Analytics via Fuzzy Rule-Based Classification
Consumer choices are influenced by a variety of circumstances, which lead to intended, impulsive in nature and unforeseen purchases. The behavior of the consumer is notable by the organizations or other marketing platforms to analyze their participation and facilitation towards the goods. This study uses a fuzzy rule-based classification model with data mapping and bagging techniques in the MapReduce scheme to deliver consumer behavior insights. This architecture illustrating the preprocessing, entropy-based feature extraction, and fuzzy ensemble classification phases within a distributed MapReduce environment. Preprocessing, feature extraction, and the prediction procedure are the three phases of the implementation. Initially, the input big data D involve the behavior of the customers, like Income, Kid home, Teen home, etc., which are preprocessed by a data cleaning process and a modified normalization process. The MapReduce framework is used to handle big data, where the feature extraction process is carried out from the normalized preprocessed data The prediction model is an enhanced fuzzy rule-based classification model that uses a mapping function from the testing dataset and a bagging technique from the training dataset to enhance efficiency. Figure 1 displays the overall prediction model.

Overview of IFCBMC Framework.
Data preprocessing is a significant phase in the process of data mining. It defines the data-preparing processes for examination by cleansing, adapting and integrating it. The purpose of data preprocessing is to improve the data's quality and appropriateness for the specific data mining process. According to this work, the data cleaning and modified normalization process is carried out under the preprocessing phase, which is explained as follows.
Data Cleaning
Initially, the input data D are subjected to data cleaning, which is also known as data cleansing. This includes classifying and modifying errors or variations in the data, such as lost values, outliers, and duplicates. Therefore, data cleaning is to find the easiest way to rectify the quality issues and also convert the data into standardized numeric data which is easier to manipulate. The cleaned input data is then placed through an enhanced normalization step once the data-cleaning phase is complete.
Modified Normalization
Initially, the input data undergo cleaning to remove noise, inconsistencies, and missing values. Following this, a modified normalization process is applied to standardize the data for machine learning tasks. This normalization approach combines min-max normalization with the Tanh estimator to enhance data scaling and stability.
(i)
The min-max normalization (Horng et al., 2009) is termed a linear transformation, which preserves the relationships between the original data values. The data are usually in the range of (0,1) and their calculation is expressed in Equation (1).
Where
(ii)
The outcome from the min-max normalization is applied to the Tanh estimators (Prasetyo et al., 2020), regarded as being a more effective and reliable normalization method. It produces values ranging from −1 to 1. The normalized value
This study handles big data, or customer behavior-related data, using the MapReduce architecture. According to the MapReduce framework (del Rio et al., 2015), the Mapper phase performs feature extraction, which extracts the Improved entropy and Correlation features. Similarly, the features gathered from the several mappers are combined in the Reducer step. The MapReduce framework's extracted features are displayed in Figure 2.

MapReduce Framework for Feature Extraction. This Figure Depicts How Consumer Data are Split Across Mapper Nodes for Entropy and Correlation Feature Extraction and Then Aggregated by Reducers. This Process Enables Scalable and Parallel Processing for High-Dimensional Big Data.
The map and the reduce functions are the two main components of the MapReduce concept. In general, the mapping function analyzes the input data in the initial phase, producing some intermediate outcomes. Accordingly, this work assists the mapper function for feature extraction. The feature extraction process is carried out for the normalized preprocessed data,
The input data are split into a variable n number of mappers in the map function. The map function takes training normalized data as input and outputs a group of intermediate data. In this paper, improved entropy and correlation features are extracted from the map function. This enhancement allows us to capture nuances in feature distributions more effectively, thereby improving the discriminatory power of extracted features. Features exhibiting high correlation coefficients are prioritized for inclusion in subsequent analysis, aiming to enhance model interpretability and predictive accuracy.
Improved Entropy
In information theory, entropy is more precisely the anticipated quantity of shared information (Trovati et al., 2020) across all its possible combinations. In this paper, an improved entropy-based feature is proposed, which is the combination of modified Deng entropy and Belief entropy (Zhao et al., 2019).
Deng entropy can estimate an unidentified degree more effectively than other uncertain measurements. However, Modified Deng entropy is suggested in this work, which offers efficiency because it more thoroughly considers all potential possibilities. In Equation (3), M denotes the mass function,
Additional information in BOE is addressed by the belief entropy, such as the scale of FOD, indicated as
The level of correlation between the two variables can be identified through a correlation analysis by the Pearson correlation coefficient. It is impossible to directly compute the Pearson correlation coefficient among the two-component values of the combined body. First, it is necessary to gather the samples that represent the two index factors, determine the Pearson correlation coefficient, and then apply the Pearson correlation coefficient among the samples of the two index factors to calculate the Pearson correlation coefficient among the two index variables. Using the obtained specimens, the sample correlation coefficient is utilized to calculate the entire correlation coefficient to derive the Pearson correlation coefficient among the total bodies of the two index factors.
There is no interaction between the two variables if the value is 0. As shown in Equation (6), the Pearson correlation coefficient is determined. Where,
The features from the map function are shuffled and reduced in the reduce function, and then ultimately combined to create the final extracted features. The parameters of the classifier details are tabulated in Table 2. Hence, the total feature extraction set is represented as
Parameter of Classifiers.
Parameter of Classifiers.
In the Prediction phase, the extracted features F are subjected to the Improved fuzzy rule-based classification model (Trawinski et al., 2011) that adopts the Improved bagging and mapping criteria, which enhances the prediction process by considering the behavior of consumers. It specifies the fuzzy rule-based classification algorithm used in the IFCBMC. To improve robustness and generalization, we employ a bagging approach where the dataset is randomly sampled with replacement into multiple bags. Each bag serves as a training set for a distinct fuzzy rule-based classifier, allowing us to capture diverse patterns in customer behavior. Bagged classifiers are integrated using a voting mechanism in the ensemble phase, enhancing predictive accuracy and reducing overfitting. Mappers extract features from distributed datasets and pass them to reducers, where features are combined and aggregated. Fuzzy logic is particularly well-suited for modeling human behavior, as it allows for approximate reasoning and handles ambiguity effectively—ideal for interpreting variables like purchase intent, income brackets, and household dynamics. The MapReduce framework is employed to ensure scalability and efficiency in processing massive datasets by parallelizing feature extraction and aggregation across distributed computing nodes. Within this, the inclusion of Improved Entropy (combining Modified Deng Entropy and Belief Entropy) and correlation-based features provides robust uncertainty modeling and relevance scoring for high-impact features. To further enhance generalization and reduce overfitting, the Bagging ensemble approach is applied, allowing multiple classifiers trained on different subsets to improve prediction stability and accuracy. The overall prediction procedure is explained as follows. Initially, the preprocessed, normalized dataset is separated into two groups: the testing dataset and the training dataset. This is done by dividing the normalized dataset into two categories: 50% of the normalized dataset is used for training or 50% for testing. The Mapper Reducer system is used to extract features from the training dataset. Improved entropy and correlation are extracted as part of the feature extraction process in this mapper step. The features from different mappers are combined in the Reducer step. to obtain the extracted features’ final result. After that, an ensemble of fuzzy rule-based classifiers is created using bagging. Here, distinct training sets for each basic classifier are created by randomly selecting portions of data with replacements. The process of bagging involves averaging numerous classifiers, each of which is adjusted to random samples that correspond to the sample distribution of the training set, in order to minimize the classification variance. The final model is significantly altered by even slight modifications to the training set; bagging works best with these unstable classifiers. Additionally, given the limited number of samples in the dataset, it is recommended. Additionally, bagging enables learners in the ensemble to train independently and in simultaneously. Simultaneously, the 50% testing dataset from the normalized data is taken into the mapping function. Here, the testing samples are split into multiple related mappers in the map function. Then each mapper's data is taken as input into the final feature set of classifiers. In the fuzzy rule-based classification model, both the mapping and bagging approaches from the testing and training datasets are combined to get the final set of classifiers. Here, 72 built-in rules are involved in the fuzzy classifiers. Then the classifier data are shuffled to get the final output in the Reducer phase. The original classification set of scores will be obtained. For each score improved weight factor Thus, the prediction outcome of the customer behavior is generated based on seven label categories. Examples of these are extremely low, low, medium-low, medium, medium-high, high, and very high. Figure 3 shows the overall prediction process of the IFCBMC model.

Prediction Structure of the Improved Fuzzy Classification With Bagging and MapReduce Coordination (IFCBMC) Model. It Shows How Extracted Features are Used to Train Multiple Fuzzy Rule-Based Classifiers With Bagging and How Predictions are Aggregated, Enhancing Accuracy and Robustness.
Experimental Setup
The proposed consumer behavior analytics was simulated using PYTHON. The Python version utilized for this implementation was “Python 3.7” and the processor utilized was “Intel(R) Core(TM) i5-4210U CPU @ 1.70 GHz 1.70 GHz.” Moreover, the analysis of consumer behavior classification was conducted using the Consumer Buying Behavior Analysis dataset (Pratap, 2022).
Dataset Description
This dataset incorporates three distinct categories of attributes: People, Products, Promotion, and Place. Within the People category, you'll find attributes like ID, Year of Birth, Education, Income, and more. The Products category encompasses attributes such as MntWines, MntFruits, MntGoldProds, and so forth. The Promotion category includes features like NumDealsPurchases, AcceptedCmp1, AcceptedCmp2, Response, and others. Finally, the Place category comprises attributes like NumWebPurchases, NumCatalogPurchases, NumStorePurchases, and so on.
Performance Analysis
Both the standard approaches and the IFCBMC approach underwent the categorization examination. Furthermore, the IFCBMC method was compared to state-of-the-art approaches such as RNN [24] and DNN [28], as well as traditional methods including LINKNET, SQUEEZENET, LSTM, SVM, CNN, DCNN, and Bi-GRU.
Comparative Study on Positive Metrics
Figure 4 illustrates the comparative evaluation of positive metrics between the IFCBMC method and LINKNET, SQUEEZENET, LSTM, SVM, CNN, DCNN, Bi-GRU, RNN [28], and DNN [28] in the context of consumer behavior analytics. To ensure the precise classification of consumer behavior, the model must achieve higher accuracy scores. Particularly, for the training rate 70, the IFCBMC scored an accuracy rate of 92.492, even though the traditional schemes attained minimal accuracy values, notably, LINKNET = 79.1, SQUEEZENET = 81.12, LSTM = 80.174, SVM = 81.236, CNN = 82.465, DCNN = 85.357, Bi-GRU = 83.591, RNN [24] = 84.753, and DNN [28] = 86.846, correspondingly. In particular, the precision acquired by the IFCBMC approach is much higher than LINKNET, SQUEEZENET, LSTM, SVM, CNN, DCNN, Bi-GRU, RNN [24], and DNN [28] in the entire training rates.

Performance Comparison Based on Positive Evaluation Metrics (Accuracy, Precision, Sensitivity). This Bar Graph Compares the IFCBMC Model to Several Existing Classifiers, Demonstrating its Superior Performance Across Key Metrics Critical for Successful Customer Behavior Classification.
The analysis of sensitivity and specificity evaluations for IFCBMC and existing models is presented in Figure 4c and d. Further, the sensitivity of the IFCBMC methodology is 88.932 (training rate = 90), even though the LINKNET is 62.63, SQUEEZENET is 65.124, LSTM is 62.541, SVM is 61.836, CNN is 60.514, DCNN is 64.947, Bi-GRU is 58.792, RNN [24] is 59.488, and DNN [28] is 63.814, correspondingly. In summary, the remarkable performance of the IFCBMC approach provides strong assurance of its capability to effectively classify consumer behavior. This capability is achieved by integrating modified normalization, enhanced entropy, improved fuzzy rule-based classification, and final classification accomplished via improved bagging and mapping.
The assessment on IFCBMC is contradicted by LINKNET, SQUEEZENET, LSTM, SVM, CNN, DCNN, Bi-GRU, RNN [24], and DNN [28] with regard to the negative metric for consumer behavior analysis, as shown in Figure 5. At a training rate of 90, the FNR achieved by the IFCBMC approach is notably lower at 14.382 compared to LINKNET, SQUEEZENET, LSTM, SVM, CNN, DCNN, Bi-GRU, RNN [24], and DNN [28]. Moreover, the FPR achieved with the IFCBMC method is significantly lower across all training rates. The least FPR rate attained using the IFCBMC is 3.218 at the training rate of 80, whereas the LINKNET (7.653), SQUEEZENET (7.854), LSTM (8.137), SVM (7.842), CNN (8.288), DCNN (8.198), Bi-GRU (8.934), RNN [24] (7.761), and DNN [28] (8.478) recorded greater FPR values. The implementation of the IFCBMC technique has been credited with the evaluation's remarkable achievement in consumer behavior analysis. This improvement can be ascribed to the use of enhanced bagging and mapping algorithms, changed normalization, enhanced entropy, and enhanced fuzzy rule-based categorization.

Performance Comparison Based on Negative Evaluation Metrics (False Positive Rate and False Negative Rate). Illustrates How the Proposed Model Significantly Reduces Error Rates Compared to Conventional Methods, Indicating More Reliable Classification.
The other metric analysis on IFCBMC is contradicted by LINKNET, SQUEEZENET, LSTM, SVM, CNN, DCNN, Bi-GRU, RNN [24], and DNN [28] for consumer behavior analysis is exposed in Figure 6. Additionally, the NPV of the IFCBMC approach is 92.485 (training rate = 90), whilst the LINKNET is 81.325, SQUEEZENET is 82.685, LSTM is 86.139, SVM is 85.215, CNN is 84.382, DCNN is 87.956, Bi-GRU is 83.763, RNN [24] is 81.437, and DNN [28] is 82.519, correspondingly. Moreover, the highest MCC attained through the IFCBMC approach is 87.902, surpassing LINKNET, SQUEEZENET, LSTM, SVM, CNN, DCNN, Bi-GRU, RNN [24], and DNN [28]. Therefore, by demonstrating better values in other measures, the IFCBMC methodology outperformed previously used methodologies, demonstrating its incredible potential for correctly categorizing customer behavior.

Evaluation of Other Classification Metrics (F1-Score, NPV, Specificity). This Figure Presents the Model's Balanced Performance Across Additional Key Metrics, Reinforcing Its General Effectiveness in Consumer Behavior Analysis.
The ablation study on IFCBMC, a model with conventional entropy, a model with standard normalization, and a model without feature extraction for consumer behavior analysis are explained in Table 3. This process helps us better understand the distinct contributions these components make to the effectiveness of the IFCBMC system. Additionally, the traditional normalization model has a specificity of 0.870, the conventional entropy model has a specificity of 0.917, the IFCBMC technique has a specificity of 0.960, and the model without feature extraction has a specificity of 0.836. Furthermore, the FPR of the IFCBMC technique is 0.040, the traditional Normalization model is 0.130, the model without feature extraction is 0.164, and the conventional Entropy model is 0.083.
Ablation Assessment on IFCBMC, Model Without Feature Extraction, Model With Conventional Normalization, and Model With Conventional Entropy.
Ablation Assessment on IFCBMC, Model Without Feature Extraction, Model With Conventional Normalization, and Model With Conventional Entropy.
Table 4 describes the statistical assessment of IFCBMC is contradicted by LINKNET, SQUEEZENET, LSTM, SVM, CNN, DCNN, Bi-GRU, RNN [24], and DNN [28] for consumer behavior analysis. To ensure that highly precise calculations are produced, each approach is evaluated in detail. In terms of the greatest statistical metric, the IFCBMC technique achieves the highest accuracy with an astounding accuracy of 0.960. In contrast, the traditional approaches showed lower accuracy rates, specifically, LSTM with 0.862, LINKNET with 0.866, SQUEEZENET with 0.858, SVM with 0.863, DCNN with 0.860, Bi-GRU with 0.854, RNN [24] with 0.861, and DNN [28] with 0.860, respectively. Furthermore, the IFCBMC method achieved an accuracy rate of 0.938 when evaluated under the median statistical metric, while LSTM, SVM, CNN, DCNN, Bi-GRU, RNN [24], and DNN [28] all produced lower accuracy scores. This superior performance is attributed to the integration of fuzzy rule-based classification, entropy-based feature extraction, and bagging ensemble techniques within a MapReduce framework which together effectively handle data uncertainty, distribution, and high dimensionality.
Statistical Evaluation of Accuracy, Precision, Recall, and F-Measure.
Statistical Evaluation of Accuracy, Precision, Recall, and F-Measure.
The K-fold analysis is compared with conventional approaches like LINKNET, SQUEEZENET, LSTM, SVM, CNN, DCNN, Bi-GRU, RNN [24], and DNN [28] for consumer behavior analytics as summarized in Table 5. A common approach to evaluate the robustness and efficacy of prediction models in machine learning and statistics is referred as K-fold cross-validation. Mainly, the IFCBMC scheme achieved an NPV of 0.967, significantly surpassing the much lower NPV values obtained by LSTM (0.891), SVM (0.892), CNN (0.893), Bi-GRU (0.887), RNN [24] (0.379), and DNN [28] (0.389). The high NPV and consistent K-fold performance confirm that the IFCBMC model is not only accurate but also highly dependable for real-world consumer behavior analysis.
Assessment on K-fold.
Assessment on K-fold.
Table 6 represents the sensitivity analysis of various metrics. The percentage of accurately identified occurrences among all events is known as accuracy. In this case, Conventional Normalization (0.812064) yields the highest accuracy, indicating it effectively improves the overall predictive performance compared to other normalization methods. Different normalization techniques affect how the data are prepared for analysis, influencing the performance of classification models. Conventional Normalization consistently shows strong performance across multiple metrics, suggesting it effectively prepares the data for accurate classification. Conventional Normalization, which often refers to standard min-max scaling or similar methods, shows robust performance across various metrics, making it a reliable choice for many applications in consumer behavior analytics.
Sensitivity Analysis of Various Metrics.
Sensitivity Analysis of Various Metrics.
Figure 7 exposes the ROC curve evaluation on IFCBMC compared with LSTM, SVM, CNN, DCNN, Bi-GRU, RNN [24], and DNN [28] for consumer behavior analysis. The ROC curve is produced by graphing the true positive rate against the false positive rate, particularly at the 70% training rate. Likewise, with the false positive rate set at 1.0, the IFCBMC approach achieved a true positive rate of 0.974, outperforming the lower true positive rates achieved by LSTM, SVM, CNN, DCNN, Bi-GRU, RNN [24], and DNN [28]. The trapezoidal rule, which numerically integrates the ROC curve points, was used to determine the Area under the Curve (AUC). Higher values denote stronger discrimination performance of the classifier over all feasible thresholds. The range of AUC values is 0 to 1. Better performance of the classifier in differentiating between classes across all thresholds is indicated by AUC values nearer 1. Thus, the positive findings of the ROC curve evaluation demonstrate the ability of the IFCBMC approach to correctly categorize consumer behavior through the integration of improved entropy, fuzzy rule-based classification, modified normalization, and, finally, improved bagging and mapping rules that develop the performance of classification.

Receiver Operating Characteristic (ROC) Curve Comparison. The ROC Curves Demonstrate the True Positive Rate Versus False Positive Rate for the IFCBMC and Benchmark Models. The Area Under the Curve (AUC) Indicates the Proposed Model's Strong Discriminative Ability.
Table 7 represents the analysis of the proposed consumer behavior analytics model using different entropy-based feature extraction methods, Shannon entropy, Belief entropy, Deng entropy, and Improved Deng entropy, highlighting the superior performance of the Improved Deng entropy approach across nearly all evaluation metrics. Its maximum accuracy (0.9247), specificity (0.9562), precision (0.7372), F-measure (0.7367), and Matthews Correlation Coefficient (MCC) of 0.6928 are significant; these results demonstrate its strong classification capacity, particularly in accurately detecting both positive and negative occurrences. While sensitivity is slightly lower than with belief entropy, the trade-off is justified by the overall boost in precision and model balance. These results demonstrate that incorporating Improved Deng entropy significantly enhances the discriminative power of features, resulting in more accurate and reliable consumer behavior predictions.
Proposed Model Using Different Entropy-Based Feature Extraction Techniques.
Proposed Model Using Different Entropy-Based Feature Extraction Techniques.
The efficacy of the suggested consumer behavior categorization model using three different kinds of fuzzy membership function generators, triangular, Gaussian, and trapezoidal, is examined in Table 8. The triangle membership function exhibits the highest accuracy (0.9309), sensitivity (0.7578), specificity (0.9598), and precision (0.7589) among the three, and consistently produces the most effective results across all performance parameters. Additionally, it has the greatest Matthews Correlation Coefficient (MCC) of 0.7181 and F-measure (0.7584), demonstrating a good balance between true positive and true negative predictions. It is also the most dependable choice for fuzzy rule-based categorization in this situation, with the lowest false positive rate (0.0402) and false negative rate (0.2422). The Gaussian function performs slightly better than trapezoidal, but both are outperformed by the triangular function, underscoring its effectiveness in capturing subtle variations in consumer behavior data.
Proposed Model Using Different Fuzzy Membership Function Generators.
Proposed Model Using Different Fuzzy Membership Function Generators.
According to MCC, it exhibits exceptional sensitivity, specificity, precision, and overall balanced performance. Accuracy values for these models (LINKNET, SQUEEZENET, LSTM, SVM, CNN, DCNN, BiGRU, RNN, and DNN) range from about 80.7% to 86.5%. This suggests that using the information retrieved during the preprocessing and feature extraction stages, they do a respectable job of forecasting consumer behavior. Likewise, there is an increase in the F-measure and recall reminder, accordingly. The testing results provide more measurement for the portions mentioned above. The results of the trial show that a very high degree of accurate mobile commercial behavior is achieved by the recommended device architecture. The suggested model's high accuracy and well-balanced metrics make it ideal for real-world uses where accuracy and dependability are essential, like customer behavior research in marketing.
The proposed IFCBMC framework provides valuable implications for managers by enabling more accurate and scalable analysis of consumer behavior in big data environments. Its application supports improved customer segmentation, allowing businesses to tailor marketing strategies and personalize services more effectively. The model's robust classification capabilities also enhance fraud detection
Conclusion
This study suggested a way to assess consumer behavior using fuzzy rule-based categorization with data mapping and bagging methodologies in a map reduction scheme. At first, the input big data of the consumer behavior were preprocessed via data cleaning, and a modified normalization process was carried out. After that, training and testing datasets were created from the normalized data. Using the MapReduce framework, the training dataset's features such as Correlation and Improved entropy, were extracted from the normalized dataset. Next, an enhanced fuzzy rule-based classification model was used to use the extracted features as input into the various bagging techniques. Similarly, the mapper function received the testing normalized dataset as input. The final classification set is obtained by combining the testing dataset of the mapping function and the training dataset of the bagging procedure. Moreover, these classifiers were shuffled to form the prediction outcome by using the Reducer phase. In terms of the maximum statistical metric, the IFCBMC approach achieved the highest accuracy of 0.960, while traditional approaches, such as LSTM with 0.862, SVM with 0.863, DCNN with 0.860, Bi-GRU with 0.854, RNN [24] with 0.861, and DNN [28], had lower accuracy rates. The use of MapReduce to process huge amounts of data improves scalability but may complicate deployment and maintenance, particularly in terms of cluster management and resource allocation. Fuzzy logic allows for greater flexibility in dealing with uncertainties, yet fuzzy rule-based models may be difficult to grasp.
Develop methodologies for real-time or near-real-time customer behavior analytics, extending beyond the batch processing capabilities of traditional MapReduce frameworks. This could involve the adoption of stream processing architectures and adaptive learning algorithms. The real-time applications are considered in the future work.
Footnotes
List of abbreviations
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
