Abstract
In this study, we present an indoor localization system leveraging Zigbee-based hardware for real-time positioning. We conduct in-house measurements in a controlled environment to generate a robust dataset for the system. A six-step methodology is employed, encompassing received signal strength indicator (RSSI)-to-image transformation, feature extraction using a pre-trained convolutional neural network (CNN) model, dimensionality reduction, clustering, and classification. By converting RSSI values into images, we capture spatial patterns more effectively. A multi-power level fusion strategy combines RSSI data from three distinct power levels into RGB images, improving localization accuracy and robustness against interference. K-means clustering segments the environment into distinct zones, streamlining classification. Experiments demonstrate the significant contributions of power level fusion, clustering, and classification, with the multi-power level fusion yielding improved performance compared to single power levels. Our approach, based on a custom CNN architecture, offers enhanced precision and efficiency for indoor localization tasks, providing valuable insights into the integration of advanced signal processing and deep learning.
Keywords
Introduction
In recent years, there has been a growing demand for indoor positioning solutions. This is due to the proliferation of applications that require the ability to track assets, people, and goods indoors. Examples of such applications include asset tracking in warehouses and factories, navigation of visitors in museums and galleries, real-time tracking of operating theater usage in healthcare facilities (Taşkın et al., 2024), monitoring the flow of passengers in airports and stations, adapting promotional ads to the location of customers in supermarkets, and so on (cf. Figure 1). However, global positioning system (GPS), the most common outdoor positioning technology, is not reliable indoors primarily due to non-line-of-sight issues (Wahab et al., 2022). This is because GPS signals can be blocked or attenuated by walls, ceilings, and other obstacles. In response to this problem, scientists have devised diverse methods for indoor positioning (Farahsari et al., 2022), one of which is known as fingerprinting. This approach entails constructing a database containing received signal strength indicator (RSSI) measurements obtained from predefined locations. When a mobile device requires localization, its RSSI readings are matched against the database to derive its probable position (cf. Figure 2; Grira et al., 2023).

Indoor positioning systems applications.

The two-phased RSSI fingerprinting IPS. RSSI: received signal strength indicator; IPS: indoor positioning system.
Our research is concentrated on the examination of the RSSI fingerprinting method utilizing Zigbee technology (Fahama et al., 2023). Zigbee operates on the IEEE 802.15.4 standard. It is a low-power, wireless communication technology that is well-suited for indoor positioning applications. Zigbee devices are typically small, inexpensive, and easy to deploy. They can also operate on a single battery for several years, which make them good candidate for indoor positioning systems (IPSs). The primary objective of this study is to create a highly detailed depiction of signal strength patterns within the specific indoor area under consideration. This aims to accurately capture the complex characteristics unique to each location. However, relying solely on discrete RSSI values collected at specific indoor positions is insufficient to achieve this goal. To address this issue, we propose a new approach that consists in converting the RSSI values into images. This RSSI-to-image transformation offers several advantages. RSSI values are numeric and can be difficult to interpret directly. Converting them into images provides a visual representation of the signal strength patterns across the indoor space and can capture spatial patterns and relationships between different signal strength readings. Areas with strong signal strength are represented with bright colors, while areas with weak signals are represented with dark colors. Moreover, images are well-suited for processing with convolutional neural networks (CNNs), a deep learning (DL) architecture that excels at capturing local patterns. By converting RSSI values into images, it is possible to apply powerful CNN models and to take advantage of several machine learning (ML) libraries that are optimized for image-related tasks like feature extraction, classification, model training and evaluation. Another significant advantage of utilizing the RSSI-to-image transformation lies in its ability to employ “Interpolation.” This technique allows for the estimation of signal strengths at positions not directly measured, resulting in a smoother and more continuous representation of signal strength distribution. Additionally, the transformation enables “Upsampling,” which increases the image resolution by adding new pixels between existing ones. This process enhances image detail, especially when the converted RSSI values produce low-resolution images due to sparsity in the original data.
Given these advantages, we have chosen to convert RSSI measurements into images to leverage advances in DL, particularly CNNs, for an effective IPS. The methodology outlined in this paper integrates the conversion of RSSI to images with feature extraction using pre-trained models. This approach simplifies the subsequent K-means clustering stage, where similar-patterned images are grouped together, enabling characterization of areas with similar radio properties. Moreover, our approach innovatively examines the impact of the power level on the positioning performance. It incorporates a fusion technique involving three power levels, combining them into an RGB image designed for indoor positioning. This pioneering fusion methodology contributes to improved accuracy, effectively capturing discrete clustering areas within the indoor scene. The outcome is a richer representation of the radio environment, thereby enhancing the precision of the positioning system. In addition, the utilization of measurements from multiple power levels significantly improves localization accuracy by providing a more robust fingerprint sensitive to various obstacles and interference scenarios.
The key contributions of this study are summarized as follows: We carry out in-house measurements in a controlled indoor environment using Zigbee-based hardware. This includes the XBee-PRO S2C Zigbee RF module (Digi International Inc., 2022) with a receiver sensitivity of We propose a novel RSSI-to-image transformation methodology, enabling effective use of DL techniques for indoor positioning. Our approach incorporates a multi-power level fusion technique, combining RSSI measurements into RGB images, resulting in enhanced localization accuracy and robustness against interference. We leverage K-means clustering to partition the indoor environment into distinct zones, which simplifies the classification process and improves positioning performance. We evaluate the impact of key components (power levels, clustering, and classification) on the precision of the IPS, providing insights into their respective contributions.
The remainder of this article is organized as follows. In Section 2, we survey existing approaches for IPS with a main focus on works at the intersection of ML, RSSI-to-image transformation, and CNN applications for indoor positioning. Following this, Section 3 details our comprehensive six-step adopted methodology. The subsequent Section 4 showcases the impact of power levels, K-means clustering, and the classification process on the precision of indoor positioning. The article concludes by summarizing our novel contributions, highlighting the effectiveness of integrating RSSI-to-image transformation with power level exploration for improved accuracy in indoor positioning.
IPSs have witnessed substantial advancements, surpassing traditional geometric approximation techniques. Numerous studies have explored the integration of RSSI fingerprinting with ML, presenting refined alternatives for location estimation (Maduranga et al., 2023). In their hybrid fingerprinting approach, Campuzano et al. (2015) demonstrated the feasibility of combining zone-based classification and coordinate-based regression using WiFi signals, achieving 78.45% room-level accuracy with random forests and 1.83 m mean positioning error with neural networks in their experimental setup. While innovative in its dual-granularity output, their method requires separate models for zone and coordinate estimation, manual feature extraction, and extensive calibration efforts. Our approach differs fundamentally by learning spatial signal structures directly from multi-power RSSI representations using deep CNNs, streamlining the localization into a single, data-efficient model. In a work by Bhatti et al. (2020), an IPS utilizing the K-nearest neighbors (KNNs) algorithm achieved an accuracy of 96.89%. However, concerns arose regarding its susceptibility to high error rates in specific scenarios. Another strategy, proposed by Umair et al. (2014), involved determining the
Despite the efficacy of ML and DL models in IPS, their implementation can be difficult and time-consuming, particularly during the offline phase when collecting RSSI data. In a prior study Grira et al. (2023), we addressed this challenge by using generative adversarial networks (GANs) to expand collected RSSI data, which helped avoid the lengthy offline phase, even with limited datasets. We also transformed RSSI measurements into images, enhancing positioning accuracy by capturing comprehensive information about radio features, thereby better exploiting spatial correlations.
In contrast to the approach described by Liu et al. (2021), which generates
Additionally, our method introduces a multi-power level fusion technique, combining three power levels in an RGB image for indoor positioning. This innovative fusion technique enhances accuracy by capturing distinct clustering areas in the indoor scene, providing a clearer picture of the radio landscape. The transformation of digital RSSI data into images also enables the application of CNN, known for its effectiveness in image processing.
Alitaleshi et al. (2023) introduced a method leveraging a CNN framework combined with an extreme learning machine autoencoder (ELM-AE). Their strategy involves data augmentation to address shortages and improve the accuracy of positioning, showcasing notable performance on Tampere and UJIIndoorLoc datasets. Cheng et al. (2022) presented a position estimation method utilizing the density-based spatial clustering of applications with noise (DBSCAN) algorithm. This approach selects appropriate reference points (RPs) based on the situation, contributing to improved stability and reduced calculation errors. Both Alitaleshi et al. and Cheng et al. focused on enhancing accuracy through innovative techniques, showcasing advancements in indoor positioning methodologies. Additionally, Hassen Fekih and Mezghani (2022) proposed an IPS based on WiFi and CNN, emphasizing WiFi RSSI fingerprinting. Their model employs CNN for learning the structure of unstable WiFi RSSI data, addressing RP ambiguity using the distance-based label distribution (DBLD) method for RP encoding. Furthermore, they incorporate random forest regression as an error correction model (ECM) to refine predicted positions.
However, our work stands out by integrating RSSI-to-image transformation and fusing multi-power levels in an RGB image for indoor positioning. Our unique approach in generating images distinguishes us from prior methods, offering a robust and accurate solution that surpasses the limitations of existing models.
Compared to existing works such as Liu et al. (2021) and Feng et al. (2022), our approach demonstrates significant advantages through a combination of RSSI-to-image transformation and multi-power level fusion. Liu et al. (2021) employ a clustering-based noise elimination scheme for RSSI data preprocessing, which improves classifier performance but is primarily aimed at noise reduction. Our method, by transforming RSSI data into images and using DL for classification, allows for more effective extraction of spatial patterns, achieving a 100% zone classification accuracy with three power levels, as shown in our results in Section 4. In comparison, Liu et al.’s approach achieves an average classification accuracy of 93.5%, with errors in zone identification still prevalent. Feng et al. (2022) employ hierarchical clustering and image retrieval techniques for indoor visual localization, but their approach requires larger datasets and focuses on image-based retrieval rather than classification of RSSI data. Our method, using only 18 testing images, achieves comparable zone-level accuracy, while their method requires considerably more data for similar performance. Moreover, our emphasis on multi-power level fusion reduces sensitivity to signal interference, achieving higher robustness against environmental variations. While Feng et al.’s method shows an improvement of 5% in accuracy over baseline approaches, our method’s ability to achieve zone-level classification with a smaller dataset and higher power level fusion offers a more efficient and effective solution for indoor localization in constrained environments.
Methodology
Figure 3 presents a concise overview of the six-step methodology applied in this study. It spans from initial RSSI measurements, through RSSI-to-image transformation, pre-trained model-based feature extraction, dimensionality reduction via principal component analysis (PCA), K-means clustering for indoor environment characterization, to final mobile location identification using CNN classification. The subsequent sections provide detailed insights into each of these crucial steps.

Adopted methodology.
During the offline phase of the RSSI fingerprinting approach, anchor nodes (ANs) or beacons, are strategically positioned throughout the indoor environment, as illustrated in Figure 4. RPs denoted as M1, G2, and so on, represent distinct physical locations within the indoor environment where signal strength measurements are collected between each RP and the deployed ANs.

Fingerprinting offline phase: The scene analysis.
For each RP, a set of A = 6 RSSI values is generated and stored in a radio mapping database, along with the corresponding coordinates of the RP. This set of values acts as the distinctive fingerprint for that specific location. To capture the radio characteristics of each position more accurately, multiple measurements are taken. In our case,
Converting RSSI values into images for indoor positioning presents numerous benefits. RSSI values, being numerical, can be challenging to interpret directly. Transforming them into images provides a visual depiction of signal strength patterns across the indoor environment. This approach can effectively capture spatial relationships and patterns among various signal strength readings. The RSSI-to-image transformation, along with the necessary measurements campaigns necessary to this process are described in Algorithm 1. The transformation from RSSI values to images relies on a suitable numerical representation for the signal strengths. By scaling the RSSI values to fall within the range of 0–255, we ensure that they align with the pixel intensity values commonly used in grayscale images. This adjustment enables a seamless transition from RSSI data to the visual representation of an image, facilitating the subsequent image processing and analysis steps in the indoor positioning process. The algorithm iterates through each RP in the indoor scene and, for each RP, conducts N = 3 measurement series. Within each series, matrices matP0, matP2, and matP4 are initialized and populated by RSSI values obtained from the RPs to A = 6 ANs at different power levels. These matrices are normalized, scaled, and resized to form individual images, where each image corresponds to a set of A
To create RGB images, the grayscale images corresponding to P0, P2, and P4 are stacked as the blue, green, and red channels, respectively, forming a composite image. As illustrated in Figure 3, an example for RP A1 in the indoor environment shows the transformation process from RSSI values to a grayscale image and subsequently into an RGB image. This RGB representation encodes signal variations across power levels, offering a richer spatial depiction. The resulting RGB image is further normalized, scaled, and resized to ensure compatibility with subsequent processing steps. This process is repeated for all N measurement series and each RP, generating a comprehensive dataset of grayscale and RGB images for training and evaluation.
Feature extraction using a pre-trained model
Using a pre-trained model for feature extraction before clustering allows to work with more meaningful and lower dimensional representations of the data, potentially leading to more effective clustering results. It is a common practice, especially in tasks like image analysis, where raw pixel values are often too high-dimensional and noisy for clustering algorithms to work effectively. Performing clustering directly on images can be computationally expensive (e.g. a
The activations from the final convolutional layers of VGG19 provide robust, abstract features, especially from deeper layers of the model. Such features offer meaningful representations of an image’s content and are more resistant to variations in environmental conditions (e.g. temperature changes, obstacles, or other nuisances) than hand-crafted features. Additionally, DL models are optimized for fast inference, making feature extraction a relatively efficient process.
We evaluate multiple pre-trained models, including VGG16, DenseNet121, ResNet50, and VGG19, to assess their impact on the extracted features. The goal is to determine the model that best preserves spatial information for effective clustering and localization. Based on our comparative analysis, VGG19 demonstrates the most consistent and well-structured feature representations, making it the best choice for our approach and its features are more compatible with the subsequent PCA and clustering phases. The impact of these models on clustering performance is further investigated across three power levels, with results presented in Section 4.2.
Importantly, this feature extraction step does not involve any training or fine-tuning of the pre-trained model; it solely leverages the model’s pre-trained weights to generate feature maps. These feature maps are then further refined through dimensionality reduction using PCA, resulting in a compact representation that serves as input to the clustering algorithms.
PCA for dimensionality reduction
The feature extraction phase is followed by the PCA step. PCA is a dimensionality reduction technique typically applied after feature extraction to eliminate redundancy and irrelevant information. Its goal is to decrease the number of features while preserving as much of the original information (variance) as possible. It accomplishes this by constructing a covariance matrix that defines a fresh set of axes known as principal components. These components are mutually orthogonal (uncorrelated) and capture the highest variance in the data. They form a novel coordinate system onto which the data is projected. Selecting a subset of these principal components effectively trims down the data dimensionality. Determining the optimal number of components to retain is a hyperparameter that should be fine-tuned to strike the right balance between dimensionality reduction and information retention. PCA offers advantages such as expediting training times, mitigating the risk of overfitting, and enhancing the model’s interpretability. The reduced-dimensional feature vectors are then passed to the K-means clustering algorithm, where they are used to characterize the indoor environment by grouping similar signal patterns.
Characterising the indoor environment: K-means clustering
K-means clustering provides an efficient and effective way to organize and categorize images resulting from RSSI-to-image transformation and feature extraction. By grouping similar patterns together, it enables accurate and real-time indoor positioning. This approach leverages the power of clustering to handle the complexities of indoor environments and varying signal strength patterns.
Locating via CNN classification
Hierarchical feature extraction: The model begins with convolutional blocks that progressively expand in depth. Initial layers focus on capturing low-level spatial variations, while deeper layers extract higher-order spatial representations crucial for distinguishing different indoor locations. The transition from shallow to deeper feature maps allows the model to learn hierarchical structures within the RSSI-to-image representation. Efficient feature aggregation: In contrast to standard CNNs that stack uniform convolutional layers, we incorporate triple convolutional layers Dimensionality reduction with max pooling: To balance computational complexity and feature retention, max pooling layers are introduced after each convolutional stage. This step gradually reduces spatial dimensions, eliminating redundant information while maintaining the most discriminative features. The pooling operation contributes to robustness against noise and environmental variations, which are common challenges in indoor positioning. Bottleneck layer for compact representations: A Final decision layer: The final stage consists of a flattening layer followed by a fully connected layer
Unlike traditional CNN architectures such as VGG or ResNet, which are primarily designed for generic image classification, our model is specifically optimized for indoor localization. By incorporating multi-stage convolutional blocks, the architecture effectively captures both fine-grained and high-level spatial patterns in RSSI-transformed images, ensuring robust feature extraction. The progressive depth expansion balances feature complexity with computational efficiency, allowing the model to refine spatial representations without excessive overhead. Additionally, the inclusion of a bottleneck layer reduces redundancy in extracted features, improving generalization and preventing overfitting. The overall design ensures high accuracy in localization tasks while maintaining computational feasibility, making it suitable for real-time deployment in indoor positioning systems.
While VGG19 is utilized to extract meaningful features from the RSSI-to-image representations, the custom CNN is specifically designed for the classification task. Table 1 outlines the architecture of the classification model, describing the input and output dimensions, kernel sizes, and layer configurations. Unlike the VGG19-based feature extractor, which is pre-trained and frozen, the classification model is trained from scratch and custom-built for the specific task of indoor localization. Convolutional layers extract spatial features, batch normalization stabilizes activations to improve training dynamics, and LeakyReLU ensures better learning performance by addressing vanishing gradients. Fully connected layers process the learned features into predicted cluster labels, completing the multi-class classification process.
Model architecture summary.
Model architecture summary.
The proposed indoor positioning system is designed to operate in two distinct phases: an offline phase, where the model is trained and the environment is characterized, and an online phase, where real-time localization is performed using incoming signal data.
Results
The measurements campaign and images’ generation
In the examined indoor environment with 36 RPs, illustrated in Figure 4, we place A = 6 ANs. Each AN (cf. Figure 5) consists of an XBee-PRO S2C Zigbee RF module with a receiver sensitivity of

Anchor node (AN) hardware components.
The algorithm generates a total of
In this phase, K-means clustering is individually applied to four sets of images: P0, P2, P4, and the combined information from all three power levels (3P), each set containing 108 images. The clustering results are visually represented in Figure 6, where distinct colored circles delineate unique clusters within the indoor environment. These clusters group together data points with similar characteristics, and the use of different colors aids in visually distinguishing each cluster.

Impact of the power level on the characterization of the indoor scene using K-means clustering: (a) clustering using P0; (b) clustering using P2; (c) clustering using P4; and (d) clustering using three power levels.
To determine the optimal number of clusters (
Upon closer examination, distinct patterns emerge in the clustering results. Notably, with increasing power levels (from P0 to P4), the clustering areas become more prominent and well-defined. This observation holds true for each individual power level figure. However, the combined figure (Figure 6(d)), incorporating information from all three power levels, provides a comprehensive view of the indoor environment. Circles representing clusters are not only distinct but also exhibit a higher level of clarity and precision in this combined representation. In addition to the primary clustering visualization method, an alternative representation is provided in Appendix A (see Figure A1).
To assess the impact of different feature extraction models on clustering, we conducted an empirical comparison using VGG16, DenseNet121, ResNet50, and VGG19. Figure 7(a) to (d) illustrates the resulting cluster formations for each model under three power levels. The results indicate that VGG19 provides the most well-defined cluster boundaries, effectively preserving spatial relationships and improving localization accuracy. In contrast, other models resulted in overlapping or poorly separated clusters, reducing the overall performance. This analysis showcases that VGG19 is the most suitable feature extractor for our approach.

Impact of the feature extraction model characterization of the indoor scene using K-means clustering: (a) clustering using ResNet50; (b) clustering using DenseNet121; (c) clustering using VGG16; and (d) clustering using VGG19.
The strategic integration of information from multiple power levels significantly enhances the accuracy of the clustering results. The improved clarity of the clusters, especially in the combined figure, emphasizes the effectiveness of leveraging a comprehensive set of power level data for establishing a robust IPS. Samples of these clustered images, both RGB images and those corresponding to power level P4, are illustrated in Figure 8.

Impact of the power level on the clustering performance: (a) clustering using P4: sample images; and (b) clustering using three power levels: sample images.
Next, we present the results of the classification process, the final step in our methodology, aiming to locate the mobile node within one of the identified clusters. The classification phase culminates with comprehensive insights into model performance, starting with the optimization of the learning rate. The learning rate is fine-tuned using the FastAI library, which employs an automatic learning rate finder, applying the “LR range test” to determine the optimal rate for model training. This technique, introduced by Smith (2017), involves progressively increasing the learning rate while tracking the corresponding loss. To select the optimal learning rate, we use the “valley method,” an approach developed by the Environmental Systems Research Institute (ESRI) and implemented as the default one in FastAI (Fast.ai, 2023). This method identifies the longest valley in the loss curve and selects the learning rate at the steepest descending slope, roughly two-thirds into this valley. The rationale is to find the region where the loss drops significantly before stabilizing, ensuring both rapid convergence and stable optimization (Smith, 2018). As shown in Figure 9, the loss decreases sharply up to a point before rising again. Our selected rate, 7.585

Learning rate optimization.
Subsequently, the classification model’s training and validation performance across 100 epochs are depicted in Figure 10. For P0, the validation loss fluctuates between 1 and 1.75, reflecting a moderate performance, while the training loss steadily decreases from an initial value of 0.5 to near-zero, suggesting a potential risk of overfitting.

Impact of the power level on the classification performance: (a) classification using P0; (b) classification using P2; (c) classification using P4; and (d) classification using three power levels.
In contrast, for P4, the performance improves significantly, with both training and validation losses converging to lower values, stabilizing around 0.25 for training loss. The accuracy is also higher for P4, indicating better model generalization. This improved performance at P4 is likely due to the stronger signal strength, which enables the model to differentiate between locations with greater precision. For P2 (cf. Figure 10(b)), the performance is relatively weaker. The training loss stabilizes around 0.5, indicating a more gradual decrease in error compared to P4. The accuracy is lower for P2, which can be attributed to the weaker signal strength at this power level, making it more challenging for the model to distinguish between different locations. The model is less able to capture fine-grained spatial variations in RSSI values at P2, which results in higher validation loss and lower accuracy compared to P4.
The most notable outcome is observed in the fusion of all three power levels (3P), where the model showcases remarkable efficiency by achieving convergence to near-zero loss in less than 20 epochs (cf. Figure 10(d)). This underscores the significant advantage gained by leveraging information from multiple power levels, contributing to improved model accuracy and generalization. The accuracy curves closely mirror the trends in loss, with 3P achieving 100% accuracy in less than 20 epochs. This outcome highlights the robustness and efficacy of the classification model when enriched with comprehensive power level data. The evaluation is further deepened through the analysis of confusion matrices in Figure 11, providing insights into the model’s ability to correctly classify images within each power level scenario. The classification process is based on the subset of 18 images used for testing in the cross-validation setup, with the remaining images utilized for model training. Remarkably, as shown in Figure 11(d), 3P attains flawless classification, while P4 demonstrates a single misclassification instance. In the case of P0, errors are observed in five images, signaling a need for further refinement. Although the number of test samples is limited (18 images), this reflects a realistic scenario where extensive measurements may not be feasible. Each sample corresponds to a distinct RP under varied acquisition conditions. The high classification accuracy—especially 100% with multi-power fusion—demonstrates the method’s robustness and its ability to generalize effectively despite modest training data.

Classification confusion matrices: (a) classification using P0; (b) classification using P2; (c) classification using P4; and (d) classification using three power levels.
This study demonstrates that high indoor positioning accuracy can be achieved even with a relatively small dataset, provided that the data is collected across multiple transmission power levels and transformed into a structured representation. The transformation of RSSI measurements into images—especially RGB images that encode signal variations across three power levels—emerges as a critical enabler of spatial pattern extraction and model generalization.
A key contribution of this work lies in its data efficiency. While many existing approaches require large-scale fingerprint databases, we show that by fusing measurements from different power levels and exploiting DL models trained on image data, robust and accurate localization can be achieved with significantly fewer samples. This makes our approach particularly attractive for real-world applications where exhaustive data collection is impractical.
Our results highlight that traditional public datasets—which often rely on single-power RSSI or WiFi-only measurements—do not provide the multi-power level granularity required by our fusion strategy. For example, the widely used UJIIndoorLoc (Torres-Sospedra et al., 2014) dataset offers WiFi-based RSSI data but lacks power-level control and does not support image-based fusion approaches like ours. As such, to reproduce or extend our method, the research community would benefit from: New datasets that include synchronized RSSI measurements at multiple transmission power levels, preferably collected under varying conditions. Standardized procedures for RSSI-to-image transformation, allowing reproducibility and comparability across studies. Benchmarking protocols that consider both classification accuracy and sample efficiency, to evaluate performance under limited data regimes.
The success of our method in accurately classifying locations using only 18 test samples—while maintaining 100% zone classification accuracy in the fused setting—underscores the effectiveness of combining signal-level diversity with spatial encoding. Furthermore, the model’s ability to generalize from limited data suggests that such fusion-based image representations could support transfer learning or domain adaptation in new environments, provided that the image generation process is standardized.
We encourage future work to contribute to or build upon multi-power level datasets, and to explore adaptive schemes that dynamically determine optimal power settings during operation. By investing in richer yet compact datasets and exploring novel representations like ours, the field can move closer to scalable and deployable indoor positioning solutions.
In conclusion, this article introduces a novel methodology for indoor positioning, integrating RSSI-to-image transformation and an in-depth exploration of power levels. The fusion of information from three power levels into RGB images significantly improves the precision and reliability of IPSs. Our results reveal the critical role of power levels in K-means clustering, leading to clear and well-defined clusters, especially in the combined multi-power levels approach. The CNN classification model, trained on clustered data, achieves remarkable accuracy, with the fused 3P approach outperforming individual power levels in a cross-validation setup. Notably, all test images in the 3P approach are accurately classified, showcasing its robustness. In practical terms, the methodology demonstrates resilience to wireless signal fluctuations, efficacy in distinguishing between closely spaced points, and adaptability to dynamic indoor environments. By combining power levels, the system becomes less susceptible to false positives or negatives due to temporary obstructions or changes in the environment.
In future research endeavors, a compelling direction involves exploring the integration of ML techniques to enable adaptive power level selection, dynamically adjusting to the prevailing environmental conditions. This entails developing algorithms or models that can autonomously assess the indoor environment’s characteristics and make informed decisions on the most suitable power level for optimal performance.
Footnotes
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Appendix A
The clustering outcomes are alternatively presented through a distinct visualization method. Instead of employing circles to delineate clustered areas on the floor plan, a dotted representation is adopted. In this depiction, each RP on the indoor scene is colored to signify its assigned cluster. Figure A1 showcases four subfigures, each illustrating the clustering results at a specific power level (P0, P2, P4, and 3P). This novel approach provides a different perspective, offering a visual insight into the distribution of clusters across the indoor environment.
