Abstract
RFID is an established technology and its implementation has been increasing steadily in different industries in the last decades. An important and relatively recent RFID breakthrough has been that of moving the level of tagging from pallet- or case-level, to item-level. This development has opened up a new set of use cases and benefits, especially in retail. One of these new use cases is the estimation of items’ location by positioning and tracking the tags attached to them. This problem is often seen as a classification problem, especially when tags that are read at the retail store must be located either in the sales floor or in the backroom area. The typical approach to ease this classification consists of physically shielding the interested areas via hardware installations, although this solution is expensive and lacks flexibility. In this paper, we present a different solution, namely a software-based shielding approach, to address the classification problem. Our solution makes use of item-level RFID tags and is based on the well-known logistic regression. Whenever a reading session is performed by means of a handheld reader, the classification model estimates in real-time (i.e. within a few seconds) which tagged items are in the same area of the reader and which are not, with no need of any shielding hardware installation. According to the validation preliminary tests presented in this paper, in which we simulated a fashion retail store, the proposed approach has an overall average accuracy of 95.5%.
Introduction
Radio Frequency IDentification (RFID) is a proven technology to automate processes and increase inventory and process accuracy throughout the supply chain (Atzori, Iera, & Morabito 2010). RFID benefits have been proven in different industries and at different levels of the supply chain (Bottani et al. 2017), and they are often connected to the level of tagging, i.e. pallet- or case-level, rather than item-level tagging (Bertolini et al. 2017). In the last decade, several authors have proven that item-level RFID tagging is very effective to improve inventory accuracy, reducing out-of-stock, and enabling omnichannel retailing (Goyal et al. 2016; Hardgrave 2012; Rizzi, Romagnoli, & Thiesse 2016).
Item-level RFID tagging, in fact, could be leveraged at retail store, by enabling a set of use cases which could not be tackled by pallet-level or case-level tags, such as replenishment from the backroom, loss prevention, faster checkout processes by means of RFID based sale transactions and items location (Madhani 2011; Gaukler 2010). In a recent review of in store RFID deployments in the fashion and apparel industry, Cilloni, Leporati, Rizzi, and Romagnoli (2019) report that the most frequent use cases of this technology are inventory accuracy, process automation and replenishment from the backroom. We note that all these use cases rely upon the fact that RFID enables more frequent inventory counts, both in the sales floor and in the backroom of retail stores. Concerning inventory counts, supply chain practices often discriminate between two applications. The former relates to the selective inventory count, whose aim is that of acquiring knowledge of the number of items of every Stock Keeping Unit (SKU) available in a small area, often not provided with physical barriers to delimitate it. The latter refers to the massive inventory count, whose aim is that of accounting for the number of items per every SKU in a much larger area, often delimitated by physical barriers (e.g., walls, shelves, doors). In both cases, the goal of the inventory counting process is to provide a robust and highly accurate classifier that accounts for all items located in a given area by means of noisy sensor data.
As a matter of fact, field applications use different solutions to provide robust and accurate inventory counts. Concerning selective inventory counts, the process is often carried out by means of portable RFID readers, also known as handheld, whose reading power and mode is set to achieve a limited reading range (Rizzi & Romagnoli 2017). This solution often provides sufficient accuracy, to the detriment of process time. On the contrary, massive inventory counts could be achieved with both handheld readers, as well as fixed infrastructures, and they often employ readers whose power and mode are set to achieve a wide read range, to allow quicker inventory count. However, to avoid inaccuracies, these latter solutions must be provided with physical shielding of the area where the inventory count is performed (Swedberg 2019b). In detail, a metal foil is usually applied to the walls that separate the sales floor area from the backroom, thus preventing radio waves from penetrating the walls. In this way, each time a reading session is carried out in one of the two areas, it is possible to have good confidence that only the tags belonging to that area are read.
Thus, RFID enabled retail stores are often provided with RF-absorbing materials, to allow accurate massive inventory counts in selected areas. This solution, however, is not perfect, as inventory errors may be generated by false tag reads (Metzger et al. 2013). Furthermore, the physical shielding introduces three issues (Swedberg 2019a). Firstly, it is an expensive solution, and it lacks flexibility, because RF-shielding materials constitute, or are applied to, physical barriers; therefore, they are not used to improve selective inventory counts, as this would imply a rigid store layout that conflicts with the need of rearranging store areas. Secondly, practical experiences showed that metallic-based shielding does not fully prevent stray reads, and RFID readers need to be tuned by decreasing read power, slowing the inventory-counting process. Finally, the installation of physical barriers is not always possible and can be un-aesthetic, especially in historical buildings generally used like shops by important fashion companies, such as Ferragamo, Gucci, and many else.
In this paper, an alternative solution to allow accurate inventory counts with no need of physical shielding between different areas is proposed. Our approach makes use of item-level tags and, since it is software-based, it has been named software based shielding. It makes use of a logistic regression model to estimate, every time a new reading session is carried out, which tags –i.e., items –are in the same area of the reader, and which are not. The software based shielding boasts the flexibility that is traditionally missed when realizing physical shielding, mainly because of needed fixed layouts (Swedberg 2019a); it is cheaper since it does not need physical installation of foil lining and metallic painting (Swedberg 2019a); and, finally, according to the preliminary experiments presented in this work, it can achieve a better accuracy. In fact, according to Bertolini et al. (2015), the use of physical shielding for inventory practices allows to achieve up to 99.9% of inventory accuracy, but it is usual to experience a leakage up to 20% of mis-locations even if physical installations are applied with the highest precision. According to the preliminary experiments presented in this work, the average positioning accuracy of tags is instead stable around 95%.
To the authors’ best knowledge, this field of research is still unexplored, as there is no scientific contribution introducing such approach. Also, concerning the industrial context, two recently filed patents claim to implement a similar solution, although their accuracy has not been disclosed to the scientific community.
The remainder of the paper is organised as follows. In section 2 a review of indoor location systems is provided focusing respectively on scientific literature, industrial solutions, and filed patents. Then, in section 3, a more accurate description of the problem is provided, by describing the typical application context and the data collection. The proposed classification model, the preliminary validation tests and the relative results are described in section 4. Finally, conclusions and future perspectives are reported in section 5.
Current state on indoor localization of objects via RFID
The localization problem
The localization consists in estimating the position of objects. It can be of different nature, however, even if scientific literature does not seem to strictly and unanimously distinguish different localization problems. As most of the authors agree (see for example Farid et al., 2013), the localization problems can be divided in indoor or outdoor localization, depending on the surrounding environment. Bergeron et al. (2018) suggest that the location problem can be organised according to the location finding: precise position finding against qualitative position estimates. For instance, the Global Positioning System (GPS) provides precise latitude and longitude on Earth’s surface; still, in many instances it could suffice to have a relative position of objects, especially for qualitative applications. Indeed, if the goal is to provide a location, that is a given portion of space, either in two or three dimensions, where the object is positioned with a given confidence interval, the problem can be seen as a classification problem. RFID technology, in particular, has proven to be an effective technology for this types of localization (Bergeron et al. 2018).
The technologies and methods used for localization consist, respectively, of wireless communication technologies used for acquiring the transmitted signal (e.g. GPS, Wi-Fi, Bluetooth, RFID), and algorithms for estimating the location based on acquired signals (Liu et al., 2007). Various wireless technologies are used for indoor localization, and they can be classified on the basis of the physical layer or location sensor infrastructure. Indoor localization systems consist of at least two separate hardware components: a signal transmitter and a unit that receive the signal. It is generally recognized that GPS, although suitable for outdoor localization, proves to be inaccurate for indoor localization (Saab & Nakad, 2010). This depends on the missing line of sight of satellites, and on the attenuation of the GPS signal as it crosses through wall. Also, the precision of GPS location detection (up to 50 meters) can be inappropriate for indoor localization, where the applications often need more precise location detection. Hence, other technologies are often used for indoor localization purposes, such as Wi-Fi, Bluetooth, UWB and RFID (Xu et al. 2018). Among these technologies, RFID is often chosen due to the effective trade-off among cost, precision, effectiveness and efficiency of the system (Farid et al., 2013; Xu et al., 2018). On the one hand, RFID might allow location accuracy in terms of centimetres (Buffi, Nepa, & Lombardini 2014). On the other hand, especially for applications with a great number of tagged objects, cheap passive RFID tags are economically feasible, and, in the UHF-frequency range, a reading range of a few metres is deployed more and more for locating applications in logistics, industrial environments and retail. In these sectors, in fact, automatic identification can be extended to comprehend localization functionalities and to build networks of sensors, controllers, machines, persons, and objects that can be interconnected (Xu et al. 2018; Uckelmann & Romagnoli 2016).
Concerning the methods, Liu et al. (2007) and Farid et al. (2013) identified three categories, namely (i) proximity, (ii) triangulation, and (iii) scene analysis.
Proximity detection provides symbolic relative location information. The position of mobile client is determined by the cell of origin method, with known starting position and limited range (Hu, Cheng, & Zhang 2011). Usually, it relies upon a dense grid of antennas, each of which has a known position. When a mobile target is detected by a single antenna, it is considered to be co-located with the antenna. When more than one antenna detects the mobile target, it is considered to be located near the antenna that receives the strongest signal.
Triangulation uses the geometric properties of triangles to determine the target location. It has two derivations: lateration and angulation. Lateration techniques are based on the propagation-time system, e.g., time of arrival, time difference of arrival, round-trip of flight, or received signal phase (Vossiek et al. 2003; Seco et al. 2009). These are distance-based techniques. On the contrary, angulation or estimation techniques are based on the angle of arrival of the mobile signal coming from a single location and received by readers (Liu et al., 2007). Triangulation is known to be well-performing in outdoor localization. However, as for the proximity detection and scene analysis, it does not work very well indoor because line-of-sight is often missing, and other impediments can affect the positioning process, such as floor layout, moving objects, and numerous reflecting surfaces (Phlavan et al., 2002; Liu et al., 2007).
Finally, scene analysis techniques determine the objects’ positions by looking at the last determined position and incrementing it according to its average speed and the time elapsed. In this case, new positions are calculated entirely from previous positions (see for instance House et al., 2011; Pai et al., 2012).
Relevance of localization problem in retail
One challenge within the context of localization with interesting industrial implications is known as shielding, and it consists of dividing the space where tags must be localised in two or more areas, so that RFID readers positioned in one of these areas should not read RFID tags placed in other areas. Typically, shielding makes use of physical solutions, which are expensive, not completely safe from errors, and they lack in flexibility. For these reasons, the interest for software-based shielding solutions has been increasing significantly in the last years. To our knowledge, the software-based shielding problem has not yet been approached in the academia. On the contrary, some industries have provided their own solutions. For instance, Nedap, an RFID system integrator that implements stock management solutions in retail, has developed a solution named ‘Virtual Shielding’, an algorithm written in its ‘!D Cloud software’, for determining the correct location of an item in the store, distinguishing between sales area and backroom. Similarly, Detego launched its machine learning approach called ‘Smart Shield’.
Contribution of the proposed solution
The use of RFID technology for goods tracking along the supply chain is well-established both in academia and in industrial and commercial environments. Also, RFID is considered a robust technology, and it is often used for implementing indoor localization services. However, the use of RFID technology for inventory counts based on software-based shielding is considered by some companies, but completely neglected by the scientific community, to the best of our knowledge. The common solution based on physical shielding of different areas by means of shielding materials is expensive and lacks flexibility. The alternative software-based solutions seem to be a promising alternative in terms of accuracy, efficiency, cost, and flexibility. Even if a few commercial solutions already exist, their accuracy and efficiency has not been yet disclosed, and this might constitute a new line of research that was rarely considered before. The current paper partially fills this gap by proposing and testing a software-based shielding solution. Our contribution is two-fold. Firstly, we propose an alternative solution as a contribution to the classification problem: this solution lies in the use of RFID technology and logistic regression to classify objects in different areas. Secondly, the practical validity of the method is assessed by the case study of inventory counts in a real environment.
Application insights
Problem description and application context
Inventorying, or inventory count, consists of doing a detailed list or report of things in possession. In retail, store managers are required to make a periodic survey of goods in stock, by reporting which and how many items are available to customers in the sales floor area, and which items are stored in the backroom. It is, in fact, quite common for retail stores to be at least partitioned in those two different areas (see for example Rizzi and Romagnoli, 2017). The inventorying process can therefore be explained as an estimate of how many items are available in the sales floor area, rather than in the backroom. These two areas are usually divided by a wall and the transition between them often happens through a door.
The inventory count automation may lead to important benefits, such as saving several hours of labour per inventory count and avoid distraction errors. When items to be inventoried are provided with RFID tags (i.e., item-level tagging), the inventory process is often performed with a mobile reader or handheld, and the store employees usually walk through the area of the store where the inventory count shall be performed waving the handheld in a reading session that typically lasts minutes, depending on the size of the store and on the number of items in stock. However, since radio waves can pass through walls, it is important to shield them, so as to only detect the tags that are in the same area of the reader. As we explained above, the classic solution to avoid reading errors, i.e. reading tags that are in different areas, is to physically shield the walls that separate different store areas by means of some RF-shielding material, such as aluminium foils or wire nets. This solution, however, has some limitations, especially in terms of material and installation costs and flexibility. Moreover, the installation of physical barriers is not always possible and can be un-aesthetic, especially in historical buildings generally used like shops by important fashion companies, such as Ferragamo, Gucci, and many else. Conversely, the solution proposed in this paper aims to estimate which tags, and therefore the attached items, are located in a store area rather than the other one by only interpreting the reading values of tags, such as Read Rate (RR) and Received Signal Strength Indicator (RSSI), with no need of any physical shielding between the different store areas. For sake of clarity, a scheme that visually represents the context and the objective of this application is reported in Fig. 1. From now on, we will refer to the simulated sales floor area as front, and to the simulated backroom area as back.

Visual representation of the application context considered in this work.
The implementation of RFID in a retail store involves several aspects that might affect the results of inventorying. In order to carry out a comprehensive feasibility analysis, 3 of them have been considered in this work, prioritizing those that, according to some preliminary tests, had the major impact on results. Hence, the aspects or environmental variables considered are: The door in the partition wall between the front and the back, which might be closed or open. The model of the reader. The effective radiated power (ERP) of the reader expressed in milliwatt [mW].
Concerning the door, we expect better results when it is closed, since the shielding between the front and the back is more pronounced. While, concerning the ERP, it is reasonable to think that higher ERP values increase the possibility to read the tags in the back and the complexity of shielding different areas. Also, the reader’s model may have a significant impact, but it is difficult to define it a priori.
According to the well-known theory of statistical full factorial experiments, in order to analyse the effect of the environmental variables and the reliability of the proposed approach in function of them, many combinations of possible discrete values have been explored. Hence, two different states per each of the first two environmental variables have been defined and successively emulated during the data collection. Concerning the door, the possible states considered are an opening of 30 degrees and a closed door, while, with respect to the reader model, two well-known and widely used readers in commerce have been selected, i.e. Bluebird RFR900 and Zebra RDF8500, which, for sake of brevity, in the remainder of this paper will be called simply Bluebird and Zebra. Conversely, concerning the effective radiated power (ERP), for a more comprehensive analysis, three different levels have been identified (i.e., 65 mW, 125 mW, 500 mW). The first two levels might be considered as representative of a selective inventorying, while the latter represents a condition of massive inventorying. Since the Zebra reader is able to deliver up to 1500 mW of ERP, a further set of experiments using that reader, an ERP of 1500 mW, and the closed door has also been performed. For the sake of clarity, the chosen states per each environmental variable are reported in Table 1. Subsequently, all the possible combinations have been explored, for a total of 13 different experiment scenarios characterized by different environmental variables. More precisely, since there are 3 independent decision variables (the door and the reader have two possible states, while, the ERP power three –i.e., 65 mW, 125 mW, 500 mW), the resulting number of combinations is 2×2×3 = 12, and each of these combinations represents a different scenario that may have different impact on results. The last case is just an additional scenario we wanted to consider since the Zebra reader is the only one able to reach that level of ERP power (e.g. 1500 mW). It should be considered as an exceptional and additional case possible only in a few cases, because not all the readers offer the possibility to use such a high ERP power. A different data collection has been done per each scenario and the proposed approach has been tested on all of them to observe its reliability and robustness.
Environmental variables considered and respective explored values
Environmental variables considered and respective explored values
The data collection was made trying to simulate as accurately as possible a retail store. The area we used is characterized by a 50 square meters front, and a back of 15 square metres. The two rooms are divided by a plasterboard wall in which a door for the transition from an area to the other is placed.
The authors are aware that the material of which the partition wall is made (usually plasterboard or bricks) could slightly affect the results, and the wall material and thickness should be considered as a possible environmental variable. However, due to the time constraints of the research, we have not been able to find a test environment in which, all other conditions being equal, the wall was made of any different material. We are confident that the scientific community will be able to verify the results of the proposed approach with different wall materials and thickness. Still, we point out that it is generally agreed by the scientific community that, as far as radio waves are considered, brick walls have a greater shielding effect than plasterboard walls (Wang et al. 2017). Because of this, the results presented below might temporarily be seen as a worst-case scenario.
For each scenario characterized by a single reader model, state of the door, and ERP level, 10 reading sessions of 1 minute have been carried out. In each reading session, 100 tags have been placed in the front, and 300 in the back, to respect a possible proportion of tags between the sales floor and the backroom area of a fashion retail store. In each reading session, the tags were shifted, both between different areas and inside every area, i.e. their position in the room was changed. However, to reduce testing times, named i = 1, …, 13 the scenarios and j = 1, …, 10 the reading sessions, the tags (and the position of tags) used in scenario i and session j is the same used in scenario i + 1 and session j, and this is true for each i ∈ (1, 13).
Further expedients have also been taken to better simulate a real situation. First, given the 100 tags in the front (and the respective 100 clothes attached to them), 50 of them have been folded on shelves or into cardboard boxes, and the remaining 50 have been hung to bars (such as jackets or shirts). A similar expedient has been taken in the back, where 150 tags have been placed into cardboard boxes with the respective clothes, and the remaining 150 have been hung. Secondly, during the reading sessions, the person waving the handheld reader at garments has been asked to randomly walk around the room (i.e., sales floor area). Finally, 4000 further non-registered tags have been placed in the simulated store (i.e., 1000 in the front and 3000 in the back), and they have been used as a source of noise for the reader. These additional non-registered tags may produce a sort of signal collision and noise, which is important to consider, since it will always be present in any real application.
To the authors’ point of view, it is also important to highlight that, in each reading session, the tagged items placed in the front and in the back, whether hanged or folded, have been equally distributed inside each room, placing 50% of them in the first 1.5 meters from the partition wall, and the remaining 50% further away from the wall.
On occasion of each 1-minute reading session, many tags have been read more than once. The information that has been collected to estimate the tag location is (i) the number of times the tag has been read, i.e. RR, and (ii) RSSI of every tag read.
Proposed approach
Data pre-processing and logistic regression
The problem at hand can be cast as a binary classification problem (Murphy 2012), in which observations are categorized according to their input variables. Indeed, being the objective to identify which garments are placed in the front room (first class) or in the back room (second class), the task can be addressed with established Machine Learning techniques. The logistic regression (Friedman, Hastie, & Tibshirani 2001) has been chosen to assess the viability of the proposed approach, given the model is rather simple, with few to none hyper parameters to be tuned, and easy to interpret.
However, before delving into details of the classification model, a data pre-processing step must be performed beforehand. Indeed, the reader, during a reading session, detects multiple times each tag. However, our purpose is to assess, for a given tag, its location only once. Moreover, very little data is associated with a single reading, like the timestamp and the instantaneous RSSI. To this aim, a pre-processing step must be done in order to aggregate the multiple readings associated to a single tag into a single vector, to be associated to the tag itself, and which can be used to carry on the classification task.
Thus, the feature engineering process first defined which possible quantities could be descriptive and discriminating in order to assert the position of the tags. As inferred from the physical theory underpinning the functioning of RFID tags, two variables have been selected: (i) the Read Rate (RR), (ii) the RSSI. The former counts how many times a given tag has been detected by the reader, and, ideally, the higher the value, the higher the probability of the tag to be positioned in the front room. The latter measures the strength of the signal received by the reader, and the higher the value, the higher the probability of the tag to be in the front room. However, while the Read Rate is already an aggregated measure of the readings of a tag, the RSSI must be further synthesized by using well known statistical indices (i.e., mean, standard deviation, and quantiles), calculated on the measures coming from the readings of a tag. To summarize, the following is the list of input variables associated to a tag: (i) the Read Rate (RR), (ii) the average RSSI, (iii) the standard deviation of the RSSI, and (iv) five different quantiles (5%, 25%, 50%, 75%, 95%) of the RSSI readings. As represented in Fig. 2, each input record provided to the classification model is therefore made of 8 elements.

Inputs of the logistic regression.
The output/target variable, of course, is the position of the tag, either in the front room or back one. The former is coded as zero, while the latter is coded as one. Concerning the amount of data, as mentioned above, 400 different tags have been used in each reading session (i.e., 100 in the front and 300 in the back), and 10 reading sessions per each scenario have been done, for a total of 4000 records per scenario.
The objective of this work is obviously to analyze the possible implementation of the classification model in a specific scenario. Because of this, each dataset of 4000 records has been treated as an independent experiment. Thus, a regularized logistic regression model (i.e., L2 penalty –Schaefer et al.,1984) has been trained, in each scenario, splitting the dataset, as usual, into a training set with 80% of data (∼3200 records) and a test set with the remaining 20%. The algorithm used for training is the Limited-memory Broyden–Fletcher–Goldfarb–Shanno (L-BFGS or LM-BFGS) (Nocedal 1980; Liu & Nocedal 1989) with the default parameters suggested by authors. The model has been implemented using Python and the well-known scikit-learn library (Pedregosa et al. 2011). As already stated, since the logistic regression is a rather simple model, it has no hyper parameters to be tuned, which is a very desirable characteristic from an operational point of view, making very easy and fast the deployment of the model in a production pipeline. To cope with the unbalanced setting of the problem (the tags in the front accounts for only the 25% of the total), the model has been trained setting the class_weight option to ‘balanced’ (King & Zeng 2001), which has the effect to penalize more the errors calculated with respect to the minority class, i.e. the tags in the front room.
The results of the validation tests which have been made are presented in Table 2. As said above, each scenario has been treated as a single experiment, and, for each dataset of 4000 records relative to the considered scenario, the 80% of randomly selected records has been used to train the model and the remaining 20% to test it. Moreover, to measure the reliability of the approach, on each dataset, 30 logistic regression models have been trained and tested randomly changing each time the records used for training and those used for tests.
Results across different scenarios
Results across different scenarios
In Table 2 are reported, for each scenario, the environmental variables that characterize it, the number of iterations used to train the classification models, the average and standard deviation of the accuracy obtained by the 30 logistic regressions on their respective test datasets, and the best result obtained, which might be interpreted as the response of the model whose training provided the best vector of weights.
Moreover, since the scientific community unanimously states that, in disciplines such as deep learning, data normalization generally has an improving impact on outcomes (Jayalakshmi & Santhakumaran, 2011), a further validation has been made. The same tests as described before have been done, although, before training the models, a normalization of data has been carried out. This normalization has been made singularly on each reading session of one minute (see Fig. 3), because, thinking to an eventual real implementation, that is the only situation where it could take place. Given a specific input value x (e.g. the RR) the transformation is given by Equation(1),

Representation of the normalization process.
where
The results are in general very satisfactory. The model adopted has an overall average accuracy of 95.54% without normalization and 95.55% in case of normalization, while the average best results are respectively 96.87% and 96.92% of accuracy. The small differences between the overall average results and the best average results prove that the model is very consistent and reliable. On the other hand, the small differences between results obtained with and without normalization prove that this kind of pre-processing is not very effective, and, probably, the model is close to its maximum effectiveness. Indeed, a t-test on the results obtained with and without normalization prove that there is no statistically significant difference, with a very high p-value (> 0.98).
Concerning the impact of environmental variables, looking at the overall average accuracy, it seems that the model performs better when the door in the partition wall is closed –other conditions being equal –, and this matches the expectations of the authors. However, looking at each single scenario, there are some exceptions. For instance, in all the scenarios where the Bluebird reader is used, it there is a slight accuracy improvement when the door is open. However, the difference is very tight, and we believe it should be explored more in detail before drawing any conclusion.
The environmental variable having the major impact is unquestionably the reader. Observing the average results, it is possible to see that, when the Bluebird reader is used, both with and without normalization, the accuracy of the model is 2% better than using the Zebra reader. This difference does not seem that big, however, it might make the difference in case of massive storage and in retail stores storing a great quantity of goods. The effect of the reader is even more evident looking at the box plots presented in Fig. 4. Looking at the raw data provided to the regression model, it is clear that, in the experiments where the Bluebird reader is used, there is a greater difference in terms of read rate and RSSI between the tags in the front room and those in the back room. Conversely, in the case of Zebra reader, the values of read rate and RSSI for tags in the front and in the back rooms are more similar. It follows that, in the case of Zebra reader, shielding is more difficult some reasons that do not depends on the regression model on which this paper is focused.

Box plot of results in all the scenarios, with no normalization of input data.
Concerning the ERP, it is difficult to find a real correlation with results. More in detail, it is possible to affirm that there is no linear correlation between the ERP and the accuracy of the proposed approach in identifying which tags are in the front and which in the back room. It rather seems that each reader has its own best configuration. This absence of a linear relationship with the environmental variables justifies our choice in considering them all, and the adoption of a logistic regression instead of a more simple multivariate linear regression.
With no normalization the best result occurs in correspondence of the second scenario, where the logistic regression has a surprising accuracy over 99%, and this result also corresponds to a great average result of 97.46%. Conversely, in case of normalization, the best model is obtained in correspondence with the fourth scenario and has an accuracy of 99.25%. Since the fourth scenario corresponds to the best result obtained in case of normalization and the best average result obtained without normalization, it is reasonable to say that it can be considered as the best context in which the proposed approach should be used. However, even in a couple of different scenarios such as the second and the third one, the proposed model is obtaining good results.
To systematically assess which scenario has indeed showed the best results, with the aim to identify the best scenario settings to which to focus for future works, coupled t-test have been employed to test the statistical difference between the results obtained in the fourth scenario with respect to all other scenarios. Indeed, the mean accuracy measured for the fourth scenario resulted statistically significantly different to all other scenarios, with very low p-values (on average p⪡3e-4).
Furthermore, Fig. 4 shows the boxplot of the results obtained for each scenario, and it is can be seen that there are few outliers (highlighted by a bullet) and most of the 30 trained models have an accuracy close to the average. This is a further indicator of the reliability of the approach.
To the authors best knowledge, these results may suggest that a possible implementation of the proposed approach in production might be possible very soon. Especially if we consider that (i) these tests have been carried out in the pessimistic case in which the partition wall is in plasterboard, (ii) the possibility to replace the logistic regression model with mode sophisticated Machine Learning and Deep Learning approaches (e.g., Random Forest, neural networks, etc.), and (iii) the fact that these results refer to a single reading session of 1 minute.
In this paper, we propose a software-based approach for classifying RIFD reading data of tags located in different areas of a simulated retail store. We named our solution software-based shielding, as it simulates via software the effect of a physical shielding of radio waves between different store areas. The solution we propose relies on the well-known logistic regression, and it reports an average overall accuracy of 95.5% in the classification of tagged items. The proposed solution has been tested in various scenarios by using two different readers, four reading power configurations, and two state of the door in the partition wall. Results show that all these environmental variables might impact the results. however, the solution we provide has always shown a good accuracy, and it also proved to be very robust and reliable. The results obtained are satisfactory, even if we are aware that there still is room for improvement.
Future research perspectives and possible extensions might focus on two main concerns. The solution proposed in this paper could be tested in different scenarios. For example, we did not have the possibility to test our solution with a brick partition wall (only a plasterboard wall has been used). Also, we did not test different rooms sizes. Lastly, the number of items that we tested in front and back is rather limited. We expect that, in case of larger simulated retail stores, the results could somehow differ, and some further considerations could be involved. Moreover, before the solution could be deployed in a real retail store, one should consider the need to perform a fine tuning of the model in the environment where the software-based shielding should operate, with the goal of readjusting the parameters weights.
Another future research line could concern the testing of different (and more complex) models, such as neural networks, random forests, or any other algorithm able to approximate non-linear functions. Indeed, the authors are already at work on some of those lines for future research.
