Natural Language processing (NLP) derives its roots from artificial intelligence and computational linguistics. The proliferation of large-scale web corpora and social media data as well as advances in machine learning and deep learning have led to practical applications in diverse NLP areas such as machine translation, information extraction, named entity recognition (NER), text summarization and sentiment analysis. Named-entity recognition (NER), is a sub task of information extraction that seeks to discover and categorize specific entities such as nouns or relations in unstructured text. In this paper, we present a review of the foundations three tolerance-based granular computing methods (rough sets, fuzzy-rough sets and near sets) for representing structured (documents) and unstructured (linguistic entities) text. Applications of these methods are presented via semi-supervised and supervised learning algorithms in labelling relational facts from web corpora and sentiment classification (non-topic based text). The performance of the three presented algorithms is discussed in terms of bench marked datasets and algorithms. We make the case that tolerance relations provide an ideal framework for studying the concept of similarity for text-based applications. The aim of our work is to demonstrate that approximation structures viewed through the prism of tolerance have a great deal of fluidity and integrate conceptual structures at different levels of granularity thereby facilitating learning in the presented NLP applications.
Natural Language processing (NLP) derives its roots from artificial intelligence and computational linguistics. The proliferation of large-scale web corpora and social media data as well as advances in machine learning and deep learning have led to practical applications in diverse NLP areas such as machine translation, machine learning-based information extraction, named entity recognition (NER), text summarization, to name a few. Named-entity recognition (NER), is a sub task of information extraction that seeks to discover and categorize specific entities such as nouns or relations in unstructured text [1]. Opinion mining and sentiment analysis are two popular NLP research areas that where the opinions of users are analyzed to detect sentiment polarity for non topic-based text categorization [2, 3, 4, 5, 6]. Reasoning with a tolerance form of granular methods is particularly useful in natural language applications (NLP) with uncertain and overlapping class boundaries inherent in the structure of linguistic data. Fuzzy set theory [7], rough set theory [8], and near sets [9, 10] are core theories that constitute granular computing. These methods use information granules where a granule is a clump of objects (points) in the universe of discourse drawn together by indistinguishability, similarity, proximity, or functionality [7]. In this paper, we present foundations tolerance-based granular computing methods and its application in: named-entity recognition in structured (documents) and unstructured text (entities) as well as sentiment classification.
Poincaré’s work in 1905 [11] led to the idea of tolerance which can be traced back to his work on the similarity of sensations. Subsequently, tolerance relations were considered by Zeeman [12] in his study of the topology of the brain where he was concerned with different cells of the brain which, when they are very near, are perceived as identical or indistinguishable. Tolerance spaces as a framework for studying the concept of resemblance was presented in [13]. For natural language processing applications, classical rough set theory [8] based on equivalence relations is considered too restrictive. Tolerance relations provide the most general tool for studying indiscernibility phenomena [14]. Authors [15, 16, 14, 17] have worked on combining rough sets and tolerance relations [18] to leading to the tolerance rough sets (TRS) model.
Authors [19, 20, 21], proposed the idea of combining fuzzy and rough sets (FRS) to develop soft similarity classes i.e., fuzzifying the approximations of rough set theory. In fuzzy set theory, the relational counterpart generates soft similarity classes which permits partial overlap, even when the fuzzy relation is reflexive, symmetrical and -transitive, i.e., a so called fuzzy -equivalence relation [22]; a property that lies at the heart of fuzzy-rough set models which is ideal for text categorization applications [23]. In near set theory, instead of set approximation operators used by tolerance rough sets to generate tolerance classes, the tolerance classes are directly induced from the feature vectors using a tolerance level parameter and a distance function. Tolerance near set-based (TNS) classification algorithm was first introduced in [24].
TRS for clustering structured text (documents) was introduced in [25] where a hierarchical clustering alogrithm based on an extension of the agglomerative method was applied. Non-hierarchical document clustering method using TRS was presented in [26] to address the issue of due to exploding time and space requirements of the hierarchical method especially when dealing with a large document corpora. A similar approach was used by the authors [27] where k-means clustering was applied to web documents represented using the TRS model. Web document classification with TRS was discussed in [28]. Online hot topic detection with TRS based clustering was presented in [29]. A tolerance-based semi-supervised two-class ensemble classifier for documents was introduced in [30]. In [31] document clustering using a lexicon-based document representation based on the TRS model was presented, as well a framework for retrieval of indonesian language text [32]. The problem of learning weights was addressed in [33]. TRS model was applied to semantic document indexing in [34, 35, 36]. An in-depth survey of application of tolerance rough sets in text categorization (structured and unstructured) can be found in [37]. Fuzzy rough set (FRS) method was proposed as a model for structured text representation (document) and retrieval [38]. The application of fuzzy rough sets to query refinement was first presented in [22] with the theory outlined in [39]. More recently, FRS was used in document classification by [40].
The growth of the Web has spawned knowledge-bases from web corpora such as Wikipedia, DBpedia, WordNet,1
https://wordnet.princeton.edu/.
Never Ending learning (NELL) [41] and Google’s Knowledge Vault [42]. A knowledge base usually contains facts that are linguistic (unstructured) and referred to commonly as nouns (unary relations) or relations (binary relations). These facts are learned using in either a semi-supervised or unsupervised manner. The representation of facts is usually in the form of triple (subject, predicate, object). The predicate is referred to as a contextual pattern. With a semi-supervised form of learning which we present in this paper, three challenges are encountered: i) the number of training examples are few, i.e., relations and their known categories, ii) a relation may belong to more than one category depending on its contextual pattern(s), and iii) new relations are frequently mislabelled (also known as concept drift [43]) in the process of iterative learning. In our tolerance-based granular approach with TRS and FRS [44, 45, 46], we address these challenges by creating an approximation space for representing unstructured text in form of either crisp or graded co-occurrence matrix and designing semi-supervised learning algorithms, Tolerant Pattern Learner (TPL) and Fuzzy Pattern Learner (FRL) (Fig. 1 illustrates a typical steps followed during the iterative learning process for linguistic entities). We frame our discussion in terms of benchmarked NELL web corpora as well as two semi-supervised learning algorithms: Coupled Pattern Learner (CPL) [47] and Coupled Bayesian Sets (CBS) [48]. To address the problem of concept drift, both CPL and CBS employ external constraints. In CPL, simultaneous training of many extractors for learning relations using mutual exclusion and type checking relationships were employed. In CBS, a probabilistic score for every candidate noun fact belonging to a category was used. No external constraints were required for the TPL algorithm. However, mutual exclusion constraints were used with the FRL algorithm.
Semi-supervised Learning in NER with granular methods [49]. Steps 1–5 shows the iterative process of enlarging the trusted entity set (nouns) using a promotion mechanism with approximation spaces.
Non topic-based text classification includes sentiment classification and news categorization. We discuss our recent work on near-set technique-based algorithm (TSC) to classify sentiment polarities of tweets and news classification [50]. The supervised TSC algorithm takes advantage of the recent advances in efficient feature extraction and vector generation from pre-trained bi-directional transformer encoders for creating tolerance classes. Feature vectors are generated using two pre-trained deep learning models BERT [51] and SBERT [52] and cosine distance is used for generating tolerance classes and representative vectors during the training process. The effect of different vector-generation methods, tolerance class sizes, balanced and imbalanced curated datasets as well as a number of sentiment classes on the TSC algorithm are examined.
The outline of the paper is as follows: In Sec. 2, we present formal definitions and models for TRS, FRS and TNS methods for structured text (documents), unstructured text (linguistic entities) and non-topic based text (sentiments). We define approximation spaces as representational structures for applying machine learning algorithms. In Sec. 3 we present two semi supervised learning approaches in the context of an NER application for categorizing nouns (with TPL) and relations (with FRL). In Sec. 4, we discuss a near set-based algorithm (TSC) to classify sentiments of tweets and news categories from text. In Sec. 5, we briefly describe the datasets used in the TPL, FRL and TSC experiments. Results of the experiments using NER and non-topic classification datasets are discussed in Sec. 6 followed by concluding remarks in Sec. 7.
Preliminaries
Tolerance rough sets
In classical rough sets theory, a universe of objects is partitioned into indiscernible classes (i.e., non-overlapping granules) by means of an equivalence relations () which are reflexive, symmetric and transitive. Specifically, for NLP applications where overlapping classes are desired, a non-transitive binary relation that is reflexive and symmetric is necessary. Hence the need for a tolerance form of rough sets. Tolerance relations are reflexive and symmetric but they are not necessarily transitive, so the classes induced by such relations can overlap. Let be a finite, non-empty universe of objects, be the power set of , and let be a binary relation such that holds for any . defines a tolerance relation and defines a tolerance class of . Then a tolerance membership function (also known as vague inclusion function) is defined as . The lower and upper approximations of set can be defined as
where defines a simplified form of tolerance approximation space [53] which leads to the the definition of a tolerance rough set:
(Tolerance Rough Set).
A tolerance rough set is defined as a pair .
Document representation with TRS
Documents are considered as structured information and were first modeled using tolerance rough sets by Kawasaki [25]. With this model, documents are represented as vectors, where each vector dimension corresponds to a term weight that is enhanced by the two approximation operators defined (see Def. 1), by relating terms across documents. This is useful particularly when each document is characterized by only a small number of terms along with many zero-valued entries in a high dimensional term vector space.
Let be a set of documents and and let be a universe of index terms that occur in those documents. Let each document be represented as a weight vector of its terms where shows the significance of term in document . Let a query be a cluster representative in the form of where and [26]. The goal is to find ordered documents that are relevant to .
(Document Approximation Space).
Let be the approximation space for approximating documents, where is the tolerance relation and is the inclusion function. The tolerance relation is defined as:
where is a user-defined threshold and denotes the number of terms in which and co-occur. The inclusion function is defined as:
where represents the concept (documents) to be approximated and the structurality function is 1 for . The lower and upper approximations of are redefined as follows:
.
The uncertainty function defines the tolerance class for each index term and, in effect, captures affinity amongst the terms via the co-occurrence function .
.
The inclusion function also called vague inclusion function, measures the overlap between tolerance class (containing ) and set .
.
The inclusion function also called vague inclusion function, measures the overlap between tolerance class (containing ) and set .
Weight Adjustment via Tolerance Rough Sets:
The following weighting scheme takes into account boundary terms and assigns non-zero weights. Note that the upper approximation of a document covers the tolerant terms for all of its own terms as well, creating the enriched representation.
.
Once the weights are adjusted within the framework of tolerance rough sets, one can measure the similarity between a query vector by using the following formula which is typically used in retrieval (by clustering similar documents).
Named entity representation with TRS
In contrast to document representation and learning, named entity identification consists of recognizing common linguistic entities (in unstructured text), and relationships between them as well as categorizing them into predefined categories such as names, location, organization. These entities are meaningful terms (nouns) or phrases (relations) in unstructured text. In this section, we give formal definitions for approximating entities (nouns). Definitions for relations can be found in [44]. Let represent the universe of nouns, and let represent the universe of contextual patterns. Let map each noun to its set of co-occurring contexts: where occurs times within context . Let map each context to its set of co-occurring nouns: .
Let where and are as defined previously. is the parametrized uncertainty function describing the tolerance classes for the contexts, in terms of contextual overlaps:
Here, is the tolerance threshold and is the overlap index which is the Sorensen-Dice index [55]:
The degree of inclusion is measured by and is defined similar to the the aproximation space for documents. Within the framework of , the lower and upper approximation of a noun is defined as:
and the upper approximation is defined as:
.
A noun (concept) can now be approximated using its contexts. The lower approximation represents closely related contexts while the upper approximation represents somewhat related contexts.
Contexts: “_ league”, “_ and other sports”, “_ is popular in _”
Co-occurrence statistics: e.g. (“Ice Hockey”, “_ league”)
.
Biomedical entities from biomedical literature [56]
entities: sars, saliva, immunohistochemistry
Categories: coronavirus, substrate, diagnostics
Contexts: “respiratory syndrome-associated_”, “binding of _”, “virus replication in _”
Co-occurrence statistics: e.g. (“coronavirus”, “respiratory syndrome-associated_”)
Named entity representation with FRS
A fuzzy rough set is a pair where A is a fuzzy set in X such that and . is a fuzzy relation in [39] where represents the lower approximator and represents the upper approximator.
A knowledge base can be viewed as a thesaurus consisting of linguistic entities such as nouns, relations, and their contexts (or contextual patterns). We now define give formal definitions for relation entities. Let represent the universe of relations described via the relational co-occurrence function where occurs times within the context and is a noun and is a relational context. Let represent the mapping of each relation to its set of co-occurring relational contexts: .
(Trusted relations).
Let represent a set of trusted relations such that and index .
(Co-occurrence).
Let denote mapping of each relation to its set of co-occurring relational contexts:
Let represent the approximation space used for thesaurus contruction. The fuzzy relation is defined as a fuzzy set in . Let represent the fuzzy sets of relations and trusted relations respectively. The upper and lower approximations of the fuzzy set in is denoted by and [22, 57, 58].
.
The first step towards creation of a fuzzy thesaurus is by normalizing the co-occurrence information using . The next step involves fuzzifying the co-occurrence function with the S-function where and are constants determined experimentally.
Normalized frequency for relation FC Barcelona Lionel Messi
FC Barcelona Lionel Messi
Normalized frequency
Attacking midfielder
1/7
Forward
1/7
Have handed
2/7
Player
2/7
Prodigy
1/7
S-function value for relation FC Barcelona Lionel Messi
FC Barcelona Lionel Messi
S-function
Attacking midfielder
1
Forward
1
Have handed
1
Player
1
Prodigy
1
(Upper approximation).
.
The upper approximation makes it possible to select candidate relations having a membership co-occurrence value either equal to or more than that of the trusted relations .
(Lower approximation).
.
The lower approximation will makes it possible to select candidate relations as trusted when there is at least one common context with a trusted noun .
.
To counter the problem where candidate relations with high membership degrees are promoted and added to the set of trusted relations, irrespective of whether the candidate relation is related to the trusted relation, the upper approximation operation of all candidate relations followed by lower approximation for the remaining candidate relations is performed. This operation is called tight upper approximation and is defined in Eq. (6).
The micro score for a candidate relation (see Eq. (7)) is calculated using the tight approximation operator. This will score will be used during the learning process.
(micro score).
where and are application dependent.
Tolerance near sets in sentiment classification
Near sets are essentially disjoint sets that contain objects with similar descriptions provided the intersection of the sets is nonempty. The basic structure which underlies near set theory is a perceptual system which consists of perceptual objects [10]. Near sets are characterized by a perceptual system, a nearness relation and a near set [60].
A perceptual system is a pair , where is a nonempty set of perceptual objects and is a countable set of real-valued probe functions where is a probe function of a single feature.
The probe functions give rise to a number of perceptual relations. The notion of tolerance is directly related to the idea of closeness between objects, that resemble each other with a tolerable level of difference. The tolerance space for a text-based nearness relation consists of textual objects where the universe consisting of text described by set of vectors .
Let be a universe of nonempty set of objects and be the feature set. Let where represents textual features. A tolerance space is defined as:
where dist is the cosine distance given in Eq. (8). The tolerance relation induces a tolerance class where is a user-defined tolerance level.
.
In other words, given a set of text (e.g., tweets) , where , , each tweet or text can be represented as a k-dimensional word vector where text similarity is measured using the cosine distance measure. The universe of text described by set of vectors , is spread amongst tolerance classes with a tolerance level for semantic textual similarity, where a tolerance class is maximal with respect to inclusion [62].
.
Let is set of tweets elaborated as: {“So there is no way for me to plug it in here in the US unless I go by a converter.”, “Good case, Excellent value.”, “Tied to charger for conversations lasting more than 45 minutes.MAJOR PROBLEMS!!”, “He was very impressed when going from the original battery to the extended battery.”, “The design is very odd, as the ear clip is not very comfortable at all.”}
.
Let {[7.22507477e-01, 1.02917385e+00, …, 6.84888303e-01, 2.86179781e-01], [2.96921253e-01, 7.52622932e-02, …, 4.93599415e-01, 1.82238102e-01], [1.55841690e-02, 7.04942644e-02, …, 3.76997262e-01, 2.59568900e-01], [7.95753375e-02, 2.57343829e-01, …, 6.54604957e-02, 2.85186619e-01 ], [ 4.41945344e-01, 5.39983749e-01, …, 4.36222434e-01, 1.82313472e-01]} where is a set of 768 dimensional vectors extracted using SBERT.
.
Figure 2 shows the cosine distance matrix created by using Eq. (8).
Distance matrix with cosine distance values between tweets .
.
Based on the distance matrix shown in Fig. 2, and 0.79, we can derive the following following tolerance classes:
since all of the pairs satisfy 0.79 which creates the first tolerance class . . which creates the second tolerance class. which creates the third tolerance class. The final tolerance class is union of all above classes .
Method I – Named entity recognition application
Semi-supervised learning with TRS and FRS
In this section, we give an overview of the TRS-based pattern learner (TPL) algorithm for categorizing nouns with the representation described in sec. 2.1.2. TPL follows an iterative method and categorizes nouns using a semi-supervised approach. In this approach, given a trusted noun for a category and a candidate noun , three approximations are computed: , , shown in blue, white and orange regions respectively in Fig. 3. Each region represents a degree of similarity for a candidate noun which will be used to determine an overall score for the promotion of .
Zones of Approximation induced by trusted entity (noun ()) and candidate entity (noun ()) and contexts and respectively for a certain category.
.
In other words, a trusted noun has its universe of contexts partitioned by its three descriptors (context, upper and lower). For example, the degree of similarity is highest in zone 1 (orange) since it is covered by lower approximation of the trusted noun and the context of the candidate noun. These zones will be used in the promotion scoring defined below.
(Promotion scoring).
.
A micro score is calculated for each category. , and are weights that reflect the contribution of each zone and are determined by the application domain. A macro score is calculated over all categories. The overlap index function given in Eq. (1) is used for this calculation.
A large unlabelled corpus , trusted nouns, categories and the co-occurrence matrix. Promoted nouns for each category. These are labelled nouns. each category cateach new trusted noun of cat Calculate the approximations and each candidate noun phrase Calculate each candidate noun Rank nouns by Promote top nouns as trusted
The input to the TPL algorithm (Alg. 13) is an ontology comprised of a set of categories (e.g. City) and a handful of trusted nouns also called as seeds (e.g., Winnipeg, New Delhi, Ankara). Additionally, a large co-occurrence matrix representing the nouns and the contextual patterns are input to the TPL algorithm. The output consists of promoted trusted nouns assigned to their respective categories adding to the growing knowledge base of trusted (labelled) nouns. Lines 6 and 8 show the computation of the scores. Once the score is calculated for every candidate noun of cat, the candidates are ranked by their macro-scores (normalized by the number of trusted instances of cat) shown in line 8. The top new candidates are promoted as trusted nouns shown in line 9. Two versions of the TPL were designed. TPL 1.0 was the first implementation of TRS in a NER application to categorize nouns and relations. The motivation for the development of TPL 2.0 was to explore the scalability (with a larger ontology) of tolerant pattern learner and to test whether any external constraints were necessary to overcome concept drift.
The FRL algorithm is also a semi-supervised algorithm which several features common with TPL in terms of the process of learning and promotion. In contrast to the TPL algorithm, Alg. 13 gives the categorization of relations (pair of nouns) with the representation described in Sec. 2.2. The input to FRL is an ontology of trusted relations (seeds), categories, unlabeled relational facts, and a co-occurrence matrix. The process consists of building a fuzzy thesaurus (line 5), applying the upper and lower approximator operators (resulting in tight upper approximation) are shown in steps 6–10.
An ontology defining categories; a large corpus , co-occurrence matrix, a small set of trusted relations is a set of all new promoted trusted relations each category cateach new trusted relations belonging to cateach candidate relation Calculate Fuzzy Relation using Eqs (2) and (3) Calculate Upper Approximation as in Eq. (4) Calculate score each candidate relation Calculate Lower Approximation using Eq. (9) Calculate score Calculate using Eq. (7) Sort trusted instances by Promote top trusted instances, such that
Method II – Sentiment polarity classification
In this section, we present a near set-based algorithm (TSC) to classify sentiments of tweets and news categories from text. The TSC classification algorithm has two parts: training and testing. Alg. 4 describes the training process where a tolerance class representative vector is generated.
[!h] TSC Training Phase: Generating tolerance class representative vectors [61]InputInput OutputOutput , // Transformer Vectors for training // Tolerance level parameter NT is the size of the Tolerance class set
// Create a distance matrix for all training vectors based on the function defined in Eq. (9)
//a. Create object pairs satisfying the tolerance relation defined in Eq. (8) b. Create a neigbhourhood of training vectors. c. GenerateToleranceClass Set where is one tolerance class member induced by the tolerance relationd. Determine the majority polarity element in each e. Compute representative class vectors
; ; //Compute the neighbourhood of training vector
all, ; // Include from into ; //T is set of all tolerance classes; //For each , determine polarity/category by majority voting
; //Number of tolerance classes in T //End of the process of generating set which is set of all tolerance classes.//For each Tolerance class in , generate a tolerance representative class vector and its polarity/category.;
Given a tolerance level , tolerance classes are induced from the training set vectors using the cosine distance, and the representative of each tolerance class is computed as the mean value of the feature vector. The polarity (or category) of the representative vector is determined based on majority voting. In the classification phase shown in Alg. 4, TSC uses the representative tolerance class vectors generated in the training phase and their associated polarity/text category. The computeCosineDist function (line 3) calculates the cosine distance between each test set vector and all the representative tolerance class vectors. The DeterminePolarity function (line 4) chooses the representative tolerance class that is closest to the test set vector and assigns the polarity of the representative to the test set vector.
[h!] TSC Classification Phase: Assigning Sentiment Polarities [61]InputInput OutputOutput , //Tolerance level parameter, //Size of the Tolerance class set , , //Transformer Vectors for testing
//Representative class vectors generated in the training phase and their associated polarities
//Transformer Vectors with assigned polarities/categories
; //Computes min. distance and assigns polarity to the test set vector
Materials
NER dataset for TPL 1.0. TPL 2.0 and FRL experiments
The original source for TPL 1.0 and FRL experiments is ClueWeb092
https://lemurproject.org/clueweb09/.
which is a large web document collection consisting of roughly 1 billion web pages in ten languages, collected in early 2009. NLP methods were used to process the web pages from CluWeb09 consisting of millions of raw noun phrases, contextual patterns and all the co-occurrence information. The authors [48] shared the all-pairs dataset from which 68,919 unique nouns and 59,325 unique contextual patterns were extracted for TPL 1.0 experiments. For the TPL 1.0 experiments with relations, 13,020,010 unique noun phrase pairs and 11,424,413 unique contextual patterns were used. For TPL 2.0 for categorizing nouns, more than 900 million sentences from ClueWeb123
https://lemurproject.org/clueweb12/specs.php.
were extracted by us to derive an enlarged set of 130,536 nouns and 118,648 contextual patterns. Preprocessing details of the ClueWeb12 dataset for TPL 2.0 experiments can be found in [46]. ClueWeb12 is a collection of about 733,019,372 English web pages (approximate ly 6TB of compressed data) in Web Archive format (WARC).4
https://www.iso.org/standard/44717.html.
Text dataset
A subset of ten curated benchmark datasets (shown in Table 3) consisting of a mix of long and short words with varying number and sizes of sentiment classes (positive, negative, neutral and irrelevant) were used in the TSC experiments. The Covid-Sentiment is a manually labelled data which is a subset derived from [63] using Tweets ID for April 2020 and May 2020. 47,386 tweets were extracted with the help of Twitter API. The tweets in languages other than English (ex: French, Hindi, Mandarin, Portuguese) were removed. Extensive pre-processing of 29,981 English language tweets from the original dataset such as removal of HTML tags, @Username, Hashtags, URLs, and incorrect spellings was performed. A total of 8,003 hand labelled tweets were prepared for experimentation. The 20 Newsgroups dataset introduced in [64], contains approximately 20,000 newsgroup posts and this dataset is partitioned (nearly) evenly across 20 different newsgroups organized into broader categories: computers, recreation, religion, science, sale, and politics. Details of the curated datasets used in the TSC experiments can be found in [50].
Precision@30 metric was used for evaluating the TPL and FRL classifier performance. In each iteration, Precision@N is calculated as follows:
As the dataset is not labeled, the correctness of an instance was judged manually.
Categorizing nouns
11 categories were used in the experiments: Company, Disease, KitchenItem, Person, PhysicsTerm, Plant, Profession, Sociopolitics, Website, Vegetable, Sport. Each category was initialized by 5–8 seed nouns (trusted) for 10 iterations. For every category, the top 5 new nouns promoted as “trusted” per iteration. The tolerance threshold was set to 50% and the weight values were set to: 0.5, 0.25 and 0.25.
TPL 2.0 and TPL 1.0 [71] and FRL [59] and CBS [48] results with Precision@30 per category
Categories
Iteration 5
Iteration 10
TPL 2.0
TPL 1.0
CBS
FRL
TPL 2.0
TPL 1.0
CBS
FRL
Company
100
100
100
100
100
100
100
100
Disease
100
100
100
100
100
100
100
100
KitchenItem
97
100
94
97
97
100
94
73
Person
100
100
100
100
100
100
100
100
PhysicsTerm
97
93
100
67
97
90
100
77
Plant
94
100
100
77
97
97
100
100
Profession
100
100
100
100
100
100
87
100
Sociopolitics
94
100
47
93
97
100
34
87
Sport
100
97
97
100
100
100
100
100
Website
97
90
94
97
97
90
90
93
Vegetable
74
93
83
83
90
63
48
47
Average
95.7
97.5
92
92
97.7
94.5
87
89
Categorizing relations
The following 11 categories were used: Athlete_Team, CEO_Company, City_Country, City_State, Coach_Team, Company_City, Stadium _City, State_Capital, State_Country, Team_vs_Team. 6 seed instances were used for 10 iterations. In every iteration, the top 5 new relations in every category were promoted as trusted relations for subsequent iterations. All the parameter values for and were the same as the ones used for categorizing nouns. To evaluate promotion-based results, the same steps implemented by CPL [47] were applied. 30 pairs were sampled from all the promoted pairs and Precision@30 values was calculated.
Precision@30 of TPL 1.0, CPL and FRL for promotion-based method
Categories
TPL
FRL
CPL
1
5
10
1
5
10
10
Athlete_Team
100
96
87
100
100
83
100
CEO_Company
100
100
100
100
100
100
100
City_Country
100
100
100
100
93
96
93
City_State
100
100
100
100
100
100
100
Coach_Team
100
100
93
100
100
100
100
Company_City
40
84
97
100
100
100
50
Stadium _City
80
92
70
80
92
90
100
State_Capital
100
100
63
100
88
43
60
State_Country
100
100
100
100
100
100
97
Team_vs_Team
100
84
80
100
96
100
100
Average
92.0%
95.6%
89.0%
98.0%
96.6%
91.2%
90.0%
Table 5 gives the promotion-based results for all three algorithms. Table 6 gives the ranking-based results. In both ranking and promotion-based methods, correctness results were verified manually. Bold-faced values indicate average values for all categories at the end of iteration 10. Based on the results in Table 6, the average precision after 10 iterations for the proposed FRL algorithm is significantly better than TPL using a ranking-based method. For this data set, one can observe that FRL is able to handle concept drift better than TPL in all categories. It is important to note that FRL enforces mutual exclusion constraint, whereas TPL was able to maintain high precision with no externally defined constraints. From Table 5, it can be seen that FRL performs better than TPL and CPL. It is noteworthy that CPL enforces 3 forms of constraints during the learning process [47]. Here FRL was able to do better than TPL in terms of concept drift only in two categories, Athlete_Team and Team_vs_Team. With CPL, the only result that was available was after the tenth iteration.
Precision@30 for relations TPL 1.0 and FRL for ranking-based method
Categories
TPL
FRL
Iter. 1
Iter. 5
Iter.10
Iter. 1
Iter. 5
Iter. 10
Athlete_Team
100
90
87
97
100
97
CEO_Company
100
100
100
100
100
100
City_Country
100
100
100
93
100
100
City_State
100
100
100
97
100
100
Coach_Team
93
93
93
100
100
100
Company_City
83
90
93
97
100
100
Stadium _City
97
93
80
93
70
93
State_Capital
100
97
73
93
83
76
State_Country
100
100
100
90
100
100
Team_vs_Team
93
83
80
100
100
100
Average
96.6%
94.6%
90.6%
96%
95.3%
96.7%
TPL’s top-16 ranked instances for selected categories. Incorrect instances are boldfaced [44]
PhysicsTerms
Socipolitics
Vegetables
Iteration 1
Inertia
Socialism
Zucchini
Acceleration
Democracy
Spinach
Gravity
Dictatorship
Cucumber
Buoyancy
Monarchy
Tomato
Velocity
Independence
Broccoli
Momentum
Justice
Lettuce
Magnetism
Equality
Celery
Resonance
Pluralism
Cabbage
Curvature
Internationalism
Kale
Electromagnet.
Federalism
Cauliflower
Density
Secularism
Asparagus
Elasticity
Liberalism
Carrots
Surface Tension
Hegemony
Tomatoes
Polarization
Self-determ.
Avocado
Vibration
Unification
Eggplant
Entropy
Capitalism
Carrot
Iteration 10
Density
Humanism
Zucchini
Conductivity
Pluralism
Cabbage
Intensity
Federalism
Kale
Viscosity
Internationalism
Celery
Permeability
Nationalism
Cauliflower
Velocity
Rationality
Eggplant
Brightness
Liberalism
Carrots
Attenuation
Secularism
Asparagus
Luminosity
Individualism
Tomatoes
Reflectance
Democracy
Spinach
Sensitivity
Environmentalism
Squash
Amplitude
Morality
Cucumber
Thickness
Pragmatism
Melon
Frequency
Spirituality
Chicken
Water cont.
Regionalism
Tofu
Salinity
Subjectivity
Shrimp
Concept drift
One of the issues with semi-supervised learning is the problem of concept drift where incorrect categorizing of entities results over a period of time due a limited number of trusted (seed) labeled examples and a large number of unlabelled examples. This problem is mitigated by constraining the learning process (e.g., simultaneously learning independent classifiers, specifying mutual exclusion or other constraints). Table 7 is a illustration of incorrect promotions during the iterative learning process.
The motivation for designing TPL 2.0 was to explore its performance two important issues: scalability (larger noun and contextual patterns) and concept drift (observed in terms of increased number of iterations). The precision value at the end of the twentieth iteration for TPL 2.0 was 96.2%. The choice of a tolerance rough set-based learner was motivated by the fact that TPL 1.0 did not require definition of any external constraints to constrain the learning process when compared with the three bench-marked algorithms: CPL, CBS and FRL.
SBERT vector-based weighted F1-score (rounded) results for six classifiers. Best results are in bold-face
Dataset
TSC-mean
RF
ME
SVM
SGD
LGBM
Covid-Sentiment
55
44
57
57
57
56
U.S. Airline
77
77
77
77
75
77
IMDB
76
69
73
73
72
72
SST-2
85
85
85
86
85
85
Sentiment140
70
68
72
72
66
70
SemEval 2017
60
54
64
63
63
60
Sanders corpus
69
70
76
74
76
75
UCI Sentence
89
84
86
87
87
83
AG-News
82
79
88
81
88
83
20-Newsgroups
66
41
58
52
52
53
SBERT vector-based AUC-ROC score for six classifiers. Best results are in bold-face
Dataset
TSC-mean
RF
ME
SVM
SGD
LGBM
Covid-Sentiment
0.65
0.71
0.68
0.70
0.70
0.67
U.S. Airline
0.75
0.77
0.76
0.77
0.77
0.79
IMDB
0.76
0.83
0.83
0.86
0.88
0.87
SST-2
0.85
0.82
0.81
0.82
0.82
0.73
Sentiment140
0.70
0.71
0.74
0.72
0.73
0.73
SemEval 2017
0.66
0.71
0.71
0.71
0.70
0.72
Sanders corpus
0.77
0.77
0.78
0.79
0.80
0.76
UCI Sentence
0.90
0.72
0.78
0.77
0.76
0.67
AG-News
0.87
0.86
0.90
0.90
0.90
0.88
20-Newsgroups
0.82
0.81
0.86
0.86
0.87
0.82
TSC results
Table 8 gives weighted F1-score with the following benchmark algorithms: Random Forest (RF) [72], Maximum Entropy (ME), Support Vector Machine (SVM) [73], Stochastic Gradient Decent (SGD) [74] and Light Gradient Boosting Machine (LGBM) [75] classifier implementations in Scikit-learn.7
https://scikit-learn.org/stable/.
The range of tolerance values used by TSC for different datasets varied from 0.08 to 0.38. The best performance was observed in the U.S. Airline, IMDB, UCI SST-2 and 20-Newsgroup datasets and comparable with the COVID-Sentiment, SST-2 and Sentiment 140 datasets. Our experiments leads us to conclude that balanced tolerance classes with two sentiment classes (e.g., SST-2 and UCI sentence datasets) generated by the SBERT vectors results in good semantic similarity between vectors. The weighted F1-score with these two datasets is over 85%. Table 9 gives the AUC-ROC score for all the tested classifiers. A pair-wise comparison of all combinations (ovo) with the weighted average score parameter was used for obtaining the results. Based on the AUC-ROC score, the TSC algorithm performs best in the UCI and SST-2 datasets that have approximately similar number of tolerance classes and two classes. It is also noteworthy that the UCI and AGnews datasets have the best separability (90%) and IMDB (88%), 20-Newsgroups (87%) and SST-2 (85%) have scores above 84% based on the tested classifiers. Since SGD is a linear SVM, for most datasets, the weighted F1 scores are either similar or identical. Results with AUC-PRC score and a more detailed reporting of results can be found in [61]. Overall, it was observed that TSC performs poorly with highly imbalanced datasets having more than two sentiment classes.
Conclusion and future work
In this paper, we review novel tolerance-based rough and fuzzy-rough hybrid models that were developed specifically for representing linguistic entities as well as semi-supervised machine learning algorithms for categorizing linguistic relational facts from web corpora. We also present a recently developed tolerance near-set based supervised machine learning algorithm for non-topic classification (e.g., sentiments and text). Performance of the presented algorithms was discussed in terms of bench marked datasets, and algorithms. With the tolerance rough set model and semi-supervised learning, the practical challenges when working with large NER datasets can be summarized as: i) determination of an optimal tolerance value , and the weights , and , ii) efficient construction and processing of the crisp co-occurrence matrix. With the fuzzy rough set model, the challenges are: i) to determine the optimal parameters and for the S-function to construct a graded co-occurrence matrix, ii) to define constraints to minimize concept drift.
Future work with NER applications and tolerance-based granular methods will involve categorizing fine-grained entity types. For example, in BioNER (biomedical NER), with severe acute respiratory syndrome virus, other fine-grained types such as sars, mers, covid-19 in addition to categorizing common biomedical entity types such as genes, chemicals and diseases is important. With tolerance near set-based classification learning, experiments with other word embeddings methods (e.g., Word2Vec), as well as examining the impact of the bag-of-words representation dimensionality will be part of work in future. Experiments with other pre-trained transformer-based vectors (besides SBERT) would be considered. Parallel computing with CUDA8
for co-occurrence and distance matrices are a natural extension for optimizing the performance of the granular tolerance-based algorithms.
Footnotes
Acknowledgments
Sheela Ramanna’s research has been supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) discovery grant 194376. The author wishes to acknowledge the contribution of her graduate students: Cenker Sengoz, Aditya Bharadwaj, Hoora Rezai Moghaddam and Vrushang Patel.
References
1.
NadeauDSekineS. A survey of named entity recognition and classification. Linguisticae Investigationes. 2007 January; 30(1): 3-26. Publisher: John Benjamins Publishing Company. Available from: http://www.ingentaconnect.com/content/jbp/li/2007/00000030/00000001/art00002.
2.
PangBLeeLVaithyanathanS. Thumbs up? Sentiment Classification using Machine Learning Techniques. arXiv; 2002. Available from: https://arxiv.org/abs/cs/0205070.
3.
GiachanouACrestaniF. Like it or not: A survey of twitter sentiment analysis methods. ACM Computing Surveys. 2017; 49(2): 1-41.
4.
MedhatWHassanAKorashyH. Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal. 2014; 5(4): 1093-1113. Available from: https://www-sciencedirect-com-443.web.bisu.edu.cn/science/article/pii/S2090447914000550.
5.
ZhangLWangSLiuB. Deep Learning for Sentiment Analysis : A Survey. arXiv; 2018. Available from: https://arxiv.org/abs/1801.07883.
6.
HusseinDMEDM. A survey on sentiment analysis challenges. Journal of King Saud University – Engineering Sciences. 2018; 30(4): 330-338. Available from: https://www-sciencedirect-com-443.web.bisu.edu.cn/science/article/pii/S1018363916300071.
7.
ZadehLA. Towards a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic. Fuzzy Sets Systems. 1997; 177(19): 111-127.
8.
PawlakZ. Rough sets. International Journal of Computer & Information Sciences. 1982; 11(5): 341-356.
9.
PetersJF. Near sets. General theory about nearness of objects. Applied Mathematical Sciences. 2007; 1(53): 2609-2029.
10.
PetersJF. Near sets. Special theory about nearness of objects. Fundamenta Informaticae. 2007; 75(1-4): 407-433.
11.
PoincaréH. Science and Hypothesis. Brock University: The Mead Project; 1905. L. G. Ward’s translation.
12.
ZeemanEC. The topology of the brain and visual perception. Univ of Georgia Institute Conf Proc. 1962. pp. 240-256. M.K. Fort, Jr. (Ed.), Topology of 3-Manifolds and Related Topics, Prentice-Hall, Inc.
13.
SossinskyAB. Tolerance space theory and some applications. Acta Applicandae Mathematicae: An International Survey Journal on Applying Mathematics and Mathematical Applications. 1986; 5(2): 137-167.
14.
PolkowskiLSkowronAZytkowJ. Tolerance Based Rough Sets. In: Lin TY, Wildberger M. (eds.) Soft Computing: Rough Sets, Fuzzy Logic, Neural Networks, Uncertainty Management, Knowledge Discovery. San Diego: Simulation Councils Inc.; 1994. pp. 55-58.
15.
NieminenJ. Rough tolerance equality and tolerance black boxes. Fundamenta Informaticae. 1988; 11: 289-296.
16.
NovotnýMPawlakZ. Black Box Analysis and rough top equality. Bull Polish Academy of Sciences, Technical Sciences. 1985; 33: 105-113.
17.
MarcusS. Tolerance rough sets, cech topologies, learning processes. Bull Polish Academy of Sciences, Technical Sciences. 1994; 42(3): 471-487.
18.
SchroederM, M W. Tolerance and weak tolerance relations. Journal of Combinatorial Mathematics and Combinatorial Computing. 1992; 11: 123-160.
19.
NakamuraA. Fuzzy rough sets. Note on Multiple-Valued Logic in Japan. 1988; 9(8): 1-8.
20.
DuboisDPradeH. Rough fuzzy sets and fuzzy rough sets*. International Journal of General System. 1990; 17(2-3): 191-209.
21.
Del CerroFPradeH. Rough sets, twofold fuzzy sets and modal logic Fuzziness in indiscernibility and partial information. The Mathematics of Fuzzy Systems. 1986; 103-120.
22.
De CockMCornelisC. Fuzzy rough set based web query expansion. In: Proceedings of Rough Sets and Soft Computing in Intelligent Agent and Web Technology; 2005. pp. 9-16.
23.
CockMDCornelisCKerreEE. Fuzzy Rough Sets: Beyond the Obvious. In: Proceedings of the 2004 IEEE International Conference on Fuzzy Systems. Vol. 1; 2004. pp. 103-108.
24.
PoliGLlapaECecattoJSaitoJPetersJRamannaS, et al. Solar flare detection system based on tolerance near sets in a GPU-CUDA framework. Knowledge-Based Systems Journal, Elsevier. 2014; 70: 345-360.
25.
KawasakiSNguyenNBHoTB. Hierarchical Document Clustering Based on Tolerance Rough Set Model. In: Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery; 2000. pp. 458-463.
26.
HoTBNguyenNB. Nonhierarchical document clustering based on a tolerance rough set model. International Journal of Intelligent Systems. 2002; 17: 199-212.
27.
NgoCL. A tolerance rough set approach to clustering web search results. Warsaw University; 2003.
28.
YiGHuHLuZ. Web Document Classification Based on Extended Rough set. In: Sixth International Conference on Parallel and Distributed Computing Applications and Technologies (PDCAT’05); 2005. pp. 916-919.
29.
WuYDingYWangXXuJ. On-line hot topic recommendation using tolerance rough set based topic clustering. J Comput. 2010; 5: 549-556.
30.
ShiLMaXXiLDuanQZhaoJ. Rough set and ensemble learning based semi-supervised algorithm for text classification. Expert Syst Appl. 2011 May; 38(5): 6300-6306.
VirginiaGNguyenHS. A semantic text retrieval for indonesian using tolerance rough sets models. Transactions on Rough Sets. 2015; LNCS 8988(XIX): 138-224.
33.
ŚwiebodaWMeinaMNguyenHS. Weight Learning for Document Tolerance Rough Set Model. In: Lingras P, Wolski M, Cornelis C, Mitra S, Wasilewski P, eds. Rough Sets and Knowledge Technology. Berlin, Heidelberg: Springer Berlin Heidelberg; 2013. pp. 385-396.
34.
NguyenSHNguyenHS. An Approach to Semantic Indexing Based on Tolerance Rough Set Model. In: Nguyen NT, van Do T, le Thi HA, eds. Advanced Computational Methods for Knowledge Engineering. Heidelberg: Springer International Publishing; 2013. pp. 343-354.
35.
SwiebodaWKrasuskiANguyenHSJanuszA. Interactive method for semantic document indexing based on explicit semantic analysis. Fundam Informaticae. 2014; 132(3): 423-438. Available from: https://doi.org/10.3233/FI-2014-1052.
36.
NguyenHS. Applications of Tolerance Rough Set Model Semantic Text Analysis. In: Ropiak K, Polkowski L, Artiemjew P, eds. Proceedings of the 28th International Workshop on Concurrency, Specification and Programming, Olsztyn, Poland, September 24–26th, 2019. vol. 2571 of CEUR Workshop Proceedings. CEUR-WS.org; 2019. Available from: http://ceur-ws.org/Vol-2571/CSP2019_paper_18.pdf.
37.
RamannaSPetersJFSengozC. Application of tolerance rough sets in structured and unstructured text categorization: a survey. In: Thriving Rough Sets. Springer; 2017. pp. 119-138.
38.
SrinivasanPRuizMEKraftDHChenJ. Vocabulary mining for information retrieval: Rough sets and fuzzy sets. Information Processing & Management. 2001; 37(1): 15-38.
39.
CornelisCDe CockMRadzikowskaAM. Fuzzy Rough Sets: From Theory into Practice. In: Handbook of Granular Computing. John Wiley & Sons, Ltd; 2008. pp. 533-552.
40.
BeheraBKumaravelanG. Text document classification using fuzzy rough set based on Robust nearest neighbor. Soft Computing. 2021 08; 25.
41.
MitchellTCohenWHruschkaETalukdarPBetteridgeJCarlsonA, et al. Never-ending learning. Communications of the ACM. 2018; 61(5): 103-115.
42.
DongXLGabrilovichEHeitzGHornWLaoNMurphyK, et al. Knowledge Vault: A Web-Scale Approach to Probabilistic Knowledge Fusion. In: The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, New York, NY, USA; 2014. pp. 601-610.
43.
CurranJRMurphyTScholzB. Minimising semantic drift with mutual exclusion bootstrapping. In: Proc. of PACLING; 2007. pp. 172-180.
44.
SengozC. A Granular-based Approach for Semi-supervised Web Information Labeling. University of Winnipeg; 2014. Supervisor: S.Ramanna.
45.
BharadwajA. Fuzzy Rough Set Based Unstructured Text Categorization. University of Winnipeg; 2017. Supervisor: S.Ramanna.
46.
MoghaddamHR. Exploring Scalability and Concept drift issues in Learning Categorical facts with Tolerance Rough Sets. University of Winnipeg; 2019. Supervisor, S.Ramanna.
47.
CarlsonABetteridgeJWangRCHruschkaER JrMitchellTM. Coupled semi-supervised learning for information extraction. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining; 2010. pp. 101-110.
48.
VermaSHruschkaER Jr. Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction. In: ECML PKDD Part II LNCS 7524; 2012. pp. 307-322.
49.
MoghaddamHRamannaS. Harvesting patterns from textual web sources with tolerance rough sets. Cell Press, Elsevier. 2020; 1(4): 100053.
50.
PatelV. Short Text Classification with Tolerance Near Sets. University of Winnipeg; 2021. University of Winnipeg, Supervisor: S.Ramanna.
51.
DevlinJChangMWLeeKToutanovaK. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018.
52.
ReimersNGurevychI. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics; 2019. Available from: https://arxiv.org/abs/1908.10084.
SengozCRamannaS. A Semi-supervised Learning Algorithm for Web Information Extraction with Tolerance Rough Sets. In: AMT 2014, LNCS 8610; 2014. pp. 1-10.
55.
SørensenT. A Method of Establishing Groups of Equal Amplitude in Plant Sociology Based on Similarity of Species Content and Its Application to Analyses of the Vegetation on Danish Commons. Biologiske skrifter. I kommission hos E. Munksgaard; 1948.
56.
SeeratpalJRamannaS. Named entity recognition on CORD-19 bio-medical dataset with tolerance rough sets. Transactions on Rough Sets, Springer. 2022; XXIII(13610): 1-10. To appear.
57.
JensenRShenQ. Computational intelligence and feature selection: rough and fuzzy approaches. vol. 8. John Wiley & Sons; 2008.
58.
RadzikowskaAMKerreEE. A comparative study of fuzzy rough sets. Fuzzy Sets and Systems. 2002; 126: 137-156.
59.
BharadwajARamannaS. Categorizing relational facts from the web with fuzzy rough sets. Knowledge and Information Systems. 2019; 61(3): 1695-1713. doi: 10.1007/s10844-018-0505-8.
60.
WolskiM. Perception and classification. A Note on Near sets and Rough sets. Fundamenta Informatica. 2010; 101: 143-155.
61.
PatelVRamannaSKotechaKWalambeR. Short text classification with tolerance-based soft computing method. Algorithms. 2022; 15(8). Available from: https://www.mdpi.com/1999-4893/15/8/267.
ChenELermanKFerraraE. Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set. JMIR Public Health and Surveillance. 2020;6(2):e19273.
64.
LangK. Newsweeder: Learning to filter netnews. In: Machine Learning Proceedings 1995. Elsevier; 1995. pp. 331-339.
65.
RaneAKumarA. Sentiment classification system of Twitter data for US airline service analysis. In: 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC). vol. 1. IEEE; 2018. pp. 769-773.
66.
MaasADalyREPhamPTHuangDNgAYPottsC. Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies; 2011. pp. 142-150.
67.
SocherRBauerJManningCDNgAY. Parsing with compositional vector grammars. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); 2013. pp. 455-465.
68.
RosenthalSFarraNNakovP. SemEval-2017 task 4: Sentiment analysis in Twitter. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017); 2017. pp. 502-518.
69.
KotziasDDenilMDe FreitasNSmythP. From group to individual labels using deep features. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2015. pp. 597-606.
70.
ZhangXZhaoJLeCunY. Character-level convolutional networks for text classification. arXiv preprint arXiv:150901626. 2015.
71.
SengozCRamannaS. Learning relational facts from the web: A tolerance rough set approach. Pattern Recognition Letters. 2015; 67(P2): 130-137.
72.
BreimanL. Random forests. Machine Learning. 2001 10; 45: 5-32.
73.
ChangCCLinCJ. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology. 2011 4; 2: 1-27.
74.
BottouLBousquetO. The Tradeoffs of Large Scale Learning. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. vol. 20; 2007. pp. 1-9.
75.
TianqiCCarlosG. XGBoost. In: Proceedings of the Twenty-First Annual Conference on Neural Information Processing Systems. ACM; 2016. pp. 785-794. Available from: doi: 10.1145/2F2939672.2939785.