Abstract
The argumentation in academic writings is necessary to clearly communicate the ideas of the students. The relations between argumentative components are an essential part since this shows the contrast or support of the presented ideas. In this paper, we present two approaches to relation identification between pairs of components. In the first, we detect initially which components are related, to later classify them in support or attack relation. In the second approach, we identify directly which components have a support relation. For these approaches, we employed machine learning techniques with representations of several lexical, syntactic, semantic, structural and indicator features. Experiments in argumentative sections of academic theses showed that the models achieve encouraging results solving the task, and revealing the argumentative structures prevailing in student writings.
Keywords
Introduction
Argumentation in academic writings is necessary to clearly communicate the ideas of students. This is a critical skill in academic writing. This ability consists of formulating an argumentation structure that provides numerous arguments to support each claim presented in academic texts. An argument is a set of statements (i.e. premises) that individually or collectively provide mainly support, but some can oppose, to a claim (conclusion). The identification of these relations (support or attack) is an essential part of argument analysis. This task is part of argument mining, which aims to identify argumentative relations of types such as support and attack between arguments in natural language, by classifying pairs of segments of text as an attack, support or neither [15]. The automatic analysis of arguments in theses becomes a necessity in the academic field, to facilitate the analysis of long texts and shorten the times of revision. In previous studies, we approached the task of classification of argumentative components [8] and assessment at the paragraph level [9]. In this paper, we focus on a fine-grain analysis at argumentative relation level.
In recent years, researchers have studied the task of automatic processing of arguments in a variety of fields, for instance in Law, where the aim is to facilitate access to jurisprudence that supports a case [19, 31]. In political discourses, automatic prediction of relations has been studied for classification of pairs of arguments as support or attack relations [17]. In scientific articles, particularly in biomedicine, automatic processing of arguments can be applied to more rapidly identify arguments for or against a hypothesis under investigation [11]. In social networks, the analysis of an argument is also employed to identify the overall stance (for or against) of comments in a debate [2]. There are also studies oriented to identify support relations between components in persuasive essays [25]. Still, there remains an absence of studies for larger and complex academic works such as academic theses, in particular in the Spanish language.
In this paper, we present models for identification of argumentative relation using machine learning techniques with representations of several lexical, syntactic, semantic, structural and indicator features. We propose two approaches for the identification of the argumentative relations between a pair of components. In the first, we present a model to detect initially which components are related, that is, if the pair of components has an attack or support relation, then afterwards we move to develop a model to classify the type of relation. In the second approach, we identify directly which components have a support relation, since this type is prevalent in students arguments. To evaluate our model, we created a corpus of thesis sections (problem statement, justification, conclusions) with annotated argumentative components and relations.
The paper is structured as follows. In Section 2, we discuss related work for argument relation identification. In Section 3, we present a theoretical background for argumentation structures. Section 4 details the corpus used in the experiment. The proposed features and learning approach are described in Section 5. In Section 6, we report the results of the efficacy of our models for argument relation identification. Finally, we conclude with final remarks and work in progress.
Related work
Recently several researchers have addressed the task of argument relation identification. After identifying the types of argumentative components and their relations, the argumentative structure can be elucidated. In this section, we present previous related research.
In the case of the work of [19, 20] for legal texts, the use of a Context-Free Grammar (CFG) allows to obtain a tree expressing the argument structure. The CFGs are not only employed to identify relations but also to classify premises and conclusions in legal texts at sentence level, as detailed in [19].
Moreover, using computational learning to identify relations between components, in [25] the authors tackled the identification of pairs of components with a supporting relation in persuasive essays. To create the training set, all possible pairs between components were assembled, later indicating if there exists a support or non-support relation, obtaining 989 (15.6%) support pairs and 5,341 (84.4%) pairs of non-support. The attributes used to represent the pairs were of four types: structural, including the number of tokens and the number of punctuation marks; lexical attributes such as combinations of word pairs, first words, and modal words; syntactic expressed in terms of production rules; and indicators such as argumentative markers with binary characteristics of appearance, and prediction of the type of components (whether is a premise, conclusion or major conclusion). Subsequently, using a support vector machine algorithm, the authors reported an accuracy of 86.3% and an F1 measurement of 0.519 for the identification of pairs of components with a support relation. In [23], using the same features, relations of attack and support between argument components were identified. They report an F measure of 0.388 using a maximum entropy classifier with the implementation in MALLET (Machine Learning for Language Toolkit).
A similar approach is observed in the work of [4], which also applies computational learning with a set of features to represent pairs of sentences. The authors used Bag of Words and attributes that relate two sentences such as similarity based on WordNet 1 , as well as distance and textual entailment measures. For sentiment features, the authors considered lists of words, discursive markers (for instance, SentiWordNet 2 tool). Employing the Random Forest classifier, the authors identified relations between two sentences as support, attack or none, and reported an accuracy of 77.5% using a corpus with 854 pairs of messages coming from social media. However, the authors do not report a level of agreement for the annotation study, as it was still ongoing.
In [14], a method to identify relations was proposed with semantic similarity between sequential propositions of the same paragraph to determine the presence of a connection (support/conflict). To obtain the similarity between two propositions, WordNet was used to determine the distance of each word of the first proposition with each word of the second proposition. This similarity is inversely proportional to the number of nodes along the shortest path between the synsets of those words. The results in precision were 0.82, with a recall of 0.56, and an F1 measure of 0.67 for the identification of a connection between two propositions. The data set utilized is based on the Araucaria corpus [5]. They extracted only 78 complete arguments, with a total of 404 propositions. The types of argumentation schemes considered were expert opinion and positive consequence.
Another approach to identify the relations between arguments is by textual entailment. An analysis was done on formal texts (debates) of Debatepedia 3 , where 100 pairs were selected to train the textual implication tool EDITS 4 and 100 pairs extracted for test purpose [2]. The implication between 2 sentences was measured and if they found an entailment, it was classified as support; otherwise, it was labeled as an attack. The authors reported a 75% accuracy for this classification task.
The prediction of relations in political discourses in monological form where there is no direct interaction between opponents is analyzed in [17]. A corpus was created, based on the transcription of speeches and official statements issued by US Presidents Nixon and Kennedy, during the 1960 presidential campaign. The pairs of arguments were annotated as support or attack relations. The analysis is developed in two steps. The lexical overlap between fragments, the position of the mention of the subject in the fragment, and the cosine similarity between the fragments are the features considered in the first step. The second step employs features such as the lexical overlap, the number of words with negation, the thematic keywords represented with the embedding of word2vec [18], textual entailment between fragments, and a sentiment score. Both steps apply an SVM classifier with a radial kernel from the LIBSVM implementation. In step one, authors achieved an F1 measure of 0.65 for the classification of related and unrelated pairs, and the score for classification of attack and support was an F1 measure of 0.77. The authors indicate that they achieved a better result by performing the classification in sequence (steps one and two) than by performing a multi-class classification (i.e. with the three classes: support, attack and unrelated).
A deep learning architecture is employed in [6] to identify relations of types such as support, attack or none, in texts of movies, technology, and politics. The corpus has a proportion of 31% attack relations, 32% support relations and 37% unrelated. The authors proposed an architecture which receives as input the words of the two text segments to be encoded with Glove [22] to be later processed by two parallel LSTM recurrent networks which generate a representation for the two texts separately. The representations obtained by the LSTM networks are concatenated to be processed by a softmax classifier. With the proposed model, an F1 measure of 0.89 is reported for the identification of support, attack or unrelated relations.
Argumentative structures
An argument consists of several components. These components are a set of assertions, that individually or as a whole support another statement [3]. The assertion supported is a conclusion (claim). The conclusion is the principal component of an argument. There is only one conclusion for each argument; however, each conclusion can be based on a series of contrasting or supporting assertions. Assertions that provide support (or attack) are called premises. A premise is a reason provided by the writer to convince the reader of the conclusion. These components are linked with support or attack relations to model the structure of the argument. Among argumentation theories [7, 29], the consensus is that the structure of an argument consists of several argumentative components. In this work, we adhere to the conclusion-premise model presented by Freeman [7] since introduces structures employed in argument mining [12].
A graphical representation of an argument structure facilitates the understanding of how argument components are associated. Argument diagramming is the area that supports students in formulating their arguments. Each premise and conclusion is associated to a letter that is further expressed as a node of a graph. Then, directed edges (arrows) are established between nodes to indicate relations between the components. A simple argument has only one premise that is used to support one conclusion [28]. [Today educational institutions have a greater number of computers with Internet.]/P1 [
As we can observe in example (1), the first sentence is a premise (in square brackets /P1) supporting a conclusion in the second sentence (in square brackets /C2). In a simple argument, a premise provides elements to sustain the veracity of the associated conclusion. Figure 1 illustrates a simple argument structure in which the premise P1 supports the conclusion C2 (on the right side).

A simple argument structure as graph: A premise with a support relation to a conclusion.
As we can notice in the argument example, the word “therefore” plays an essential role in the identification of a possible conclusion. These patterns of words are called argumentative markers and can help with the detection of elements in an argument.
As mentioned previously, arguments can have more than one premise as support. In Figure 2, four types of structures for arguments are depicted. A convergent structure (a) has several supporting premises. In this case, either one of the two premises can be eliminated and the conclusion is still supported [28]. A linked structure (b) has two premises which together are used to provide support for a conclusion. A divergent structure (c) has a unique premise to support several conclusions. A serial structure (d) has arguments deployed in successive order. In this type of structure, the conclusion of one component acts as a premise for another element [7]. These simple structures can be joined to form a more complex graph.

Types of argument structures.
The corpus created to identify the argumentative aspects in undergraduate and graduate academic writings comes from the Coltypi collection [10]. This collection has 468 theses and research proposals from the area of information technology and computing, written in Spanish. The texts come from undergraduate (college and higher technical studies) and graduate (master and doctorate) levels. In particular, our study focuses on the sections of Problem statement, Justification and Conclusions, since these are considered highly argumentative [16].
The corpus was built with 444 sections with at least two annotations per section, after following an annotation guide. The team consisted of seven annotators from fields related to linguistics who worked in different subsets of the corpus. The results of the annotation process are provided below. The analysis of the level of agreement for the argumentative annotations was done considering all possible combinations of pairs of argumentative components per paragraph. Therefore, the paragraphs taken into account have at least one conclusion and one premise. We also considered the direction of the relation, from a source component to a target component, for example, in a given relation, a premise that supports a conclusion and vice versa. We found 663 paragraphs fulfilling these criteria, with a total of 6,370 pairs of relations of support, attack and unrelated types. The level of agreement computed for the annotations of support, attack and unrelated relations was a kappa of 0.569, corresponding to a “moderate” level [13]. The level of agreement for the support relation (with support and without support) was kappa of 0.574. For the attack relation (with attack and without attack), we got a kappa measure of 0.538. We also calculated the level of agreement taking into account the annotation of relations, grouping the attack and support relations (i.e. with relation and without relation), we obtained a kappa measure of 0.568. In Table 1 we present the agreement levels for the annotation of the relations between components, and for all configurations the agreement corresponds to a “moderate” level.
Agreement level of annotators for argumentative relations
Agreement level of annotators for argumentative relations
Only matching relations between the two annotators were selected to create the final corpus used in the experiments. A total of 5,531 pairs were obtained, of which 15.5% are support relations; 1.4% attack relations; and 83.1% unrelated pairs. The final corpus is composed of 854 support relations and 79 attack relations, we can notice that most relations are of the support type. Support relations are established, mainly, between a premise and a conclusion; for this particular combination of components, 788 relations were found. In addition, we encountered 47 support relations between premises (P→P) and 19 between conclusions (C→C). Attack relations are established, mainly, from a premise to a conclusion; there are a total of 39 attack relations with this combination. Also, there are several attack relations between premises (P→P), with a total of 36 and, to a lesser extent, there are three attack relations from a conclusion to a premise (C→P) and only an attack relation between conclusions (C→C).
We consider the identification of argumentative relations between argumentative components (one component as a source to another as a target) as a task of classification of text pairs as support, attack or unrelated. We explored two approaches for the task, first to classify the existence of argumentative relations as a binary classification to determine if a pair of components are related or unrelated. Afterwards, a classification of attack and support relations is done, focusing mainly on the identification of attack relations, since they are the scarcest. The second approach performs the classification of support relations, since they represent almost all the relations in our corpus, therefore, the most relevant for the structure of arguments in academic texts. The instances employed consider all possible combinations of pairs of argumentative components per paragraph. The features to represent the instances are lexical, syntactic, semantic, structural, indicators, prediction of components, lexical similarity and semantic similarity, extracted before applying machine learning algorithms.
Experimental analysis
In this section, we present the analysis for the argumentative relations for each type of features, that is, to identify whether a pair of argumentative components; one component as a source to another as a target are related. Our first approach includes the detection of related pairs as presented in section 6.1. Subsequently, a classification task of attack and support relations, focusing mainly on the identification of attack relations (more scarce), which required balancing the classes, the model is detailed in section 6.2. The second approach presents a model to identify the support relations since they are the most numerous in the academic texts, this is explained in section 6.3.
Analysis of the existence of argumentative relations
We performed experiments to detect the presence of argumentative relations; that is, given two components, determine if they have either some type of relation (attack or support) or if they are not related. In the experiments, 460 paragraphs were used with at least one conclusion and one premise both with almost exact adjustment in the text limits. We employed 2,402 instances with 519 related (attack or support) and 1,883 unrelated. We performed a stratified cross-validation with 10 folds training several classifiers, such as: Support Vector Machine (SVM), Naive Bayes, Decision Tree (DT), Random Forest, and Logistic Regression (LR) [6], implemented in the Scikit-learn tool [21]. Table 2 shows the best macro-F measure for each representation as the evaluation metric, since our classes are unbalanced.
Results of argumentative relations detection for each feature
Results of argumentative relations detection for each feature
The representation of component prediction (Comp) obtained the best result with an F macro measure of 0.6763, that came out also as the best to identify the instances of the class “with relation” reaching a 0.489 of F measure. We assume that the relations are possible when the text analyzed has a premise and a conclusion. Therefore, the prediction of components provides crucial information to the model. The second best result was obtained with the lexical representation (Lex) reaching the best accuracy of 80.02% with an F macro measure of 0.648 and the best efficacy to identify the class “without relation” with an F-measure of 0.879. Finally, we can observe that the structural representation (Str) reaches an F measure of 0.608 which better identifies the class “without relation” in comparison with the class “with relation”. We assume that the difference in the values of the F measure is due to the imbalance of the classes, i.e. the class “without relation” has a higher number of instances. The less informative features are lexical similarity (Lexsim) and semantic similarity (Semsim), since both achieved an F measure of 0.439. Both features failed to identify any instance of the classes “with relation”, which indicates that the similarity between components does not provide information to identify a relation.
Afterward, we analyzed the representation with all the features as well as the impact of omitting each one of the feature types at a time in the representation. Table 3 shows the result of the classification using these combinations of features in terms of F-measure. The table is ordered by measure F from lowest to highest, the best values are indicated in bold and the lowest values are underlined. The representation with all features obtains an F measure of 0.7527 with the Logistic Regression algorithm. We compared the value obtained by the representation with all the features against the semantic, syntactic, lexical similarity and semantic approaches, we observed they decreased the efficacy of the model achieving values below 0.7522 of F macro measure. Due to these results, the four features that contribute information to the model were selected to assemble an improved representation; these correspond to those in the upper part of Table 3 (Str + Lex + Comp + Ind).
Impact analysis in the detection of argumentative relations
In Table 4 we can observe the efficacy of the improved representation when using the lexical, structural, indicator, and component prediction features with an SVM classifier. The representation obtains an F measure of 0.7716 and a Cohen kappa measure of 0.5439 which corresponds to a “moderate” level.
Best representation for detection of argumentative relations
The structures generated from the best identification model of related components were mostly simple. The paragraphs processed were 460, of which only in 319 some relation was identified. In 228 paragraphs, simple relations were detected, which for the most part, i.e. 121, the source component is a premise and the target component is a conclusion. In addition, 49 simple relations were observed from one premise to another premise as well as 27 relations between conclusions.
Figure 3 depicts the most frequent argument structures. The convergent structures (a) and (b) denote the relation of several premises to a conclusion, in particular, we observed seven structures of type (a) and three of type (b). The convergent structure is that most employed by students since it provides more support to the connected conclusion. In addition, some paragraphs have structures composed of several simple relations such as the type (c) with five structures identified and (d) with seven. Finally, serial structures were observed in which several components are linked that end in a conclusion, such as structures of type (e) and (f). The most commonly appearing serial structures are those of type (f) in which a succession of two premises supports a conclusion.

Structures of arguments identified in the corpus.
We also explored a model to detect the attack and support relations, particularly, to identify a way to differentiate attack relations, since we only have 49 instances. The experiment was carried out using the 49 instances of attack and 49 of support relations to obtain a balanced set. We performed a 5 folds cross-validation using classifiers such as Support Vector Machine (SVM), Naive Bayes, Decision Tree (DT), Random Forest, and Logistic Regression (LR) [6] again from the Scikit-learn tool [21]. The features extracted to classify argumentative relations were detailed in section 5.
In Table 5, we show the result of the best macro-F measure for each individual representation. Indicators feature with classifier LR is observed as the best model to classify relations of attack and support, obtaining a measure F macro of 0.867 and an accuracy of 86.73%. We suppose that the indicators feature achieves the best result because they contain the categories of justification and refutation which provides information to differentiate the attack relations. On the other hand, the least effective feature is the semantic similarity which could indicate a need for lexical information for the task.
Classification result of relations with support / attack
Classification result of relations with support / attack
We also performed an impact analysis of features in the detection of argumentative support and attack relations. The models in which only one feature of the complete set is omitted are presented in Table 6 showing the efficacy of the classification in terms of F measure. The lowest performance of the model was obtained when omitting the syntactic, lexical similarity, and structural features, with 0.7647, 0.7938 and 0.8059 of F1 macro, respectively. This indicates that they contribute information to the model. The semantic feature again shows to negatively affect the model, since by omitting it, we achieve better efficacy. This analysis allows confirming that the best model takes advantage of the indicators features with an LR classifier, achieving an F macro measure of 0.8673.
Impact of features in relations model of support/attack
The structures found with the model using indicators features with LR classifier were 82 simple relations. We identify 21 support relations between a premise and a conclusion. The support relations were also encountered between premises with 8 relations and between conclusions with 9 relations identified. Moreover, the attack relations were found from one premise to another premise with a total of 18 relations. We also observed 13 relations generated from a premise attacking a conclusion.
Based on the results presented, we can notice that the relations between two components using the model created with argumentative indicators with LR classifier presents the best option for identifying the attack or support relations.
We also proposed a model to detect support relations, because most of the relations utilized by students are aimed at reinforcing their statements, so once detected these relations, it is possible to offer a more direct feedback to the student. The experiment was carried out using the 470 instances of support relations; in addition, for the ‘without _support’ class, 49 attack relationships were added with 421 pairs of non-related components randomly chosen, in order to reach balanced sets. A cross-validation was performed in ten folds, employing the following classification algorithms: SVM, Naive Bayes, DT, Random Forest, and LR, from the Scikit-learn tool. We extracted the features of argumentative relations such as: lexical, syntactic, semantic, structural, indicators, lexical, and semantic similarity.
In Table 7, we show the results of the models using individual features, indicating the classification algorithm with the best efficacy considering the F macro measure. The lexical feature obtains the best efficacy with an F macro of 0.70 and an accuracy of 70.9%. The model with the best performance for the class “w/ support” is the structural and the best for the class “w/o support” is the prediction of the component type. These results show that a combination of features such as the lexical, prediction of component type and structural would achieve a better efficacy. The features with less contribution are the lexical similarity and semantic similarity.
Detection results of support relations for each type of feature
Detection results of support relations for each type of feature
To analyze the contribution of each type of feature to the model, we evaluate different models omitting only one of them. The representation with all the features obtains an F macro of 0.7351 with the SVM classifier, as Table 8 shows. When omitting the structural element, the model is affected reaching only an F macro of 0.7053, which indicates the importance of this feature for the task. In the same way, we observed that, by omitting features as the syntactic, indicators, lexical and the prediction of components, the model is also affected, which indicates that these features provide information to the model. On the other hand, the features that negatively affect the model are semantic, and once again lexical similarity, and semantic similarity.
Impact analysis in the detection of support relations
The best configuration for identification of support relations was the selection of lexical, component prediction, structural and indicators features. Table 9 shows the results of the best representation (Lex + Comp + Ext + Ind) applying the SVM algorithm, that reached a measure F macro of 0.7872 with an accuracy of 78.72%. Notice that these figures are close in terms of accuracy and F measures given that this experiment was with a balanced data set.
Best representation for detection of support relations
Finally, the structures identified by the model for detecting support relations were 332 structures, where 219 show simple structures. These structures basically consist of relations between two components and 113 with more complex structures. Simple structures were observed mainly between a source premise and a target conclusion (P → C) with a total of 121 structures with such configuration. In addition, there were 40 support relations from premise to premise (P → P) and 29 support relations from conclusion to a conclusion (C → C). To describe the most complex structures we take as a reference those in Figure 3, where different types of convergent, simple and serial structures are presented. We found eleven convergent structures of type (a) in which two premises support a conclusion. In addition, two structures of type (b) also appeared, with three premises supporting a conclusion. Four simple structures of type (c) (P→C, P→P) and six of type (d) were obtained (P→C, P→C). In addition, we identify serial structures, two of type (e) (P→C→C) and three of type (f) (P→P→C). We can observe that support relations are employed mainly as a premise that supports a conclusion or several premises that support a conclusion such as the convergent structure. We assume that most of the argumentative relations in thesis writings are support relations, since the student mainly presents elements to support his statements, either in his section on the Problem statement, Justification or Conclusion.
In this research work, we have presented a series of methods for the identification of relations between argumentative components in academic thesis. We proposed two approaches for the identification of argumentative relations; the first, to detect initially which components are related, then, classify them as support or attack; and the second approach to identify which components have support relations. The first approach obtains a fine grain relation type nevertheless demands to realize two steps to obtain the result, on the other hand, the second approach obtains the support relations in only one step.
A corpus was built for the investigation of arguments in thesis and research proposals in Spanish. The annotation task was carried out with the help of seven annotators from several fields related to linguistics, who worked in different subsets of the corpus. They reached an agreement between annotators for argumentative relation with a “moderate” level of kappa. We discovered that students establish support relations most of the time, mainly between a premise and a conclusion, and barely questioning (attacking) conclusions. This could reveal a deficiency in student argumentation since is convenient to offer pros and cons for a conclusion. The corpus was the basis for the development and evaluation of the proposed methods.
Regarding the two approaches presented to identify argumentative relations, we conclude that for the second (one-step) approach, an SVM classifier applied on features such as lexical, structural, indicators and type prediction of components is the most effective. In this approach, the structural feature is useful for the identification of relations with support, and, on the other hand, the component type prediction feature provides more information for the detection of unsupported relations. Notice that, except of the lexical features, the rest (structural, indicators and type prediction of component) are relatively language independent, so we plan to explore them for the task in other languages.
The argument structures generated by the one-step model were for the most part, simple, that is, a premise that supports a conclusion, and the most frequent complex structures are convergent with two premises supporting a conclusion. However, an additional number of attack relation instances are required to refine the classification of that type of relation.
In future work, we expect to analyze the integration of the models for argument analysis in order to process them jointly. For example, the components classification and relation identification simultaneously, using optimization algorithms.
We have also reached enough knowledge of the task and the resources to move to explore improvements for the proposed models by incorporating deep learning techniques, such as the use of recurrent neural networks (e.g. long short-term memory LSTM) to represent word sequences of argumentative components.
