Improved argumentative paragraphs detection in academic theses supported with unit segmentation

Abstract

Academic theses writing is a complex task that requires the author to be skilled in argumentation. The goal of the academic author is to communicate clear ideas and to convince the reader of the presented claims. However, few students are good arguers, and this is a skill that takes time to master. In this paper, we present an exploration of lexical features used to model automatic detection of argumentative paragraphs using machine learning techniques. We present a novel proposal, which combines the information in the complete paragraph with the detection of argumentative segments in order to achieve improved results for the detection of argumentative paragraphs. We propose two approaches; a more descriptive one, which uses the decision tree classifier with indicators and lexical features; and another more efficient, which uses an SVM classifier with lexical features and a Document Occurrence Representation (DOR). Both approaches consider the detection of argumentative segments to ensure that a paragraph detected as argumentative has indeed segments with argumentation. We achieved encouraging results for both approaches.

Keywords

Academic writing argumentation analysis machine learning text representation natural language processing

1 Introduction

The arguments in academic theses are essential to sustain their assertions. These arguments have a structure that provides numerous statements to support each claim presented in the thesis. An argument is a set of statements (i.e. premises) that individually or collectively provide support to a claim(a conclusion).

In recent research, automatic argument processing has begun to be studied, where artificial intelligence and argumentation theories are interdisciplinarily applied to improve the process of extracting and recovering information.

For example, in the legal field, argument analysis seeks to facilitate access to jurisprudence that supports a case [18]. In other line, in scientific biomedical articles, argumentation is studied to identifyarguments for or against a hypothesis under investigation [13].

In social networks, arguments are analyzed to identify comments for or against a topic in debate, to observe what is the position of the majority [4] and eventually evaluate these arguments based on whether they comply with an admissible structure [21].

Finally, essay writing is another area where the level of argumentation is also evaluated, to assess the student and offer immediate feedback [23]. However, we have not found studies aimed to analyze textual argumentation in larger academic works such as theses. These writings are often prepared at the end of academic programs to demonstrate the student’s research and writing skills. As a result, such products are important for the student’s further education prospects or future employment. For this reason, a model to detect argumentative paragraphs in theses is essential.

A simple argument contains at least one premise and a conclusion [8]. In Example 1 below, an argument from our corpus is presented with a conclusion which is supported by a single premise.

Example 1:

[The generation of an ontology for a domain represents a challenge,]conclusion [since to generate them requires the services of one or more experts in the domain, with experience in the subject, and experience in the construction of semantic dictionaries.]premise

The conclusion indicates an assertion enunciating the generation of an ontology as a challenge, to support this assertion, a reason is presented in the premise as the need of experts to complete the task. In addition, we observed the argumentative marker ‘since’ which allows to identify the premise that supports the conclusion. Identifying the premise provides the basis for taking the given conclusion as true. We observe in the example two argumentative segments which provide evidence to identify the paragraph as argumentative.

In this paper, we present a model for the detection of argumentative paragraphs supported with the information of argumentative segments. We propose a method for the classification of the presence of argumentative paragraphs using machine learning techniques together with lexical and indicator features. We also offer a method to perform argumentative unit segmentation employing Conditional Random Fields (CRF) with lexical, syntactic,structural and indicator features to classify sequences by token, to identify segments in the paragraph, and then obtain the possible segments of argumentative components. We integrate the result of both methods to perform an effective detection of argumentative paragraphs. To evaluate our model, we created a corpus of thesis sections (problem statement, justification, conclusions) with annotated argumentative paragraphs.

This article is organized as follows. Section 2 briefly reviews related work, closing with details of the corpus for experiments. The detection of argumentative paragraphs as a whole is presented in Section 3, while Section 4 focuses on the identification of argumentative segments. Section 5 details the fusion of the information for the improved detection of argumentative paragraphs by the two methods. Last section discusses conclusions and further work.

2 Related work

The detection of argumentative paragraphs, sentences or clauses has been determined a preliminary step to identify the presence of premises or conclusions. For this, researchers [19] classify argumentative and non-argumentative sentences in the Araucaria corpus, representing sentences with features such as combinations of word pairs, verbs and text statistics. Using a Bayes classifier, they report 73.75% of accuracy.

In addition, [18] employ the corpus of legal texts ECHR with 47 annotated documents, where clauses (sub-sentences) are classified as argumentative or not, using a maximum entropy classifier, and report an accuracy of 80% for the task. Note that legal texts have a particular structure that allows lawyers to identify the arguments.

The identification of argumentative paragraphs in public policy formulation is investigated by [7] who employs five sets of argument categories (justification, explanation, deduction, refutation and conditional) and features based on the mode and tense of verbs. In this work, the authors identify segments of text with argumentation, using a decision tree J48 classification algorithm, reporting an F measure of 0.764.

Also, the identification of argumentation in text segments is studied by [12], who built a corpus of 204 documents collected from social networks, which were annotated with their premises. They used structural, lexical, contextual and grammatical features to represent each sentence. They report an F measure of 0.77 employing a logistic regression classifier.

And the classification of segments of text that correspond to argumentative components is analyzed by [25]. They employed to Part Of Speech (POS) tags, a list of keywords and distributional representations to characterize the texts. They report an F measure of 0.3221 using CRF.

Furthermore, deep learning architectures have been employed in the argumentative unit segmentation task, as observed in the investigation of [1] when using algorithms such as SVM, CRF and Bi-LSTM (bi-directional long short-term memory) with semantic, syntactic, structural and pragmatic features. The authors report the best performance when utilising all the features and an architecture that employs several recurring Bi-LTSM networks, thus reaching an F measure of 0.885 in the identification of argumentative segments in academic essays.

2.1 Corpus

The corpus to identify the argumentative aspects in undergraduate and graduate academic writings comes from the Coltypi collection [11]. This collection has 968 theses and research proposals from the area of information technology and computing, written in Spanish. In particular, our study focuses on sections of Problem statement, Justification and Conclusions, since these are mainly argumentative.

The data-set includes 444 sections with at least two annotations per section. Seven annotators from fields related to Linguistics worked in different subsets of the corpus. This was a challenging task for the annotators since advanced concepts of computer and information science were discussed in the theses. The annotation study covered 1,973 paragraphs. Cohen kappa [6] agreement between annotators for the identification of argumentative paragraphs was 0.399, which corresponds to a ‘fair’ level [16].

We also performed the annotation of argumentative components for the task of argumentative unit segmentation. The annotators tagged the segments as premises and conclusions, as well as the segments without argumentation, which do not have any annotation. We extracted the segments where the annotators agreed on the limits of the selection, also we considered if the segment of one annotator is contained within the segment selected by the other annotator.

Finally, a total of 4,989 segments of premise, conclusion or none types were obtained. The agreement reached for the annotation of argumentative components was 0.461, i.e. a ‘moderate’ level [16]. The complete developed corpus is described in detail in [10], and is available online in the corpus site 1 . In the following sections, we detail the number of instances taken for the experiments.

3 Detection of argumentative paragraphs

This section presents the method for the automatic detection of argumentative paragraphs using machine learning techniques. We use the corpus of arguments detailed in the previous section to perform the experiments. In addition, we consider paragraphs with argumentation as those in which both annotators indicated the presence of argumentation, and at least two argumentative components were identified. These criteria were considered since paragraphs annotated with argumentation were observed but without an agreement at the component level, whereby paragraphs were located indicating argumentation, but without at least two components (one premise and one conclusion) to analyze (260paragraphs).

We found 1,174 paragraphs annotated as argumentative (with_argument) or non-argumentative (without_argument), which met the stated criteria. A proportion of 70.7% (830) of paragraphs was found with arguments and 29.3% (344) paragraphs without arguments.

We employed the NLTK tool to process the paragraphs for tokenization and feature extraction [3]. Representations created for our experiments reflect lexical, syntactic, semantic, and indicator features, which are described next.

Lexical features (Lex): These are based on words and punctuation symbols in texts. Such features consider unigrams as all terms in the text, including some punctuation marks (;:,.), and bigrams, i.e. pairs of consecutive terms in the text, also including punctuation marks. We computed the Tf-Idf (Term frequency - Inverse document frequency) weight for the lexical features thereby it assigns higher values to the most discriminative terms. So lexical feature representation consisted of unigrams and bigrams of words. Also, from previous experiments, we observed that punctuation marks provided information for the classification task. For example, comma (,) alone got an information gain of 0.01944, and the bigram of “. the” (“. el” in Spanish) obtained a value 0.0181 of information gain.

Syntactic features (Syn): we extracted lemmas, Part Of Speech (POS) tags and grammatical categories, i.e., the first letter of the POS tag. These elements in combination were taken as syntactic features. The POS tags that indicate the type of role a token has in a sentence were analyzed, for example, nouns, verbs, adjectives, adverbs, etc. Freeling [20] provides the lemmas that are the canonical form of the word. For example, for ‘ya – que’ (since), the POS tags are ‘RG - CS’ that indicate an adverb ‘RG’ and conjunction ‘CS’. To represent the syntactic features, each token (word or symbol) was represented by joining its lemma with its grammatical category, e.g., the adverb ‘ya’ leads to ‘ya-R’ or the verb ‘investigando’ (investigating) to ‘investigar-V’ (investigate-V). We build a vector to represent the paragraphs using the frequencies of these features.

Semantic features (Sem): we implemented Polyglot [2] with 100,004 word embeddings for Spanish, from Wikipedia. The word embeddings represent words in the vocabulary as a feature vector, learned with a neural network. The used model employs the context of each word to train the vector. Contexts are a set of randomly initialized word vectors, which are terms before and after the trained words. Polyglot 2 contains words with associated vectors that represent the meaning of the words. For each instance, we computed the average of word embeddings vectors contained in each paragraph.

Indicator features (Ind): argumentative markers are useful in identifying components of an argument. These features correspond to five sets of word patterns (argumentative markers), used to obtain the frequency of each of the following categories: justification, explanation, deduction, refutation, and conditional. Pattern sets were created based on the investigation of different sources of argumentative markers [9, 24]. Some examples of these patterns are ‘ya-que’ (since), for the category of justification, that is related to an argumentative premise component; or “por-lo-tanto” (therefore), for the deduction category, indicating a conclusion. The lemmas of the word patterns were used to capture a higher number of expressions. Below we include the lists of the argumentative markers categories for Spanish:

Justification Category: a causa de, a el fin y a el cabo, a el fin y a el postre, a fin de cuenta, como, como mostrar, como ser indicar por, con decir te, dar que, de acuerdo con, de hecho, deber a, deber se a, después de todo, el anterior porque, el motivo ser, el razón ser, el razón ser que, en tanto que, en vista de que, gracia a, motivo de que, no en vano, poner que ser consecuencia de, por causa de, por cuanto, por todo ello, porque, pues, puesto que, razón de que, se poder deducir de, se poder derivar de, se seguir de, ser que, ver que, ya que.

Explanation Category: a causa de, a fin de cuenta, así, de otro modo, deber a, decir de otro modo, el motivo ser, en concreto, en definitivo, en otro palabra, en particular, el razón ser, motivo de que, poner, poner por caso, por ejemplo, por ello, por ese razón, por este motivo, por este razón, razón de que, uno ejemplo, uno poner.

Deduction Category: a consecuencia de, a el fin y a el cabo, ante el anterior, así, así pues, así que, como conclusión, como consecuencia, como resultado, concluir que, conclusión, consecuentemente, consiguientemente, correspondientemente, de acuerdo a el anterior, de ahí que, de este forma, de manera que, de tal forma, de tal manera, deducir que demostrar que, el cual apuntar a el conclusión de que, el cual implicar que, el cual mostrar que, el cual nos permitir inferir que, el cual probar que, el cual significar que, en conclusión, en consecuencia, en definitivo, en fin, en resumen, en resumir cuenta, en sí, en síntesis, en suma, en tal caso, entonces, establecer que, finalmente, implicar que, inferir que, llegar a el, llegar a el conclusión, para, para concluir, para terminar, poder inferir que, por consiguiente, por el que, por el tanto, por ello, por ende, por ese, por este razón, por tanto, por último, probar que, que, resumir, se desprender, se desprender de, se seguir que, ser por ese que.

Refutation Category: a el contrario, a menos, a pesar de, a pesar de todo, ahora, antes bien, aun así, aunque, bien a el contrario, de cualquiera modo, de todo modo, después de todo, empero, en cambio, ese sí, mas, más aun, más bien, muy a el contrario, no obstante, no parecer, pero, pero sin embargo, pesar a, por contra, por el contrario, pues, si bien, sin embargo, sino,sólo que.

Conditional Category: según, con tal que, a condición de que, a menos que, con que, suponer que, aunque, si, en caso de, si y solo si.

Document Occurrence Representation (DOR): we also explore a vector-based representation of terms based on their occurrence in the documents of the corpus. DOR is based on the latent semantics of a term, which can be revealed by the statistical distribution of its occurrence on the documents in the corpus. A term is represented as a vector of weights associated with the documents in the collection. Weights represent the contribution of a document to the semantics of a term [5]. The size of the term vector for a DOR representation is equal to the number of documents in the training set. So in this work, the size of the training set is 1,056 paragraphs, same number as the vector size of DOR representation. We associate a vector with each term in our vocabulary. In addition, we calculated the DOR representation taking as terms the unigrams and bigrams of text. The DOR representation for each instance is obtained by adding all the vectors of terms in a givenparagraph.

First, features representations described above were subject to a Feature Selection process based on information gain, for each type of feature. This is indicated as greater than zero in the corresponding column (FS) of Table 1. In two cases, this was not applicable.

Table 1
Results of argumentative paragraphs detection

FS Classifier Kappa Accuracy F1 Macro F1 with_arg F1 without_arg

Single Semantic >0 NB 0.3914 0.7513 0.6956 0.8258 0.5655

Features Indicator >0 DT 0.3939 0.7658 0.6952 0.8419 0.5484

Syntactic >0 RF 0.4490 0.7939 0.7210 0.8636 0.5784

DOR - SVM 0.4926 0.8041 0.7448 0.8678 0.6217

Lexical >0 DT 0.5044 0.7905 0.7521 0.8496 0.6545

Combinations of features Lex + Syn >0 NB 0.4954 0.7794 0.7468 0.8376 0.6560

Lex + Sem >0 NB 0.5012 0.7777 0.7490 0.8339 0.6641

Lex + Ind >0 DT 0.5247 0.8032 0.7623 0.8609 0.6638

Lex + DOR >0 SVM 0.5260 0.8143 0.7621 0.8735 0.6506

Florou - RF 0.3760 0.7683 0.6834 0.8474 0.5194

Moens >0 RF 0.4831 0.7990 0.7403 0.8637 0.6169

All features >0 SVM 0.5033 0.8066 0.7505 0.8689 0.6321

All the features excluding only one All w/o Lex >0 SVM 0.447 0.7802 0.7229 0.8489 0.5969

All w/o Sem >0 SVM 0.4851 0.7981 0.7416 0.8624 0.6208

All w/o DOR >0 NB 0.5016 0.7836 0.7501 0.8416 0.6586

All w/o Ind >0 SVM 0.5076 0.8092 0.7525 0.8710 0.6340

All w/o Syn >0 SVM 0.5235 0.8152 0.7605 0.8749 0.6460

All features >0 SVM 0.5033 0.8066 0.7505 0.8689 0.6321

		FS	Classifier	Kappa	Accuracy	F1 Macro	F1 with_arg	F1 without_arg
Single	Semantic	>0	NB	0.3914	0.7513	0.6956	0.8258	0.5655
Features	Indicator	>0	DT	0.3939	0.7658	0.6952	0.8419	0.5484
	Syntactic	>0	RF	0.4490	0.7939	0.7210	0.8636	0.5784
	DOR	-	SVM	0.4926	0.8041	0.7448	0.8678	0.6217
	Lexical	>0	DT	0.5044	0.7905	0.7521	0.8496	0.6545
Combinations of features	Lex + Syn	>0	NB	0.4954	0.7794	0.7468	0.8376	0.6560
	Lex + Sem	>0	NB	0.5012	0.7777	0.7490	0.8339	0.6641
	Lex + Ind	>0	DT	0.5247	0.8032	0.7623	0.8609	0.6638
	Lex + DOR	>0	SVM	0.5260	0.8143	0.7621	0.8735	0.6506
	Florou	-	RF	0.3760	0.7683	0.6834	0.8474	0.5194
	Moens	>0	RF	0.4831	0.7990	0.7403	0.8637	0.6169
	All features	>0	SVM	0.5033	0.8066	0.7505	0.8689	0.6321
All the features excluding only one	All w/o Lex	>0	SVM	0.447	0.7802	0.7229	0.8489	0.5969
	All w/o Sem	>0	SVM	0.4851	0.7981	0.7416	0.8624	0.6208
	All w/o DOR	>0	NB	0.5016	0.7836	0.7501	0.8416	0.6586
	All w/o Ind	>0	SVM	0.5076	0.8092	0.7525	0.8710	0.6340
	All w/o Syn	>0	SVM	0.5235	0.8152	0.7605	0.8749	0.6460
	All features	>0	SVM	0.5033	0.8066	0.7505	0.8689	0.6321

The task of detecting argumentative paragraphs was approached as a binary classification to detect whether a given paragraph contains argumentation. In the experiments, we apply a 10-fold stratified cross validation with Scikit-learn Toolkit [22]. We used the same train/test split of each fold in all the experiments. The machine learning tool Weka [14] was used to perform the classification using the default hyper-parameters. We trained the classifiers with the data set for training and applied them to the test data set. We explored the efficacy of four classifiers: Naive Bayes (NB), Decision Tree (DT), Random Forest, and Support Vector Machine (SVM).

Table 1, in the upper (Single Features) part, shows the level of agreement Cohen kappa, accuracy and F1 measures for each representation. We observed that the Decision Tree classifier with lexical features achieved the best macro F measure of 0.752, and the best accuracy of 80.4% is reached by the SVM classifier with DOR representation.

We analyzed the decision tree, created with the lexical features using attributes with information gain (265), and we observed in the first levels of the tree, word patterns that are part of argumentative markers. For instance, the bigram of ‘ya-que’ (since) appears at the root, which is an argumentative marker for identifying premises. In lower levels of the tree, we noticed words that are part of markers, such as ‘por-lo’ (at-it), the argumentative marker to indicate conclusions “por lo tanto” (therefore); as well as the word “debido” (due), indicating a premise as part of the marker “debido a” (due to). In addition, we notice that semantic features and NB classifier obtain the lowest efficacy in terms of accuracy.

Once the efficacy of features was observed individually, we carried out their combinations. We combined the best individual feature ‘Lexical’ with the other as pairs. We also created a representation with all the features to test if we achieve improved efficacy.

Additionally we include the results of solutions previously proposed by authors such as [19] and [7], without seeking a thorough comparison, since we have a different data set. The representation developed by Florou consists of categories of argumentative markers and features based on the mode and tense of verbs. The representation proposed by Moens consists of combinations of all possible word pairs, main verbs, and text statistics.

In Table 1, the results of the described combinations of features are presented, indicating in bold the maximum values. We considered the best model the combination of lexical and indicator features (Lex+Ind) using feature selection with information gain (279), with a DT classifier reaching the best macro F1 measure of 0.7623. The model with the best accuracy reaches 81.43% using lexical features together with a DOR representation (Lex+DOR), and feature selection with information gain with an SVM classifier. Observe that this model identifies better the paragraphs with arguments (with_arg) reaching a 0.8735 of F measure. Moreover, we notice that the representations ‘Lex+Ind’ and ‘Lex+DOR’ achieved higher results than the implementations of the representations proposed by Moens and Florou.

An additional analysis was carried out to identify the contribution of each set of features in the classification task. A full representation was built that includes all types of features. To examine them, representations were created in which only one type of feature of the complete representation is omitted.

In Table 1 bottom, we present the results of the representations with information gain as Feature Selection criteria, including the best classifier for that representation. By omitting the lexical feature (Lex), the lowest efficacy is achieved, indicating that it provides more information to the model to identify argumentative paragraphs. By omitting the syntactic feature (Syn), a highest result is obtained, even higher compared to the full representation, and this means that syntactic features negatively affect the efficacy of the model.

4 Identification of argumentative segments

The segmentation of argumentative units is an indispensable task for the detection of argumentation, which consists of identifying the argumentative sequences of the text. This task was done by classifying the sequences by token. To capture the context around each token, we extract several features detailed below. Conditional Random Field (CRF) was employed in the classification task to capture the token sequences for labeling as in [15].

The argumentative components were coded using the IOB scheme (abbreviation of Inside, Outside, Beginning) as in [27], considering each sentence as a sequence. The argumentative components were represented using the IOB format tags, where the first token of the component is indicated by ‘B-Arg’, the tokens inside the component with ‘I-Arg’ and all non-argumentative tokens with the ‘O’ label.

The representation of each token in the sentences was done by lexical, syntactic, structural and indicator features. The description of each of the features used in the experiments is presented next.

Lexical features (T_Lex) include tokens (words or symbols) in lowercase, as well as the elements around it, to reflect contextual information. The representation of each token is done through a group of variables with their corresponding values, in the form ‘variable = value ‘.

For example, the information taken to characterize the second token of the text “La primera definicion” (in English ‘The first definition’), corresponding to word ‘primera’ (first), considering a window of size one, employs three variables to indicate the word in question (token = first), the previous (-1: token = the) and the following (+1: token = definition). When taking a window of size one in the example, only one word before and one word after the word being characterized, are employed.

Syntactic feature (T_Syn) used to represent each token, utilizes the first two letters of the POS tag to denote its grammatical category, for example, for the verb ‘investigar’ (investigate) is ‘VM’, indicating that is a main verb. An example of the variables used for the representation of the token ‘primera’ (first) with grammatical category ‘AO’ (ordinal adjective) with a window of size one is: postag = AO, -1: postag = DA, +1: postag = NC.

Lexical-syntactic feature (T_Lexsyn) combines the lexical information with the token syntax, for which the syntax tree generated from the sentence is taken. For the construction of the syntax tree, the Stanford Parser [17] tool was employed. Each token was represented as in [26]. That is, for each token the upper node Nw was searched with the same lexical head as the token, which must also have a right sibling Ns. Three attributes were taken, Nw as the upper node with the same lexical head; Ns was employed as the node corresponding to the right sibling of Nw; and Np, as the parent node of Nw.

Indicators feature by token (T_Ind) denotes whether the word is part of an argumentative marker at the beginning, or in the rest of the pattern of the argumentative marker. For instance, the marker “ya que” (being that), the word ‘ya’ (being) is indicated as a token that starts a marker (Bm); and “que” (that), as a word that is part of the marker (Im). Words that are not part of the argumentative markers were assigned with ‘Om’. We consider that the presence of argumentative markers reveals the presence of argumentation to the segmentation model. An example of the variables obtained to represent the token ‘first’ are the following: marker = Om, -1:marker = Om, +1:marker = Om.

The structural feature (T_Est) captures the token location within the sentence, paragraph and section with five categories, and indicates whether it is in the first or last sentence of the paragraph and the first or last paragraph of the section. For example, the variables to represent a token are the following: location.in.sentence = Start, location.in.paragraph = Medium, location.in.section = Start, first.paragraph = False, last.paragraph = False, first.sentence = false, last.sentence = false.

In experiments, we used the corpus of arguments presented previously with 2,971 sentences, composed of 2,865 labels annotated as ‘B-Arg’, 63,399 labels of class ‘I-Arg’ and 59,557 labels of class ‘O’. We performed 10-fold cross-validation using CRF Suite with the averaged perceptron training method.

Table 2, in the single features part, presents the macro F1 measure and accuracy. The best efficacy was observed using a window size of 15 tokens around the token to be tagged. We assume that with this window size is possible to incorporate information from argumentative markers near the token, which helps in the segmentation. The feature with the best result was T_Lexsyn, achieving a macro F1 measure of 0.548. In contrast, the structure feature T_Est achieves the best efficacy for the labeling of the class ‘I-Arg’, which corresponds to the internal part of an argumentative sequence; an F measure of 0.613 is reached. The indicators feature T_Ind is the best for predicting labels that are not part of an argument, i.e., the ‘O’ class.

Table 2
Results of argumentative unit segmentation

F1 Macro Accuracy F1 B-Arg F1 I-Arg F1 O

Single features T_Lex 0.5455 0.5779 0.4775 0.5998 0.5593

T_Syn 0.5212 0.5519 0.4627 0.5967 0.5041

T_Ind 0.4692 0.5505 0.2920 0.5386 0.5769

T_Est 0.4475 0.5227 0.3312 0.6127 0.3987

T_Lexsyn 0.5489 0.5871 0.4681 0.6092 0.5695

Combinations of features T_Lex + T_Ind + T_Est 0.5692 0.6005 0.5038 0.6264 0.5775

Previous + T_Syn 0.5728 0.6018 0.5128 0.6288 0.5770

Previous + T_Lexsyn (All) 0.5733 0.6032 0.5108 0.6276 0.5814

All the features excluding only one All w/o T_Lexsyn 0.5723 0.6016 0.5117 0.6289 0.5763

All w/o T_Est 0.5614 0.5935 0.4935 0.6151 0.5755

All w/o T_Ind 0.5652 0.5957 0.5025 0.6237 0.5694

All w/o T_Syn 0.5675 0.5989 0.5020 0.6243 0.5761

All w/o T_Lex 0.5695 0.5996 0.5074 0.6269 0.5744

		F1 Macro	Accuracy	F1 B-Arg	F1 I-Arg	F1 O
Single features	T_Lex	0.5455	0.5779	0.4775	0.5998	0.5593
	T_Syn	0.5212	0.5519	0.4627	0.5967	0.5041
	T_Ind	0.4692	0.5505	0.2920	0.5386	0.5769
	T_Est	0.4475	0.5227	0.3312	0.6127	0.3987
	T_Lexsyn	0.5489	0.5871	0.4681	0.6092	0.5695
Combinations of features	T_Lex + T_Ind + T_Est	0.5692	0.6005	0.5038	0.6264	0.5775
	Previous + T_Syn	0.5728	0.6018	0.5128	0.6288	0.5770
	Previous + T_Lexsyn (All)	0.5733	0.6032	0.5108	0.6276	0.5814
All the features excluding only one	All w/o T_Lexsyn	0.5723	0.6016	0.5117	0.6289	0.5763
	All w/o T_Est	0.5614	0.5935	0.4935	0.6151	0.5755
	All w/o T_Ind	0.5652	0.5957	0.5025	0.6237	0.5694
	All w/o T_Syn	0.5675	0.5989	0.5020	0.6243	0.5761
	All w/o T_Lex	0.5695	0.5996	0.5074	0.6269	0.5744

Finally, we note that the lexical feature T_Lex is the most effective to identify the beginning of an argument sequence ‘B-Arg’. Based on these results, we combine the representations of T_Lex, T_Ind, and T_Est to verify their performance together.

Table 2, in the combination of features part, shows the best efficacy using the representation with all the features (Previous + T_Lexsyn) with the best accuracy (60.3%) and best macro F1 measure (0.573), this means that the tokens, the grammatical category, the structure, and the markers contribute to the segmentation. This representation identifies better the tokens that are part of an argument ‘I-Arg’, than the tokens of the’ B-Arg’ and ‘O’ classes.

Finally, we generate representations where only one type of feature was omitted to determine the impact of its absence on the model.

In Table 2 bottom, we present the results of these representations (all the features excluding only one), and indicate the highest values in bold and underlined the lowest values. The structural feature (T_Est) is observed to provide more information to the model since by omitting it, the lowest performance is obtained, in particular, it provides information for segmentation of classes B-Arg and I-Arg.

After reviewing the performance of the other features, we deduce that they all provide information for the argumentative segmentation of texts since we did not observe a combination with higher efficacy than the representation with all the features. According to these results, our best model for the argumentative unit segmentation employs all the features (T_Lex + T_Ind + T_Est + T_Syn + T_Lexsyn), reaching a macro F of 0.5733.

5 Improved detection of argumentative paragraphs

The prediction of argumentative paragraphs is improved by incorporating the information of the argumentative segments in the paragraph. The fusion of the methods is carried out considering the prediction of the method of argumentative paragraphs detection presented, and in those cases in which no argumentative segments are detected, the paragraph is classified as ‘without argumentation’. Using this criterion, in addition to improving the task of detecting argumentative paragraphs, we guarantee that there are argumentative segments.

The idea is depicted in Fig. 1, illustrating the process for paragraph argument analysis. This begins with the input text to the model. Subsequently, each paragraph is processed through the argumentative paragraph identification method and a prediction is obtained, also simultaneously, the argumentative units segmentation method is applied to the paragraph in order to identify the segments that correspond to argumentation. Next, the fusion of both results is performed using the criteria described above. The output of the process is made up of only those paragraphs with arguments and their corresponding identified segments.

Fig. 1

Argument analysis model with method fusion.

Some evidence to support this idea is that if we only consider the presence of argumentative segments for the identification of argumentative paragraphs, a macro F measure of 0.666 and a kappa of 0.36 is obtained, this is due to a large number of paragraphs with detected argumentative sequences (a total of 1,027 paragraphs), against the 830 argumentative paragraphs in the gold standard. For this reason, it is necessary to consider the result of the classification of both methods to detect more accurately the paragraphs with argumentation.

We propose two approaches for the fusion of the methods:

First approach. This involves a more descriptive model, using lexical and indicators features with information gain (285 attributes), using a DT classifier, where a tree can be presented to the student to explain the decision taken indicating the rule used, to provide a more complete feedback.

In Fig. 2, we observed in the upper nodes of the tree, argumentative markers such as the attributes ‘ya-que’ (‘already-that’), ‘por-lo’ (‘for-it’), ’debido’ (‘due’), since being in the paragraph, classify it as argumentative. In addition, the nodes with argumentative categories are highlighted in the tree; these are mostly observed in lower nodes, providing information for decision making in the tree, in particular the sub-tree with the category‘C_justification’.

Fig. 2

Decision tree for detection of argumentative paragraphs.

Second approach. This involves using lexical features with DOR and with information gain (1,327 attributes), employing an SVM classifier, with this model a faster response could be given, which would be adequate to offer the instructor a quick global report of the performance of the group.

The fusion of Lex+DOR features with an SVM classifier and the argumentative segment detection model, reaches 78.2% accuracy with a macro F1 measurement of 0.823, i.e. the best efficacy. The other proposed alternative offers a descriptive solution, by using the Lex+Ind features with a DT classifier and the segment detection model, obtaining values close to the best model, reaching 77.8 accuracy with a macro F1 measurement of 0.81.

Finally, we compare the Cohen kappa agreement between annotators for the labeling of argumentative paragraphs of the corpus with 0.399 corresponding to a ‘Fair’ level. We observe that our models reach a higher value of 0.5651 and an improved level, now as ‘Moderate’. These results provide support for the use of the proposed models to identify argumentative paragraphs in academic texts.

6 Conclusions

Automatic argument analysis is a field of research that combines, in an interdisciplinary way, artificial intelligence and theories of argumentation, with the purpose of improving the process of extracting and analyzing information.

In this article, we focus on the argument analysis of academic thesis, intending to support students in writing their texts. To accomplish this, we develop and validate methods for the detection of argumentative paragraphs improved using the information of the argumentative unit segmentation model. These methods could be incorporated into a system to provide the student with assessment and feedback, to improve the argumentation inwritings.

We proposed two approaches for argumentative paragraph detection. The first, a more descriptive, employs a decision tree algorithm as a classifier with lexical features and indicators. The second, more efficient, uses an SVM classifier with lexical features with DOR. Both approaches consider the detection of argumentative segments to ensure that a paragraph detected as argumentative has segments with argumentation. In addition, we observe individual features, such as lexical, which provides more information in the detection of paragraphs without argumentation and otherwise, the DOR representation is useful in the identification of paragraphs with argumentation.

One benefit of our method fusion is that those paragraphs ascertained as argumentative would have already argumentative segments determined, so a component classification task can be directly further applied to decide their type (i.e. premise or conclusion), and then proceed to establish whether they are related as a support or attack.

The proposed approaches also provide hints on how to tackle the problem with a deep learning approach, focusing initially on lexical and indicators features, possibly coded as embeddings. And then, such representation fed to architectures that process simultaneously the paragraph and sub-sentencesegments.

Despite our representations, methods and experiments were done intended for Spanish, they can also serve with slight changes for other languages, since they do not depend on sophisticated resources.

Further work implies the identification of argumentative components (e.g. premises, conclusions), as well as their relations (support or attack), toindicate precisely to students the argument errors and deficiencies identified in their writings.

Footnotes

Acknowledgments

This research was partially supported by SNI, México.

We carried out experiments with different word embeddings, the best result was observed using Polyglot of 64 dimensions, as opposed to Word2vec of 300 dimensions.

References

Ajjour

, Chen

W.-F.

, Kiesel

, Wachsmuth

and Stein

, Unit segmentation of argumentative texts, In Proceedings of the 4th Workshop on Argument Mining, pages 118–128. Association for Computational Linguistics, 2017.

Al-Rfou

, Perozzi

and Skiena

, Polyglot: Distributed word representations for multilingual nlp, In Proceedings of the Seventeenth Conference on Computational Natural Language Learning, pages 183–192, Sofia, Bulgaria, August 2013. Association for ComputationalLinguistics.

Bird

, Klein

, Loper

, Natural Language Processing with Python, O’Reilly Media, Inc., 1st edition, 2009.

Cabrio

and Villata

, Combining textual entailment and argumentation theory for supporting online debates interactions, In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2, ACL ’12, pages 208–212. Association for Computational Linguistics, 2012.

Carrillo

, Eliasmith

and López-López

, Combining text vector representations for information retrieval, In Matoušek

and Mautner

, editors, Text, Speech and Dialogue, TSD 2009, LNCS 5729, pages 24–31, Berlin, Heidelberg, 2009. Springer Berlin Heidelberg.

Cohen

, A coefficient of agreement for nominal scales, Educational and Psychosocial Measurement20(1) (1960), 37–46.

Florou

, Konstantopoulos

, Koukourikos

and Karampiperis

, Argument extraction for supporting public policy formulation, In Proceedings of the 7th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pages 49–54, 2013.

Freeman

J.B.

, Argument Structure: Representation and Theory, Springer, 2011.

Fuentes Rodríguez

, Diccionario de conectores y operadores del español, ArcoLibros, SL, 2009.

10.

Garcí a-Gorrostieta

J.M.

, López-López

, Rico-Sulayes

and Carrillo

, Argument corpus development and argument component classification: A study in academic spanish, Digital Scholarship in the Humanities36(2) (2021), 287–306.

11.

González-López

and López-López

, Colección de tesis y propuesta de investigación en tics: un recurso para su análisis y estudio, In XIII Congreso Nacional de Investigación Educativa, pages 1–15, 2015.

12.

Goudas

, Louizos

, Petasis

and Karkaletsis

, Argument extraction from news, blogs, and social media, In Hellenic Conference on Artificial Intelligence, pages 287–299, Springer, 2014.

13.

Green

N.L.

, Towards mining scientific discourse using argumentation schemes, Argument & Computation9(2) (2018), 121–135.

14.

Hall

, Frank

, Holmes

, Pfahringer

, Reutemann

and Witten

I.H.

, The weka data mining software: an update, ACM SIGKDD explorations newsletter11(1) (2009), 10–18.

15.

Lafferty

J.D.

, McCallum

and Pereira

F.C.N.

, Conditional random fields: Probabilistic models for segmenting and labeling sequence data, In Proceedings of the Eighteenth International Conference on Machine Learning, ICML ’01, pages 282–289, San Francisco, CA, USA, 2001. Morgan Kaufmann Publishers Inc.

16.

Landis

J.R.

and Koch

G.G.

, The measurement of observer agreement for categorical data, Biometrics33(1) (1977), 159–174.

17.

Manning

, Surdeanu

, Bauer

, Finkel

, Bethard

and McClosky

, The stanford corenlp natural language processing toolkit, In Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations, pages 55–60, 2014.

18.

Mochales

and Moens

M.-F.

, Study on the structure of argumentation in case law, In Proceedings of the 2008 Conference on Legal Knowledge and Information Systems, pages 11–20, IOS Press, 2008.

19.

Moens

M.-F.

, Boiy

, Palau

R.M.

and Reed

, Automatic detection of arguments in legal texts, In Proceedings of the 11th international conference on Artificial Intelligence and Law, pages 225–230, ACM, 2007.

20.

Padró

and Stanilovsky

, Freeling 3.0: Towards wider multilinguality, In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), pages 2473–2479, Istanbul, Turkey, may 2012, European Language Resources Association (ELRA).

21.

Park

, Blake

and Cardie

, Toward machine-assisted participation in erulemaking: An argumentation model of evaluability, In Proceedings of the 15th International Conference on Artificial Intelligence and Law, ICAIL ’15, pages 206–210, New York, NY, USA, 2015, ACM.

22.

Pedregosa

, Varoquaux

, Gramfort

, Michel

, Thirion

, Grisel

, Blondel

, Prettenhofer

, Weiss

, Dubourg

, Vanderplas

, Passos

, Cournapeau

, Brucher

, Perrot

and Duchesnay

, Scikit-learn: Machine learning in python, Journal of Machine Learning Research12(Oct) (2011), 2825–2830.

23.

Persing

and Ng

, Modeling argument strength in student essays, In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, pages 543–552, Beijing, China, July 2015, Association for Computational Linguistics.

24.

Sánchez

, Avenda no, Los conectores discursivos: su empleo en redacciones de estudiantes universitarios costarricenses, Revista de Filología y Lingüí stica de la Universidad de Costa Rica31(2), 2005.

25.

Sardianos

, Katakis

I.M.

, Petasis

and Karkaletsis

, Argument extraction from news, In Proceedings of the 2nd Workshop on Argumentation Mining, pages 56–66, Association for Computational Linguistics, June 2015.

26.

Soricut

and Marcu

, Sentence level discourse parsing using syntactic and lexical information, In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1, pages 149–156. Association for Computational Linguistics, 2003.

27.

Stab

C.M.E.

, Argumentative Writing Support by means of Natural Language Processing, PhD thesis, Technische Universität Darmstadt, 2017.