Impact of passive and negative sentences in automatic generation of static UML diagram using NLP

Abstract

In this research work, we propose a rule based approach for the automatic extraction of UML diagram from the unstructured format of software functional requirements. The existing work provides decent results for active sentences and positive sentences but the challenge in our work is to automatic extract class diagram elements from passive voice type sentences and negative sentences. Furthermore, there is scope to do more research in extraction process using multi-word terms. Thus, we have endeavored to automatic extract the class diagram elements by overcoming these challenges. The methodology uses the Stanford CoreNLP Tools along with Java for the practical implementation of formulated rules. Our approach has proved that without supplant the human being and their decision making, one could reduce the human effort while designing functional requirements. Several case studies were performed to compare class diagrams generated by our methodology to the ones created by experts. Our methodology outperforms the existing work and provides impressive Average completeness (0.82), Average correctness (0.92) and Average redundancy (0.15). Results show that class diagram elements extracted by our methodology are precise as well as accurate and hence, in practice, such class diagrams would be a good preliminary diagram to converge towards to precise and comprehensive class diagrams.

Keywords

Unified modeling language class diagram natural language processing functional requirements

1 Introduction

Unified Modeling Language (UML) is a great deal to focus on as it portray the detailed design and description of the object oriented software development. It acts as a bridge among the participants for understanding each other like to share ideas, resolve ambiguities etc. Hence, we have carried out our work in the direction of automatic extraction of conceptual diagram i.e. class diagram which is widely used and known structural models among practitioners such as tool builders, users, and notation designer [25].

Existing research work suggests that several methodologies such as CM Builder [15], Use-Case driven Development Assistant (UCDA) [20], CIRCE [2], Visual Narrator tool [26], AnModeler [32], aToucan [36] etc. have been proposed for the extraction of conceptual models from functional requirements using Natural Language Processing (NLP) tools and techniques. Although the existing extraction rules have shown successful outcomes for active voice and positive sentences but existing tools and methodologies are facing some major limitations.

1.1 Limitations of existing work

The existing work counted out the multi-word terms during extraction process.

Looking at passive voice and negative sentences, there is scope for further improvement.

Thus, we propose a methodology to overcome these shortcomings for automatic class diagram extraction from unstructured functional requirements. Firstly, we have identified the multi-word terms in functional requirements. Secondly, we have come up with the paraphrasing rules for conversion of negative sentences into affirmative sentences. We strived to devise new NLP based extraction rules for passive voice and negative sentences in conjunction with multi-word term extraction.

The rest of the paper is organized as follows: we have discussed the related work in section 2, section 3 contains a detailed description of proposed methodology and extraction rules along with their implementation form, and section 4 contains results and analysis of our findings. Our concluding remarks are presented in section 5 together with some directions for future work.

2 Related work

The research trend in the field of automatic extraction of UML diagram from functional requirement has witnessed a disruption in last few years. Currently, there are few factors which boost the effort in this research field such as recent progress in field of natural language (NL) analysis i.e. semantic analysis, anaphora resolution, and detection of multi-word term etc. These fields bring down the complexities in NL analysis up to some extent.

The pattern in this research field shows that researchers are using different rules and patterns for the UML diagram extraction and proposed some extraction tools such as CM Builder [15], UCDA [20], CIRCE [2], Visual Narrator tool [26], Class-Gen [10], Static UML Model Generator from Analysis of Requirements (SUGAR) [18], AnModeler [32], and aToucan [36]. Some other approaches are also presented for extracting the relation between the entities from the requirement document using linguistic pattern matching such as [21], [28], and [29]. Some researchers [9], [3], and [24] have carried out their work by emphasizing on the formulation of the extraction rules to better exploit result. Some research works such as [33], [34] employed dependency parser for the extraction of the domain element from text document. We have analysed the extraction rules employed in existing researches and observed that these rules provide commendable results as far as active voice and positive sentences are concerned but in the matter of passive sentences and negative sentences, more research is required.

Based on the analysis of these pioneering works, we have proposed a methodology and discussed in detail in the next section. The emphasis is in the direction of minimizing human efforts by improving the performance of extraction rules.

3 Proposed methodology

The proposed methodology can be broadly classified into two main steps such as preprocessing of functional requirements and identification of entities as described in Fig. 1.

Fig. 1

Steps followed in the proposed methodology.

This methodology employs Stanford universal dependencies from Stanford CoreNLP [13], [5], [30], [17], [7] tool. The universal dependencies provide a simple description of grammatical relationships between words present in a sentence using head and reliant. This representation contains approximately 50 universal dependencies and each dependency holds binary relationship between a head and reliant. The definitions practice Penn Treebank [22] part-of-speech tags and labels. The grammatical relations are illustrated using an example in Table 1.

Table 1

Description of universal dependencies in a sentence

Universal dependencies	Head	Dependent	Grammatical relationship between words
nsubj	owns	account	‘account’ is subject for ‘owns’
root	root	owns	‘owns’ is root of the sentence.
compound	cart	shopping	‘shopping’ and ‘cart’ are parts of multi-word term.
dobj	owns	cart	‘cart’ is an object for ‘owns’.
cc	cart	and	‘and’ is conjunction word for some conjunct word which has ‘cart’ as a conjunct element.
dobj	owns	orders	‘orders’ is object for ‘owns’.
conj:and	cart	orders	‘cart’ and ‘orders’ are conjunct words coordinate with ‘and’.

Example “Account owns shopping cart and orders.” and corresponding universal dependencies produced by parser are given below.

[nsubj(owns-2, Account-1), root(ROOT-0, owns-2), compound(cart-4, shopping-3), dobj(owns-2, cart-4), cc(cart-4, and-5), dobj(owns-2, orders-6), conj:and(cart-4, orders-6)]

Requirement specification must possess three restrictions before preprocessing of functional requirements. The first restriction is that the sentences should be grammatically correct. The next restriction is that the requirement specifications do not accomplish anaphoric relation. Our proposed methodology is practised only on anaphora that refers to the subject of the same sentence, e.g. “Tutors in the organization are assigned courses to teach according to the area that they are specialized in and their availability”. Here, ‘their’ is referring to the subject of sentence. Generally, co-reference resolution leads to ambiguity in the text. So, we assume that if there is anaphora present then it refers to the subject in sentence. The third restriction is that an entity used in requirement specification should be indicated by a same term in the entire functional requirement. E.g. if “loan item” multi-word term is used for an entity then only the same shall be used in the entire document rather than “item” otherwise, it will lead to ambiguity in the document.

3.1 Grouping multi-word term

Generally, functional requirements include complex phrases such as multi-word term that can be decomposed into meaningful units. Thus, after conversion of listed functional requirement into paragraph form, the step is to identify the multi-word term from functional requirements using Algorithm 1 which employ universal dependencies. For example, in Table 1, we can observe “shopping cart” is a multi-word term that is represented by “compound(cart-4, shopping-3)”. We are using this dependency to substitute the word “cart” with word “shopping _ cart” in other universal dependencies also.

3.2 Paraphrasing of negative sentences

Generally, negative sentences are also used by the software analyser while gathering software requirements. These negative sentences often lead to ambiguities. In existing research works, the negative sentences are often neglected or these sentences are manually paraphrased into affirmative sentences. Thus, we have formulated paraphrasing rules for negative sentences and WordNet 2.0 [23], [11] has been employed for searching out the antonyms of certain words. We have applied Algorithm 2 for extraction and paraphrasing of the negative sentences.

For paraphrasing of negative sentences, we have implemented the rules (NR1-NR5) using universal dependencies of the sentence. Here, negation (¬) of a word symbolizes the antonyms of word and ‘S’ denotes corresponding requirement sentence.

NR1- If sentence includes negation word and its modified word is an adjective then replace the adjective word with its antonyms and remove negation word in sentence.

E.g. Negative sentence- The course offerings not marked as “enrolled in” are marked as “selected” in the schedule.

Paraphrased sentence -The course offerings unmarked as “enrolled in” are marked as “selected” in the schedule.

NR1- ∀a ∀ n (neg (n, a) ∧ n ∈ adjective ∧ a ∈ negationword ⊢ remove (a, S) ∧ replace (n, ¬ n, S))

NR2- If there is a negation word present in a sentence and its modified word is a verb then replace the verb with its antonyms and eliminate negation word in sentence.

E.g. Negative sentence- The professor is not eligible to teach any course offerings in the upcoming semester.

Paraphrased sentence - The professor is ineligible to teach any course offerings in the upcoming semester.

NR2- ∀a ∀ n (neg (n, a) ∧ n ∈ verb ∧ a ∈ negationword ⊢ remove (a, S) ∧ replace (n, ¬ n, S))

NR3- If a sentences contains negation word and its modified word is ‘has’ or ‘have’ then replace the modified word with its antonyms and remove negation word in sentence.

E.g. Negative sentence- Course offerings that do not have enough students are canceled.

Paraphrased sentence-Course offerings do miss enough students are cancelled.

NR3- ∀a ∀ n (neg (n, a) ∧ n ∈ {′has′, ′have′} ∧ a ∈ negationword ⊢ remove (a, S) ∧ replace (n, ¬ n, S))

NR4- If there is a sentence pattern such as ‘if no A, then no B’, then eliminate negation in A and B, and replace ‘then’ with ‘then only’ in sentence B. A and B denotes sentence.

e.g. Negative sentence- If no alternates are available, then no substitution will be made.

Paraphrased sentence - If alternates are available, then only substitution will be made.

NR4- ∀P ∀ Q (if ¬ Pthen ¬ Q ⊢ remove (a, P) ∧ remove (a, Q) ∧ replace (′then′, ′thenonly′, S))

NR5-If sentence includes negation word and its modified word is a noun phrase then replace verb with its antonyms and remove negation word in sentence.

e.g. Negative sentence- No Course Offerings is available.

Paraphrased sentence - Course Offerings unavailable.

NR5- ∀ a ∀ m ∀ n ( neg (m, a) ∧ m ∈ noun ∧ a ∈ negation ∧ n ∈ verb ⊢ remove (a, S) ∧ replace (n, newline ¬ n, S))

3.3 Extraction of entities

The next step of our work flow is to identify the class diagram elements using Algorithm 3. In our approach, grammatical semantic analysis of requirement sentences is used for devising extraction rules and then extracted class diagram entities are lemmatized to avoid ambiguities and redundancies. Thus, we have employed different annotated (identified by human assessors) functional requirements to devise automatic extraction rules. The extraction rules employ universal dependencies to get a lot of mileage of grammatical semantics of the sentences.

The implementation form of extraction rules are described using logical representation which employ universal dependencies such as nsubjpass(n, a), cop(n, a), amod(n, a) etc. and these universal dependencies are explained in [8]. Other schemes of abbreviation and sets used in implementation form are as follows:

class (a) shows extracted word ‘a’ is a class and attribute (a, b) indicates extracted word ‘b’ is an attribute for extracted class ‘a’.

association (a, b, n) shows word ‘n’ is a relation between two extracted classes ‘a’ and ‘b’ whereas method (a, n) defines extracted word ‘n’ is a method for extracted class ‘a’.

‘subject’ and ‘verb’ are set of subject and verb of requirement text respectively.

‘indefinite adjective’, ‘auxiliary verb’, and ‘numeral adjective’ are set of indefinite adjective, auxiliary verbs, and numeral adjective respectively.

3.4 Extraction of the classes using class rule

Class Rule R1: Passive sentences which have verb “identified by”, “recognized by” or “denoted by”, subject of sentence will always be extracted as a class. E.g.: “The seat is identified by a location.” here, “seat” is always considered as a class.

R1: ∀a (:nsubjpass (n, a) ∧ n ∈ {′indentified′, ^newline′recognized′, ′denoted′} ∧ a ∈ subject ⊢ class (a))

Class Rule R2: For all other passive sentences if the subject is further mention in text for different reference but not as verb then that subject will consider as a class.

R2: ∀a (nsubjpass (n, a) ∧ a ∉ verb ⊢ class (a))

Class Rule R3: The predicate noun is always considered as a class. E.g “Customer is a web user”. Here,“web user” is a predicate noun which is always consider as a class.

R3: ∀a (nsubj (n, a) ∧ cop (a, b) ⊢ class (a))

Class Rule R4: Noun phrases used with indefinite adjective such as ‘any’, ‘few’, ‘many’, ‘several’, ‘some’ will be considered as class. E.g.: “There are specific pilots for each airline.”, “airline” is used with the indefinite adjective “each”, so ‘airline’ will be a class.

R4: ∀a (amod (a, n) ∧ n ∈ indefinite adjective ∧ a ∈ nounphrase ⊢ class (a))

Class Rule R5:: If sentence pattern is Noun Phrase (NP1)+“related to” (similar words or synonyms like “belongs to”, “associated to”)+ Noun Phrase (NP2). Then, NP1 and NP2 both will act as class. E.g.: “Employee related to department.” Here, NP1 is ‘employee’ and NP2 is ‘department’ then, ‘department’ and ‘employee’ both are classes.

R5: ∀a ∀ b (nsubpassj (n, a) ∧ (nmod : to (n, b) lor case (b, to) lormark (b, to)) ∧ n ∈ {′belongs′, ′linked′, ^newline′associated′} ⊢ class (a) , class (b))

After applying the class extraction rules, some classes are removed from class list which are in common with stop words. These stop words contain such words that treat whole system in a single unit such as ‘system’, ‘information’ etc. After extracting classes from the text, our next step is to apply attributes extraction rules.

3.5 Extraction of the attributes using attribute rule

Attribute Rule R6: Verb such as ‘enter’, ‘type’ signifies that object will be an attribute.

Attribute Rule R6(i): If ‘enter’ verb is used with preposition then object will be an attribute for NP of preposition phrase. E.g.: “User enters username and password.”, “username” and “password” are attribute for “user”.

R6(i) ∀a ∀ b (dobj (n, b) ∧ nsubj (n, a) ∧ n ∈ {′enter′, ^′type′, ′input′} ⊢ attribute (a, b))

Attribute Rule R6(ii): If object is multi-word term and both words are noun phrase (NP) then, second NP will be extracted as attribute for first NP of multi-word term if first NP is present further in text. E.g.: “Customer enters the item id.”, “id” will be extracted as attribute for “item”.

R6(ii): ∀a ∀ b (dobj (n, b) ∧ nsubj (n, a) ∧ n ∈ {′enter′, ^′type′, ′input′} ∧ b := b1 _ b2 ∧ b1 ∈ nounphrase ∧ b2 ∈ nounphrase ⊢ attribute (b1, b2) , class (b1))

b1 and b2 are words of multi-word term

Attribute Rule R6(iii): If object is not a multi-word term and not used with preposition then object will be attribute for subject. E.g.: “Passenger type the date for the reservation.”, “date” will be attribute for “reservation” class.

R6(iii): ∀a ∀ b (dobj (n, b) ∧ nsubj (n, a) ∧ prep (b, c) ∧n ∈ {′enter′, ′type′, ′input′} ∧ c ∈ nounphrase ⊢ attribute (b, c))

Attribute Rule R7: If Possessive adjective such as “my”, “your”, “his”, “her”, “its”, “our”, “their” are referring to subject, then NP which comes after the possessive adjective, will be extracted as attribute for the subject. E.g.: “Students give their enrolment number for registration.”, “enrollment number” is attribute for “student” class.

R7: ∀a ∀ b (nsubj (n, a) ∧ poss (b, c) ∧ b ∉ class ∧ a ∈ class ⊢ attribute (b, a))

Attribute Rule R8: If pattern is such as “according to”+ NP, then NP will be extracted as attribute for the subject. E.g.: “Tutor are assigned courses according to area of interest.”, “area” will be extracted as attribute for the class “tutor”.

R8: ∀a ∀ b (nsubj (n, a) ∧ nmod : according (b, a) ∧ a ∈ class ⊢ attribute (b, a))

Attribute Rule R9: If adverb phrase (AdP) for an object includes NP and “for/on” adjectives Then, object will be extracted as a class and NP will be extract as attribute of this object. E.g.:“Teacher will teach course for particular date and time.” Where ‘course’ is object, ‘date’ and ‘time’ are NP2. So, ‘course’ is a class and NP2 are attributes for class Course.

R9: ∀a ∀ b (dobj (n, b) ∧ nsubj (n, a) ∧ mark (b, c) ⊢ class (b) , attribute (c, b))

Attribute Rule R10: If object has numeral adjective then subject will have an attribute that store the number of object. E.g.: “The schedule does not have the maximum number of primary courses selected.”, “number of primary courses” is attribute for “schedule” class.

R10: ∀a ∀ b (dobj (n, b) ∧ nsubj (n, a) ∧ b ∈ numeralnewlineadjective ⊢ attribute (a, b))

All attribute extraction rules should follow one rule that extracted attributes should not belong to class list.

3.6 Extraction of the relation and method

Relation Rule R11: If subject and object both are in class list then verb in sentence will be an association between subject and object. E.g.: “Employee related to department.” the association is employee – related – department.

R11: ∀a∀ b (dobj (n, b) ∧ nsubj (n, a) ∧ b, a ∈ class ∧n ∈ verb ⊢ association (a, b, n, directionof associationfrom a to b))

Relation Rule R12: Verb such as auxiliary verb and “identified”, “denoted”, “recognized” does not include in the association.

R12: ∀a ∀ b (dobj (n, b) ∧ nsubj (n, a) ∧ n ∈ {′identified′, ^newline′denoted′, ′recognized′} notvdashassociation (a, b, n)

Relation Rule R13: If verb is “related to”, “linked to” or “associated to”, then verb will be bidirectional association and in all other cases, association direction is from subject to object. E.g. “web user is linked to shopping cart.” Here “linked” is bidirectional association between web user and shopping cart.

R13: ∀a∀ b (nsubj (n, a) ∧ dobj (n, b) ∧ n ∈ verb ∧ a, b ∈ classnewline ∧ n ∈ {′related′, ′linked′, ′associated′} ⊢ bidirectionalnewlineassociation (a, b, n)

Method Rule R14: If subject is present in class list and object is not present in class list then verb will be a method of subject.

R14: ∀a∀ b (dobj (n, b) ∧ nsubj (n, a) ∧ n ∈ verb ∧ a ∈ class ∧ b ∉ class ⊢ method (a, b, n)

Method Rule R15: Verbs such as “has”, ‘have”, auxiliary verb, “identified”, “denoted”, “recognized”, “related to”, “belongs to”, “associated to” are not included in method. E.g.: “Employee is identified by employee no.” here verb is “identified”, it will not include in method of any class.

R15: ∀a ∀ b (dobj (n, b) ∧ nsubj (n, a) ∧ n ∈ {′has′, ′have′, ^newline′denoted′, ′recognized′, ′related′, ′belong′, ′association′, ^newline′identified′} lorn ∈ auxiliaryverb ∧ a ∈ class ∧ b ∉ classnotvdashmethod (a, b, n)

4 Results and analysis

We have examined several test cases for the validation of extraction rules described in Section 3. The case study along with automatic extraction of class diagram elements and performance evaluation is described in section 4.1 and 4.2 respectively.

4.1 Case study and extracted class diagram elements

In this section, we have described the formulation of our rules for the case study of Library Information System domain (LIS) [4] given in Fig. 2. The corresponding gold standard class diagram [4] is illustrated in Fig. 3 which is used for the comparison of automatically generated class diagram. Other case studies from different domains have been analyzed, for example, Online Shopping System domain (OSS) [16], Automated Teller Machine domain (ATM) [27], Airport System domain (AS) [31], and the Railway Reservation System (RRS) [19], Course Registration Requirement (CRR) [6].

Fig. 2

Functional Requirement for LIS [4].

Fig. 3

Class Diagram for LIS [4].

In LIS [4], according to preprocessing phase of our methodology, we have identified multi-word terms and paraphrased the negative sentences using rules NR1-NR5 as described in Table 2. In order to achieve the extracted class diagram elements, we have applied class extraction rules followed by attribute, method, and relation extraction rules respectively. Here, we will describe the newly formulated extraction rules in detail. As mentioned in Table 3, we have applied class extraction rules on the functional requirements for instance subjects such as ‘Library’, ‘Membership_ card’, ‘Language_ tape’, ‘Book’, ‘Customer’, ‘Membership’ are extracted as classes from statement no. 1, 4, 10, 11, 12, 15 and 17 using rule described in [28]. The noun phrases ‘Library’ and ‘Subject_ section’ are extracted as classes from statement no. 6 because this statement comprised of ‘made up of’ phrase [28].

Table 2

Paraphrasing of negative sentences

Sentence No.	Rules for paraphrasing of negative sentence	Paraphrased sentence
16	NR1	If the loan item can be issued (e.g. unreserved) the loan item is stamped and then issued.

Table 3

Extracted class diagram elements from the corresponding sentence

Line No.	Class 1	Class 2	Attribute	Methods	Relation	Rules
1	Library	Loan_item		issues		[28], [35], R11
2	customer					R2
3	Customer	Membership_card		issue		R11, R2
4	Membership_card				Show_member_number()	[28], [35], R14
6	Library	Subject_Section		made_up_of		[28]
7	Subject_section		Classification_mark			R1, [10], [14]
8	Loan_item		Bar_code			R1, [10], [14]
9	Loan_item	Language_tapes, books		Type_of		[28]
10	Language_tape		Title _language, level			[28], [35], [15], [1]
11	Book		Title, author			[28], [15], [35], [1]
12	customer	Loan_item	Number_of_loan_item		Borrow()	[28], [35], R10
13	Loan_item				Borrowed(), reserved(), Renewed()	R2, R14
14	Loan_item	customer		issued		R2, R11
15	membership					[28], [35]
16	Loan_item				Stamped(), issued()	R2, R14
17	Library				Support _facility()	[28], [35], R14

Rule R2 extracts subject as class from the passive sentences for instance subjects ‘Customer’ and ‘Loan_item’ are extracted as classes from passive sentences from the statement no. 2, 3, 13, 14, 16 as these subjects are further mentioned in the text.

Rule R1 extracted subjects such as ‘subject_section’ and ‘Loan_item’ as classes from passive statement no. 7 and 8 as these statements include phrase ‘denoted_by’ and ‘identified_by’ respectively. The noun phrases such as ‘language_tape’, ‘book’ and ‘loan_item’ are extracted as classes from the statement no. 9 because these statements include ‘type of’ phrase which shows generalization among the extracted noun phrases [28]. Trivial classes which are present in stop words list, are removed from the class list. Then, attributes, methods and associations extraction rules have implemented on the text.

The noun phrases ‘classification_ mark’ and ‘bar_ code’ are extracted as attributes for the subject ‘subject_ section’ as this statement comprises of phrase ’denoted_ by’ [15], [28]. The same rule is also applied on statement no. 8 and noun phrase ’Bar_ code’ is extracted as attribute for the subject ‘loan_ item’ as phrase ’identified by’ indicates the presence of attribute for the subject. In statement no. 10, object‘title_ language’ and ‘level’ are extracted as attributes for the subject i.e. ’Language_ tape’ as in this statement verb ‘to have’ point out the attributes for the subject [15], [28], [35], [1]. Same rule also is applied on statement no. 12 which extracts ‘title’ and ‘author’ for the subject ‘book’.

Rule 10 applied on statement no. 12 where it extracts numeral adjective i.e. ’number_ of_ loan_ item’ as an attribute for the subject ‘Customer’.

Rule R11 applied on statement no. 1 where subject ‘Library’ and object ‘loan_item’ since both are in class list thus verb ‘issue’ is extracted as association between them. The same rule is also applied on statement no. 3 and 14 where it extracts association i.e. ‘customer –issue –Membership_ card’ and ‘loan_ item –issued –customer’ respectively. The phrase ‘made_ up_ of’ is extracted as aggregation between subject ‘Library’ and object ‘subject_section’ from the statement no. 6 and term ‘type_ of ’ is extracted as generalization between the noun phrase ‘loan_ item’, ‘language_ tape’ and ‘book’.

Rule R14 extracted verb ‘show_ membership_ number’ as method from statement no. 4 for its subject ‘Membership_ card’ as object i.e. ‘Membership_ card_ number’ is not included in the class list but subject is included in the class list. The same rule extracted the methods from sentence no. 13, 16, and 17. Those classes which are not associated with any class and do not contain any attribute or method, have been removed from the class list and the final version of extracted class diagram is illustrated in Fig. 4.

Fig. 4

Class Diagram for LIS [4].

As we have considered only the explicitly defined information, thus, some implicitly defined classes cannot be extracted such as ‘check-in’, ‘check-out’, ‘loan transaction’, ‘membership code’. In our process of automatically extraction of UML class diagram, the additional correct class diagram elements have been extracted such as ‘number_of_loan_item’, ‘classification_ mark’, ‘support_ facility’, and ‘stamped’. As negative sentences have also been considered for the application of extraction rules. Thus, additional class diagram elements have also been identified such as methods ‘Stamped’ and ‘issued’ are extracted for class ‘Loan_ item’. The correct identification of extra class diagram entities is an advantage for the software development.

4.2 Performance evaluation of the automated extracted class diagram elements

Our objective is to examine our extracted class diagram elements using three evaluation metrics completeness, correctness, and redundancy on comparing with the reference diagrams. These reference diagrams are created by human experts such as industry experienced person, post graduate students, and Ph.D. students with specialization in software engineering. The three evaluation metrics derived from research article [36] are explained below:

Completeness of Class Diagram

The completeness of class diagram refers the correct class diagram elements that have been extracted over the total amount of class diagram elements present in reference diagram. We have evaluated completeness in terms of Average of class completeness (CM_c), attribute completeness (CM_a), method completeness (CM_m), and relation completeness (CM_r).

CM_cd = (CM_c + CM_a + CM_m + CM_r)/4, where CM_c, CM_a, CM_m, and CM_r are explained below:

CM_c = No. of correct identified classes (N_cc)/ No. of classes in reference class diagram (N_cr)

CM_a = No. of correct identified attributes (N_ac)/ No. of attributes in reference class diagram (N_ar)

CM_m = No. of correct identified methods (N_mc)/ No. of methods in reference class diagram N_mr)

CM_r = No. of correct identified relations (N_mc)/ No. of relations in reference class diagram (N_rr)

Correctness of Class Diagram

The correctness of class diagram refers the correct class diagram elements that have been extracted over the total amount of class diagram elements extracted. We have evaluated correctness in terms of Average class correctness (CR_c) and Average relation correctness (CR_r) i.e.

CR_cd = (AvgCR_c + AvgCR_r)/2

Avg ${CR}_{c} = \sum_{i = 1}^{N_{c}} {CR}_{ci} / N_{c}$ , where N_c is total number of identified classes.

Avg ${CR}_{r} = \sum_{i = 1}^{N_{r}} {CR}_{ri} / N_{r}$ , where N_r is total number of identified relations.

The correctness of each class (CR_c) is calculated in terms of:

CR_c = (CR_cc + CR_cn + CR_ca + CR_cm)/4 where CR_cc1, CR_cn, CR_ca,and CR_cm are explained below:

CR_cc = 1, if identified class represents significant class entity, otherwise 0.

CR_cn = 1, if identified named class is correct otherwise 0.

CR_ca = No. of correct identified attributes (N_ac/ No. of identified attributes (N_a)

CR_cm = No. of correct identified method (N_mc)/No. of identified method (N_m)

The correctness of each relation (CR_r) is calculated in terms of:

CR_r = (CR_rc + CR_rn + CR_rc1 + CR_rc2 + CR_rt)/5 where CR_rc, CR_rn, CR_rc1, CR_rc2, and CR_rt are explained below:

CR_rc = 1, if identified relation denotes significant relationship otherwise 0.

CR_rn = 1, if identified relation is named correctly otherwise 0.

CR_rc1 = 1, if correctly identified one related class otherwise 0.

CR_rc2 = 1, if correctly identified another related class otherwise 0.

CR_rt = 1, if relationship type is correctly identified otherwise 0.

Redundancy in Class Diagram

The redundancy in class diagram (R_cd) is explained as the fraction of redundant class diagram elements extracted among identified class diagram elements. The redundancy is calculated in terms of class redundancy (R_c) , attribute redundancy (R_a), method redundancy (R_m), and relation redundancy (R_r).

R_cd = (R_c + R_a + R_m + R_r)/4 where R_c, R_a, R_m, and R_r are explained as follows:

R_c = No. of redundant classes (N_rc)/ No. of identified classes (N_c)

R_a = No. of redundant attributes (N_ra)/ No. of identified attributes (N_a)

R_m =No. of redundant methods (N_rm)/ No. of identified methods(N_m)

R_r =No. of redundant relations (N_rr)/No. of identified relations (N_r) where,

Redundant classes (N_rc) are extracted by our approach which are either incorrect or not included in any relation.

Redundant attributes (N_ra) and method (N_rm) are identified incorrectly by our approach.

Redundant Relation (N_rr) are those incorrect relations which are identified by our approach, these redundant relations include relation which are either incorrect or have incorrect classes.

We have conducted a comparison of obtained class diagram with the specific existing approach which was proposed in [28]. We have selected this existing approach because this is well-known approach for the unstructured functional requirements and it extracts the same class diagram elements i.e. clases, attributes, methods and relations. Thus, this approach is suitable for comparison with our approach and the obtained results are explained in Table 4 and corresponding bar chart comparison is given in Figs. 5, 6, and 7.

Table 4
Comparison with existing approach

Test case [28] Proposed approach

CM _cd CR _cd R _cd CM _cd CR _cd R _cd

LIS [4] 0.48 0.84 0.23 0.79 0.95 0.10

OSS [16] 0.37 0.67 0.34 0.87 0.96 0.11

ATM [27] 0.65 0.66 0.43 0.91 0.89 0.18

AS [31] 0.81 0.90 0.20 0.87 0.92 0.17

RRS [19] 0.42 0.68 0.24 0.83 0.98 0.18

CRR [6] 0.58 0.51 0.45 0.78 0.86 0.21

Neńios [12] 0.52 0.41 0.32 0.72 0.90 0.15

Test case	[28]	Proposed approach
LIS [4]	0.48	0.84	0.23	0.79	0.95	0.10
OSS [16]	0.37	0.67	0.34	0.87	0.96	0.11
ATM [27]	0.65	0.66	0.43	0.91	0.89	0.18
AS [31]	0.81	0.90	0.20	0.87	0.92	0.17
RRS [19]	0.42	0.68	0.24	0.83	0.98	0.18
CRR [6]	0.58	0.51	0.45	0.78	0.86	0.21
Neńios [12]	0.52	0.41	0.32	0.72	0.90	0.15

Fig. 5

Comparison of completeness.

Fig. 6

Comparison of correctness.

Fig. 7

Comparison of redundancy of class diagram.

We have also applied our methodology on industrial case study i.e. Neńios Child Care Center software requirement and corresponding extracted class diagram is illustrated in Fig. 8. This document describes the child care management software which assist in managing the child care center by minimizing the administrative working time in various task such as report printing, processing invoices etc. This software also automates the activities such as tracking child immunization and maintaining child enrolment etc. so that employees spend their time in child caring. This industrial case study mostly includes passive and negative sentences, thus, in obtained result, we have observed that our methodology achieved decent results as compared to other methodologies.

Fig. 8

Extracted class diagram for industrial case study i.e. Neńios child care center system software.

As the negative sentences and passive voice are more often occurs in large text document, thus our methodology performed extremely well in these cases such as Neńios Child Care Center software requirement [12] and course registration requirement (CRR) [6].

5 Conclusions and future work

We have formulated new rules for automatic selection of class diagram elements from functional requirements of the software requirement documents. Application of these NLP based newly formulated rules (for passive voice sentences and negative sentences) give more accuracy in automated selection of the class diagram elements. This is obvious that even all rules (counting already existing and novel rules) are not sufficient for extraction of the class diagram elements and thus work can be enhanced in different domains as

Work can be improved by inferring more distinctive arrangement of rules and patterns for various sorts of sentence arrangement.

Additionally, work can be extended by elicitation and documentation of non-functional requirement similar as a functional requirement, with the assistance of IR techniques in combination with NLP.

The anaphora resolution is also a challenging task in NLP. If anaphora resolution can be enhanced than it will lead to more accuracy in automatic extraction of UML models.

References

Aguado de Cea

, Gómez-Pérez

, Montiel-Ponsoda

and Suárez-Figueroa

, Natural language-based approach for helping in the reuse of ontology design patterns, Knowledge Engineering: Practice and Patterns (2008), 32–47.

Ambriola

and Gervasi

, On the systematic analysis of natural language requirements with c irce, Automated Software Engineering13(1) (2006), 107–167.

Arora

, Sabetzadeh

, Briand

and Zimmer

, Extracting domain models from natural-language requirements: approach and industrial evaluation, In Proceedings of the ACM/IEEE 19th International Conference on Model Driven Engineering Languages and Systems, pp. 250–260. ACM, (2016).

Callan

R.E.

, Building Object-Oriented Systems: An introduction from concepts to implementation in C++. Computational Mechanics, (1994).

Chen

and Manning

, A fast and accurate dependency parser using neural networks, In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), (2014), pp. 740–750.

Corp

, Ibm corp: Ibm rational software, section 1: Course Registration Requirements. (2004).

De Marneffe

M.-C.

, MacCartney

and Manning

C.D.

, et al., Generating typed dependency parses from phrase structure parses, In Proceedings of LREC, 6, pp. 449–454. Genoa Italy, (2006).

De Marneffe

M.-C.

and Manning

C.D.

, Stanford typed dependencies manual, Technical Report, Technical Report, Stanford University, (2008).

Deeptimahanti

D.K.

and Babar

M.A.

, An automated tool for generating uml models from natural language requirements, In Automated Software Engineering (2009), ASE’09. 24th IEEE/ACM International Conference on, pp. 680–682. IEEE, (2009).

10.

Elbendak

, Vickers

and Rossiter

, Parsed use case descriptions as a basis for object-oriented class model generation, Journal of Systems and Software84(7) (2011), 1209–1223.

11.

Fellbaum

, A semantic network of english verbs, WordNet: An Electronic Lexical Database3 (1998), 153–178.

12.

Ferrari

, Spagnolo

G.O.

and Gnesi

, Pure: A dataset of public requirements documents, In Requirements Engineering Conference (RE), 2017 IEEE 25th International, pp. 502–505. IEEE, (2017).

13.

Finkel

J.R.

, Grenager

and Manning

, Incorporating nonlocal information into information extraction systems by gibbs sampling, In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 363–370. Association for Computational Linguistics, 2005.

14.

Gomez

, Segami

and Delaune

, A system for the semiautomatic generation of er models from natural language specifications, Data & Knowledge Engineering29(1) (1999), 57–81.

15.

Harmain

and Gaizauskas

, Cm-builder: A natural language-based case tool for object-oriented analysis, Automated Software Engineering10(2) (2003), 157–181.

16.

F.K. Online shopping uml class diagram example, (2009).

17.

Klein

and Manning

C.D.

, Fast exact inference with a factored model for natural language parsing, In Advances in Neural Information Processing Systems (2003), 3–10.

18.

Kumar

D.D.

and Sanyal

, Static uml model generator from analysis of requirements (sugar), In Advanced Software Engineering and Its Applications (2008). ASEA pp. 77–84. IEEE, (2008).

19.

E.Y.M.Z.M.D.N.K.R. H.

and Keshmiri

, Software requirements specification for automated railway reservation system, (2000).

20.

Liu

, Subramaniam

, Eberlein

and Far

, Natural language requirements analysis and class model generation using ucda, Innovations in Applied Artificial Intelligence, (2004), 295–304.

21.

Liu

, Li

and Kou

, Eliciting relations from natural language requirements documents based on linguistic and statistical analysis. In Computer Software and Applications Conference (COMPSAC), 2014 IEEE 38th Annual, pages 191–200. IEEE, (2014).

22.

Marcus

M. P.

, Marcinkiewicz

M.A.

and Santorini

, Building a large annotated corpus of english: The penn treebank, Computational Linguistics19(2) (1993), 313–330.

23.

Miller

G.A.

, Wordnet: a lexical database for english, Communications of the ACM38(11) (1995), 39–41.

24.

Narawita

C.R.

and Vidanage

, Uml generator-an automated system for model driven development. In Advances in ICT for Emerging Regions (ICTer), 2016 Sixteenth International Conference on, pp. 250–256. IEEE, (2016).

25.

Reggio

, Leotta

and Ricca

, Who knows/useswhat of the uml: A personal opinion survey, In International Conference on Model Driven Engineering Languages and Systems, pp. 149–165. Springer, (2014).

26.

Robeer

, Lucassen

, van derWerf

J.M.E.

, Dalpiaz

and Brinkkemper

, Automated extraction of conceptual models from user stories via nlp, In Requirements Engineering Conference (RE), 2016 IEEE 24th International, pp. 196–205. IEEE, (2016).

27.

Rumbaugh

, Blaha

, Premerlani

, Eddy

and Lorensen

, Object-oriented Modeling and Design

Prentice-Hall Inc

Upper Saddle River, NJ, USA, (1991).

28.

Sagar

V.B.R.V.

and Abirami

, Conceptual modeling of natural language functional requirements, Journal of Systems and Software88 (2014), 25–41.

29.

Shweta , Sanyal

and Ghoshal

, Automatic extraction of structural model from semi structured software requirement specification, In 2018 IEEE/ACIS 17th International Conference on Computer and Information Science (ICIS), (2018), pp. 543–558.

30.

Socher

, Bauer

, Manning

C.D.

, et al., Parsing with compositional vector grammars, In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1 (2013), pp. 455–465.

31.

S.S.S. Class diagram for airport uml questions, (2012).

32.

Thakur

J.S.

and Gupta

, Anmodeler: a tool for generating domain models from textual specifications, In Automated Software Engineering (ASE), 2016 31st IEEE/ACM International Conference on, pp. 828–833. IEEE, (2016).

33.

Thakur

J.S.

and Gupta.

, Identifying domain elements from textual specifications, In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, pp. 566–577. ACM, (2016).

34.

Thakur

J.S.

and Gupta

, Automatic generation of analysis class diagrams from use case specifications, arXiv preprint arXiv:1708.01796, (2017).

35.

Tjoa

A.M.

and Berger

, Transformation of requirement specifications expressed in natural language into an eer model, In International Conference on Conceptual Modeling pp. 206–217. Springer, (1993).

36.

Yue

, Briand

L.C.

and Labiche

, atoucan: an automated framework to derive uml analysis models from use case models, ACM Transactions on Software Engineering and Methodology (TOSEM)24(3) (2015), 13.