Abstract
Requirements are important in software development. Ambiguous requirements cause inconsistent understanding by developers, which leads to rework, delayed delivery, and other problems, and may even have devastating effects on the project. A large number of requirements text written in natural language are not concise, intuitive, and accurate. This condition increases the workload of designers and the difficulty of their tasks. An effective solution for the aforementioned problems is to extract actors and use cases from the requirement texts. This study proposes a model for extracting actors and using cases automatically, which combines bi-directional long short-term memory (BiLSTM) and conditional random fields. BiLSTM is used to capture the contextual information of the texts, and CRF is used to calculate the tag transfer score and determine the most accurate tag sequence, which aims to extract actors and use cases. Results show that the accuracy of extraction is significantly improved compared with the baseline method, which verifies the effectiveness of the proposed method in extracting actors and use cases.
Introduction
Determining requirements is a key step in the software development life cycle. Natural language (NL), which is uncertain and inconsistent, is often used to describe requirements, which may affect the way developers understand the texts [1]. The requirements are written in NL, with the potential for ambiguity, contradiction, or misunderstanding, or simply an inability of developers to deal with a large amount of information [2]. Therefore, NL may cause understanding deviation of the same contents. In other words, different people have a different understanding of the same texts describing certain requirements. If the designer’s understanding is different from the requirements expressed by the clients, the finished software will require expensive repairs in the later stage and irreparable errors may occur. An urgent issue is that requirements are transmitted into the design stage intuitively and accurately. Therefore, to reduce the error of requirement output and enhance the understanding of designers, automatically extracting actors and use cases from the requirement text can be an effective way to solve the problem of requirement transmission.
In recent years, various methods have emerged for extracting software requirements. Kamalrudin and Grundy [1] proposed an automatic tool to extract user requirements. The article mentioned that requirements are represented by essential use cases, a simple and easy-to-understand language to illustrate the structure of an application. This language is generally concise but also descriptive. R-TOOL [3] is a method based on NL that constructs an object-oriented framework to display requirements. Another common method [4] is to use unified modeling language (UML) to describe software requirements such as use case diagram, activity diagram, and others. Developers and requirement engineers are more inclined to show requirements through a use case diagram. Deeptimahanti and Babar [5] pointed out that most methods for extracting actors and use cases have varying degrees of automation. Most of these methods can generate analytical models but require human intervention. At the same time, these methods focus more on model generation than on accuracy in extracting actors and use cases from the requirement text. We aim to contribute to the state of the art in stress classification from text by proposing a machine learning method to detect stress in corpora from different sources such as personal interviews, Reddit, and Twitter. Our work also aims to identify which features and techniques perform better.
This study proposes an automatic method for accurately extracting requirements from requirements text. We pay attention to how actors and use cases are described in context, use models to analyze contextual content and relationships, combine a large amount of data and learned experience, and extract actors and use cases from the requirements text concisely and accurately.
The remaining sections of this paper are structured as follows. We reviewed and discussed methods for extracting actors and use cases from the requirements text in section 2. Section 3 introduces more details about the proposed method and method evaluation, including the structure and significance of the BiLSTM-CRF model. In section 4, we describe detailed information about the experimental setup. Section 5 analyzes the experimental results and discusses related problems in the experiment. We summarize the study and propose future research directions in section 6.
Related works
Existing papers on requirements text mining is divided into two categories. One part focuses on how to clearly and intuitively display actors and use cases, and the other part focuses on how to extract information about actors and use cases from the requirements text. Most of the methods for extracting actors and use cases rely on other premise processing, such as using natural language processing (NLP) tools or methods [3] to analyze word relationships. UMGAR is a semi-automatic method for generating UML models proposed by Deeptimahanti et al. [6], including use case diagram, analysis class model, collaboration diagram, and design class model. This method aims to normalize (rewrite) NL requirements to remove ambiguity in sentence structures and introduces control constructs to organize interactions in the statement sentence structure, and then generates various object-oriented diagrams according to the normalized requirements. All kinds of graphs generated by them intuitively represent actors, use cases, and collaboration, but they simply identify subject and object as actors, predicate as use cases, associate actors with use cases using the parse tree generated by Stanford Parser [7], and generate use case models, which is not rigorous enough in the process. Jebril et al. [8] defined a semi-automatic method called AAEAA, which can draw use case diagrams according to functional requirements. This method uses NLP tools to decompose the language analysis process into different stages, and finally determine the actors and use cases. Actors and use cases are expressed clearly and intuitively by use case diagram. However, a command has to be prepared before the use case diagram is drawn. This command requires inputting the actors and use cases determined in the NLP phase into the fixed software for drawing the UML use case diagram. The command has a fixed format and syntax, which increases the complexity and limitations of the experimental steps and process. Sonbol et al. [9] used NL processing and graph theory technology to propose an unsupervised method to identify templates from requirements text. With the help of graph theory technology, this study summarizes the demand template, only represents the demand text list as a graph, and then detects the demand template. The key step in obtaining the demand text list from the text information is not studied. Our research accurately extracts important information from a large amount of text, and intuitively expresses software requirements with these words of actors and use cases.
Vinay et al. [3] designed an R-TOOL device to extract actors and use cases from requirements text. The extraction steps are as follows: (1) decomposing a tokenizer, (2) analyzing pronouns, (3) identifying actors and use cases, (4) identifying relationships and generating use case reports, and (5) classification. These five steps generate use case reports, classes, attributes, methods, and relationships. This method generates a variety of results, but it has some limitations. It requires the inputting text to be in the form of active sentences and compound sentences need to be split into simple sentences. The method that automatically converts NL specifications into object-oriented (OO) analysis models [2] can be divided into three steps. First, find parts of speech from NL and divide them into subject, predicate, and object group, which is the part of language component analysis. Next, abstract the entities extracted from the sentences into nodes to build a semantic network. Finally, the knowledge in the semantic network is transformed into the OO model. The OO model is simply transformed from the knowledge in the semantic network. In their method, each part is relatively complete and rigorous, but it is undoubtedly more difficult that only when the three parts are well connected can the meaning of NL converted by the model be guaranteed. Based on the MaRK method [10], Lian et al. proposed MaRK-II [11] to assist engineers in extracting requirements on components from domain documents. The process of MaRK-II includes six steps. Step 1 involves manually building an analytical domain reference model. In Step 2, the search terms are used to dynamically retrieve and rank documents by computing their relevance to components. From step 3 to 5, we recommend pertinent text for engineers. Finally in step 6, the domain analyst browses the most relevant documents leveraging the highlighted parts and summarization to focus on the most pertinent information, and finally extracts and assimilates requirements knowledge from which the product requirements can be specified. This study mainly identifies product requirements from multiple documents in the same domain and requires manual modeling of different domains. Javed and Lin [12] proposed an iterative approach (iMER) for extracting models from requirements. The method has multiple iterations, each aiming to solve a sub-problem. The first iteration extracts data entities and attributes. The second iteration finds relationships between data entities, and the extraction of cardinalities is in the third step. The business process model is generated in the fourth iteration, including external (actors’) and internal (system’s) operations. In the process of using this method, if an error occurs in sorting sentences, although the operations of users and systems can be extracted, the dependencies will be affected, resulting in poor quality of business process models. Al-Hroob et al. [13] proposed a semi-automated approach, an intelligent technique for requirements engineering (IT4RE), which specifically combines natural language processing (NLP) with back propagation neural networks (BPNN).The first work is the goal-use case model extraction supporting tool (GUEST), which uses a rule-based approach to analyze part-of-speech (POS) and tokens’ type dependencies [14]. Then, its tokens are mapped by SRS semantics. Finally, the results are fed into the BPNN to extract actors and use cases. Tiwari et al. [15] labeled the actors and use cases in the text with P-ACTOR and UC-NAM respectively and put the data into multinomial naive Bayes, perceptron, linear classifier with SGD, and passive-aggressive classifier for training. The results show that the extraction effect of these methods is not very good. AlDhafer et al. [16] proposed the use of recurrent neural network to classify software requirements. Input text types can be word order and character sequence, and then input sentences are divided into functional requirements and non-functional requirements. The effect of this method on word order type is good, but the effect of character sequence type becomes poor with the increase of classification results. In addition, this study only divides text description into categories such as functionality and usability, which is not very helpful for designers to accurately understand software requirements. A large number of NL texts still cause misunderstanding and lead to software design errors. This type of research focuses more on how to get actors and use cases more accurately from a large amount of text information, usually with certain limitations on the input text, or requires manual participation of the experimenter in some steps. Our research not only ensures the accuracy of the extraction results, but also realizes the automation of the experimental process without the need for the experimenter to participate in the extraction process steps. Furthermore, requirements for the professionalism of the experimenter are less strict.
Currently, most studies in this area have focused on using development tools to display actors and use cases in UML [17, 18] or OO analysis models [2, 6], most of which call a UML drawing software (such as Rational Rose [19], PowerDesigner, and others). It is the important step that extracted actors and use cases are input into the software [2]. This type of research is divided into two steps. The first step is to extract actors, use cases or other information from the requirements text, and the second step is to draw use case diagrams with the help of tools. Researchers have paid more attention to the implementation of the second step, but the first step is the preparation for the second step. Only a small part of the research focuses on how to accurately extract actors and use cases from NL requirement texts [8, 13] to facilitate the next stage of development work. These studies are more semi-automatic methods that require manual participation in the multi-stage connection. In IT4RE [13], considering the characteristics of natural language, it appropriately combines the part of speech and dependency in NL with the BPNN in deep learning, which effectively solves the problem of extracting actors and use cases from the requirements text. However, its accuracy rate is not ideal and the process is complicated. Tiwari et al. [15] used the Perceptron method to extract actors and use cases in their research. Therefore, we choose IT4RE, Perceptron, and LSTM as the baseline method, and pay more attention to the accuracy of extracting actors and use cases from the requirements text. From the context of the text, we analyze how to extract actors and use cases. At the same time, to increase the practicability of the method, we automate experimental process, freeing the human body and avoiding subjectivity. We propose a method that combines bidirectional long-term short-term memory (BiLSTM) and conditional random field (CRF) to automatically extract actors and use cases. The BiLSTM is used to memorize the context of the text to capture the words, and then it is handed over to CRF to calculate the best sequence combination to complete the final extraction.
Methodology
Research framework
This paper studies the identification and extraction of actors and use cases from the requirements text. We put a sentence or a text file into the model to make predictions in character and extract the required actors and use cases. For example, if you put the sentence “
(administrator successfully logged in)” into the model, the model will predict labels for each character, and then extract actors and use cases based on the labels. As shown in Fig. 1, from the bottom up, the words in the sentence are processed by BiLSTM-CRF to output the content of the actor and use case. We extract two categories of actors and use cases, which are represented by words, while labels are in units of character. Therefore, we use five categories of tags to mark content, namely, B-ACTOR, I-ACTOR, B-CASE, I-CASE, and O, which represents the first character of actor class, other characters of actor class, the first character of use case class, other characters of use case class, and other classes, respectively.

Research framework.
In this paper, a model of extracting actors and use cases was proposed by combining BiLSTM and CRF. BiLSTM is used to capture contextual information for words, and CRF is used to calculate the optimal sequence combination. Long short-term memory (LSTM) is widely used to extract contextual text and remember contextual relations of text. The neuron information in the LSTM model can only be transmitted from front to back, which means that the inputting information at the current moment can only use the information at the previous moment. However, for the extracting information task, the state before the current state and the state after the current state should be equal, so a BiLSTM network is introduced. A strong dependency exists between the labels of the actors and the use cases in the requirements text, and BiLSTM can use both the information before and after the current moment, which is very suitable for the extraction task of the actors and use cases in the requirements text. CRF has made an outstanding contribution to sequence labeling. To improve the accuracy of tag sequences, we need to refer to adjacent tags when determining the position of a tag. In this case, CRF can add some constraints to ensure that the final prediction results are effective. Therefore, combining BiLSTM and CRF can complement each other to extract the tags for each word of the actor and use case from the text, correctly outputting the tag sequence of sentences and then extracting the content of actors and use cases based on the predictive tags.
This section mainly introduces one-hot vector, its input layer, and subsequent simple processing (look-up layer). The input layer is used to pass the text to the model to extract the actors and use cases. First of all, we need to input content to the model, which can be a sentence or a file. This paper takes the content in the form of a sentence as an example to explain the processing of the model. A file can be viewed as a collection of multiple sentences. Then, the input content is encoded by one-hot method. One-hot encoding, also known as one-bit effective encoding, uses N-bit registers to encode N states, each with its own separate register bits, and only one of which is active at any time. One hot vector is a representation of a class variable as a binary vector. This first requires the classification value to be mapped to an integer value, and then each integer value is represented as a binary vector. In the requirements text, the actors might be administrators, students, and so on, and the use cases might be login, browse, log out, and so on. These eigenvalues are not continuous but discrete and unordered. We need to digitize the features. If each feature has m possible values, then it has m features by one-hot encoding, and they are mutually exclusive, only one at a time. As shown in Table 1, the actors have two eigenvalues, “
(administrator)” is 0, “
(student)” is 1, and the corresponding binary vectors are 10 and 01. Use cases have three eigenvalues, “
(login)” is 0, “
(browse)” is 1, and “
(exit)” is 2, and the corresponding binary vectors are 100, 010, and 001. For the sentence “
(administrator successfully logged in)”, according to the contents of Table 1, we can use the eigenvalue vector [0,2,0] to obtain the corresponding binary vector for 10 | 0010 | 100. The result obtained by one-hot vector is handed to the look-up layer for the next conversion process. The look-up layer is designed to convert each word from one-hot vector to a word embedding. In this study, a randomly initialized embedding matrix was used to map each character x
i
in the sentence to a low-dimensional, dense character embedding vector by the one-hot vector. Overfitting easily occurs when a complex network is training a small data set [20]. Dropout is set in this model to effectively solve time limitations and easy overfitting in the complex network [21].
Part of eigenvalues with binary vectors
Part of eigenvalues with binary vectors
BILSTM is used to automatically extract sentence features, including forward LSTM and backward LSTM. The LSTM unit is composed of a memory unit and three gates (forgetting, update, and output gates) as shown in Fig. 2. Among them, the function of the memory unit is to manage and save information, and the function of three gates is to control the update, attenuation, input and output of information in the memory unit. The specific steps are as follows.

LSTM unit.
Step 1: The forget gate determines what is discarded from the memory unit. The gate determines what other types of information are discarded from the sentence. For example, “
,
(Administrators can add or delete users after logging in to the system)” might output “
(administrator),” “
(system),” “
(add user),” and “
(delete user)” after a unit processing. When the input of the next sentence was “
(Teachers can apply for classrooms),” they chose to remember “
(teacher)” and forgot the relatively unimportant “
(system)”. To determine how much of the previous memory is forgotten or needed to be retained, we must refer to the output of the previous unit and the input of the current unit through the sigmoid function. In Equation (1), f
t
is the value from 0 to 1, indicating the extent to which the previous information is forgotten, ht - 1 is the output information of the previous unit, and x
t
is the inputting content of this unit.
Step 2: The update gate decides what is put in the memory. The sigmoid function in Equation (2) determines which values need to be updated, that is, how much we need to remember the newly learned information. The tanh function in Equation (3) creates a new value
(teacher)”.
Step 3: The state of the memory unit is updated. The first two steps are preparing for updating the state of the memory unit, as shown in Equation (4), which needs to integrate the previously forgotten contents and the new information learned.
Step 4: The output gate controls the output of the memory unit. In the same way, the sigmoid function in Equation (5) determines which parts are output, and the tanh function in Equation (6) deals with the state of the memory unit. The sigmoid and tanh layers together determine the output of the state of the memory unit, that is, the content of the input to the next unit. Our study typically outputs information about actor classes and use case classes for the next unit.
Where σ represents the sigmoid activation function, tanh represents the hyperbolic tangent activation function,
When the information in the later text has an impact on the current text, the result in which forward processing of the information in the previous text only is combined with LSTM will not be accurate. Thus, the following content of the text should be considered. The forward process of LSTM is relatively easy, while the reverse process is more complicated. We define a loss function first as follows:
In this study, the characters of a sentence are embedded into the sequence (x1, x2, ... , x
n
) as the input of each time step of BiLSTM. Then, forward LSTM output sequence of hidden states (
In this study, CRF is used to select the best prediction sequence. It first calculates the transfer score of the prediction tag in the BiLSTM layer, then calculates the path sequence score in combination with the sentence features of the BiLSTM layer. Finally, CRF selects the optimal sequence. The BiLSTM output in Fig. 1 shows that the predicted result is obviously wrong. The reason for such error is that BiLSTM can only predict the relationship between sequences and tags, but not between tags. The relationship between tags is the transition matrix in CRF. The CRF encoding layer can calculate the score of the predicted tag sequence and improve the accuracy of the predicted tag sequence in the sentence. After processing by the CRF layer, the correct prediction result is shown in the line of CRF’s output in Fig. 1.
1) The relationship constraints between labels are defined.
CRF is used for sentence-level sequence labeling to modify the predictive labeling at the BiLSTM layer. The CRF layer can add constraints to the final predicted labels to ensure they are valid and can automatically learn these constraints from the training data set during the training process. These constraints include the following: The first character in a sentence should be labeled B- or O instead of I-. Label 1, label 2, label 3... In “B-label1, I-label2, I-label3 ... ” should be the same tag, both ACTOR and CASE. The O I-label is not valid because the first label of an actor or use case should start with B- instead of I-. The valid pattern should be O B-label.
2) Updating the transfer score matrix
All transformation scores in the matrix can be randomly initialized prior to training the model, and all random scores are automatically updated during training. That is, the CRF layer can learn the inter-tag relationship constraints by itself, and we do not need to manually build the matrix. The score becomes more reasonable as the number of training iterations increases. We use tyi,yj to represent the transfer score. As shown in Table 2, tB - ACTOR,I - ACTOR=0.9 means that the transfer score from the tag B-ACTOR to the tag I-ACTOR is 0.9. To make the transition score matrix more robust, we added two tags, START and END. Start means the beginning of a sentence and END means the ending of a sentence.
Transfer score matrix for all tag
Transfer score matrix for all tag
As shown in Table 2, we found that the model had learned some useful constraints based on the transformation score matrix. Corresponding to the constraints mentioned above, the contents are as follows: The conversion scores from STRAT to I-ACTOR or I-CASE are very low. B-ACTOR I-ACTOR is valid, but B-ACTOR I-CASE is invalid. We can find that the score from B-CASE to I-ACTOR is only 0.0003, which is lower than other scores. tO,I - ACTOR and tO,I - CASE are very low.
3) Calculate the tag sequence score
The parameter of the CRF layer is a (k + 2)×(k + 2) matrix, and the value of k in the sentence in Fig. 1 is 7. From the above, t
ij
represents the transfer score from the i label to the j label. When labeling a character, we can use the existing label. The reason for adding 2 is that we are adding a START state for the beginning of the sentence and an END state for the ending of the sentence. If a tag sequence whose length is equal to the length of the sentence is recorded y=(y1,y2, ... ,y
n
), then the CRF layer scores the tag y of the sentence x as shown in the following equation:
Equation (13) shows that the score of the entire sequence is equal to the sum of the scores of all characters. The score of each character is obtained by two parts, one is determined by the p
i
output by BiLSTM, and the other is determined by the transfer matrix of CRF. Then, the normalized probability can be obtained by Softmax as follows:
In Fig. 1, 0.9, 0.1, and 0.3 in the CRF layer respectively represent the score of each sequence, so the sequence with a score of 0.9 is selected as the optimal sequence, which is < B-ACTOR, I-ACTOR, I-ACTOR, O, O, B-CASE, I-CASE>.
The output layer is used to output the actors and use cases we extracted. CRF’s output only gives the prediction label corresponding to the input content, and does not intuitively extract the actors and use cases. Therefore, we also need the output layer to integrate the labels and filter the contents of non-actors and non-use cases. After filtering, only “
” is left in the sentence. Then, the remaining label content is sorted into participants and use cases, and the text of the participants and use cases is clearly output. Since this model is processed on a character basis, and actors and use cases are based on word, it is not guaranteed that the information of each word can be completely identified and output. Thus, the situation shown in Table 3 will appear:
Examples of possible situation of extraction results
Examples of possible situation of extraction results
From the comparison in Table 3, we can find that the actual extraction result is not exactly corresponding to the information we need. “
(administrator)” is extracted as “
(admin),” and “
(set permissions)” is extracted as “
(set permissions for)”. To better assess the capability of the model, whether it is more or less than the standard extraction, we specify the extraction error. In the example in Table 3, when calculating the model evaluation index, only the extraction result of “
(logging in)” is calculated as correct, that is, in this example, the correct number of extraction of actors and use cases is 1.
We used the following evaluation index to assess the effectiveness of BiLSTM-CRF. As the number of words of actors and use cases is relatively small compared with other category words in the requirements text, the labeling of category labels is unbalanced. Therefore, F1 score is added as the evaluation index of the system together with the precision and recall. The higher the evaluation index, the better the model selection. Confusion matrix is the basis for the calculation of evaluation index and represent a variety of possible situations for the prediction. As shown in Table 4, specific definitions are as follows:
Confusion matrix
Confusion matrix
where TP (true positive) represents the number extracted from the attributes of the actor use case class; FP (false positive) represents the number of non-actor and non-use case class attributes extracted; FN (false negative) represents the amount of actor and use case class attributes that are not and extracted; and TN (true negative) represents the amount of non-actor and non-use case class attributes that have not been extracted.
The precision is the ratio of the number of actor and use case class attributes extracted to the total number of all extracted attributes. Recall is the ratio of the number of actor and use case class attributes extracted to the total number of all actor and use case class attributes.
Precision and recall are contradictory measures. Generally, when the precision rate is high, the recall rate is low, and when the recall rate is high, the precision rate is low. To comprehensively consider precision and recall, we use F1 score, which is the harmonic average of precision and recall.
Data preprocessing
Requirements text describes functional and non-functional requirements in NL, and functional requirement describes who can do what, i.e., actors and use cases. To train the BiLSTM-CRF model to extract actors and use cases, we collected a large number of requirements texts and collated the functional requirements described in them. To ensure the authority of the data set, we employed professional requirements analysts to identify actors and use cases from the collated requirements text. We labeled the actors and use cases by character (B-ACTOR, I-ACTOR, B-CASE, I-CASE, O), where B-ACTOR represents the first character of the actor, I-ACTOR represents other characters except the first character of the actor, B-CASE represents the first character of the use case, I-CASE represents other characters except the first character of the use case, and O represents other contents except the actors and use cases.
A large amount of labeled requirements text was integrated into a data set. It was divided into two parts: training set and test set, in which the data of the training set was 80% and the data of the test set was 20%. Figure 3 shows the transformation of the requirements text from text to a dataset.

Data preprocessing process.
The experiment consisted of three phases: data reading, model training, and actors and use case extraction, as shown in Fig. 4. From 60 different types of project requirement documents, we selected the text information related to the roles and functions of the system and organized it into the data set of the experiment. The data is read in a row from the data set to generate a list of sentences and a list of labels for model training. In the model training phase, the input content is a large amount of requirements texts and its corresponding category labels (actor classes, use case classes, and others). This data is used to train the BilSTM-CRF model to learn how to categorize requirement text into actor classes, use case classes, or other classes. After training, the model will be used in two stages. First, given the requirements text, the trained model is used to predict the category to which each word belongs. Second, the trained model is used to extract actors and use cases, and phrases can make the content of actors and use cases in the requirements text clearer, facilitating the next step of software design. This experiment uses Python language to build BiLSTM-CRF model based on TensorFlow framework.

The extraction process with BiLSTM-CRF.
As shown in Fig. 3, each line in the dataset consists of a character in a sentence and its corresponding label, which are clearly separated by spaces. We read data in the traditional way of reading by line. We create two lists to divide the data into sentence and tag types of content. Each line contains sentence content and tag content.
Table 5 shows the data reading results with three sentences as an example. The first column represents the actual content of the sentence content list and the second column represents the actual content of the tag list. The second, third, and fourth rows respectively represent the contents of the two lists corresponding to the three sentences. The label of punctuation marks is O. As shown in Table 5, the actors are “
(administrator),” “
(entry clerk), and “
(analyst)” in sentence “
,
,
,
(The system consists of three participants, the administrator, the entry clerk and the analyst, who respectively enjoy different levels of authority. The analyst is the highest user.). The actor is “
(administrator),” and the use cases are “
(add user)” and “
(delete user)” in the sentence “
,
,
,
. (After entering the system, the administrator can add or delete users and modify the permissions of users. In addition, the administrator can set different levels of permissions for users.) The actor is “
(analyst)” in sentence “
”
(Analysts can enjoy full access to entry.)
Data read results
Data read results
When the data is read, the dictionary needs to be built and the sentences have to be digitized. We count the non-repeating characters in the dictionary which is constructed with {‘first character’: [corresponding ID, count of the character], ‘second character’: [corresponding ID, count of the character],...}. Then, we format and remove low-frequency characters.
BiLSTM-CRF model training is an optimization task that requires several iterations. In each iteration, we divided the training set into multiple batches.
Then, we processed each batch that contains a list of sentences, the size of which is determined by the parameter batch size. For each batch, we first run the BilSTM-CRF model for forward pass, including forward pass of the LSTM forward and backward states, and detect the loss between the predicted and actual labels at all positions. Then, we run the forward and backward passes of the CRF layer to calculate the gradients of the network output and state transition edges. Thereafter, we run the BiLSTM-CRF model for backward pass, which includes the backward pass of the LSTM forward and back states, and backward passes of errors from the output layer to the input layer. Finally, we update its parameters based on the data passing through the network.
Training and deploying the model requires setting the hyperparameter, including batch size, learning rate, clip, and dropout. Previous studies [19, 20] have shown that in a predictive model, hyperparameter optimization is crucial to achieve better performance. When adjusting the hyperparameters, we use 80% of the data as the training set and the rest as the test set. We cannot enumerate all the hyperparameter combinations, so we give a default value to all hyperparameters. Then, we change one of the hyperparameters and select two cases with better effect on which we change the next hyperparameter. In the same way, we add the hyperparameters one by one until the most effective combination is obtained. (Note that the default value of the hyperparameter is unchanged if it is not added to the combination.)
In this study, we made the following specific adjustments to the hyperparameters. We first took the batch size of 128, and then took 128 as the dividing line, multiplied by 2 and 0.5, respectively, in the upward and downward directions, and compared the test results. If the upward effect was good, we multiplied it by 2, or if the downward effect was good, we multiplied it by 0.5 until the effect no longer improved. The learning rate determines whether and when the objective function can converge to the local minimum value. An appropriate learning rate can make the objective function converge to the local minimum value in a short time. We set the initial learning rate to 0.1, with manual adjustment based on experience. We try different fixed learning rates, such as 0.1, 0.01, 0.001, and so on; observe the relationship between the number of iterations and loss; find the learning rate with the fastest loss decline and combine it with batch size. In deep learning, the phenomenon of overfitting easily occurs in the trained model [21]. Overfitting shows that the loss function of the model on the training data is small, and the prediction accuracy is high. However, loss function is relatively large in the test data and the prediction accuracy is low. Dropout can effectively alleviate the occurrence of overfitting and achieve regularization effect to a certain extent. Therefore, we add the dropout parameter to prevent the occurrence of overfitting. Similarly, we also set the clip parameter to prevent gradient explosions.
According to the adjustment process described above, we added all parameters into the combination one by one and selected the most effective hyperparameter combination as the parameter value of the model. Table 6 shows the hyperparameter values in the model.
Values of hyperparameters
Values of hyperparameters
Using the trained model, we extract actors and use cases from the input text, as shown in Fig. 5. Firstly, a preset label O is given to the input content, and the content is fed into the model. The bidirectional LSTM layer is used to joint the output hidden state sequence by the forward LSTM and the backward LSTM at each position. The sentence features are automatically extracted, and multiple combined prediction sequences are transmitted to the CRF layer. Then, the CRF layer calculated the score of each sequence combination and selected the sequence combination with the highest score as the sentence level sequence and prediction labels. These were divided as actor, use case, and other classes. The model extracted the contents of the actor and use case classes, and discarded useless information, which made the results clear and intelligible.

The process of extracting the actors and use cases.
To verify the accuracy and simplicity of this study, we compared the proposed BiLSTM-CRF method proposed with IT4RE, Perceptron, and LSTM methods. The IT4RE method combines NLP and BPNN. NLP was used for pre-processing, which processed a large section of text into a word-unit format. The Python library of NL processing released by Stanford was used for word segmentation and syntax and semantic analysis of Chinese data sets. POS tags and dependency parser were cleared up. For the encoding of word segmentation, the POS tag, and dependency parser, the result of the word segmentation was collected into a dictionary, in which the same word corresponds to a code. Then, the data contents were labeled corresponding to the correct actors and use cases identified by the professionals. The training data set was passed into the BPNN model, which was trained to extract actors and use cases. The model can classify inputs into actors, use cases, and other types.
Perceptron is a linear classification model of binary classification, which aims to find the separate hyperplane that can divide the training data linearly. The loss function based on misclassification was imported into the model, and the loss function was minimized by the gradient descent method. Finally, the model results were obtained. We label the data according to the method in section 4.1 and input the labeled data into the Perceptron model, which trains the parameters of the model according to the labels of the data and gives the results of other data. Finally, we obtain the results of roles and use cases. LSTM was also used as the benchmark method to further verify the effect of combining BiLSTM and CRF methods in this study.
Results and discussion
Table 7 lists the results of precision, recall, and F1 scores for actors and use cases extracted by BiLSTM-CRF model and the baseline methods, which include IT4RE, Perceptron, and LSTM.
Comparison between the results of this experiment and the baseline experiment
Comparison between the results of this experiment and the baseline experiment
Compared with IT4RE, Perceptron, and LSTM, the accuracy of the BiLSTM-CRF method for extracting actors and use cases was 88.82%. The reasons for the improvement of precision may be due to two aspects: (1) The number of correctly extracted real actors and use cases is increased, that is, the number of fully extracted actors and use cases is increased. (2) The number of actors and use cases which are non-actors and non-use cases is decreased, that is, in the extracted contents, the number of non- actors and non- use cases is reduced.
When BiLSTM-CRF was used to extract actors and use cases, the recall was only 75.50%, which was 1.98% lower than the optimal IT4RE method in the benchmark experiment, and the effect was improved compared with the Perceptron and LSTM methods. This result may be due to the following reasons: (1) The number of real actors and use cases extracted correctly is reduced, that is, the number of actors and use cases in extracted contents is reduced. (2) The number of real actors and use cases not extracted is increased, that is, the number of missing actors and use cases in the extracted content increases. The F1 score of BiLSTM-CRF shows a reconciliation of precision and recall to 81.62%, 13.73% higher than IT4ER, 31.82% higher than Perceptron, and 16.19% higher than LSTM. The overall effect is good.
For example, in the sentence “
, 
(After the administrator successfully logged into the system, the administrator can add or delete users)”, the actor is “
(administrator),” and the use cases are “
(log in),” “
(add user),” and “
(delete user)”. These are real actor and use cases, and the rest are non-actors and non-use cases. Since any method has certain errors, the extracted content cannot be the same as the real actors and use cases, but may contain the content of real actors, use cases and non-actors, and non-use cases. The number of these two parts will affect the value of precision. Similarly, the extracted content may contain only part of the real actors and use cases, and the missing part is the real actors and use cases that are not extracted. The number of the unextracted content and the number of actors and use cases in the extracted content both affect the recall. As shown in Table 8, the extracted real actors and use cases are “
(administrator),” “
(login),” and “
(delete user),” and their number is 3. The extracted non-actors and non-use cases have “
,” the number of which is 1. The number of real actors and use cases that are not extracted is “
(add user),” the number of which is 1.
Experimental results of expected and possible
This experiment extracts two types of content: actors and use cases. To further analyze the experimental effect, we will evaluate the results of extracting actors and use cases. Table 11 shows the BiLSTM-CRF method used to extract the assessments for the two types of actors and use cases.
As shown in Table 9, the precision of extracting actors through the BiLSTM-CRF method is 92.16%, and that of extracting use cases is 87.39%. Thus, the number of actors extracted correctly is greater than the use cases or the number of non-actors extracted incorrectly is less than the non-use cases. The recall of extracting actors is only 64.38%, and the recall of extracting use cases is higher than that of actors (81.89%). We can easily see that the number of real actors extracted correctly is less than the number of use cases, or the number of real actors not extracted is more than the number of use cases. The F1 score for extracting use cases is 84.55%, higher than 75.81% for actors. According to the F1 score, this method is more effective in extracting use cases than actors.
The result evaluation of Actors and Use Cases extracted by BiLSTM-CRF
For example, in “
,
“
(After the administrator successfully logs into the system, the administrator can add or delete users). In this sentence, the actor is “
(administrator),” use cases are “
(login),” “
(add user),” “
(delete user),” which are the above mentioned real actors and use cases. The extracted content includes the actor and use case sections. In the actor part, there may be real actors such as “
,” there may be non-actors except for real actors, and there may be the case of no extracted actors, that is, missing actors. The number of these parts will affect the precision and recall of this method to extract actors. Similarly, use cases consist of real use cases and non-use cases, as well as missing use cases. The number of these parts affects the precision and recall of the method to extract use cases. As shown in Table 8, the number of real actors and use cases correctly extracted are 1 and 2, the number of non-actors and non-use cases extracted are 0 and 1, and the number of unextracted actors and use cases are 0 and 1, respectively.
To sum up, some errors occur in the results. The reason may be that the data set requirements document type is not rich, actors in the system or software of different types have certain differences, and use cases in the system including adding, deleting, modifying, and querying the contents of the four basic categories, when the model to identify one class, combining context can more fully recognize the cases. However, adding, deleting, modifying, and querying are not all use cases, and some errors may occur when combining the context.
Errors in the performance of the method proposed may be due to the following conditions: Uniformity of data set format: Data sets are collected from the requirements survey and summary texts of different software. Different development teams have various writing formats and habits, so different forms of similar content may appear. For example, in the student management information system, the teacher can input scores for each student, and some requirements text will be refined to add, modify, query, and delete scores. Unity of attribute content: The annotation of actors and use cases in the training set is manually identified, so different people have different understandings and are easily affected by subjective consciousness. For example, in the school’s educational administration system, administrators manage students’ grades and classroom resources, and there may be administrators for several different modules. If only administrators are marked, some actors may be missed. Redundancy of results: For short text, the same type of actor and use case only appear once, and no problem occurs in the extraction process. However, for long text, the same content may appear many times, which may lead to the redundancy of some actors and use cases. The comprehensiveness of the field involves several elements. The actors of software in different fields differ greatly, such as the educational administration system that has teachers, students, and others; the catering system that has waiters, purchasers, and others; and the industrial system that has analysts, entrants, and others. However, there are no fixed forms of actors, so judging only by part of speech and context has limitations. Thus, we need to scale up and give the model more learning data.
This paper proposed a method to extract actors and use cases from the requirements text. The method is an automated method that combines BiLSTM and CRF to automatically extract actors and use cases from the requirements text in NL. BiLSTM was used to predict the tag of each character according to the context content, forming the tag sequence. After CRF calculated the transfer score, the predicted tag sequence of the sentence was determined by the results of the BiLSTM and CRF layer. Then the actors, use cases, and other information were distinguished by the prediction tags, and the actors and use cases were finally extracted. In this study, BILSTM-CRF method was proposed to solve the problem of automated extraction of actors and use cases, which made up for the limitations of previous studies that required manual participation in multiple stages in requirement analysis experiments, and expanded the depth of research on extracting actors and use cases from the requirements texts. At the same time, from a practical point of view, the actors and use cases we extracted can be directly provided to developers who create software functions. This approach can not only saves the time cost of software developers to understand software functions from a large number of texts, but also prevents the ambiguity caused by different people’s inconsistent understanding.
However, this study has limitations that need to be further addressed. First of all, the extracted actors and use cases are scattered, and the use cases are not attributed to the corresponding actors. One can associate the actors with the use cases and distinguish which use cases belong to which actors. Second, our method can extract by sentence or by document, but cannot summarize and analyze the extraction results. Our method also cannot distinguish between the cross use cases (common use cases) and unique use cases, and cannot determine the permission level of actors. In the future, these issues can be solved or the scattered use cases can be divided into blocks. Furthermore, multiple operations of a function can be gathered and integrated.
Footnotes
Acknowledgment
The work was supported by Shaanxi Provincial Social Science Fund Research Project (2018S28). We thank all participants in the experiment and the professors for their advice.
