Software reusability metrics prediction by using evolutionary algorithms: The interactive mobile learning application RozGaar

Abstract

Considering object oriented program based software metrics (cohesion, coupling and complexity) and their significance to characterize software quality, particularly software component reusability, we have considered six important CK matrices. The predominant reason behind using the measurement technique is the individual relationship with the design aspect and fault-proneness or aging-proneness. The key objective of this paper is to generate employment opening to thousands of people who have different skillsets and furthermore to provide hassle-free services by RozGaar service providers to customers with the help of machine learning techniques. In the current century’s rapid growth of modernization and automation, manual labor is reduced which gives rise to unemployment at mass. If we need technicians, workers, plumbers or drivers who work on daily wages, it is quite difficult to find one in our locality without having any contact references and knowing the quality of the work they provide. This paper helps in filling the gap between the various customers and the service providers. We aim to introduce this paper as an ocean of opportunities for all where people can get jobs on a daily basis and can earn money for their skills. The used application is a dual-platform application that runs on Android devices and on Internet as a website, promising you to provide unmatched services of daily work. To achieve the goal, we used the novel software prediction model, evolutionary algorithms such as decision tree, Rough Set, and Logistic Regression algorithms, to predict software reusability.

Keywords

Software reusability metrics reusability prediction software metrics through regular expression software reused code from apps architecture and framework reused

1. Introduction

In the 21 ${}^{st}$ century, mobile applications are developed rapidly and are deployed in the play stores. The apps may run on smartphones or tablets. Different app stores are available, e.g. Google Play App store, Blackberry App World, Windows Phone App Store, and Apple App Store. The question is how these apps were develop within limited time. The answer is quite straightforward: the concept of reusability. It is a challenging task for developers how much percentage of code is going to be reused for developing the new app within a short time span. Mobile phones and tablets have grown to be most accepted over the recent years. They offer the opportunity to rapidly and effortlessly attach to others and look for information on the web. This is the reason behind the widely use of smartphones by both customers as well as the industry [41, 42] who investigates these devices. People believe that their strategy offers them means of rapid information that they can demand. As an alternative of improving on hardware only, we can try to build smarter phones, e.g. let them forecast our events so that they can accomplish or organize some work for us. The aim of this paper is to attempt to predict the job market through mobile apps. This information is vital to reduce load for job seekers. Mobile phones and tablets have grown to be most accepted over the recent few years. The apps that are employed focus more on the type of application that is used. There are several agents required to train the machine and to learn which type of ML algorithm is most successful when applied. As a final point, the excellence of the predictions will be reviewed using replicated data.

2. Literature review

With the rapid growth of apps, the revenue increased over 23 billion dollars by the end of 2016. A large amounts of developers either worked with Android or iOS [1]. As mobile applications grow in popularity, its complexity increases simultaneously [2, 3, 4]. A large amount of developers believes that Android is the most important platform for handling different tasks smoothly. In the literature, nearly 85% of the users use the Android apps platform [2]. It is difficult to understand the source code of Android apps [2]. Yuan Tian et al. [4] have examined 28 issues that have the length of eight dimensions to recognize how the highly rated apps are dissimilar from the low-rated apps. They also influenced factors by applying the random forest classifier to identify the high rated apps. Similarly, Padhy et al. [5] explained how mobile apps developed using object oriented programming language such as C# and Java. They demonstrated how to estimate the software metrics from apps as well and how to automatically estimate the metrics. Their prime focus was to estimate metrics using CK (Chidamber and Kemerer).

The term mobile apps derives from mobile applications, which are able to run on smartphones and tablets [6]. A large amount of users reuse the existing code and develop new android apps. The thought of inheritance is termed reusability. The software metrics are estimated from android app codes. The metrics, such as WMC (Weighted Methods per Class), DIT (Depth of Inheritance), CBO (Coupling between Object), NOC (Number of Children), and RFC (Response per Class) are called CK-Metrics [7]. Developers use the concept of software quality to improve the functionality of apps to attract customers. The following properties might be used reliability, efficiency, maintainability, reusability, and quality. At every stage, the developers must think about the functionality of the software’s development life cycle (SDLC). Metrics are maticulsoly checked and verified at each stage of the SDLC process. Some metrics may be used during the coding whereas others may be used in the process of the development stage. Some typical metrics are used only when the project is completed but in some situations metrics may be used at earlier stages [7]. Numerous studies have researched the metrics and developed prototype models which are able to find quality attributes such as maintainability [8], reusability [9], usability of the component [10], and reliability of the component based architecture [11]. The benefits of the intensified reuse are: lower projects costs, less staff, less time, and consequently improves the reliability of the code [12]. Once a reusable component has been developed, it can reused in a new product. The newly developed product must certainly have a lower fault density [13].

Software qualities such as high cohesion and coupling, modularity, and low complexity are the most important factors influencing the software reusability prediction [14]. Most of the apps use Java programming language with the help of the Android Software Development Kit. Padhy et al. [20] described the properties of software metrics and how to estimate the metrics from software code. They have presented the estimation technique to measure from the C++ and C# code. CK-Metrics are widely used in the field of software engineering; not only for prediction but also for cost estimation. With the help of OOM (Object Oriented Metrics) reusability assets can be measured efficiently. Padhy et al. [21] and Tian et al. [22] presented a paper about rating apps. They focused on how highly rated apps are different from low-rated apps.

Considering the significance of Chidamber and Kemmerer Object-Oriented Software metric (i.e. CK-Metrics), in our research we intended to exploit associated features to characterize reuse-proneness of a software component in WoS software. Undeniably, realizing the fact that the excessive component reusing might give rise to complexity, lack of cohesion eventually leads to aging proneness of fault proneness. In this case, assessing these features (i.e. complexity, cohesion and coupling) might play a vital role in identifying reuse-proneness of a software component in software. Considering OOP based software metrics (cohesion, coupling and complexity) and their significance to characterize software quality, particularly software component reusability, we have well thought-out key components of CK metrics (i.e. WMC, CBO, DIT, LCOM, NOC, and RFC). The predominant reason behind using these software metrics is their individual relationship with the design aspect and fault-proneness or aging-proneness. For example, an increase in the link of code (LOC) usually increases the complexity and execution time. On the other hand, coupling features that state how well a class is connected to the other might cause adversaries in case of improper coupling between (or among) classes or functions. In any software function response, instructions often play a decisive role in assuring a reliable function. In this relation, Response for a Class (RFC) metrics must reflect the allied classes or functions’ responses. In this relation, a function in denial model for any request shows aging behaviour. Thus, assessing RFC of software can be efficient in assuring responsiveness of the classes. Undeniably, cohesion between or amongst classes signifies uniformity of the artefacts in software. Under such circumstances, assessing software metrics such as Lack of Cohesion in Methods (LCOM) can be significant to characterize aging-proneness of a software component. Similarly, Depth of Inheritance Tree (DIT) too signifies the maximum suitable length from the root to the node in the tree signifies reliability of the function and coherence. Therefore, the assessment of the selected software metrics (i.e. WMC, CBO, DIT, LCOM, NOC, and RFC) can assist in characterizing a class or software component for its reuse-proneness.

Due to the above mentioned reasons we have considered these features for reuse-proneness estimation or reusability estimation. Khan and Mahmood [23] described the complexity of the project and the method of calculation of shift and value shift in his work. ArunKumar and Dillibabu [24] developed “a model which enhances the software quality without increasing cost, effort and time”. Padhy et al. [25] discussed the software reusability metrics and its proposed model, algorithms and optimization techniques.

Considering suitability of CK metrics for software quality assessment (fault proneness, maintainability, reliability, reusability, scalability, etc.), we have applied CK metrics in this research. For the sake of simplicity, we have applied Chidamber and Kemerer Java Machine (CKJM) tool to extract the software features or metrics values. Since there are 100 software projects in our model (all are developed in Java language with OOP concept), this software is given as input to the CKJM tool and accordingly the respective 22 CK metrics were obtained, out of which six metrics (i.e. WMC, CBO, DIT, LCOM, NOC, and RFC) were selected as input for further process. We have followed the instructions proposed in major online resources such as https://www.spinellis.gr/sw/ckjm/. Evolutionary algorithms were used to solve the optimization problems which were able to predict and determine the high-performing reinforcement-learning strategies [38, 39]. An agent can predict the model of state transition probabilities of the environment, but the state transition probability must be fixed. The performance of a user on his phone has to be modeled by a Markov Decision Process. There are successful attempts at predicting human intent using MDPs [40]. Another learning scheme was developed called as SARSA, which is closely related to Q-learning, but it integrates policy learning. When the actions are continuous, there is no need to consider all actions and we thus had to identify the Q-values to select one. This type of learning is called actor-critic learning [43]. Software quality can be enhanced by using an approach to quality assessment with triangular fuzzy information. Y.R-Yang et al. used MADM (Multiple attribute Decision making) techniques along with fuzzy information to gain the software quality.

The key objective behind this RozGaar mobile application and website is to generate employment opportunities who have different skillsets and to provide hassle-free services to customers.

Scope

RozGaar will serve customers to greater extents by creating job opportunities as well as avail any daily basis services provided by RozGaar in their locality. Our main aim is to focus on the rural population and act as a link between them and the customers who need helping hands to get any task done without any hustle. Besides all these, certain features can be added to RozGaar, such as: organizing coaching classes through video tutorials and help the registered members to enhance their skills for better livelihood and better future scope. We implemented evaluation and performance ratings so that we could analyze the total efficiency of the RozGaar members who are responsible for providing services to the customers.

Social impact of the research work

With help of MobileApps, job seekers can view the jobs for upcoming days, from which both parties can benefit. Job prediction is a difficult task nowadays, but our MobileApps can forecast the upcoming jobs by using state-of-the-art of machine learning algorithms.

2.1 Research goals

The problem statements pointed out in this paper are addressed below.

RQ1: What type of ML is required for the agents?

RQ2: Which mobile platform should be used?

RQ3: How can the RozGaar App predict the jobs using machine learning techniques?

RQ4: Which types of challenging tasks does RoZGaar (MobileApps) perform?

These problems are addressed below.

RQ1: What type of ML is required for the agents?

This paper provides the novel techniques of ML (machine learning) algorithms which are regularly used for mobile technology. We have simulated the algorithms so that the MobileApps is able to predict the jobs for the job seekers. There are different types of agents available but properly not defined so far. According to Wooldridge [129], an agent can be defined as “a computer system that is situated in some environment, and that is capable of Autonomous action in this environment in order to meet its design objectives.â€™’ There are 3 types of machine learning techniques that are used: supervised (1), unsupervised (2) and reinforcement learning (3). In category 1, the beginner accepts an example input value along with their results and is supposed to study the contribution to the accurate outputs. In category 2, the expected output is not known and thus the beginner has to provide the input. In category 3, the beginner is required to achieve a certain goal in a dynamic environment. In précis there is a clear objective, as the environment is dynamic and at the same time there is an assessment of the results, the conclusions aren’t explicitly affirmed. It is completely accomplished that reinforcement learning desires to be used to study the actions of a user. Compared to the other two ML approaches, the reinforcement learning approach is most suitable and this is thus the one we have chosen in this paper. The below algorithm provides the basic information such as the state, time and reward. The different states are mentioned as ‘S’ and time of interval is represented by ‘T’ and ‘R” for reward as well as to measure the quality and value we have represented as ‘Q’/’V’.

Algorithm for quality measure with respect to state, time and reward
for each episode do
$s\in S$ is initialized as the starting state
$t:=$ 0
repeat
choose an action $a\in A(s)$
perform action $a$
observe the new state $s^{\prime}$ and received reward $r$
update $\tilde{T}$ , $\tilde{R}$ , $\tilde{Q}$ and/or $\tilde{V}$
using the experience $\langle s,a,r,s^{\prime}\rangle$
$s:=s^{\prime}$
until $s^{\prime}$ is a goal state

RQ2: Which mobile platform should be used?

The three different platforms mentioned before all have their own perks. For this investigation it is predominantly essential that at least some of the features of the phone can be checked, apps can be begin from another app and the object oriented language Java is supported. There are numerous platforms available for these kind of applications. Here we have taken Android as the platform, for which Eclipse is the most likely choice. This is because this environment provides the tools and strongly supports the Android development environment. Android fully supports the object oriented paradigm. These are the following platforms we have used for mobile application: Windows Phone, Android, iOS, because it is easy to set up these systems.

RQ3: How can the RozGaar App predict the jobs using machine learning techniques?

The competence and usefulness of forecasting model depends on the class of the software measurement data. Figure 1 indicates the proposed work. The flow diagram predicts the jobs from the apps by using the novel evolutionary algorithms. First, the class has to be identified as well as the UML diagram from the Mobile App (RozGar), which is a challenging task. Once the class information is identified, the metrics are obtained. Further source code measurements are validated with the help of the suitable proposed model (outlined below). These selected metrics are fed into the ANN model.

Figure 1.

Effort measurement from Android Apps by using ML techniques.

The above flowchart represents the effort measurements from Android application (RojGar) by using the different ML (machine learning) techniques by taking the dataset into consideration. The dataset will be input to the model.

Step 1: It is the basic steps of the RojGar Apps. Collection of the data set for developing effort estimation as well as prediction model.

Step 2: Preprocessing of the large data set is a tedious task. Therefore, the entire dataset is divided into separate parts. In order to get accuracy, the dataset is merged. Every portioned dataset has to be taken care of properly. In this phase the key attributes are identified and processed. The rest of the attributes are removed from the dataset.

Step 3: Data normally distributed

The statistical investigation of the dataset has been carried out. It is confirmed as to whether the dataset track has a normal distribution or is not based on the values of skewness and kurtosis. If normalization is done, the transformation can be done directly so as to get the equal distribution.

Table 1

Software prediction from the decade (2009–2018)

Name of the author	Method used by the author
Abaei et al. [27]	They have used a special type of method which is called as semi-supervised hybrid self-organizing map (HySOM)
Goyal et al. [29]	They have used the KNN based regression technique
Padhy et al. [26]	They have used the prediction techniques like Levenberg Marquardt (LM) learning model, Extreme learning machines (ELM)
Cruz et al. [30]	Logistic regression
Burrows et al. [31]	Logistic regression
Aggarwal et al. [32]	ANN, Logistic regression
Zhou et al. [33]	Logistic regression
Fokaefs et al. [34]	Decision tree analysis
Dong et al. [35]	Multiple linear regression, Principal component analysis
Mishra et al. [36]	SVM, Fuzzy, and Nave Bayes
Sezer et al. [28]	ANN, SVM and ANFIS
Padhy et al. [37]	Decision tree, SVM, Naive Bayes model, Polynomial regression, Linear regression, Multivariate adaptive regression spines (MARS), Levenberg-Marquardt (LM) learning based ANN model, Gradient descent based ANN Model

Step-4: Data transformation required

If normal distribution is not done properly, we need to use the novel concept of logarithmic transformation technique and apply it into the dataset to get the dataset normalized. Some other techniques we have adopted to represent the graph by using the histogram technique to validate the distribution of data earlier and later transformation.

Step-5: Needs to scaling the data set

In this step we have measured the values for the input vectors independently, which are contained within the range {0 and 1}. We can represent the scalling the data set as below:

$\displaystyle y^{\prime}=\frac{y-\min(y)}{\max(Y)-\min(Y)}$ (1)

Figure 2.

Problem solving approach in RozGaar.

Figure 3.

Framework of RozGaar App.

Where $y^{\prime}$ is the normalized values and can be represent the range [0, 1]. $\min(Y)$ can be identified as the minimum value of the $y$ and $\max(Y)$ is the maximum value of $y$ . We consider the threshold value is 0.5 where $\min(Y)=\max(Y)$ .

The complete dataset is divided into two different sets called as preparation and examination set. The preparation set is employed for model assessment, though the examination set is employed just for evaluating the predictable attempt of the finishing model. Ten-fold cross-validation process is used to predict the job.

Step-6 Apply ML (machine learning) techniques using prediction the effort value

In this step the effort value is forecasted by using the different novel machine learning algorithms such as MARS, CART, decision tree induction, NB (Naïve Bays Classification), KNN, etc.

Step 7: Performance evaluation

In this step the performance measurement is done through the different ML (machine learning) algorithms. The different models have been used to carry out the task: Root Mean Square Error, Prediction Accuracy, The Mean of Magnitude of Error Relative to the estimate (MMER), The Mean Magnitude of Relative Error (MMRE), The Mean Absolute Error (MAE). These are the main parameter during performance measurement.

RQ4: Which types of challenging tasks does RoZGaar (MobileApps) perform?

The following task is to resolve the proposed model and flow chart. The key contributions are highlighted in the below section.

Figure 4.

Preference diagram.

Apart from the research goal and challenging tasks, the following problems are also discussed throughout this paper: How to recognize the class and UML diagram; How to identify the reusability metrics; How to predict the software reusability, and how to estimate the reusability prediction through mobile apps.

3. Software prediction techniques

During the systematic survey of prediction techniques, different methods are examined and relationships are derived between the OOM (Object Oriented Metrics) and fault proneness as pointed out in the tabular form. From Table 1 it can be observed that various methods such as logistic regression, decision tree analysis and Naïve Bayes classifier, are commonly used by researchers.

4. Proposed frame work of RozGaar mobile application (App)

The below framework has been potentially used in mobile apps. The entire framework consists of 3 steps which are described in Fig. 3.

Step 1: Extract all data from diverse sources and store it in the repository (i.e. central database). It contains the sequence regarding the different sources (i.e. metadata).

Step 2: The requirements will be gathered from the users by using APKTOOL

Step 3: The factors that are influencing the reusability will be analysed.

5. Flow diagram of main class (preferences checking)

Figure 4 depicts the state checking of the application using data accumulate in preferences. “Preferences” get the constant PREFS which stores the value for LOG_PREF and IS_LOGGED_IN, which are responsible for initialization of tables and check whether a user is logged in or not. If LOG_PREF value is false then it will call the initialization () to initialize the table only for the instance time whenever the application runs on a device for once. Similarly, IS_LOGGEG_IN verifies whether the user is logged in or not. If logged in, the application will open City_selection page. If not, it will open Client_Login_portal.

5.1 Algorithm for preference checking and initializing tables [Check login preferences]

(1) Let DB, LOG_STATUS, INITIAL_STATUS, USER, PASS, PREF

(2) Set DB := OpenorCreateDb(“RozGaar”)

(3) Set LOG_STATUS := false

(4) Set INITIAL_STATUS := false

(5) Set PREF := getPreferences()

(6) If PREF.getBoolean(LOG_PREF) = false, then

6.1 Initialization(DB, INITIAL_STATUS, PREF)

6.3 Else if PREF.getBoolean(IS_LOGGED_IN) = false, then

(7) NextActivity(Src.class) [City_selection class]

[End of IF Step 7.1]

[End of IF Step 6]

5.2 [Initialization function]

Initialization (DB, INITIAL_STATUS, PREF)

(1) Set DB := OpenorCreateDb(“RozGaar”)

(2) Set DB := createTable(“users”)

(3) Set DB := initializeTable(“users”)

(4) Set DB := createTable(“emps”)

(5) Set DB := initialize Table(“emps”)

(6) Set INITIAL_STATUS := true

(7) Set PREF := edit(LOG_PREG,

INITIAL_STATUS)

5.3 Pseudo code customer grievance

Generally the mobile app is unable to provide the information about the product and it is thus unknown whether the user is satisfied or not. For this reason we have developed the Pseudo code for customers, as satisfying the client is the main goal of the developer. This automated tool provides enables the user to review the product. The tool reviews the suggestion and grievances. Here we are taking the utmost care of customers to request and response.

(1)
Set all the grievance is equal to empty.Grievance $=$ {Ø}// Initially no grievance
(2)
assemble everyone’s the complaint and suggestions
(3)
Read the entire grievance with different types.
(4)
Loop:
(5)
Repeat for each suggestion:
(6)
Inspect all the text in the suggestion
(7)
if the suggestion exactly like same type then:
(8)
Point out the suggestion with specific type.
(9)
Else:
(10)
Insert then a new suggestion and include it.
(11)
do again stepladder with the fresh grievance

6. Reusability estimation level

The reusability stage can be estimated by using the OO-CK metrics (DIT). These metrics are used to calculate the deepness in the inheritance tree in the class diagram. Sometimes it is also called a nesting level hierarchy. These metrics can estimate the class diagram of any object oriented program. The maximum length can be obtained from the start of the class level to the ancestor level in the class hierarchy. The new formula is derived to measure the reusability in the class. As we know, the thought of inheritance is the reusability. Due to inheritance, the number of overridden methods and object reference in a class point out the reusability. Hence, heritage is the straight pointer of reuse in the class itself.

By using the least square regression analysis, we need to calculate the

$\displaystyle M=(\textit{TOVM}+\textit{TOR})$ (2) $\displaystyle N=p+C*M$ (3) $\displaystyle\textit{Reusability Metric}=N/M$ (4)

Where $p=N-C*M$ , $C$ is the empirical constant which is nearly scale from 0.0 to 1.00 (0–100%). $T$ is the total number if methods. $M$ is the total overridden methods in the class $+$ the object reference in the class.

import java.util.regex.*;

import java.util.*;

import java.awt.event.*;

Class test{

array:\\w*\\s*(\\s*\\[\\s*\\]\\s*)*\\s*\\w*\\s*(\\s*\\[\\s*\\]\\s*)*

arguments:(\\w*\\s*(\\s*\\[\\s*\\]\\s*)*\\s*\\w*\\s*(\\s*\\[\\s*\\]\\s*)*,?)*

cons:"\\s*((public|private|protected)?\\s+)?\\w+\\s*\\({1}?\\s*(\\w*\\

s*(\\[?\\s*\\]?)*{2}?\\s*,?\\s*)+\\s*\\){1}?\\s*\\{?\\s*}?"

class:((public)?\\s+)?(class)\\s+\\w*\\s*((extends)\\s+\\w*\\s*)?

(\\s+(implements)?\\s+(\\w*\\s*,?\\s*)*)?\\{?\\s*\\}?

class name extraction:

Pattern pattern=Pattern.compile("class (.*?) \\s*");

Matcher matcher=pattern.matcher(want);

if(matcher.find())

{

System.out.println (matcher.group(1));

}

//[^\$*\$*])

public static void main(String args[]){

String want="";

String cons="\\s*((public|private|protected)?\\s+)?\\w+\\s*\\({1}?\\"+

"s*(\\w*\\s* (\\[?\\s*\\]?)*{2}?\\s*,?\\s*)+\\s*\\){1}?\\s*\\{?\\s*}?";

while(!want.equals("exit"))

{

Scanner sc=new Scanner(System.in);

want=sc.nextLine();

boolean b=Pattern.matches(cons,want);

System.out.println(b);

}

class exp extends test implements ActionListener{

public void actionPerformed(ActionEvent ae)

{

}

Once we achieve the reusability, we can measure the complexity level. Complexity can be measured by using a traditional or object oriented approach. Some of the complexity measurement techniques are Halsted Measures and Cyclomatic Complexity. It depends on the number of operands used in the source code. The term operands stands for the number of identifiers and the constants as well as the number of operators, keywords, etc. The automated tool calculates the complexity by using the standard level by taking some of the parameters. It calculates the number of iterations present in the program, the total number of keywords, etc.

Table 2

Complexity of the metrics and identifying the risk level

The complexity	Identifying the level of the risk
level (range)
01–10	If a very simple program is used, no risk is
	considered
11–20	Assumes more complex but moderate level risk
21–50	Consider as highly complex and moderate risk
	level
$>$ 50	Consider as most complex and very high risk

Table 3

Reusability influencing

Name of the metrics	Metrics	Reusability
	property values	influencing
Depth of inheritance	Increases	Decrease
Lack of cohesion metrics (LCOM)	Decreases	Increase
Number of children metrics (NOC)	Increase	Decrease
Number of line count metrics (LOC)	Increase	Decrease
Coupling between object (CBO)	Increase	Decrease
Number of methods (NOM)	Increase	Increase

//single variable or array declaration:

//\\w*\\s*(\\s*\\[\\s*\\]\\s*)*\\s*\\w*\\s*(\\s*\\[\\s*\\]\\s*)*

//multi-variable declaration or arguments:

(\\w*\\s*(\\s*\\[\\s*\\]\\s*)*\\s*\\w*\\s*(\\s*\\[\\s*\\]\\s*)*,?)*

//constructor: p*\\w*\\s*\\w+\\s*\\({1}?\\s*(\\w+\\s*(\\s*\\[\\s*\\]\\s*)*

\\s*\\w+\\s*(\\s*\\[\\s*\\]\\s*)*,?)*\\s*\\){1}?\\s*\\{?

1.methods counter

2.abstract method counter

3.constructor counter

4.class variable counter

5.Object variables counter

6.Inherited Attributes

import java.io.*;

import java.util.*;

import java.util.regex.*;

public class counter{

int methods=0;

int abs_methods=0;

int cons=0;

int v_counter=0;

int ov_counter=0;

counter() throws Exception

{

FileReader fr=new

FileReader("C:\\Users\\SURAJ\\Videos\\dfr\\FileScanner.java");

BufferedReader br=new BufferedReader(fr);

String s;

while((s=br.readLine())!=null)

{

}

public static void main(String args[]) throws Exception{

boolean b = Pattern.matches("\\w b", "adafvb");

System.out.println(b);

}

In the literature survey we found some important aspects about the metrics (Table 3). In Table 3, the reusability influences are listed.

7. Software metrics estimation code and parser techniques from app

In this section the parser finds the metrics from the mobile application (RozGaar). Each time the parser scans the apps, the object oriented metrics (i.e. software metrics) are estimated. The parser is a search machine that determines the software metrics, by which we can predict the software reusability by using novel machine learning techniques. Below is a sample of the parser; how it can scan the software code. The below code is a snippet code developed using Java which analyzes arras, classes, methods and constructors.

In this case, the above code tests if a token is a constructor. We then have to change the express string to the correct RegEx (some of the these expressions are above):

•
‘exprsn’ variable contains the RegEx for matching with the tokens
•
‘want’ variable contains the test case entered by the user to check if RegEx is correct

The loop continues until the user types exit. The loop is created to test all the possible combinations and cases of the input.
7.1 Regular expressions accepted by java by using automata

The below code developed using the regular expression using java. The objective of the code to identify the single and multiple variable as well as constructor methods.

7.2 Proposed software reuse code from mobile App (SRCM)

SRCM is a technique through which we can identify the code reusability. How the code is reused in the mobile application is another challenge. We can estimate the percentage of codes that are exclusive to a particular app and to which class the code belongs. The proposed estimation technique is known as PCQR (the percentage of class name uniquely reused).

$\displaystyle\textit{PCQR}=1-\frac{\sum\text{Uniquely reused class}}{\sum\text% {Unique class}}$ (5)

In the above proposed equation it is clear that, when PCQR is high, the reused class signature is high as well.

From the literature survey, it is clear that almost all the mobile applications reused their software, code, architecture, and/or functionality. The survey pointed out that approximately 86.56% of the class signatures match the older versions of mobile apps.

7.3 Architecture and framework reused (FAR)

Not only code will be reused but also the framework and the architecture for developing the new apps within the stipulated stage of time, i.e. FAR (Framework and Architecture Reused), are reused. It has been observed that new apps have a similar framework and architecture where a list of classes and methods are the same and exhibit the same functionality. The numerical models have been projected for the said task:

X and Y are the two classes where the signatures are similar. If the set functionality of class X is similar to the class Y then;

$\displaystyle\text{Limited(X,Y)}=\frac{|\text{s(X)}\cap\text{s(YB)|}}{|\text{s% (X)}|,\text{s(Y)}}$ (6)

s(X) is set of signatures in app X.

A high reused figure means that maximum class prototypes are reused in the new apps.

Figure 5.

Client login portal.

Figure 6.

Welcome screen.

Figure 7.

Mail or call us.

Figure 8.

Service provider.

Figure 9.

Selection area.

Figure 10.

Easy hire.

8. Android application screenshots

The below mentioned figures are snapshot of the MobileApps. The user has to log in to MobileApps by providing the user ID and password (Fig. 5). Once the user has logged in to the Apps, a welcome screen will appear (Fig. 6).

Figures 7 and 8 are used for job seekers who have to call or mail the employers so that different types of services can opt from the Apps.

Figure 11.

Proposed software reuse proneness prediction.

9. Proposed software reused prediction model

In this section different predictions or classification algorithms including decision tree (DT) algorithm, Logistic Regression (LR), Logarithmic Regression (LRR), Naïve Bayes (NB), Pearson regression (PR), Support Vector Machine (SVM), Multivariate Adaptive Regression Spline (MARS), Artificial Neural Network (ANN), and Adaptive Genetic Algorithm (AGA) based ANN are discussed for reusability prediction (Fig. 11).

9.1 Adaptive genetic algorithm (AGA)

Genetic Algorithm (GA) is an adaptive search method for finding optimal or near optimal solutions, premised on the evolutionary thoughts of normal selection. The fundamental concept of GA is focused on simulating processes in the natural system required for evolution, distinctively those that consider the Charles Darwin principles representing the terms of the survival of the fittest. Considering procedural flow, GA at first generates the initial population arbitrarily, where the population refers to a set of solutions. The discussed answers are nothing but a chromosome that possesses a form of binary strings where all the comprising parameters are supposed to be encoded. Generating the population, GA estimates the fitness function of individual chromosome. As per retrieved fitness values, offspring are produced using genetic operators – crossover and mutation. Applying these genetic operators, the generations of the population are repeated iteratively until the stopping criteria are satisfied and an optimal solution is achieved. As illustrated in Fig. 3, the proposed ANN model comprises $i-h-o$ network configuration with input layer, hidden layer and output layer or neurons. In the proposed ANN model, all the six considered CK metrics or feature vectors are fed as input to the individual input node, where each feature vectors’ metrics accompanies the number of classes available in datasets. Considering Fig. 3 and relevant network configuration, there is N weight required to be estimated. Mathematically, the number of weight vectors is:

$\displaystyle N=(i+O)*h$ (7)

Here, the individual weight, which is considered as gene in the chromosomes of the A-GA, is a real number. Considering the gene length or the number of digits is $l$ , the length of the chromosome $L_{\textit{Chrom}}$ can be estimated by the following expression:

$\displaystyle L_{\textit{Chrom}}=N*L=(i+O)*h*l$ (8)

9.2 Rough set analysis

Pawlak [15] introduced rough set analysis (RSA) as a generic approximation technique for a conventional set. Generally, before preprocessing the data set, we required the preliminary information, but in rough set analysis, supplementary information about data is not required. The intension of this analysis is to find the hidden pattern from the data set: it agrees to produce in a routine method the sets of choice rules from data, and it is suitable for simultaneous (parallel/distributed) dispensation.

The chronological accomplishment approaches of the rough set analysis method are obtainable as follows

Pass 1: Featured information gathering technique

During this stage, the pull out features from the CK metrics for every class are acquired.

Pass 2: Information Discretization

In this stage, willingness data is discredited as a result of means of K-means clustering algorithm.

Pass 3: Lower/upper approximation for all feasible data sets

In this phase, the lesser at the same time greater estimate value can be achieved as the combination of the entire includes sets (Phase-2) present in X.

Mathematically,

$\displaystyle\underline{B}X=\{x_{i}\in U|[x_{i}]_{\textit{Ind}(B)}\subset X\}$ (9)

Here, the greater estimation corresponds to the combination of each and every set, having component non-empty (say non-zero) connection with X.

Mathematically,

$\displaystyle\overline{B}X=\{x_{i}\in U|[x_{i}]_{\textit{Ind}(B)}\cap X\neq 0\}$ (10)

Pass 4: Calculate approximately correctness of the possible sets

An issue suggestive of correctness of $X$ in $B\subseteq A$ is the result by means of subsequent equation:

$\displaystyle\mu_{B}=\frac{\textit{Card}(\underline{B}X)}{\textit{Card}(% \overline{B}X)}$ (11)

Here, the cardinality of a set represents the total number of objects present in the lower or upper approximation of.

Pass 5: Collection of the approximated sets

During this phase, every possible set is preferred in such a way that their (individual) accurateness equals the correctness of the common set.

Pass 6: Data set collection

During this phase, the retrieve data set with least amount probable cardinality is chosen as the condensed set and is further used for categorization processes.

9.3 Decision tree algorithm

Decision tree based classification has been suggested for a long time [16, 17] and various enhancements have been incorporated. In the current decade, machine learning techniques are dominant in research, i.e. decision tree. Decision tree based categorization has been recommended for a long time and a mixture of improvement has been included in the supervised learning category. It accepts both kinds of definite and incessant input and output variables. Two kinds of modifications are allowed: C4.5 and C5.0, which are task-related on association rules and contain important appreciation towards mining and classification. In this paper, C4.5 decision tree algorithm [18] has been applied that uses recursive partitioning of the metrics data so as to classify classes as REUSABLE and NON-REUSABLE.

9.4 Logistic regression

Logistic regression is a type of regression analysis technique, typically applied to predict the results of a certain dependent variable on the basis of one or more independent variables [19]. Typically, a dependent variable can have only two values, and therefore the dependent variable of a software component or the class encompassing reusability is split into two clusters, where one cluster contains non-reusable components and another encompasses the components with minimal single reusability. As stated, in this paper LR technique has been used to form the prediction model that assesses reuse proneness of the classes in the web of service software. Here, selected CK metrics have been used in combination. Mathematically, LR can be represented by following equation:

$\displaystyle\log it[\pi(x)]=\beta_{0}+\beta_{1}X_{1}+\beta_{2}X_{2}+\ldots+% \beta_{m}X_{m}$ (12)

Where $\log it[\pi(x)]$ and $x_{i}$ state the dependent and the independent variable respectively. It depicts that LR method is a standard linear regression approach, where the dichotomous results are converted by logic transform. Since being used as linear regression, Logic transform varies the range of $\pi(x)$ from 0 to 1 to $-\infty$ to $+\infty$ . The variable $m$ signifies the number of independent variables, while $\pi$ refers the likelihood of the reuse proneness of the class during validation. Mathematically, it is presented as follows:

$\displaystyle\pi(x)=\frac{e^{\beta_{0}+\beta_{1}X_{1}+\beta_{2}X_{2}+\ldots+% \beta_{m}X_{m}}}{1+e^{\beta_{0}+\beta_{1}X_{1}+\beta_{2}X_{2}+\ldots+\beta_{m}% X_{m}}}$ (13)

Table 4

Logistic regression performance

SLNO	Techniques	Accuracy	F-measures
1	Logistic regression	87.91%	91.85%
2	Decision tree	93.41%	97.45%
3	Naive Bayes	81.32%	87.94%
4	Linear regression	85.62%	88.06%
5	Logistic regression	86.91%	90.58%
6	Polynomial regression	91.26%	94.81%

Figure 12.

Performance measurement of ML algorithms in RozGaar.

10. Simulation

This paper presents the prototype model developed in Java for test and validation purposes. Apart from this, the Apps developed in two versions: one is a mobile app and the other one is a website. The minimum requirements are addressed: we have used RAM 24 MB and the Intel Q8400 2.66 GHz processor. Apart from these some of the other parameters are size of the population, crossover, mutation rate, and chromosome length, all of which can be set through the system.

In this research work, the overall algorithms use MATLAB 2015a software tool. Since the proposed work intends to assess the reusability assessment for web of service software developed on object oriented software design paradigm, at first WSImport tool had been applied that converts web of service software projects into a Java file. Once the projects were converted into Java files, the respective classes were obtained using the CKJM tool. CKJM estimates the object oriented CK metrics values, especially for WMC, DIT, NOC, CBO, RFC, and LCOM metrics. Once the CK metrics values were retrieved, linear Univariate regression had been applied to estimate the reusability threshold values for the data. This was followed by features extraction, RSA based feature reduction and cluster validation, which was then followed by reusability prediction using different prediction techniques. The overall algorithms have been developed using MATLAB, and the prediction outcomes have been obtained in terms of a confusion matrix (see Table 4). Thus, applying the confusion matrix, the performance of the respective classifiers in terms of reusability prediction accuracy and F-Measure was obtained.

10.1 Results

We have derived the confusion metrics from our MobileApps (RozGar). From Table 4 it can be concluded that the accuracy rate is 93.41% and the F-Measure is 97.45% which is the highest in terms of the performance measurement of the MobileApps, i.e. decision tree (DT). In Table 3, the performance for accuracy is 87% and F-Measure is 91.85%, which is the second highest performance of the Apps. The Apps provide the least performance in the Naïve Bayes accuracy 81.32% and F-Measure 87.94%. Figure 12 indicates that DT induction provides better results: accuracy is 93.41% and F-Measure is 95.45% in comparison to other ML (machine learning) algorithms. In order to categorize the exact forecasting for the job scenarios from the Apps (RozGaar), this paper used novel techniques of the ML algorithms. The algorithms like DT induction, AGA (Adaptive Genetic Algorithm) are used to optimize prediction. This algorithm is used to select the model parameters of SVM (Support Vector Machine) for obtaining a better prediction performance. We have proposed one model which will predict the job the opportunity from the mobile Apps. We have taken the data sample sets like 54 numbers of web based application projects (most of them are developed Android Apps) and tested. Finally, the algorithm decision tree provides more accuracy (Table 4).

11. Conclusion and future scope

The whole system activities can be divided into two major parts: clients and service providers, although administrative services are also there to maintain the RozGaar application. Each one has their own role to perform and the system responds accordingly. Our developed MobileApps predict the accuracy as 93.41% and F-Measure as 95.45%. In the near future, researchers can use several machine learning algorithms to predict more jobs. Researchers can use suitable algorithms such as SVM (Support Vector Machine), Polynomial Regression, and GA-ANN which may be more accurate. Furthermore, the AGA-SVR model gives a better forecasting performance than the other models, such as SVM, rough set analysis, and linear regression. Consequently, AGA (Adaptive Genetic) and DT (decision tree) induction could be considered as one of the efficient alternative method for forecasting jobs.

References

Columbus

, Roundup of Mobile Apps and App Store Forecasts, http://www.forbes.com/sites/louiscolumbus/2013/06/09/roundup-of-mobile-apps-app-store-forecasts-2013/, Retrieved June, 2013.

Minelli

and Lanza

, Software analytics for mobile applications insights and lessons learned, in: European Conference on Software Maintenance and Reengineering (CS ’13), IEEE, 2013, pp. 144–153.

Ray

Wilcox

and Woskoglou

, Developer Economics – State of the Developer Nation Q3 2016, Vision Mobile, London, Tech Rep, (July 2015).

Tian

Nagappan

and Hassan

A.E.

, What are the characteristics of high-rated apps? A case study on free android applications, in: International Conference on Software Maintenance and Evolution, (ICSME ’15), IEEE, 2015, pp. 301–310.

Padhy

Singh

R.P.

and Satapathy

, Utility of an object-oriented metrics component: Examining the feasibility of. Net and C# object-oriented program from the perspective of mobile learning, International Journal of Mobile Learning and Organization 12(3) (2018), 263–279, DOI: 10.1504/IJMLO.2018.092777.

Lee

Schneider

and Schell

, Mobile Applications: Architecture, Design, and Development, 1st edition, Prentice Hall, 2004.

Chidamber

S.R.

and Kemerer

C.F.

, A metrics suite for object oriented design, IEEE Transactions on Software Engineering 20 (1994), 476–493, DOI: 10.1109/32.295895, ISSN: 0098-5589.

Heitlage

Kuipers

and Visser

, A practical model for measuring maintainability, Proceedings of the 6𝑡ℎ International Conference on Quality of Information and Communications Technology, 2007, pp. 30–39.

Washizaki

Yamamoto

and Fukazawa

, A metrics suite for measuring reusability of software components, in: Proceedings of the 9𝑡ℎ Software Metrics Symposium, 2003, pp. 211–223.

10.

Bertoa

M.F.

Troya

J.M.

and Vallecillo

, Measuring the usability of software components, The Journal of Systems and Software 79 (2006), 427–439.

11.

Reussner

R.H.

Schmidt

H.W.

and Poernomo

I.H.

, Reliability prediction for component-based software architectures, The Journal of Systems and Software 66 (2003), 241–252.

12.

Haefliger

Von-Krogh

and Spaeth

, Code reuse in open source software, Management Science 54(1) (2008) 180–193.

13.

Mohagheghi

Conradi

Killi

O.M.

and Schwarz

, An empirical study of software reuse vs. defect-density and stability, in: Proceedings of the 26th International Conference on Software Engineering, 2004, pp. 282–291.

14.

Taibi

, On measuring the reusability proneness of mobile applications, World Academy of Science, Engineering and Technology, International Science Index 91, International Journal of Computer, Electrical, Automation, Control and Information Engineering, 8(7) (2014), 1251–1259.

15.

Pawlak

, Rough sets, International Journal of Computer and Information Sciences 11(5) (1982), 341–356.

16.

Han

Kamber

and Pei

, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2011.

17.

Kuncheva

L.I.

, Combining Pattern Classifiers: Methods and Algorithms, Wiley-Interscience, 2004.

18.

Ting

K.M.

, An instance-weighting method to induce cost-sensitive trees, IEEE Transactions on Knowledge and Data Engineering 14(3) (2002), 659–665.

19.

Basili

V.R.

Briand

L.C.

and Melo

W.L.

, A validation of object-oriented design metrics as quality indicators, IEEE Transactions on Software Engineering 22 (October 1996), 751–761.

20.

Padhy

Panigrahi

and Baboo

, The statistical measurement of an object-oriented programme using an object oriented metric, 328 (2015), 605–618, DOI: https://doi.org/10.1007/978-3-319-12012-6_67, Springer.

21.

Padhy

Satapathy

and Singh

R.P.

, State-of-the-art object oriented metrics and its Reusability: A decade review, Springer Nature, Smart Innovation Systems and Technologies 77 (2018), Springer, Singapore, DOI: 10.1007/978-981-10-54544-7_42, Print ISBN978-981-10-5543-0.

22.

Tian

Nagappan

and Hassan

A.E.

, What are the characteristics of high-rated Apps? A case study on free android applications, in: ICSME ’15 Proceedings of the 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME), IEEE Computer Society Washington, DC, USA, September 29–October 01, 2015, pp. 301–310, DOI: 10.1109/ICSM.2015.7332476.

23.

Khan

M.A.

and Mahmood

, Measuring Flexibility in Software Project Schedules, Arabian Journal of Science and Engineering 40(5) (2015), 1343–1358, DOI: https://doi.org/10.1007/s13369-015-1597-x.

24.

ArunKumar

and Dillibabu

, Design and application of new quality improvement model: Kano lean six sigma for software maintenance project, Computer Engineering and Computer Science 41(3) (March 2016), 997–1014.

25.

Padhy

Satapathy

S.C.

and Singh

R.P.

, Software reusability metrics estimation: Algorithms, models and optimization techniques, Computers and Electrical Engineering 69 (July 2018), 653–668, DOI: https://doi.org/10.1016/j.compeleceng. 2017.11.022, Elsevier.

26.

Padhy

Singh

R.P.

and Satapathy

S.C.

, Enhanced evolutionary computing based artificial intelligence model for web-solutions software reusability estimation, Cluster Computing (2017), 1–17, DOI: https://doi.org/10.1007/s10586-017-1558-0, Springer.

27.

Abaei

Selamat

and Fujita

, An empirical study based on semi-supervised hybrid self-organizing map for software fault prediction, Knowledge Based Syst 74 (2015), 28–39.

28.

Erturk

and Sezer

E.A.

, A comparison of some soft computing methods for software fault prediction, Expert Syst Appl 42(4) (2015), 1872–1879.

29.

Goyal

Chandra

and Singh

, Suitability of KNN regression in the development of interaction based software fault prediction models, IERI Procedia 6 (2014), 15–21.

30.

Erica Cruz

and Ochimizu

, Towards logistic regression models for predicting fault prone code across software projects, in: 3rd International Symposium on Empirical Software Engineering and Measurement, ESEM, 2009, pp. 460–463.

31.

Burrows

Ferrari

F.C.

Lemos

O.A.

Garcia

and Taiani

, The impact of coupling on the fault-proneness of aspect-oriented programs: An empirical study, in: 2010 IEEE 21st International Symposium on Software Reliability Engineering (ISSRE), 2010, pp. 329–338.

32.

Aggarwal

Singh

Kaur

and Malhotra

, Empirical validation of object-oriented metrics for predicting fault proneness models, Softw Qual J 18(1) (2010).

33.

Zhou

and Leung

, On the ability of complexity metrics to predict fault-prone classes in object-oriented systems, J Syst Softw 83(4) (2010), 660–674.

34.

Fokaefs

Mikhaiel

Tsantalis

Stroulia

and Lau

, An empirical study on web service evolution, in: IEEE International Conference on Web Services (ICWS), 2011, pp. 49–56.

35.

Dong

and Wu

, Adaptive cascade deep convolution neural networks for face alignment, Computer Standards Interface 42 (2015), 105–112.

36.

Mishra

Shukla

K.K.

et al., Defect prediction for object oriented software using support vector based fuzzy classification model, Int J Computer Appl 60(15) (2012), 8–16.

37.

Padhy

Singh

R.P.

and Satapathy

S.C.

, Cost-effective and fault-resilient reusability prediction model by using adaptive genetic algorithm based neural network for web-of-service applications, Cluster Computing (2018), 1–23, DOI: https://doi.org/10.1007/s10586-018-2359-9, Springer.

38.

Grefenstette

J.J.

Moriarty

D.E.

and Schultz

A.C.

, Evolutionary algorithms for reinforcement learning, arXiv preprint arXiv: 1106.0221, 2011.

39.

Whiteson

, Evolutionary computation for reinforcement learning, in: Reinforcement Learning: State of the Art, Springer, Berlin, Germany, 2012, pp. 325–355.

40.

McGhan

C.L.R.

Nasir

and Atkins

, Human intent prediction using markov decision processes, in: Proc Infotech@Aerospace Conference, 2012.

41.

Jacob

S.M.

and Issac

, The mobile devices and its mobile learning usage analysis, ArXiv preprint arXiv: 14104375, (2014).

42.

Negahban

and Chung

C.-H.

, Discovering determinants of user’s perception of mobile device functionality fit, Computers in Human Behavior 35 (2014), 75–84.

43.

Yang

Y.R.

Wang

H.-C.

and Xin

Y.-H.

, Grey relational analysis model software quality assessment with triangular fuzzy information, International Journal of Knowledge-based and Intelligent Engineering Systems 21(2) (2017), 97–102, DOI: 10.3233/KES-170355.