Abstract
Factual scenario analysis of a judgement is critical to judges during sentencing. With the increasing number of legal cases, professionals typically endure heavy workloads on a daily basis. Although a few previous studies have applied information technology to legal cases, according to our research, no prior studies have predicted a pending judgement using legal documents. In this article, we introduce an innovative solution to predict relevant rulings. The proposed approach employs text mining methods to extract features from precedents and applies a text classifier to automatically classify judgements according to sentiment analysis. This approach can assist legal experts or litigants in predicting possible judgements. Experimental results from a judgement data set reveal that our approach is a satisfactory method for judgement classification.
Keywords
1. Introduction
With the rapid development of information dissemination and increasing socioeconomic changes, a variety of illegal activities occur on a daily basis. When people’s rights are infringed upon, they seek legal recourse, which has led to an increasing number of lawsuits. When a judge issues a judgement, he or she must decide impartially between the two parties. Limited resources present a challenge for judges to maintain an objective and unbiased stance when managing a large number of cases.
The Internet has become a new service tool for legal experts and lay people alike to search for or acquire information. Various automatic information systems have been developed for specific information requirements. E-discovery is an emerging area of legal service that involves the discovery of relevant electronic legal information [1]. Moreover, online legal database systems such as Beck-Online 1 and Westlaw International 2 provide legal material search services. Even though these systems can search for related information using query methods, they cannot aid legal experts in predicting associated rulings according to related evidence.
Traditionally, a judge issues judgements on the basis of his or her knowledge, experience, personality and emotions. Because the amount of litigation has increased, judges now face two problems. First, because processing detailed evidence from numerous cases is difficult, the judgements may become more subjective. Second, with an increasing workload, a judge may be overwhelmed and may be unable to deliver a quality judgement. These two problems can be remedied if a judge can be provided with references regarding possible judgements and relevant articles that may have been violated. This type of prediction may also help litigants take appropriate actions before going to court.
Text mining has been widely applied in various disciplines. Only a few studies, however, have examined text mining in the legal field [2–8]. Furthermore, several online forums for legal systems are available that provide news, insights and professional discussions (Above the Law, 3 Legal Informatics Blog, 4 and The Crime Report 5 ). Although these studies and forums can help professionals or lay people retrieve or classify legal documents or determine legal statutes, none of them were developed to assist judges in sentencing or litigants in predicting a possible ruling.
This study developed a framework to assist professionals in judgement prediction. Using a corpus-based semantic approach [9,10], we predicted the possible judgement category of a legal problem according to three features: relevant articles, sentiment analysis of criminal facts and the term of imprisonment. Within a precedent, the motivation and thought process of a criminal are described by several words or terms, some of which are sentimental or emotional. Examples of sentiment terms are ‘deliberately’, ‘cruel’ and ‘quite disgusted’. A unique characteristic is that the degree of sentiment of criminal facts has an influence on the judge’s sentence. The corpus, NTUSD (National Taiwan University Semantic Dictionary) (Chinese sentiment dictionary) [11], was utilised for determining sentiment terms in legal documents. This custom-designed prediction method can be divided into two processes: training and testing. In the training process, two outputs are produced for application in the testing process. The first output, an article classification model, is generated by employing a multilabel text classifier implemented by a support vector machine (SVM) algorithm to match legal cases to articles. For the second output, which is a judgement classification model, legal cases are classified into judgement categories. Cases are transformed into a set of vectors by combining the sentiment and punishment features and then trained by an SVM classifier. In the testing process, Phase 1 adopts the article classification model for classifying the target cases into the top k articles that are most relevant. In Phase 2, using all the top k articles, we applied the judgement classification model and considered the sentiment analysis of each crime’s motivation and the extent of the penalty. In addition, cross-validation (CV) was used in our evaluation. We applied a fourfold CV scheme to the data collection.
This study developed a framework that emphasises the crime sentiment scenario, infringement of articles and punishments incorporated within criminal judgements. On the basis of these features, the proposed method in this study can satisfactorily predict the judgements of legal cases. The contributions of this study are as follows:
This study is stimulated by the practical needs of legal experts. According to our research, this two-phase model approach to judgement prediction in the legal field has not been applied.
This study presents an innovative approach that can predict possible judgements for legal issues that concern the public. The essential part of this approach involves determining the degree of criminal behaviour intention according to legal documents.
The experimental results demonstrate that the proposed method operates efficiently and achieves high accuracy when it has been trained with a large number of cases.
The remainder of this article is organised as follows. The relevant literature is first reviewed and described in section 2. The proposed method for judgement prediction is presented in section 3. In section 4, the experimental results and evaluations are discussed. Finally, conclusions, limitations, and future research suggestions are presented in section 6.
2. Literature review
2.1. Background
In the legal system, a judgement is a formal decree issued by a court of law at the conclusion of a lawsuit. The court delivers the judgement, which is the country’s impartial administration of justice. Therefore, its importance is self-evident. A judge has a variety of options in rulings, such as providing an adequate remedy for the plaintiff in a civil matter or imposing a sentence upon a defendant found guilty in a criminal case. A precedent consists of several sections: the file number, the name of the accused, the name of the counsel, the cause of action, the main body of the court judgement, the facts and evidence, the cited articles, the date of judgement and the name of the judge. The facts and evidence portion describes the criminal motivation, process and purpose. Using the cited articles, the judge issues a fair and just judgement to the defendant.
Because a precedent is a content-rich court declaration, it contains valuable features that can be further utilised. In this study, a collection of criminal precedents was used for model training and testing. Three parts of a criminal precedent were pertinent: the facts and evidence, the cited articles and the judgement. The semantics of the words and representative terms were drawn from the facts of the precedent. In addition, the cited articles and the sentence can be used as judgement classification labels at different stages of the proposed approach. When training a classifier, the cited articles and the judgement are two crucial features that are incorporated into a classification model, which can predict the degree of punishment of the judgement.
2.2. Sentiment analysis approaches and applications
In many applications, sentiment analysis is used to determine the sentiments, emotions and attitudes of a user or text [12–15]. Sentiment analysis involves determining whether a given text is subjective (presents a positive or negative sentiment) or objective and can be treated as a two-class classification problem when classifying positive or negative sentiments [9,16–18]. In addition, some variations have been proposed, such as cross-lingual sentiment analysis [19] and the combination of the psychological and linguistic features [20]. Bohlouli et al. [21] presented a social media analysis platform using big data technology, and sentiment analysis is employed to discover knowledge from social media. Several studies have attempted to classify emotions, such as happiness, sadness, anger and joy, instead of sentiments [22–24]. Some studies have investigated emoticons in sentiment classification [25,26]. Machine learning approaches have been widely employed to cluster and classify texts or documents into positive or negative categories, which include naive Bayes, k-means, maximum entropy and SVM [9,27,28].
In general, two types of approaches have been proposed to ascertain the sentiment of words or phrases in terms of semantic orientation determination: corpus-based and dictionary-based [29]. Corpus-based techniques exploit the interword co-occurrence relationships in large corpora to determine their sentiments. Several studies have investigated sentiment determination. For example, Baroni and Vegnaduzzo [30] used a seed list of known subjective adjectives to calculate the subjectivity score of target adjectives by computing their mutual information using the pointwise mutual information (PMI) technique [31]. Wilson et al. [32] used conjunctions, local negations (i.e. the presence of a negative word before a polar expression) and dependency trees to classify the contextual sentiment of given expressions, which helped to significantly improve the baseline for phrase-level sentiment classification tasks.
Dictionary-based techniques use dictionaries (e.g. WordNet) to compile sentiment terms by listing synonyms and antonyms for each word [10]. Kamps et al. [33] presented a distance-based WordNet approach to determine the semantic orientation of a given adjective according to the distance of a given word from two selected reference term sets (i.e. good and bad). Esuli and Sebastiani [34] developed an automatic subjectivity lexicon resource, SentiWordNet, based on WordNet. SentiWordNet assigns three numerical scores (Obj(s), Pos(s) and Neg(s)) to each synset of WordNet that describes how objective, positive and negative the terms in the synset are. In recent years, several sentiment classification studies have adopted SentiWordNet as their lexicon [35–38].
Because this study is domain specific and the data set consists of Chinese words, we employed a corpus-based approach to determine the sentiments of terms and computed their mutual information using the Chinese sentiment dictionary, NTUSD [11].
2.3. Text mining applications in the legal field
As the number of digital legal documents continues to increase rapidly, the enormous amount of content often overwhelms people and interferes with legal information management. The ability to satisfy information needs, by efforts such as analysing precedents, requires considerable effort and expertise because of the domain knowledge and variety of terminologies. Therefore, to reduce costs, a comprehensive framework and methodology should be designed for finding and retrieving associated legal information that meets user needs.
The application of information technology in the legal domain is a recent practice. Until now, little work has been conducted on this subject. An opinion mining application regarding legal web blogs was evaluated using sentiment analysis [2]. Chen and Chi [3] aimed to retrieve the most similar historical judgements for a prosecutor using police criminal investigation documents. Wyner et al. [4] provided an overview of recent approaches in text mining to automatically profile and extract arguments from legal cases. Chou and Hsing [5] presented a legal document classification, clustering and search methodology based on neural network technology to help law enforcement manage written judgements of criminals.
Similarly, Joshi et al. [1] presented a methodology for improving legal service capabilities by applying classification techniques to retrieve relevant legal information. Whereas most studies have been concerned with supporting professionals in retrieving or managing related legal documents, Chen et al. [6] presented an approach that the public could use to retrieve relevant judgements using everyday terms as queries. Moreover, they presented another approach that could predict relevant statutes regarding legal problems [8]. Truyens and Eecke [7] discussed the legal issues encountered during the assembly of texts into so-called ‘corpora’, as well as the application of such corpora.
Previous studies have been designed to support users in retrieving or classifying legal documents. This study differs from previous research in that our major focus is to propose a methodology for providing pertinent judgement predictions regarding legal problems in order to assist professionals.
3. Research design
According to corpus delicti, judgements issued by a judge are based on the articles violated and their severities. Because the judgement typically depends on the articles and the judge’s perspective, court decisions are not based on any absolute standard. Accordingly, designing automatic methods that can determine possible rulings for a legal case is challenging. For design purposes, however, three unique features in a judgement suggest the need to develop a custom-designed approach:
Law violation. In a guilty verdict, the judgement incorporates at least one cited article. An article is regarded as a label in a judgement prediction classification model. In other words, each precedent is annotated with multiple labels.
Punishment. An article is a paragraph of text that contains the terms of imprisonment, which is a time range whose length depends on the severity of the offence. The judge’s sentence is declared in the judgement.
Factual scenario. Within the facts of the crime, the circumstances of the crime can be described in emotional or sentimental words. In general, the more negative the orientation is, the more criminal the motivation or behaviour is.
To predict judgements accurately, we designed a two-phase classification method based on these three traits. The design architecture (i.e. training and testing procedure) is illustrated in Figures 1 and 2. The data collection includes historical criminal judgements, most of which are used for training; the rest are used for testing in automatic prediction. In this framework, Phase 1 determines the top k applicable articles to the test judgements, whereas Phase 2 ascertains the judgement class label from both relevant articles found in Phase 1 and the calculation of the test judgements’ sentiment orientations.

Conceptual training diagram of judgement prediction.

Conceptual testing diagram of judgement prediction.
As shown in Figure 1, four distinguishing features compose the judgement training vector: judgement category, sentiment score, punishment period (i.e. average period, total period) and a set of cited articles A1 … Ar:
Judgement category. A judge’s decision determines the guilt or innocence of the accused. Consequently, the sentence is recorded as the main judgement (i.e. principal sentence) in the written judgement. Because of its importance, we adopted it to examine its effect on prediction.
Sentiment score. Because a judgement consists of multiple sentiment terms, the degree of criminal intention can be determined using document sentiment detection. Hence, the judgement’s sentiment score is crucial when training a model.
Punishment period. In accordance with the criminal code in Taiwan, offences are sentenced according to the penalty provisions of the articles. Because imprisonment terms cannot be easily separated from the judgement, we set the imprisonment term for each offence equal to the average period of punishment in the article. When several terms of imprisonment were declared for several offences, we incorporated two punishment features in our model: (1) average period – the average imprisonment term for several punishments and (2) total period – the sum of the maximal imprisonment term of each punishment. For example, assume that a judgement cited three articles: A1, A2 and A3. Table 1 shows the imprisonment range and average penalty period of these three cited articles. The sentences of these three offences are 9 months, 2 years and 6 months, respectively. However, the value of total period is 5 years (i.e. 1 year + 3 years + 1 year), whereas the value of average period is 1 year 7.5 months.
Cited articles. The judge bases the judgement on the cited articles. To simplify the article vector (i.e. A1, A2, …, Ar), a method [39] was employed to represent article terms in binary forms, where the article is either present or not in a given judgement.
Example of imprisonment terms of three cited articles
Before training the judgement classification model, we first manually labelled each judgement in the training data set. Depending on the severity of the crime, criminals are sentenced to imprisonment by the judge for a certain length of time or to death. The term of imprisonment can be a few days, several months, many years or life imprisonment. On the basis of suggestions from legal professionals, judgements can be annotated into n classes according to the punishment terms. For example, with a 5-class experiment, we can label each judgement as Class 1 (less than 6 months), Class 2 (not less than 6 months but not more than 1 year), Class 3 (not less than 1 year but not more than 10 years), Class 4 (life imprisonment) or Class 5 (death).
Therefore, two classification models for two respective phases must be developed to determine an input judgement’s category. To generate the article classification model, text preprocessing should first be performed on the data set before judgement classification. After the legal terms and sentiment terms are generated, feature selection is used to discard the less discriminative terms. The training judgements are then represented by vector space documents (VSDs) according to the term frequency–inverse document frequency (TF-IDF) scheme. The classifier programme inspects all judgement vectors and constructs the article classification model in Phase 1. The judgement classification model in Phase 2 is generated by aggregating the sentiment terms, articles and penalty terms.
3.1. Preprocessing
Preprocessing consists of three steps: term extraction, stop-word removal, Part-Of-Speech (POS) filtering operations and sentiment term identification. We applied Chinese knowledge and information processing (CKIP) [40] for term extraction. Because no standard Chinese stop-word list is available, this study developed a self-defined Chinese stop-word list for processing. Most previous studies have traditionally selected nouns to represent documents. In criminal case documents, however, other POS tags are also meaningful, such as adjectives (共同, together), nouns (罪犯, criminal) and verbs (破壞, damage). Therefore, we revised the traditional POS filtering rule to retain nouns, verbs and adjectives as candidate terms. Sentiment term detection was subsequently conducted. To represent the overall sentiment of a document, a group of sentiment features was first determined. Our study employed a Chinese sentiment dictionary, NTUSD [11], which consists of 9365 positive and 11,230 negative opinion words.
3.2. Feature selection
Numerous features can be generated from a document set, but not all features are useful for classification. Less discriminative features should be removed before classification. Recent studies have suggested that without feature selection, a classifier does not effectively utilise a few excellent features for text categorisation [41–43]. In the study by Liu et al. [8], precedents were treated as multilabel documents. An article is regarded as a label in the concept of classification. Because each precedent has at least one cited article, precedents have multiple labels. Although many feature selection methods have been proposed, they cannot be applied directly to multilabel data [44–47]. Therefore, we adopted a multilabel entropy feature selection method [48] to select candidate terms from articles
Here, ti denotes a term across legal documents, p(Aj|ti) denotes the relative frequency of article Aj given that term ti belongs to Aj and q(Aj|ti) = 1 − p(Aj|ti) denotes the relative frequency of article Aj given that term ti does not belong to Aj. If the entropy value of a given term ti is low, then the term ti is focused on a few distinct articles, rather than being widely distributed over articles.
3.3. Vector space representation
After feature selection, the VSD model is applied to generate representative vectors from the document set. Each document is denoted by a term set {t1, t2 … tn}, where ti is a term and n is the number of terms that occurs in the document. The relevance of each ti in document dj is calculated using the TF-IDF formula (w(ti,dj) = tf(ti,j) × log(m/mi)). In this equation, tf(ti,j) is the normalised frequency of term ti in document dj, m denotes the total number of documents and mi denotes the number of documents containing term ti in the document collection.
3.4. Sentiment score calculation
Before transforming precedents into vectors, the sentiment scores of the sentiment must be determined. To obtain the sentiment orientation of a precedent, the sentiment similarity between each sentiment feature is calculated. In this study, the adjectives and adverbs of a document were selected. We adopted the PMI technique to calculate the strength of the semantic association between terms [31,49]. The PMI formula of terms ti and tj is as follows
where p(ti,tj) is the probability that terms ti and tj co-occur.
Based on PMI, the sentiment score of each term ti is defined as follows
Sp denotes a group of positive terms and Sn denotes a group of negative terms. To determine the overall sentiment of the precedent, the following formula, called the precedent sentiment score (PSS), is designed to calculate the sentiment score of precedent D
Using equation (4), the PSS judgement values are quantified. These PSS values can then be utilised by the precedent vectors. If the computed PSS value of a given document D is high, it indicates that the suspect had a strong intention to commit the crime.
3.5. Judgement prediction approach
As illustrated in Figure 2, the proposed method contains two stages: relevant article retrieval through multilabel classification and judgement category forecast based on sentiment analysis using the classifier. SVM was chosen as the classification algorithm because of its strong performance in many previous text classification studies [50–52].
In the interest of brevity, the training process is not described in depth. The details of the proposed testing process are explained in detail, but whenever required, we explain the training process. The two-phase testing procedure is as follows.
3.5.1. Phase 1: predict associated articles
Figure 3 depicts relevant article prediction. First, text preprocessing is performed on the test judgements. The VSD model is then used to represent documents. Because each precedent is annotated with multiple labels (i.e. related articles) according to the article classification model, a multilabel classifier generates predictions, leading to an estimated ranking of all articles for an input judgement. Accordingly, for the test judgement, the top k–related articles are the output.

Relevant article prediction procedure.
3.5.2. Phase 2: choosing the most suitable judgement category
The judgement category prediction procedure can be divided into three steps, as shown in Figure 4. First, the text preprocessing procedure segments the test judgement into multiple terms. The sentiment terms are specifically retained for determining the overall sentiment score of the test case. The test judgement is then transformed into a vector for forecasting. The judgement vector contains two parts: the sentiment score and the article vector. Because each test judgement references the top k articles from Phase 1, it can be represented as a vector through an article term set. A method [39] is employed to represent terms in binary format, depending on whether the term is present or not. Finally, with regard to judgement label determination, both the test judgement vector and the judgement classification model are employed to predict a possible category for the test judgement.

Judgement category prediction procedure.
4. Experiment and evaluation
4.1. Data collection
In this study, we investigated the Chinese criminal precedents in the Law and Regulations Retrieving System (Judicial Yuan 6 ). The facts and cited articles were used for analysis of each precedent. The facts detailed the crime and evidence, whereas the cited articles were referenced during sentencing. Both fields are vital for assessing the effectiveness of our proposed approach. Details concerning the experimental data are shown in Table 2. The data set consisted of the 12 most common types of crime in 2015, as reported by the Judicial Yuan. A total of 1208 different criminal precedents were gathered manually over approximately 2 months. To evaluate the performance of the designed method, we used approximately one-sixth of the data collection as test samples.
Description of data collection
Several text preprocessing tasks were then executed, including word segmentation, stop-word elimination, POS filtering and sentiment term identification. The results from these tasks were then treated as the input data for the following steps.
4.2. Experimental procedure
Experiments were conducted to evaluate the performance of our research approach. First, preprocessing was implemented with the data collection. After stop-word removal and POS filtering, 6922 terms were retained for additional operations. The weights of these extracted terms were determined using TF-IDF weighting schemes. All extracted feature terms were then sieved using the multilabel entropy method [48] during feature selection. After feature selection, 411 features were selected to represent every document in the data collection. Prior to training, judgement categories were defined. According to the advice of legal experts, we labelled judgements as six different classes in an ordinal sequence. Table 3 illustrates the defined judgement categories, their corresponding imprisonment terms and data allocation.
Description of judgement category
In the following step, all document vectors were analysed in LIBSVM [53], an open-source multilabel SVM package, to train the SVM classifier to produce the classification model. This trained classification model was then utilised to execute the SVM learning and prediction processes. To evaluate the performance of the proposed algorithm, we performed an X-fold CV for the data set. For small data sets, test results may be biased if X is set too high. Therefore, we set X = 4 for the data set.
During the preparation of the classification model, determining the optimal k value of Phase 1 is critical. As illustrated in Table 2, in Phase 1, we used 121 test cases. Because the first phase was designed to recommend articles, the threshold of output articles should be limited to different values for evaluation in the following phase. In general, the smaller the threshold value is, the better its performance is. To determine how k affects the performance of our proposed approach, testing various k settings was necessary. In this experiment, we set k to 3, 5, 7 and 9.
4.3. Evaluation metric
Standard measures in information retrieval performance evaluation are precision and recall; however, in this study, a revised concept of precision, called partial hit, was adopted to evaluate the experimental results. For each judgement, the imprisonment terms were categorised into several ordered intervals (i.e. n-class) and ordinal judgement labels. It is possible that two predictions may both fail to select the actual judgement label; they may exhibit different degrees of error. Therefore, this study employed a partial hit rate to evaluate the relative performance of various predictions. The partial hit rate for test cases can be defined as follows
Maxcategory denotes the maximal category number, Predictcategory denotes the category of the test case predicted by our proposed method, and Actualcategory denotes the actual category of the test case. For example, assume that we have a three-class label and five test cases. The total number of categories is three. Also assume that the predicted categories of the five test cases are 3, 3, 3, 2 and 1, whereas the actual categories are 2, 3, 1, 1 and 1, respectively. From equation (5), the partial hit rates of the five test cases are 0.67, 1, 0.33, 0.67 and 1, respectively. The overall partial hit rate is 0.734.
4.4. Experimental results and evaluation
To evaluate the performance of our proposed approach, we developed three types of algorithms for comparison. To evaluate the effect of sentiment, the first comparison method (denoted by ‘comparison method without sentiment’) was created, which excludes the sentiment score attribute during prediction. To assess the effect on the punishment period features, the second comparison method (denoted by ‘comparison method without punishment’) was developed, which did not combine the punishment features when predicting. The third comparison method (denoted by ‘comparison method without sentiment and punishment’) was adopted to assess our proposed method without combining the two attributes of sentiment score and punishment period. Table 4 summarises our research method and the three comparison methods regarding the 12 types of crime in our data collection. Compared with each algorithm, we found that our research method with k = 7 achieved the best performance, with a partial hit rate of 80.62%. For various k values, our proposed research approach clearly outperformed the other algorithms that lacked the sentiment attribute or the punishment attribute. The partial hit rate of the third comparison method was 61.12%, which was the poorest performance. The results indicate that our research method was more successful than all the comparison methods were.
Evaluation of our research method and three comparison methods (partial hit %)
We evaluated the performance of our research method by comparing the results using three retrieval functions: cosine similarity, Pearson correlation coefficient and Spearman’s rank correlation coefficient. Table 5 summarises the results of our method and the three functions; the partial hit values were 80.62, 58.69, 58.16 and 54.98. The performance level of our proposed approach significantly surpassed the performance levels of the other three methods. Therefore, our proposed approach provides an innovative method for judgement prediction.
Evaluation of our research method and three retrieval functions
Because the judgement category is a type of ordered label, a predicted category may differ by up to three levels from the actual category. Therefore, we also conducted a class difference analysis to identify the distribution of level differences for test judgements. Level differences represent the difference between a predicted judgement category and the actual judgement category. Table 6 illustrates the four types of level differences: zero-level, one-level, two-level and three-level. The value of each entry in Table 6 indicates the percentage of cases with this particular level difference for a given k value. For example, 33.08% of cases have a zero-level difference when k = 7. When comparing different k values at zero-level and one-level, the best results were 33.08% and 34.73%, respectively, when k = 7. In addition, our method displayed a more favourable result than the other comparison methods did at zero-level for k = 7, whereas the comparison method that does not combine the sentiment attribute displayed a more favourable result at one-level for k = 7. We observed that the performance levels at zero-level and one-level determine the success of the partial hits. However, the performance at a three-level difference was negatively correlated to partial hit performance.
Outcome of class difference analysis
The values are shown in %.
5. Discussion
5.1. Findings
The aim of the experiments was to evaluate the performance of the proposed methods. The indispensable part of this approach involves determining the degree of criminal behaviour intention according to legal documents. According to Table 5, the proposed method retrieved the most accurate results compared with the other three methods. As can be seen from Table 4, by testing different number of articles generated from the proposed method, the retrieval results are more accurate when the number of predicated articles has reached a fixed value. This result indicates that considering both the sentimental and punishment features can improve the performance of retrieving the results.
Nowadays, several predictive models have been presented by prior studies to solve business problems that are in practice. These business’ predictive approaches provided associated supports contribute to decision-making. Apart from business issues, due to lots of legal disputes, there is an increasing need for more effective and smarter legal systems or databases that can help the legal experts acquire related judgement information in dealing with legal decisions. Given the demand for the judgement prediction, the results imply that our proposed method provides a mediator to bridge the gap (i.e. lack of legal knowledge) between litigants and possible sentences or to provide reliable evidences to legal experts making court decisions. The mediator was designed by incorporating several distinguished features extracting from legal documents. Additionally, the sentiment analysis was adopted to reveal the role of emotional plays – a vital factor in prediction. The experimental results suggest the ability to build judgement equity in the legal field using automated techniques to fulfil the justice system.
In general, our findings imply that the aim of our designed approach is to provide a third form of service regarding legal disputes in the legal service sector. Judges or litigants using the supportive tools may have an advantage in increasing judicial fairness and justice. In the long run, this study is deserved to be extended in many different areas of law, such as civil code and intellectual property law. Additionally, in the future, this work can be implemented as a business software when its functions become robust and mature.
5.2. Limitations
The limitations of this study should be acknowledged as follows. One limitation of our work is that the external validity is somewhat restricted. Because different laws have specific characteristics, our study is confined to criminal codes and cannot be directly applied to different areas of law. The second limitation is that no criminal precedent public data sets could be applied in the experiment. This may restrict the validity of the experiment. Therefore, if public data sets were provided as benchmarks, the validity of the experiment could be enhanced. Third, a limited amount of data was used in this study. The results were derived from the 12 most common types of crime in Taiwan in 2015, which may restrict the generalisation of the results. However, according to our experimental results, the proposed method still achieves excellent performance. Finally, because legal experts have their own perceptions of appropriate sentencing, the judgements of different experts regarding the same case may be inconsistent. Confronted with privacy issues, we experienced difficulty in collecting relevant information on the subjective stances of relevant experts. In the long term, if more information is available, we may include further user properties as attributes to increase the accuracy of this research.
6. Conclusion and future research
Judging social disputes has become a vital issue for legal experts. It is crucial for a judge to issue a fair judgement. Because of a lack of assistive tools, the increasing number of legal cases has resulted in heavy judicial workloads. To overcome this difficulty, this article presented an advanced method that employs a classifier and utilises sentiment analysis to predict judgements automatically from historical criminal precedents. We incorporated 203 judgements from our data set as test cases and evaluated the effectiveness with various k values using our custom-designed, two-phase algorithm. The evaluation outcomes revealed that our proposed method could accurately predict a judgement category with a partial hit rate of 80.62% for the top 7 articles and 77.23% for the top 9 articles. Notably, the empirical results demonstrated that our research method achieved more favourable results than did the other comparison approaches. To the best of our knowledge, this study introduces an entirely new field of research.
The limitations of this study should be acknowledged as follows. One limitation of our work is that the external validity is somewhat restricted. Because different laws have specific characteristics, our study is confined to criminal codes and cannot be directly applied to different areas of law. The second limitation is that no criminal precedent public data sets could be applied in the experiment. This may restrict the validity of the experiment. Therefore, if public data sets were provided as benchmarks, the validity of the experiment could be enhanced. Third, a limited amount of data was used in this study. The results were derived from the 12 most common types of crime in Taiwan in 2015, which may restrict the generalisation of the results. However, according to our experimental results, the proposed method still achieves excellent performance. Finally, because legal experts have their own perceptions of appropriate sentencing, the judgements of different experts regarding the same case may be inconsistent. Confronted with privacy issues, we experienced difficulty in collecting relevant information on the subjective stances of relevant experts. In the long term, if more information is available, we may include further user properties as attributes to increase the accuracy of this research.
Numerous other research possibilities are available for future studies. For example, in this study, the results could first be improved using mining techniques or models to enhance the accuracy of the target judgement prediction. Second, because the sentiment corpus lacks relevant legal words such as voluntary confession and contains a limited amount of data, large data sets and sentiment vocabularies should be collected to improve the performance of the proposed method. We also plan to investigate the effectiveness of other advanced information retrieval models, such as using Binary Independence Model, BM25 and Latent Dirichlet Allocation (LDA)-based models. Moreover, a potential issue involves the development of judgement prediction methods for specific fields of law. Furthermore, because similar factual scenarios are present in many different areas of law, the mediation between factual specifics and a particular area of the law is a notable future research topic. Finally, in our study, we adopted time spans (i.e. imprisonment term) as classification labels. As recent relevant studies have suggested, regression algorithms (in terms of time spans) could be used for prediction. Our problem, the prediction of judgement, could also be solved using a continuous variable prediction approach.
Footnotes
Acknowledgements
The authors would like to thank the Editor-in-Chief and the anonymous referees for their helps and valuable comments to improve this paper.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
This research was supported by the Open Research Funds of Shantou University (grant no. KFJJ201703).
