Performance comparison: Cloud-based vs. open-source NER for English and Polish

Abstract

Named Entity Recognition (NER) plays a vital role in Natural Language Processing (NLP) tasks, extracting valuable information from textual data. This study addresses a gap in NER research by comparing the effectiveness of cloud-based NER tools (Azure NER and Google Cloud NER) and a popular open-source tool (SpaCy) for recognizing named entities in both English and Polish text. Text data is imported into a PostgreSQL database and processed by each NER tool. The extracted entities and their labels are stored in a dedicated SQL Entity table, enabling performance evaluation across different languages and entity types. This research contributes to the field of NLP by investigating the suitability of cloud-based NER tools for multilingual tasks, particularly those involving Polish text, which presents unique linguistic challenges. By analyzing the performance of these NER approaches, the study provides valuable insights for selecting the most effective NER technique for specific NLP applications, especially when dealing with multilingual content.

Keywords

Named entity recognition (NER)Azure Google Cloud SpaCy English vs. Polish

1. Introduction

In the era of big data, the ability to extract valuable information from textual content is crucial for diverse applications in Natural Language Processing (NLP). Named Entity Recognition (NER) plays a vital role in this endeavor, automatically identifying and classifying named entities such as locations, organizations, and people within text data. The process of extracting stop words is crucial for enhancing the performance of various NLP tasks, including sentiment analysis,^1,2 topic modeling,^3,4 and text summarization.^5,6 Stop word lists generated from this process can significantly improve the accuracy and efficiency of these tasks.

One key application of NER lies in information retrieval systems. By identifying named entities within a document, NER facilitates more precise searches.^7,8 For instance, searching for news articles related to a specific historical figure. NER can pinpoint the relevant entities (e.g., “Nicolaus Copernicus”) within the text, enabling the search engine to return more accurate and relevant results compared to keyword-based searches alone.

Furthermore, NER plays a crucial role in building intelligent question answering systems.^9,10 When a user asks a question like “What is the capital of Poland?”, NER can identify the entity “Poland” as a location. This information can then be used by the system to retrieve the correct answer “Warsaw” from a knowledge base.

Finally, accurate NER is essential for high-quality machine translation. By identifying named entities in the source text (e.g., organizations, locations, people), NER allows the translation system to handle them appropriately in the target language. This ensures a more accurate and natural-sounding translation by preserving the intended meaning of the named entities. These examples showcase the multifaceted role of NER in empowering various NLP applications to extract valuable information from text data, ultimately leading to improved performance and functionality.

Several studies have shown that existing NER tools primarily focus on English language data (e.g. Won et al.,¹¹ Ehrmann et al.,¹² Jehangir et al.¹³ and Krechowicz¹⁴). However, languages like Polish, with their inherent complexities, can pose challenges for these tools. This study aims to bridge this gap by examining the effectiveness of NER tools across both English and Polish text.

Dadas (2018) also achieved high performance in Polish NER using a deep learning model with knowledge-based features from Wikipedia.¹⁵ Their model surpassed others in PolEval 2018 (22.4% error reduction). The evaluation combined exact match (0.2 weight) and overlapping entity scores (0.8 weight). Notably, incorporating Wikipedia and lexicons further improved their model’s performance (scores up to 89.6). This suggests knowledge-based techniques can be highly effective, especially when enriched with external resources.

Choosing the most effective NER tool depends on the task and data at hand. Schmitt et al.¹⁶ compared popular open-source options like StanfordNLP, SpaCy, and others.¹⁶ Their evaluation, using the CoNLL 2003 and GMB corpora, revealed that StanfordNLP achieved the highest overall F1-score (81.05) on CoNLL 2003, potentially due to its default classifier being partially trained on that data. However, the performance gap narrowed significantly on the GMB corpus (StanfordNLP F1-score: 70.88), with SpaCy even showing improvement (F1-score: 59.66) compared to CoNLL 2003 (F1-score: 54.33). This underlines the importance of considering both the chosen tool’s training data and its suitability for your specific corpus to achieve optimal NER performance.

A study by Roy¹⁷ highlighted the shift towards deep learning architectures in NER due to the availability of large amounts of computer-readable textual data and hardware that can process the data.¹⁷ The study reviewed significant learning methods employed for NER and how they evolved from linear learning methods.

Regarding the Polish language, a study by Telenyk et al.¹⁸ evaluated the coherence of Polish texts using neural network models.¹⁸ Although not specifically focused on NER, their work demonstrates the applicability of advanced machine learning techniques to the Polish language, which could potentially extend to NER tasks.

Another comprehensive study by Pakhale¹⁹ offered an exhaustive exploration into the evolving landscape of NER methodologies, blending foundational principles with contemporary AI advancements.¹⁹ The study spanned a spectrum of techniques from traditional rule-based strategies to the contemporary marvels of transformer architectures, particularly highlighting integrations such as BERT with LSTM and CNN. The narrative accentuated domain-specific NER models, tailored for intricate areas like finance, legal, and healthcare, emphasizing their specialized adaptability.

In terms of specific tools, a study by Otto et al.²⁰ introduced a novel task, corpus, and baseline for Scholarly Entity Extraction focused on Machine Learning Models and Datasets.²⁰ Their work underscores the crucial role of NER models in various NLP tasks, including information extraction (IE) and text understanding.

Research by Tamla et al.²¹ introduces a novel cloud-based Information Extraction system (CIE) designed for Named Entity Recognition (NER) in the medical domain. Their system offers a comprehensive solution, managing cloud resources and providing functionalities for both ML practitioners and medical experts to develop and train their own NER models. Evaluation results demonstrate promising performance, with their model achieving F1-scores of 0.853 and 0.842 on Azure and AWS platforms, respectively, surpassing their baseline model (GERNERMED). This research highlights the potential of cloud-based NER systems to enhance performance and accessibility of NER tasks within the medical field.

Wang et al.²² propose a novel approach to multilingual and multimodal named entity recognition (2M-NER) with their 2M-NER model.²² Recognizing the limitations of existing datasets, they introduce a large-scale 2M-NER dataset encompassing four languages and text-image combinations. The 2M-NER model tackles this challenge by aligning text and image representations through contrastive learning and incorporating a multimodal collaboration module to capture the interactions between the two modalities. Their evaluations demonstrate the effectiveness of this approach, with the 2M-NER model achieving the highest F1-scores on the 2M-NER tasks for all four languages (English: 69.99, French: 63.13, Spanish: 71.21, German: 62.16) compared to benchmark models. These findings highlight the potential of combining multilingual and multimodal information to improve NER performance.

The field of NER is rapidly evolving, with new models and techniques continually being developed and refined. These advancements are not only enhancing the performance of NER tasks but also expanding their applicability across various domains and languages. However, it’s important to note that the choice of the most effective NER tool depends on the specific task and data at hand.

A review of Named Entity Recognition (NER) systems revealed a research gap: the lack of focus on cloud platforms and the Polish language. This gap motivated a comparative analysis in this area. Publicly available open-source algorithms and tools dominate current NLP research. Therefore, this study will compare the performance of cloud platforms (Azure and Google Cloud) with a popular open-source tool, SpaCy, for both English and Polish NER tasks.

Several factors might explain the limited research on Polish NER within cloud platforms. The dominance of English-language research in NLP is a significant factor. Many cloud-based NER services and benchmark datasets primarily cater to English text. This can lead to a bias towards English-centric solutions.

Furthermore, the inherent complexity of the Polish language itself presents a challenge. Polish boasts a rich inflectional morphology, complex syntax, and a unique alphabet with additional characters. These characteristics necessitate specialized approaches that might not be readily available in existing cloud-based NER solutions designed primarily for simpler languages.

By comparing the performance of these NER approaches across different languages and entity types, this research seeks to provide valuable insights into their strengths and weaknesses. This information can aid in selecting the most suitable NER technique for specific NLP applications, particularly when dealing with multilingual content.

2. Research methodology

This research investigates the effectiveness of Named Entity Recognition (NER) within cloud platforms compared to open-source tools. The study hypothesizes that cloud-based NER systems, such as Azure Text Analytics and Google Cloud Natural Language API, will achieve superior performance due to their potential access to more extensive and diverse language corpora for training. These corpora are critical for NER models to learn the intricacies of language and identify named entities accurately. Open-source tools, while readily available and often customizable (e.g., SpaCy), might be limited by the size and scope of the training data they utilize.

Furthermore, this research explores the potential influence of language complexity on NER performance. The hypothesis is that the English language will be more effectively recognized compared to Polish. This can be attributed to several factors: English, as a more widely used language, benefits from a larger pool of training data, leading to more robust and mature NER models. Additionally, English possesses a relatively simpler grammatical structure compared to Polish. Polish boasts a rich inflectional morphology, complex syntax, and a unique alphabet with additional characters. These characteristics can pose challenges for NER models, particularly those not specifically trained on the intricacies of the Polish language.

To investigate these hypotheses, a comparative analysis will be conducted. This analysis will evaluate the performance of NER systems offered by leading cloud platforms (e.g., Azure Text Analytics, Google Cloud Natural Language API) alongside a popular open-source tool (e.g., SpaCy with pre-trained Polish models) for both English and Polish text. Text data for evaluation will be sourced from publicly available datasets. To ensure consistency and focus on relevant entities, the analysis will concentrate on three commonly targeted named entity types in NER research: “location”, “organization”, and “person”.^16,22 This multifaceted approach will allow for an assessment of the impact of cloud-based resources and language complexity on NER accuracy specifically for these key entity types.

By comparing performance across these platforms and languages, this study aims to contribute valuable insights to the field of NLP. The findings can inform researchers and practitioners on the strengths and limitations of cloud-based NER tools, particularly for languages like Polish that might require specialized approaches.

2.1. Datasets

One of the significant challenges encountered during this research was the limited availability of suitable datasets, especially for the Polish language. This scarcity was most pronounced for datasets containing organizations, evident in both languages, and locations, particularly for Polish due to the lack of publicly available annotated location data. The lack of diverse datasets containing these entities necessitated additional effort in data preparation.

To address this challenge, the chosen approach went beyond utilizing solely ready-made linguistic corpora. In addition to existing datasets, articles freely available on the Internet were also included. This approach ensured a broader and more comprehensive representation of contemporary Polish language use, considering the variety of sources and the overall size of the data. Data cleaning steps were applied to the internet articles to remove noise and maintain consistency with the existing datasets.

Data for the Polish language are summarized in Table 1, detailing the sources and characteristics of the corpus used in the research. Table 1 also includes the size (number of characters) of the datasets for both languages. The total size of texts was: 252504 characters for the English dataset, and 235430 characters for the Polish dataset.

Table 1.
Polish and English corpora.

Language Corpus name Number of chars Source

en Named Entity Recognition (NER) Corpus 141502 Named Entity Recognition (NER) Corpus²³

en Largest companies in the world for 2024 111002 Top 10 Largest & Most Valuable Companies in the World / Axi EU²⁴

pl The NKJP-NER is based on a 35873 Przepiorkowski²⁵

human-annotated part of NKJP

pl 50 most beautiful places in Poland 64481 50 najpiekniejszych miejsc w Polsce, 0200²⁶

pl 21st edition of the Economic Award of the 57704 Final XXI edycji Nagrody Gospodarczej Prezydenta RP²⁷

President of Poland

pl List of 500. Awards given to Polish leaders 77372 Lista 500²⁸

Language	Corpus name	Number of chars	Source
en	Named Entity Recognition (NER) Corpus	141502	Named Entity Recognition (NER) Corpus²³
en	Largest companies in the world for 2024	111002	Top 10 Largest & Most Valuable Companies in the World / Axi EU²⁴
pl	The NKJP-NER is based on a	35873	Przepiorkowski²⁵
	human-annotated part of NKJP
pl	50 most beautiful places in Poland	64481	50 najpiekniejszych miejsc w Polsce, 0200²⁶
pl	21st edition of the Economic Award of the	57704	Final XXI edycji Nagrody Gospodarczej Prezydenta RP²⁷
	President of Poland
pl	List of 500. Awards given to Polish leaders	77372	Lista 500²⁸

Table 2 details the number of named entities identified and manually labeled within the Polish and English corpora used for this research. The labels correspond to the three targeted entity types: “location”, “organization”, and “person”.

Table 2.

Number of named entities by type in Polish and English corpora.

Language	Named entity type	Number of entities
en	location	188
en	organization	227
en	person	207
pl	location	355
pl	organization	163
pl	person	261

2.2. Implementation

This research investigates the performance of three popular named entity recognition (NER) tools: Azure NER (ANER), Google Cloud NER (GNER), and SpaCy NER (SNER). This study evaluates their effectiveness on both English and Polish text, analyzing their capability to recognize three entity types: “location”, “organization”, and “person”.

The research leverages tools from two cloud service providers and an open-source framework. Azure AI Language²⁹ service provides pre-built models for various NLP tasks. The recognize entities function is used to identify entities within the provided content.

Google Cloud Natural Language API³⁰ offers a comprehensive suite of NLP functionalities. The analyze entities function is used to detect named entities in the text data.

SpaCy³¹ is the open-source library that provides pre-trained statistical models for various NLP tasks. A SpaCy model is loaded based on the document language to recognize named entities in text. The study used the en_core_web_lg model³² for English and pl_core_news_lg³³ for Polish.

Due to the inherent variation in naming conventions across the tools, all entity labels were converted to a lowercase format and mapping was performed to establish a base label for each entity type, as presented in Table 3.

Table 3.
Mapping entity labels to base labels for English and Polish in different NER tools.

Language Base label ANER label GNER label SNER label

en location location location gpe, loc, norp

en organization organization organization org

en person persontype person person

pl location location location geogname, placename

pl organization organization organization orgname

pl person persontype person persname

Language	Base label	ANER label	GNER label	SNER label
en	location	location	location	gpe, loc, norp
en	organization	organization	organization	org
en	person	persontype	person	person
pl	location	location	location	geogname, placename
pl	organization	organization	organization	orgname
pl	person	persontype	person	persname

To assess the performance of the NER tools, this research employed standard metrics used in NER evaluation: Precision, Recall, and F1-score.^16,21 Precision measures the proportion of identified entities that are actually correct, while Recall reflects the proportion of actual entities that are correctly identified. F1-score provides a harmonic mean that balances these two measures.

After importing data from the sources presented in Table 1, any necessary cleaning or pre-processing steps were performed on the text data. This might include removing punctuation, converting to lowercase, or handling special characters to ensure consistency for the NER models. The pre-processed text data was then imported into the PostgreSQL database for further analysis.

To perform a comparative assessment of NER accuracy for Azure Named Entity Recognition (ANER), Google Cloud NER (GNER), and SpaCy NER (SNER) on Polish and English content, a custom research environment was developed using Python. This environment integrates with both Azure Cloud, Google Cloud, and SpaCy to access the appropriate functions required to run each NER model (Figure 1).

Figure 1.

Named entity recognition research environment.

The text data obtained from the sources listed in Table 1 was imported into a PostgreSQL database for analysis. Each text entry, limited to a maximum of 250 characters, is stored within an SQL Document table. This table schema includes the following fields: (1) Document Id (integer): A unique identifier assigned to each text record. (2) Language (text): This field specifies the language of the text (“en” for English, “pl” for Polish). (3) Source (text): This field indicates the origin of the text data, such as the dataset name or website URL. (4) Content (text): This field stores the actual text content that will be analyzed for named entities. Sample data from this SQL Document table is presented in Table 4.

Table 4.

Sample data in the SQL Document table.

Document Id	Language	Source	Content
15745	en	Named Entity Recognition (NER) Corpus	The AP also reports Khayam has been on
			parole from prison since last year after
			serving half his six-year sentence for
			drug dealing.
70835	en	Largest companies in the world for 2024	Apple is a multinational technology company
			that produces a wide range of consumer
			electronics, along with software and services.
			The company was founded in 1976 by Steve
			Jobs, Steve Wozniak, and Ronald Wayne.

The implemented program iterates through each record in the SQL Document table. For each record, the program interacts with the chosen NER tool (ANER, GNER, or SNER) depending on the document language. The output from each NER tool, which consists of the identified named entities and their corresponding labels, is then parsed.

During parsing, the program extracts relevant information from the NER tool’s output, such as the identified entity phrase and its assigned label. This information is then stored in a separate SQL Entity table. This SQL Entity table includes the following fields: (1) Document Id (integer): Foreign key referencing the corresponding record in the SQL Document table. This establishes a link between the identified entity and the original text it was found in. (2) Phrase (text): The actual text phrase identified as a named entity. (3) Actual label (text): The ground truth label assigned to the entity based on the data source (“location”, “organization”, or “person”). (4) ANER label (text): The label assigned to the entity by the Azure NER tool (ANER). (5) GNER label (text): The label assigned to the entity by the Google Cloud NER tool (GNER). (6) SNER label (text): The label assigned to the entity by the SpaCy NER tool (SNER).

This structure allows for efficient storage and comparison of the named entities identified by each NER tool for the same text record. By analyzing the data in the SQL Entity table, the research can evaluate the performance of each NER tool and assess their effectiveness in recognizing different entity types across both English and Polish text. Sample data from this SQL Entity table is presented in Table 5.

Table 5.

Sample data in the SQL Entity table.

Document Id	Phrase	Actual label	ANER label	GNER label	SNER label
3284	radar	none	organization	none	none
15745	AP	organization	organization	organization	organization
15745	Khayam	person	person	person	person
15745	prison	location	none	location	none
70821	networking platform	none	none	organization	none
70835	Warren Buffett	person	person	person	person
70835	investors	person	person	person	none
70835	Berkshire Hathaway	organization	organization	organization	location

The NER tool’s (ANER, GNER, SNER) logic for identifying true positives, false positives, and false negatives can be described as follows: (1) True Positive: Both the actual label and the NER label accurately categorize a named entity as a “location”, “organization”, or “person”. (2) False Positive: The NER label erroneously classifies a named entity as a “location”, “organization”, or “person”, while the actual label is “none”. (3) False Negative: The actual label designates a named entity as a “location”, “organization”, or “person”, but the NER label fails to recognize it. (4) True Negative (Not Applicable): Given that the phrases were exclusively labeled as “location”, “organization”, or “person”, the concept of a “true negative” does not apply to the NER tool’s evaluation. A True Negative would occur if an entity were correctly identified as something other than these three categories, which was not possible based on the available labels.

3. Results and analysis

3.1. NER performance

Based on the conducted experiments, Precision, Recall, and F1-score metrics were calculated for each tool (ANER, GNER, SNER) across different entity labels and overall. The results for English are presented in Table 6, and for Polish in Table 7.

Table 6.
NER performance on English text.

Label NER tool Precision Recall F1-score Support

location ANER 100.00 70.74 82.87 188

location GNER 98.90 95.74 97.30 188

location SNER 100.00 67.55 80.63 188

organization ANER 99.24 57.27 72.63 227

organization GNER 96.05 96.48 96.26 227

organization SNER 94.26 50.66 65.90 227

person ANER 99.35 74.40 85.08 207

person GNER 99.46 88.89 93.88 207

person SNER 98.11 25.12 40.00 207

overall ANER 99.52 67.04 80.12 622

overall GNER 97.98 93.73 95.81 622

overall SNER 97.35 47.27 63.64 622

Label	NER tool	Precision	Recall	F1-score	Support
location	ANER	100.00	70.74	82.87	188
location	GNER	98.90	95.74	97.30	188
location	SNER	100.00	67.55	80.63	188
organization	ANER	99.24	57.27	72.63	227
organization	GNER	96.05	96.48	96.26	227
organization	SNER	94.26	50.66	65.90	227
person	ANER	99.35	74.40	85.08	207
person	GNER	99.46	88.89	93.88	207
person	SNER	98.11	25.12	40.00	207
overall	ANER	99.52	67.04	80.12	622
overall	GNER	97.98	93.73	95.81	622
overall	SNER	97.35	47.27	63.64	622

Table 7.

NER performance on Polish text.

Label	NER tool	Precision	Recall	F1-score	Support
location	ANER	98.48	73.24	84.01	355
location	GNER	95.11	87.61	91.20	355
location	SNER	100.00	40.56	57.72	355
organization	ANER	97.37	45.40	61.92	163
organization	GNER	94.48	94.48	94.48	163
organization	SNER	100.00	42.33	59.48	163
person	ANER	99.46	70.50	82.51	261
person	GNER	94.33	89.27	91.73	261
person	SNER	100.00	19.92	33.23	261
overall	ANER	98.67	66.50	79.45	779
overall	GNER	94.71	89.60	92.08	779
overall	SNER	100.00	34.02	50.77	779

In English, for each entity label, the GNER tool achieved the best F1-score results: 97.30 for the “location” label, 96.26 for the “organization” label, 93.88 for the “person” label, and 95.81 overall. The ANER tool performed 15.69 points lower overall, and the SNER tool was 32.17 points lower overall compared to GNER.

In English, for each entity label, the GNER tool achieved the best F1-score results (Table 6). The overall F1-score for GNER was 95.81, with individual entity label scores of 97.30 (“location”), 96.26 (“organization”), and 93.88 (“person”). The ANER tool achieved an overall F1-score of 80.12, while SNER achieved 63.64. This indicates that GNER outperformed both ANER (by 15.69 points) and SNER (by 32.17 points) in English.

Similar to English, the GNER tool achieved the best F1-score results for each entity label in Polish (Table 7). The overall F1-score for GNER was 92.08, with individual entity label scores of 91.20 (“location”), 94.48 (“organization”), and 91.73 (“person”). The ANER tool achieved an overall F1-score of 79.45, while SNER achieved 50.77. Here again, GNER demonstrated superior performance compared to ANER (by 12.63 points) and SNER (by 41.31 points).

As initially hypothesized, the comparison of English and Polish results (Figures 2 and 3) revealed that English generally yielded better performance across all tools. GNER showed the smallest difference between languages (3.73 points higher in English), while ANER displayed a slightly higher overall F1-score in English (0.67 points). Notably, SNER exhibited a significant performance gap between languages, with English results being 12.87 points higher.

Figure 2.

Comparison of F1-score by entity label (English vs. Polish).

Figure 3.

Overall F1-score comparison (English vs. Polish).

Figure 4.

Comparison of F1-score by entity label (English vs. Polish) on identifying proper nouns.

Figure 5.

Overall F1-score comparison (English vs. Polish) on identifying proper nouns.

An unexpected finding was the consistently lower performance of the ANER tool compared to GNER for both English and Polish. To understand this better, a closer examination of the entities correctly identified by GNER but missed by ANER and/or SNER is planned for the next step of the analysis. This investigation will focus on specific cases where ANER and/or SNER misclassified entities that GNER identified correctly. A deeper examination of these discrepancies aims to uncover the reasons behind ANER’s underperformance and identify potential areas for improvement in these NER tools.

3.2. Focusing on proper nouns in NER performance

The initial analysis of NER tool performance revealed a potential bias. GNER demonstrated a stronger ability to recognize common nouns compared to proper nouns (Table 8). This could have influenced the overall results. To isolate the performance on proper nouns specifically, the methodology was refined.

Table 8.
Sample NER performance on common nouns compared to proper nouns.

Phrase Phrase type Actual label ANER label GNER label SNER label

seaside resort common nouns location none location none

company common nouns organization none organization none

parlament common nouns organization organization organization none

investor common nouns person person person none

reporters common nouns person none person none

Saudi Arabia proper nouns location location location location

Afghanistan proper nouns location location location none

Alphabet Inc. proper nouns organization organization organization organization

Berkshire Hathaway proper nouns organization organization organization location

Paul Allen proper nouns person person person person

Tim Cook proper nouns person person person person

Phrase	Phrase type	Actual label	ANER label	GNER label	SNER label
seaside resort	common nouns	location	none	location	none
company	common nouns	organization	none	organization	none
parlament	common nouns	organization	organization	organization	none
investor	common nouns	person	person	person	none
reporters	common nouns	person	none	person	none
Saudi Arabia	proper nouns	location	location	location	location
Afghanistan	proper nouns	location	location	location	none
Alphabet Inc.	proper nouns	organization	organization	organization	organization
Berkshire Hathaway	proper nouns	organization	organization	organization	location
Paul Allen	proper nouns	person	person	person	person
Tim Cook	proper nouns	person	person	person	person

First, an additional field named “phrase type” was introduced to the SQL Entity table (Table 8). This new field allowed for the distinction between common and proper nouns within the data. With this distinction in place, the experiments were then repeated, but this time considering only phrases that were identified as “proper nouns” based on the new field. This ensured a more focused analysis on the NER tools’ ability to recognize proper nouns. The refined results for English and Polish are presented in Tables 9 and 10, respectively.

Table 9.

NER performance on identifying proper nouns in English text.

Label	NER tool	Precision	Recall	F1-score	Support
location	ANER	100.00	66.49	79.87	188
location	GNER	98.46	68.09	80.50	188
location	SNER	100.00	67.55	80.63	188
organization	ANER	99.23	56.83	72.27	227
organization	GNER	93.53	57.27	71.04	227
organization	SNER	94.26	50.66	65.90	227
person	ANER	98.33	28.50	44.19	207
person	GNER	98.39	29.47	45.35	207
person	SNER	98.04	24.15	38.76	207
Overall	ANER	99.37	50.32	66.81	622
Overall	GNER	96.37	51.29	66.95	622
Overall	SNER	97.33	46.95	63.34	622

Table 10.

NER performance on identifying proper nouns in Polish text.

Label	NER tool	Precision	Recall	F1-score	Support
location	ANER	97.56	45.07	61.66	355
location	GNER	91.21	46.76	61.82	355
location	SNER	100.00	39.15	56.28	355
organization	ANER	97.06	40.49	57.14	163
organization	GNER	88.61	42.94	57.85	163
organization	SNER	100.00	38.65	55.75	163
person	ANER	98.25	21.46	35.22	261
person	GNER	79.41	20.69	32.83	261
person	SNER	100.00	19.92	33.23	261
Overall	ANER	97.58	36.20	52.81	779
Overall	GNER	88.15	37.23	52.35	779
Overall	SNER	100.00	32.61	49.18	779

When focusing on proper nouns in English text (Table 9), SNER emerged as the leader for “location” labels, slightly outperforming GNER and ANER by 0.13 points and 0.76 points respectively. However, for “organization” and “person” labels, SNER’s performance significantly dropped, lagging behind GNER by 6.37 for “organization” and ANER by 6,59 points for “person”. Overall, ANER and GNER exhibited nearly identical F1-scores, with a difference of 0.14 points. SNER performed considerably worse overall (by 3.61 points behind GNER) due to its weaknesses in recognizing “organization” and “person” entities as proper nouns.

Analyzing proper noun identification in Polish text (Table 10) revealed a different pattern. GNER slightly outperformed ANER for “location” (0.16 points) and “organization” labels (0.71 points). However, the trend reversed for “person” labels, where ANER outperformed GNER by 2.39 points. SNER’s performance was worse for “location” labels (5.54 points behind GNER). Surprisingly, for “organization” and “person” labels, SNER achieved results comparable to the best tools, with a difference of only 2.10 points and 1.99 points from the leaders, respectively. Overall, ANER emerged as the top performer for proper noun identification in Polish, surpassing GNER by 0.46 points and SNER by a substantial margin of 3.63 points.

Statistically significant differences in the results (Figures 4 and 5) for all three tools (ANER: 14.00, GNER: 14.60, SNER: 14.16) confirm the hypothesis of greater tool effectiveness in processing the English language. The obtained results indicate a clear advantage of the models over the Polish language.

This refined analysis revealed GNER as the most effective tool for English, while ANER demonstrated superior performance for Polish, especially when considering the overall F1-score. Furthermore, SNER exhibited competitive results compared to the other tools.

4. Conclusion

The initial hypothesis that cloud-based NER systems outperform open-source systems needs to be considered in the context of the specific application. When dealing with both common and proper nouns, GNER emerges as the top performer (95.81 for English and 92.08 for Polish), followed by ANER (80.12 for English and 79.45 for Polish). SNER, on the other hand, consistently yields the lowest results (63.64 for English and 50.77 for Polish). This suggests that the SpaCy tool may not be well-suited for this type of application. However, when the task is limited to proper nouns, the performance of all tools becomes comparable. For English, the F1-scores are ANER: 66.81, GNER: 66.95, SNER: 63.34; and for Polish, they are ANER: 52.81, GNER: 52.35, SNER: 49.18. In this case, the choice of tool depends solely on the specific application context. The second hypothesis, regarding the superiority of NER performance in English over Polish, was confirmed. The results for Polish were consistently lower than those for English, with the smallest difference being 14.00 points and the largest being 14.60 points. This trend indicates a general advantage for English.

The conducted research led to additional, unforeseen conclusions. There is a clear distinction between the NER tools in their approach to named entity recognition. GNER excels at identifying both proper and common nouns as named entities, while ANER performs noticeably worse in this regard, and SNER is essentially not suitable for recognizing common nouns. This highlights the importance of selecting the right tool for the specific application, and the conducted research has implications for this decision-making process.

It is evident that tools like ANER exhibit very high precision, while GNER, despite its high F1-score, often has lower precision. This is particularly evident in Polish text, where GNER achieved a Precision of 94.71 for all nouns and 88.15 for proper nouns, which are significantly lower than those of ANER and SNER. This should also be considered when choosing a tool for specific needs and applications.

Future research should explore several avenues for improving named entity recognition. One direction is to analyze other open-source tools, particularly for handling common nouns. Additionally, it is worth investigating how to control the types of named entities recognized within individual NER tools (e.g., restricting recognition to only proper nouns). Furthermore, the study could examine the impact of punctuation removal on entity recognition accuracy, with a particular focus on SpaCy. While this study has shed new light on the relationship between common nouns and proper nouns, further research is needed to explore this issue in greater depth, particularly in relation to the individual performance of NER tools on different datasets and the impact of noise corpus quality on identification accuracy.

Overall, the research findings provide valuable insights into the performance of cloud-based and open-source NER tools, highlighting the strengths and limitations of each approach. The recommendations for future work aim to address these limitations and further enhance the effectiveness of named entity recognition in various applications.

Footnotes

Acknowledgments

Not applicable.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article.

ORCID iD

Lukasz Pawlik

References

Wankhade

Rao

ACS

Kulkarni

. A survey on sentiment analysis methods, applications, and challenges. Artif Intell Rev 2022; 55: 5731–5780.

Pawlik

Plaza

Deniziak

, et al. A method for improving bot effectiveness by recognising implicit customer intent in contact centre conversations. Speech Commun 2022; 143: 33–45.

Meddeb

Romdhane

. Using topic modeling and word embedding for topic extraction in twitter. Procedia Comput Sci 2022; 207: 790–799.

Gelar

Sari

. Bertopic and NER stop words for topic modeling on agricultural instructional sentences, Atlantis Press, 2024, pp.129–140. ISSN: 2352-5401. ISBN 978-94-6463-364-1. doi:10.2991/978-94-6463-364-1_14. https://www.atlantis-press.com/proceedings/icast-es-23/125998286.

El-Kassas

Salama

Rafea

, et al. Automatic text summarization: a comprehensive survey. Expert Syst Appl 2021; 165: 113679.

Khan

Sheikh

Sinha

. Graph neural network and NER-based text summarization, arXiv, 2024, arXiv:2402.05126 [cs]. doi:10.48550/arXiv.2402.05126. http://arxiv.org/abs/2402.05126.

Zhang

Yan

, et al. Using search session context for named entity recognition in query. In: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’10, 2010, pp.765–766. New York, NY, USA: Association for Computing Machinery. ISBN 978-1-4503-0153-4. doi:10.1145/1835449.1835605.

Shelar

Kaur

Heda

, et al. Named entity recognition approaches and their comparison for custom NER model. Sci Technol Libr 2020; 39: 324–337. Publisher: Routledge _eprint.

Liu

Xiao

Zhu

, et al. QaNER: Prompting question answering models for few-shot named entity recognition, arXiv, 2022, arXiv:2203.01543 [cs]. doi:10.48550/arXiv.2203.01543. http://arxiv.org/abs/2203.01543.

10.

Yin

Cheng

Pan

, et al. Chinese named entity recognition based on knowledge based question answering system. Appl Sci 2022; 12: 5373. Number: 11 Publisher: Multidisciplinary Digital Publishing Institute.

11.

Won

Murrieta-Flores

Martins

. Ensemble named entity recognition (NER): evaluating NER tools in the identification of place names in historical corpora. Front Digit Humanit 2018; 5. DOI: https://doi.org/10.3389/fdigh.2018.00002. Publisher: Frontiers.

12.

Ehrmann

Hamdi

Pontes

, et al. Named entity recognition and classification in historical documents: a survey. ACM Comput Surv 2023; 56: 27:1–27:47.

13.

Jehangir

Radhakrishnan

Agarwal

. A survey on named entity recognition - datasets, tools, and methodologies. Nat Lang Process J 2023; 3: 100017.

14.

Krechowicz

. Massive natural language processing in distributed environment. In: Distributed computing and artificial intelligence, special sessions I, 20th international conference (eds R Mehmood, V Alves, I Praca, J Wikarek, J Parra-Dominguez, R Loukanova, I de Miguel, T Pinto, R Nunes and M Ricca), 2023, pp.56–65. Cham: Springer Nature Switzerland. ISBN 978-3-031-38318-2. doi:10.1007/978-3-031-38318-2_6.

15.

Dadas

. Combining neural and knowledge-based approaches to named entity recognition in polish. In: Rutkowski L, Scherer R, Korytkowski M, Pedrycz W, Tadeusiewicz R and Zurada JM (eds) Artificial intelligence and soft computing. Cham: Springer International Publishing, 2019, pp.39–50. ISBN 978-3-030-20912-4. doi:10.1007/978-3-030-20912-4_4.

16.

Schmitt

Kubler

Robert

, et al. A replicable comparison study of NER software: StanfordNLP, NLTK, OpenNLP, SpaCy, Gate. In: 2019 Sixth international conference on social networks analysis, management and security (SNAMS), 2019, pp.338–343. doi:10.1109/SNAMS.2019.8931850. https://ieeexplore.ieee.org/abstract/document/8931850.

17.

Roy

. Recent trends in named entity recognition (NER), arXiv, 2021, arXiv:2101.11420 [cs]. doi:10.48550/arXiv.2101.11420. http://arxiv.org/abs/2101.11420.

18.

Telenyk

Pogorilyy

Kramov

. Evaluation of the coherence of Polish texts using neural network models. Appl Sci 2021; 11: 3210. Number: 7 Publisher: Multidisciplinary Digital Publishing Institute.

19.

Pakhale

. Comprehensive overview of named entity recognition: models, domain-specific applications and challenges, arXiv, 2023, arXiv:2309.14084 [cs]. doi:10.48550/arXiv.2309.14084. http://arxiv.org/abs/2309.14084.

20.

Otto

Zloch

Gan

, et al. GSAP-NER: a novel gtask, corpus, and baseline for scholarly entity extraction focused on machine learning models and datasets, arXiv, 2023, arXiv:2311.09860 [cs] version: 1. doi:10.48550/arXiv.2311.09860. http://arxiv.org/abs/2311.09860.

21.

Tamla

Hartmann

Nguyen

, et al. CIE: a cloud-based information extraction system for named entity recognition in AWS, Azure, and medical domain. In: Coenen F, Fred A, Aveiro D, Dietz J, Bernardino J, Masciari E and Filipe J (eds) Knowledge discovery, knowledge engineering and knowledge management. Cham: Springer Nature Switzerland, 2023, pp.127–148. ISBN 978-3-031-43471-6. doi:10.1007/978-3-031-43471-6_6.

22.

Wang

Feng

Liu

, et al. 2M-NER: contrastive learning for multilingual and multimodal NER with language and modal fusion, arXiv, 2024, arXiv:2404.17122 [cs]. http://arxiv.org/abs/2404.17122.

23.

Named Entity Recognition (NER) Corpus. https://www.kaggle.com/datasets/naseralqaydeh/named-entity-recognition-ner-corpus .

24.

Top 10 Largest & Most Valuable Companies in the World / Axi EU. http://www.axi.com/eu/blog/education/shares/largest-companies-in-the-world.

25.

Przepiorkowski

. Narodowy korpus jezyka polskiego. Warsaw: Naukowe PWN, 2012.

26.

50 najpiekniejszych miejsc w Polsce, 0200. https://podroze.onet.pl/gdzie-na-weekend/50-najpiekniejszych-miejsc-w-polsce-co-zobaczyc-ciekawostki-przewodnik/5lxwzew.

27.

Final XXI edycji Nagrody Gospodarczej Prezydenta RP. 2023. https://www.prezydent.pl/aktualnosci/inicjatywy/nagroda-gospodarcza/aktualnosci/final-xxi-edycji-nagrody-gospodarczej-prezydenta-rp,77785.

28.