Abstract
Every day, information users struggle to find relevant documents and data needed to perform their jobs effectively and efficiently. This paper presents a strategic, practical approach to information architecture, focusing on developing and adding metadata to improve information retrieval. The review starts with a discussion of common information management problems and potential business benefits, and addresses the need for overarching principles and policies to be aligned with a comprehensive enterprise architecture. Next, the paper outlines a solution based on combining elements of information architecture – the hierarchy of a taxonomy, the synonyms from a thesaurus and the relationships in an ontology. It discusses how this hybrid structure can improve content classification, user navigation and enterprise search. The final sections will explain how taxonomy-generated rules can improve content classification and search, and present the results of a recent Proof of Concept (PoC) for automated content classification. The article also includes two sidebars – one offering taxonomy best practice guidelines and the other exploring an advanced classification weighting to improve search precision.
Keywords
Introduction
I always look forward to the annual survey from Swedish company Findwise that analyzes user requirements for enterprise search and ‘findability’.
The poll by these information specialists provides ample justification and business benefits for anyone battling to improve information architecture (IA) – the accurate, logical and consistent description of enterprise content and data. These descriptive tags are a pre-requisite for improving information retrieval and delivery.
Whether you call it findability or discovery, the aim is to get the right information to the right people at the right time.
Two-thirds of responding organizations in the 2016 Findwise survey said that more than 50 per cent of employees depended on finding information in their daily work.
This evidence of clear demand was merely the set-up for some bad news – more than one-third of these respondents said it was difficult for users to find that information.
Focusing on solutions, Findwise noted that a strategy for improving information retrieval was a key factor in boosting user satisfaction.
In fact, over the past four years, in organizations that had a strategy, some 27 per cent of users found it easy or very easy to find information. Without a strategy, the number plunged to just 10 per cent.
In a related finding from the 2015 survey, respondents said that missing or inconsistent content tagging was the biggest obstacle to effective search.
So, your pesky information architect was right – you need a strategy, and the crux of the problem is metadata, that key structural component of IA. It is variously described as information about information or a concise summary of an organization’s knowledge assets.
In answer to the needs summarized above, this article will present a strategic, practical approach for developing and adding metadata. This blueprint will focus on unstructured content such as documents but align the strategy with methods for describing more structured content found in traditional databases or newer linked data repositories.
The next section will be a discussion of current information management problems and potential business benefits; the third section will review the need for overarching principles and policies, in line with a larger enterprise architecture; the fourth section will outline a solution based on combining elements of IA and look at how this hybrid structure can improve content classification, user navigation and enterprise search.
The fifth section will explain how taxonomy-generated rules can improve content classification and search, while the sixth section will present the results of a recent Proof of Concept (PoC) for automated content classification.
The article will also feature two sidebars – one offering taxonomy best practice guidelines and the other exploring an advanced classification weighting to improve search precision.
Information management issues and business benefits
There are several problems that IA can address: Ineffective search Poor website navigation Inability to retrieve useful, related information Content ‘silos’ and ‘Wild West’ folder structures Chaos of personal classification systems Obstacles to personalized information delivery Duplicate and outdated content Inconsistent, incomparable data fields and values that undermine management reporting efforts Burden of creating and maintaining custom web page content (in the absence of dynamic, metadata-driven updates) Limitations of purely technical solutions to information management problems
Addressing these issues in a concerted fashion can transform the list into significant business benefits. The sample benefits below, many of them corroborated by the Findwise surveys, will help your key project stakeholders understand the value and impact of supporting an IA strategy.
First, there are benefits associated with the improved accuracy of information by source, subject and version – a single point of truth. This is the same goal as master data management in the domain of structured information. The benefits include: Efficiency in producing, finding, distributing and re-using high-quality information, specifically: Improved content navigation More effective search Delivery via profiles/personas More efficient discovery of subject specialists Better informed staff
There are also strategic and operational benefits, such as: Better business planning Improved decision-making Easier collaboration Improved client service More effective external communication and thought leadership
Finally, there are advantages for the organization in meeting internal and external compliance and regulatory responsibilities, covering issues such as: Effective document life cycle management (e.g. to support Freedom of Information requests) Efficient and effective regulatory compliance Reputation protection and enhancement Added value to (and protection of) intellectual property
Information management principles and policies (or how to play nice with other architects)
It is important to realize that IA is a separate and independent element of the enterprise architecture, whose other layers include structural solutions for business processes, applications and technical systems.
Unfortunately, in many organizations, IA is neglected. The IT department often claims it understands and represents business needs, but truly effective solutions require a technically savvy information architect, operating as a go-between, to interpret and champion business requirements to the IT specialists.
These business requirements need to be defined at an early stage for efficient integration with the other layers of enterprise architecture. The successful information architect must demand a seat at the table when the discussions begin about transformative enterprise solutions.
The information architect’s vision must also encompass the entire organization. The IA comprises descriptions of structured and unstructured information and thus includes data models as well as controlled vocabularies such as glossaries or taxonomies.
The information specialist must establish basic principles for information management and define them in detailed policies. This process is part of any proper enterprise architecture planning, so there will be allies on the project team.
A helpful starting point is the 21 principles developed by The Open Group Architecture Forum (TOGAF) for enterprise architecture.
In complete form, each principle would include a Name, Statement and Rationale. To provide a flavour, here are a few high-level principles that I have consolidated and modified from the TOGAF list: Information requires a common vocabulary, structure and definitions. Information must be structured and formatted to support interoperability. Information structure must be technology independent. Information should be created or acquired once, then re-used and shared appropriately. Information must be governed and managed effectively.
These principles can be followed with detailed policies for IA and for metadata management. For instance, any IA policy should explain the requirements to describe content accurately and consistently across all processes, applications and systems.
Metadata and IA structures
Any exploration of information structures and descriptions must start with metadata types and management. (Note – once you start organizing information, it is hard to stop; you must also organize information about information!)
Metadata can usefully be defined by three types: Administrative metadata – Content attributes focused on the creation or format of content, for example, the creator, date, media type, and so on. Many of these administrative terms are based on the Dublin Core metadata set and can be generated by the content management system. Administrative metadata also provides technical information such as the content source, how content is stored and how it is processed. Descriptive metadata – Content attributes that define what the information is about, such as geographic location, subject matter or event type and who it would interest. Most of our content classification efforts are devoted to descriptive metadata. Structural metadata – attributes that define collections or series of content, for example, a multimedia package, chapters or quarterly data. Structured in this way, metadata improves machine-to-machine information flows, ensuring information is understood, interpreted and presented correctly, can be uniquely identified and have business and quality rules consistently applied. But why stop there? We can also define two types of metadata management: Controlled vocabularies – someone is in charge, defining preferred terms, synonyms and related terms. Folksonomies – No one is in charge! Users are free to tag content with personally meaningful key words.
Luckily, these two management approaches are no longer mutually exclusive. Practitioners of both methods generally agree that a folksonomy supplements, rather than replaces, a controlled vocabulary.
Proponents of centralized terms now recognize that folksonomies help users see the immediate benefits of metadata, for instance, with better search results on terms they contributed. Improved search also provides incentives for contributing up-to-date terms for inclusion in the controlled list.
Also, system administrators can modulate search queries using folksonomy key words, so the primacy and performance of controlled terms are not threatened.
Hybrid approach to create an ‘extended’ taxonomy
Unstructured information becomes more usable when structured hierarchically as a taxonomy, extended with the synonyms of a thesaurus and the relationships of an ontology.
Business stakeholders need protection from jargon, so I simply call the result of this hybrid approach an ‘extended’ taxonomy.
Helpfully, the fields and values of a high-level conceptual data model look nearly identical to a two-level taxonomy, providing a common source to unite information initiatives across the enterprise. When you add relationships to your entities and concepts, your more detailed logical and physical data models are easily aligned with your ontology, as shown in Figure 1.

Align content structures to unify information and data.
In addition, employing semantic web standards internally, such as the ‘triples’ of the Resource Description Framework described below, can expand an organization’s access to useful, authoritative information. For instance, such a linked data strategy will provide useful connections to Open Data provided by government agencies or rich Big Data sources from industry groups or digital devices. It can also reduce the need to bring all information ‘in house’.
The full structural journey starts with best practice for building taxonomies. (See related Sidebar One for detailed guidelines.) However, at this early stage, the first practical steps can be summarized under two structural views, vertical and horizontal.
Building a multifaceted taxonomy (vertical view)
As a vertical structure, a taxonomy provides the benefit of logical navigation and metadata inheritance, where any selection of a lower level heading should automatically invoke the parent metadata above it. If I select ‘Inheritance tax’ as a document tag, I would expect the higher level tags of ‘Personal taxation’ and ‘Taxation’ to be included as a default.
However, to reflect the reality that users enter the ‘information pool’ from a variety of directions, any taxonomy should be ‘multi-faceted’, that is, it should cover more than subject matter and answer a variety of questions about your organization, such as Who and Where are we, Who do we work with and for, What do we do, Why do we do it, and, finally, How do we do it.
The answers to these questions will suggest different facets of the taxonomy, and there are three useful components under which the resulting facets can be grouped: Entities – proper nouns, such as Organizations, Company departments, Geographic locations, People, Products and Statutes. To help describe these entities, it is useful to capture or link to relevant attributes and values, such as location and subject expertise of staff members. Subject matter – largely generic topics such as Business activities and Business sectors, and domain-specific topics such as Legal subjects, Science areas, Engineering processes, Medical conditions and so on. Focused filters – descriptions of useful filters for finding specific information, such as Content types, Event types and Language. Many of these filters can be applied automatically using templates, as SharePoint does by adding metadata and workflow paths to Content types.
Within this vertical structure, we can employ the ‘top-down, bottom-up’ approach to refine the taxonomy. The facets above provide the ‘top-down’ categories. Existing vocabularies, glossaries, web sites, databases, library catalogues, and so on supply the ‘bottom-up’ terms to populate these top-level categories.
The key element in this vertical, multifaceted structure is the preferred term – the word or phrase we select to describe the topic. Once the preferred term is established, we can extend the taxonomy with further information on the topic, for example, by adding appropriate columns along its horizontal axis, as we would in a spreadsheet or data base.
Extending the taxonomy (horizontal view)
There are several sources for extended taxonomy elements:
‘Runners up’ to the preferred term
Acronyms
Search queries
Subject specialists
Domain-specific documents
Text-mining software
Faceted-classification or search software (especially if employed when building the taxonomy, not after)
This horizontal extension allows us to supplement the taxonomy with the synonyms of a thesaurus, and the related taxonomy terms and stated relationships of an ontology (Figure 2).

Hybrid structures’ most useful features for ‘extended’ taxonomy.
In addition to related terms, we can add another type of associated description – ‘contextual keywords’. These are neither equivalent nor related terms from the taxonomy but reflect the ‘folksonomy’ knowledge derived from domain specialists and relevant documents.
By design, these elements comply with British Standard 8723 (ISO 25964) for structured vocabularies, by representing three types of relationship – hierarchical (Preferred term and Hierarchical parent), equivalent (Synonyms) and Associative (Related terms, Contextual keywords and Properties).
These elements would also be familiar to anyone following the guidelines on construction, format and management of controlled vocabularies from the National Information Standards Organisation.
Another global standard, the Resource Description Framework (RDF), supports Linked Data by defining entities, concepts and their relationships in a consistent, machine-readable way.
It is one of the key semantic web standards developed by the World Wide Web consortium (W3C). There are two other, separate but compatible, W3C standards supporting the RDF with consistent descriptions and syntax for the vocabulary structures described above.
One is the more formal Web Ontology Language (OWL); the other is the less rigid Simple Knowledge Organisation System (SKOS).
Because of standards like these, internet search engines now return structured summaries that can answer many basic queries (e.g. when was Winston Churchill the British prime minister), rather than just list relevant web resources. The searches also rely on repositories that provide the structured content, such as DBpedia, which rebuilds Wikipedia content in RDF format.
The same standards, coupled with an extended taxonomy structure and a flexible user interface, could produce similar summaries for staff querying an organization’s intranet.
If your business stakeholders show an interest in RDF, you may find the following table useful, as it translates the grammar-influenced ‘triples’ into more understandable language that would resonate with data base specialists (Figure 3).

How to translate ‘triples’.
Taxonomy-generated rules for classification and search
There are at least five ways to add metadata to content. In practice, they can be complementary options, mixed and matched depending on an organization’s policies, resources and culture: Manual tagging by authors and specialist indexers. Entity text mining, with related metadata added from authority files (an example of a linked data strategy). Stored search queries – aka ‘soft tagging’. Automated content classification for key entities and subjects – ‘hard tagging’. System-generated and template metadata, for example, Microsoft’s Active Directory could produce an author’s business unit and specialist subjects. At the same time, templates can define content types or event types more simply and accurately than can a textual analysis.
Whichever combination of methods appears attractive, some form of automated tagging is needed to assist manual classifiers and ensure efficiency, accuracy and consistency. It is not enough to devise a consistent extended taxonomy; that vocabulary also needs to be applied consistently.
Even subject specialists differ widely on how they understand and apply metadata. In a study for a global news agency, I found that tagging accuracy of the specialist editors ranged from 40 to 100 per cent, with nearly half of the 500 sample stories failing to hit an 80 per cent accuracy level. Many digital publishers would consider 85 per cent accuracy to be the required minimum.
However, by understanding the relationships among the terms from the extended taxonomy, we can exploit this structure to create effective rules for automated content classification.
This approach aligns closely with the capabilities of most automated content classification providers, such as SmartLogic, ConceptSearching, Expert System and open source systems like GATE.
These rules deliver focused metadata that can be leveraged by similarly structured and stored search queries.
It is a paradox of modern computing that search capabilities have far outstripped most users’ own abilities to take advantage. To benefit from modern search methods, a successful user must combine knowledge of the correct terminology, including synonyms and associated terms, the correct Boolean logic to link them and the proper syntax to run the search effectively. As shown in the Venn diagram in Figure 4, that user population would likely fit in a Volkswagen Beetle.

Few users can combine the elements needed for successful search.
Fortunately, the taxonomy-generated search rules make everyone an expert, by merging the terms from the extended taxonomy into the correct Boolean structure with the proper search syntax.
The rules can be generated via simple formulas, based on the contents of the data fields or linked data sources, much like a mail-merge application creates address labels from a list of contacts (Figure 5).

Creating taxonomy-generated rules via a ‘mail merge’ process.
Such pre-built searches work in a variety of discovery applications, such as Autonomy’s IDOL, Google Search Appliance, OpenText’s integrated search and Microsoft’s FAST search (now the default tool in SharePoint).
These saved queries can be displayed as hyperlinks, helping users perform advanced searches with a single mouse click. The advanced queries can be supplemented by another practical use of the extended taxonomy – the exposure of taxonomy facets and related terms to filter search results.
From more than 25 projects for clients including Unilever, Oxfam and the NHS, InfoArk has devised an effective and efficient approach to creating these classification and search rules that delivers consistent, accurate and automated results for nonspecialist users.
For most clients, we begin with a two-part rule or query. First, the rule looks for topic matches in common metadata fields, and in prominent locations within the content itself. Next, it finds topics by proximity to other relevant descriptions from the extended taxonomy.
Specifically, the location test looks for the topic’s preferred term and synonyms in the content title, URL or other significant content locations, such as a Management Summary or Conclusion. The proximity test then attempts to find the preferred term and synonyms within several words of the hierarchical parent, related terms and contextual key words.
Here is a sample search rule for the topic of Strategy, where we include synonyms such as ‘Long-range plan’ and contextual key words such as ‘leadership’, ‘direction’ and ‘objectives’. The use of wildcards, if supported, expands the search to ‘strategic’, ‘strategize’, and so on:
(Strateg*, vision, “long range plan*”, “long term plan*”, “business plan”, “mission statement”, “vision statement”): Title OR (Strateg*, vision, “long range plan*”, “long term plan*”, “business plan”, “mission statement”, “vision statement”) WNEAR7 (plan, leadership, direction, policy, review, goals, objectives, aims, priorities, roadmap, “future state”, transform*, CEO, “board of directors”, “senior manag*”, corporat*, develop*)
To prove the usefulness of this approach, some returned documents made no mention of ‘strategy’ but did include ‘business plan’, ‘vision’ and ‘mission’.
It helps to select the synonyms, related terms and contextual key words carefully and limit their number. The advantage of limiting these elements is to keep our rules roughly balanced between the dual accuracy measures of precision and recall. For every term we employ in our rules, the scope widens and the recall increases; limiting the terms forces us to select the most precise descriptions available.
While the production of these consistent search rules depends on a clear understanding of the extended taxonomy, prolonged success requires a representative and transparent procedure for keeping it up to date.
Stable versus dynamic vocabularies
Finally, we need to recognize that while many of our taxonomies or taxonomy facets will be relatively stable, some will include more dynamic and revisable lists that will need to be maintained and governed outside the core taxonomy.
A list of staff members or top clients would be an example. However, a controlled list of properties that help define these entities could be part of our stable, controlled vocabulary, for instance, we could use our Geographic location facet to describe staff and client location.
Recent case study – Testing classification and measuring accuracy
A recent PoC for a global agribusiness company applied classification rules based on the preferred term and synonyms and basic identification of specific content types and sub-elements such as summaries and data tables; future rules will be more comprehensive, exploiting the rich descriptions and structure of the full extended taxonomy model.
Figure 6 shows the three successive phases, with associated actions, for improving content classification, starting with Phase 1 in this PoC.

Improving automated content classification is an iterative process.
As Figure 6 shows, an out-of-the-box classification engine, equipped with a few simple rules, would normally generate accuracy between 65 and 75 per cent.
Further taxonomy extensions, more thorough document analysis and increasingly sophisticated rules should push accuracy into the 75–85 per cent range.
Happily, as explained in detail below, the PoC produced results at the higher end of that range, which suggests that further refinements could reach the target levels of 85–95 per cent accuracy.
To measure this classification accuracy, we devised a spreadsheet matrix. It included columns for each of the expected 85 potential metadata tags agreed with stakeholders, about half of which commonly appeared. Rows identified each of the 40 documents assessed.
Expected tags were marked as ‘hits’ and validated against the results of the tagging engine. Any tags the engine failed to attach were overwritten as ‘misses’, while any tags attached in error were marked as ‘noise’.
This consistent process – and the tagging guidelines agreed by two assessors – built a large degree of objectivity into what otherwise could have been a subjective exercise.
Here is a summary of the measures used:
The following industry standard accuracy formulas were embedded in the spreadsheet, producing separate scores for recall and precision and an evenly balanced composite rating for accuracy:
The automated content classification test successfully demonstrated the natural language processing capabilities of the classification engine, with the following high-level results: Better-than-expected median accuracy of 84 per cent. Accuracy slightly higher on template-based documents. Sufficient tags to assist with data migration – 15 was the median. Tagging Precision was impressive at a median 94 per cent. Missed tags (producing a median Recall score of 77 per cent) reduced accuracy more than did unwanted tags (limiting the Precision score). Reasons generally were clear for the missed and unwanted tags, so further extensions of the taxonomy and more sophisticated rules should significantly improve the accuracy (Figure 7).

Missed topics were reflected in lower recall scores, reducing accuracy.
The median accuracy of 84 per cent is significant as is the relatively narrow accuracy range of 62–95 per cent; it is notably better than we would expect from a specialist manual tagger, based on the news agency analysis mentioned earlier.
It is worth looking in more detail at the PoC results. In most cases, the reasons for missed tags and noise were obvious and will be easily fixed. Specifically, the misses occurred because the tagging engine was not running a generic entity extractor (e.g. to find all organizations, not just pre-selected ones), and the rules were not yet extended to look for related terms and contextual key words.
The main reason for noise was overly broad synonyms in the information model.
The scoring was strict, with misses and noise based on what the tagging engine should have produced, even when the reasons for failure were obvious.
For example, we would expect the recognition of a product to generate the related terms of its active ingredients and the correct product category; we disregarded the fact there were no rules to associate these related terms, even when the descriptions were available in a separate repository containing linked data ‘triples’.
This unforgiving approach established a solid benchmark for improvement, when further taxonomy extensions are included and more sophisticated rules devised.
Even so, the results were much better than expected at this stage of taxonomy and rule development.
The results supported related plans to migrate content from one document repository to another. Plentiful hits were generated – the median was 15 tags – with correctable misses and minimal noise.
Similar tests can be run on the accuracy of taxonomy-generated searches. Staff can identify exemplary documents covering the selected topics and include them in the repository being searched. If the saved query finds them, it’s a ‘hit’ and if not, a ‘miss’. Any unwanted documents returned are ‘noise’.
Sidebar One – Best practice in action – ten guidelines
Based on established principles, policies and structural best practice, there are several rules to guide construction of a consistent, multifaceted and extended taxonomy. These guidelines are as follows: Establish a single list as the sole repository for identical or similar topics. Consolidate and remove duplicate or overlapping terms. Replace an unneeded compound topic using new or existing terms as dual descriptors; for instance, in the absence of numerous lower-level terms, the heading ‘Real Estate tax’ could be replaced by the reusable dual descriptors of ‘Real Estate’ and ‘Tax’. Another example would be ‘Patents’ + ‘Litigation’, rather than ‘Patent litigation’. (There are three exceptions to the ‘building block’ approach above. First is the need to find a home for lower-level terms, which would require parent topics like ‘Construction law’. Second, if you have a large volume of documents requiring the precision of the compound term. Third, if the combined description matches a job function, allowing you to highlight staff expertise alongside displays of documents.) Change a term name for clarity and consistency. Establish a new category or heading to group underlying terms, especially those terms in puzzling, illogical or isolated locations. Move a term under a more logical or intuitive category or heading. Add a detailed term for clarity and consistency, particularly if comparable headings include lower level, detailed terms. Aim to nest a child term under one parent term only (a mono-hierarchy rather than a poly-hierarchy) and use Related Terms to reflect the other mandatory or discretionary connections within the taxonomy. For example, a hospital listed under Devon could be linked to Mental health as a related term, rather than creating a second hierarchical parent. This guideline helps avoid the over-application of tangentially related metadata through hierarchical inheritance, which dilutes search accuracy by expanding recall over precision. Convert a candidate term to a synonym or ‘contextual key word’ of a more common preferred term, especially if it is the only term under a heading. Map a preferred term to an ‘alternate display term’ if required by existing systems and applications, to avoid undermining your validated vocabulary.
While these guidelines will pave the way toward taxonomy consolidation, an organization may need temporarily to maintain dual sources of metadata to support legacy systems.
Sidebar Two – Enhancing classification with the metadata weighting for ‘extent’
Our assessment of tagging accuracy in the accompanying case study purposely underplayed the impact of ‘noise’ or unwanted tags, as we had no agreed cut-off point for tagging frequency. Many annotations appeared once only. At the same time, the number of mentions alone – translated directly into metadata tags – is not a sufficient description of content meaning.
Classification engines usually calculate a ‘relevancy’ or ‘accuracy’ score for each applied tag. If transparent and improvable, this score can provide a helpful content description – and filter – behind the frequency of annotations. One digital publisher I worked for used this score to establish an 85 per cent threshold for attaching metadata. If all tags hit that level or above the article was published; otherwise, it was sent for manual review.
Even so, it would be helpful to explore how classification engines could add and recognize an additional, numeric value for entity and concept tags. This number (with values of 1, 2 or 3) would measure an attribute I’m calling Extent – the degree to which the topic tag describes the content. Another definition would be the ‘importance’ of that tag, relative to others.
This measure is separate from accuracy, which defines if the tag is correctly applied, and relevancy, which should define its value to the user.
This weighting for Extent, rather than frequency alone, could help refine tagging accuracy and therefore our content retrieval success.
Such a measure would improve search by differentiating frequently and significantly mentioned entities from single or tangential references, and allow us to link entities with their associated topics. For instance, we could clarify that a document is primarily about US regulation because the tags ‘US’ and ‘Regulation’ share a top Extent score of 1, even if the document also mentions the UK (with a default Extent score of 3).
Such disambiguation is particularly important in lengthy documents that include references or bibliographies, where many tags would be of tangential interest.
The need for this measure also flows directly from our ‘building block’ approach to organizing information – where we prefer a modest number of reusable dual descriptors like ‘US’ and ‘Regulation’ to innumerable compound descriptors such as ‘US regulation’, ‘UK regulation’ and so on.
Here is another advantage: By using rules to add a number, rather than to block borderline tags, we retain the full range of applicable descriptions, but filterable by Extent.
We can already sketch out a process by which Extent values could be applied. To start with, we know which facets within the taxonomy represent Entities, that is, Organization, Location, Product, Person, and so on.
Any identified Entity could receive a default Extent score of 3. Entities with Frequency mentions above the median for all Entities, or those found in significant Content elements such as Summary (as opposed to tangential References or a Bibliography), could automatically receive an Extent score of 2. The same Extent score of 2 could be attached to any subject identified by our sophisticated classification rules.
We could then ask subject specialists reviewing the automated tagging results, optionally, to upgrade the scores of one or several entities to 1 and do the same for any primary subject. We would particularly expect upgrades for disambiguation, when the entity needs to be associated with the primary subject, such as ‘US’ and ‘Regulation’ noted above.
If our search engines could then be configured to recognise these weightings, the precision of our results would be significantly enhanced.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
