Abstract
This study provides a detailed comparative evaluation of freely available AI-powered tools designed to support the preparation of scientific literature reviews, a core competency in researcher education and training. Adopting a mixed-methods approach, seven prominent AI tools were assessed against a predefined set of weighted criteria. The findings reveal significant variations in tool performance, highlighting the need for educators and practitioners to provide targeted guidance on tool selection. While Scispace demonstrated superior comprehensive functionality (overall score: 91.5%), other tools offered notable strengths in specific niches relevant to different learning and research stages. This study contributes valuable, evidence-based insights for information science educators, trainers, and librarians on integrating these emerging technologies into research methods curricula. The aim is to enhance the efficiency, quality, and critical skills of students and researchers in their scholarly work.
Keywords
Introduction
A literature review is a systematic evaluation of written works on a particular subject, commonly conducted within the sciences and social sciences, to synthesize information, identify knowledge gaps, and formulate new research questions (Wong & Li, 2023). It serves as a foundational element for any defensible thesis and bridges past and present findings on a topic (Bolaños et al., 2024; Jhajj et al., 2024). Ensuring comprehensive and accurate coverage in a review is critical (Orel et al., 2023). However, locating and assessing all relevant studies poses a significant challenge, a task made more complex by the rapid proliferation of academic research. This challenge is particularly significant in the education and training of new researchers, where mastering the literature review is a critical yet often difficult milestone.
In this context, the development of Artificial Intelligence (AI) tools, particularly Large Language Models (LLMs) like ChatGPT, is creating an unprecedented shift (Pinzolits, 2023). These models, trained on vast corpora of human-generated text, can replicate complex language patterns (Altmäe, 2023). This technological leap has sparked diverse opinions on the role of AI in automated writing and editing, raising crucial questions about authorship, responsibility, and the potential to enhance the scientific research process (Jhajj et al., 2024; Zala et al., 2024). AI tools can now assist in various research stages, from organization and drafting to the expeditious retrieval of key information from large volumes of text (Khalifa & Albadawy, 2024; Souifi et al., 2024). This strongly suggests that AI is swiftly becoming an integral part of systems supporting scholarly work.
Literature Review
The emergence of AI offers a potential solution to the traditional challenges of conducting literature reviews (Mogoale et al., 2025; Van Dijk et al., 2023). The process is often fraught with difficulties, especially for students, who may struggle with a lack of experience and time, leading to lower quality standards (Souifi et al., 2024; Wong & Li, 2023). The sheer volume and complexity of scholarly literature make conventional methods for conducting reviews, particularly Systematic Literature Reviews (SLRs), inefficient and time-consuming (Tovar, 2023), a challenge that automation aims to alleviate (Orel et al., 2023; Souifi et al., 2024).
The application of AI in this domain is multifaceted. The concept of Human-Centred AI (HCAI), for example, emphasizes a collaborative approach where AI enhances human capabilities rather than replacing them, while maintaining transparency and control (Pinzolits, 2023). Functionally, AI tools can be applied at various stages of the review process, from initial research question formulation to the final analysis (Fabiano et al., 2024). This functional diversity is reflected in the classification of tools into distinct categories, such as Literature Search Tools (e.g., Consensus), Research Article Analysis Tools (e.g., Scholarly), and Academic Writing and Editing Tools (e.g., SciSpace Copilot), as outlined by researchers like Pinzolits (2023) and Filetti et al. (2024). Case studies have further demonstrated this diversity, examining both integrated systems like the “AI Literature Review Suite” and individual tools such as Elicit and LiteRev and Google Bard (Aydin, 2023; Kung, 2023; Orel et al., 2023; Tovar, 2023; Wagner et al., 2022).
Despite the opportunities for enhanced efficiency, the use of these tools is not without risks. A primary concern is the potential for AI to generate plausible sounding but unsupported information or to introduce errors during data summarization (Fabiano et al., 2024). Several studies have highlighted significant challenges related to accuracy, ethics, and the potential for misuse (Giglio & Costa, 2023; Gwon et al., 2024; Tomczyk et al., 2024). This underscores the importance of critical engagement from researchers and the implementation of mitigation strategies, such as using AI tools that provide verifiable citations to ensure accuracy and avoid fabrication (Fabiano et al., 2024).
This burgeoning body of literature reveals a clear trend: while many studies describe the capabilities of individual AI tools or provide general overviews (e.g., Jhajj et al., 2024; Souifi et al., 2024), a significant gap exists in direct, detailed comparisons between them. The literature often presents tools in a manner more akin to an annotated bibliography, without a synthesized exploration of their comparable, complementary, or contrasting features. This lack of comparative analysis leaves researchers without clear, evidence-based guidance for selecting the most appropriate tool for their specific needs. Consequently, the present study was conducted to address this critical gap by providing a much-needed, in-depth comparative evaluation of AI tools for preparing scientific reviews.
Methodological Framework
Research Problem
There is a clearly defined knowledge gap regarding how AI tools can be effectively utilized in preparing scientific reviews, even when the literature suggests that these tools have remarkable capabilities. Providing capabilities, as literature has shown, is not equivalent to providing an exhaustive and detailed comparison that can help a user make an informed decision. This gap in the literature has led to confusion among researchers when it comes to selecting the appropriate tool for their research.
This gap in the literature highlights the need for evaluating the AI-based tools designed for preparing scientific reviews and comparing them in such a way as to measure their claimed effectiveness and provide reasons as to why some were better than others, while providing practical insights on how best to navigate these tools.
Research Objectives
This research aims to first identify and evaluate the available AI tools that support the preparation of scientific reviews. Following this evaluation, the study conducts a detailed comparison of these tools based on a comprehensive set of criteria, including user interface, corpus creation capabilities, review preparation features, document analysis, AI-powered notebooks, citation management, and cost. Through this comparative analysis, the research seeks to identify the distinct strengths and weaknesses of each tool. Ultimately, the findings are synthesized to provide practical recommendations that can guide researchers in selecting the most appropriate tool for their specific research purposes.
Research Questions
This research seeks to answer the following questions: (1) What are the available AI tools that support the preparation of scientific reviews? (2) What are the similarities and differences between the AI tools in the study sample? (3) What are the strengths and weaknesses of each AI tool under study? (4) What practical recommendations can be offered to researchers on how to select the appropriate tool for their research purposes?
Research Significance
The significance of this study is threefold. Theoretically, it contributes to the scientific literature by providing a theoretical framework for understanding the capabilities and challenges of AI in the field of scientific reviews. In terms of applied significance, this research offers a practical set of tools to assist researchers in selecting the most suitable instrument for their specific purposes. Finally, from a societal perspective, the study contributes to promoting a culture of using modern technologies in scientific research, which can positively impact the overall quality of scholarly output and the competitive ability of research institutions.
Research Scope
Thematic Scope
This research is limited to studying the AI tools available to support the preparation of scientific reviews.
Temporal Scope
This research focuses on the tools and their features as they were available and evaluated during a specific timeframe, from [September 2024] to [November 2024]. It is crucial to acknowledge that the AI tool landscape is characterized by exceptionally rapid evolution. Therefore, this study represents a “snapshot” analysis, and it is expected that functionalities of the evaluated tools may have been added or altered since the conclusion of the data collection period.
Linguistic Scope
The tools under study support both Arabic and English languages.
Research Methodology
The analysis focused on descriptive, evaluative, comparative, and applied methodologies employing a convergent mixed-methods design (Vedel et al., 2019). This approach was selected to integrate quantitative performance data with qualitative observational insights, providing a comprehensive evaluation of the tools. The first step of the methodology required outlining the selected AI tools and defining evaluation criteria while describing them. In the second step, the performance of the tools was systematically evaluated against the pre-set criteria to ensure a structured and consistent assessment. In the third step, all the tools were examined in relation to the evaluation results and utilized in the assessment logic to find evaluative similarities and differences. Finally, in the fourth step, the tools were used practically to determine their usefulness in aiding the scientific review process.
Mitigation of Researcher Bias
Given that the evaluation was conducted by a single researcher, it is important to acknowledge the potential for subjectivity. To mitigate this risk and minimize bias, several methodological safeguards were implemented. First, the assessment was strictly anchored to the predefined criteria and weighted scoring system, which acted as an objective rubric to guide the evaluation of every tool. Second, the evaluation focused on verifying the existence and functionality of specific technical features (e.g., the ability to export citations or upload PDFs) rather than relying on general impressions. Third, a standardized evaluation protocol was applied across all tools, where the same research tasks and prompts were used to ensure that every tool was tested under identical conditions.
The methodological approach of this study followed four key phases, visually represented in Figure 1. It began with Planning and Preparation, where the research problem was identified, objectives defined, literature reviewed, and the evaluation criteria established. This was followed by Data Collection, involving the systematic gathering of both quantitative scores and qualitative observations through the direct application of the tools. Steps of the Research Study. Note. The Workflow Diagram was Developed by the Author to Illustrate the Research Process
The third phase, Data Analysis, was critical and employed a convergent mixed-methods design. Quantitatively, the performance scores were analyzed using descriptive statistics—specifically means and standard deviations—to identify statistical trends and variability across the tools. Qualitatively, the researcher’s detailed observational notes on usability and features were analyzed through content analysis to identify recurring patterns and provide contextual explanations for the numerical data. Finally, the fourth phase involved the Presentation and Discussion of Results, where findings were synthesized to formulate conclusions and recommendations.
Study Population and Sample
AI Tool Sample for Comparative Evaluation
Note. The list of tools and their corresponding website links was compiled by the author to define the sample for this study.
Second, the sample selection was informed by the researcher’s exploratory use of specific platforms such as Scholarly and NotebookLM, as well as a review of recent reports and updated online inventories of AI research assistants. Tools explicitly highlighted in relevant literature, such as Elicit, were also prioritized. Finally, an ethical criterion regarding accessibility was applied (Fabiano et al., 2024; Siderska et al., 2023). To ensure fairness and broad applicability for researchers with varying resources, the study exclusively selected tools that offer substantial functionality within their free-tier versions, allowing for an evaluation based on freely accessible features.
The Evaluation Framework: Core Assessment Tasks
To systematically assess the performance of each of the seven tools, a flexible, task-oriented evaluation framework was applied. Instead of a rigid, identical sequence of prompts, each tool was evaluated based on its ability to perform a set of core functions essential to the literature review process. The assessment was adapted to the unique workflow and feature set of each tool to ensure a fair and contextually relevant evaluation. The core assessment areas were as follows:
Task 1: Corpus Creation
This involved evaluating the tool’s primary mechanism for building a literature collection. For search-based tools (e.g., Scispace), this was tested using the standardized research question: “What are the primary capabilities and limitations of using Artificial Intelligence (AI) tools for academic literature reviews?”. For upload-based tools, this was tested by uploading a core corpus of papers. For network-based tools like Research Rabbit, this involved evaluating the process of generating its visual network.
Task 2: Document Analysis
This assessed the tool’s features for analyzing individual papers. Where applicable (in tools with advanced NLP capabilities like Scispace and NotebookLM), this was tested using a standardized prompt: “Provide a structured summary of this paper’s methodology, key findings, and conclusions.” For tools like Research Rabbit, this task evaluated their unique analytical features, such as citation network visualization.
Task 3: Synthesis
This task evaluated the tool’s ability to synthesize information, either across multiple documents (tested in tools like Elicit and Consensus) or through other means, such as identifying key thematic concepts.
Task 4: Citation and Export Management
This assessed the availability of features for generating citations and exporting data for use in other software (e.g., reference managers and Excel).
Task 5: Advanced AI Assistance
This task evaluated unique, AI-powered features beyond basic analysis, such as generative notebooks, conceptual mind maps, or menus of pre-configured “AI agents.”
For complete transparency and to provide detailed, verifiable evidence of this framework in action, a comprehensive Supplementary Material file has been prepared. This document, which provides annotated screenshots and unedited AI-generated outputs for all seven evaluated tools, is openly available for peer review at the following secure link: https://figshare.com/s/56dd16e3dbfdff146b42?file=60042299.
Clarification on the Study’s Objective
It is important to note that the objective of this study was to evaluate and compare the inherent capabilities, features, and functionalities of the AI tools themselves as they perform standardized research tasks. The goal was not to compare the final output of a full AI-assisted review against a traditionally conducted one, but rather to provide a granular assessment of the tools that support that process.
Definition of the Tools Under Study
Research Rabbit
As shown in Figure 2, it is a tool that helps researchers explore topics related to their subject and create a knowledge map of previous research, contributing to identifying research gaps in prior intellectual output (Cole & Boutet, 2023). Screenshot of the Research Rabbit Interface. Note. Image Captured by the Author From the Official Research Rabbit Website
SciSpace
As shown in Figure 3, and according to Khan et al. (2023), SciSpace offers a comprehensive view of shared data across multiple geographically distributed High-Performance Computing (HPC) data Centers through a single workspace that facilitates direct access to data for optimal performance when reading or writing data within the data center’s namespace. The platform provides a comprehensive and searchable database of over 270 million scientific papers, authors, topics, journals, and conferences. Screenshot of the Scispace Interface. Note. Image Captured by the Author From the Official Scispace Website
Consensus
An AI-powered academic search engine designed to help researchers and students prepare literature reviews faster and more accurately. As shown in Figure 4. Screenshot of the Consensus Interface. Note. Image Captured by the Author From the Official Consensus Interface
Elicit
As shown in Figure 5, elict is online tool developed by Ought; a non-profit machine learning (ML) research lab based in the United States. Elicit is used as an “AI research assistant” that automates part of the researchers’ workflow, ideal for gathering evidence and extracting text. Elicit pulls publications from Semantic Scholar and speeds up the literature review process. Screenshot of the Elict Interface. Note. Image Captured by the Author From the Official Elict Website
NotebookLM
An AI-powered tool developed by Google, designed to assist in reviewing literature and taking notes (Eager B, 2024). As shown in Figure 6 Screenshot of the Notebook LM Interface. Note. Image Captured by the Author From the Official Notebook LM Website
Scholarly
As shown in Figure 7, it is a tool that allows you to upload your files and research papers on the topic under study to organize them and explain the elements of the paper in detail, facilitating content understanding. Screenshot of the Scholarly Interface. Note. Image Captured by the Author From the Official Scholarly Website
Schobot
A specialized search engine based on AI to assist researchers in writing scientific research. As shown in Figure 8. Screenshot of the Schobot Interface. Note. Image Captured by the Author From the Official Schobot Website
Comparison Criteria
While no single, standardized framework for comparing AI literature review tools yet exists in the literature, the evaluation criteria for this study were systematically developed to reflect the core stages of the established scientific review process (Atkinson, 2024; Fabiano et al., 2024; Gwon et al., 2024; Souifi et al., 2024). The criteria were derived from a comprehensive analysis of the literature review workflow, which typically includes: (1) literature search and retrieval, (2) document management and analysis, (3) synthesis of findings, and (4) citation and referencing to ensure academic integrity.
The relative weights for each criterion were assigned based on their centrality to the research process. Core functional criteria that directly contribute to the analysis and synthesis of literature—namely, “Adding Research Papers,” “Scientific Review Preparation,” and “Management and Analysis of Research Papers”—were each assigned a higher weight of 20%. Supporting criteria that enhance the user’s experience or address practical constraints—such as “User Interface,” “AI-Powered Notebooks,” “Citations and Referencing,” and “Cost”—were deemed essential but secondary to the core functions, and were therefore each assigned a weight of 10%. This weighting scheme was designed to prioritize the tools’ effectiveness in performing the most critical tasks of a literature review.
Results
Evaluation Criteria and Weighting for AI Tools Comparison
Evaluation Criteria and Relative Weights
Note. The criteria and their relative weights were developed by the author for this study.
Detailed Feature Comparison of AI Tools
Key:
• ●: Feature Available.
• x: Feature Not Available.
Note. The data were compiled by the author based on a direct review and practical application of each tool, using information available on their official websites.
Statistical Analysis of Overall Tool Evaluation Results
Descriptive Statistics
Descriptive Statistics of Tool Evaluation Criteria
Note. The descriptive statistics were calculated by the author based on the evaluation data collected for this study.
Overall Performance Scores
Note. The overall performance scores were calculated by the author based on the weighted evaluation criteria developed for this study.
The data presented in Table 5 illustrates varying levels of performance consistency across the tools. An analysis of the standard deviation reveals distinct patterns. Specifically, a large variation was observed in the criteria for “Adding Research Papers” (0.057735), “Management and Analysis of Research Papers” (0.049369), and the “Overall Evaluation per Tool” (0.199317), indicating significant differences in the efficiency and depth of these features among the tools. In contrast, a moderate variation was noted in the “Citations and Referencing” (0.037796) and “Preparation of a Scientific Review” (0.034503) criteria. Finally, the tools demonstrated low variability regarding the “User Interface” (0.016102), “AI Powered Notebooks” (0.029921), and “Cost” (0.026682) criteria, suggesting that performance in these areas was relatively consistent across the sample.
Additions of Research papers, (mean 10), Management and Analysis of Research Papers (mean 9), User Interface (mean 7.86), and Scientific Review Preparation (mean 9.29) were rated high. There was however, noted inconsistency in the quality of AI-powered notebooks supplied by the tools, (mean 4.57), and means some tools require improvement as illustrated in Figure 9. The Correlation Between Scientific Review Preparation Scores and Management and Analysis of Research Papers Scores. Note. The Graph was Generated by the Author Based on the Study’s Correlation Analysis Results
Pearson Correlation Coefficient Analysis
To investigate the interrelationships among the evaluation criteria, a Pearson correlation coefficient analysis was performed. The analysis revealed a moderate positive correlation (r = 0.438) between the “Scientific Review Preparation” criterion and the “Management and Analysis of Research Papers” criterion (see Figure 9 for a graphical representation). This finding suggests that tools demonstrating strong capabilities in one of these areas are likely to perform well in the other, potentially due to the co-development or inherent interrelation of these feature sets.
Discussion
Summary and Discussion of Comparative Results
The findings of this study offer critical insights for the education and training of information professionals and researchers, particularly concerning the integration of new technologies into scholarly practice. This comparative analysis of seven AI tools for literature review automation reveals that each tool exhibits unique capabilities and constraints, requiring an informed selection process that should be a key component of research training. As presented in Table 5 and Figure 10, the overall performance scores provide a quantitative depiction of each tool’s effectiveness based on the predefined evaluation criteria. The following discussion integrates these quantitative findings with qualitative observations to highlight key performance trends and offer specific insights relevant to educators and practitioners in the field of information handling. Overall Percentage Scores of AI Tools. Note. The bar Chart was Created by the Author to Visualize the Overall Performance Scores Calculated in This Study (See Table 5)
Overall Performance Trends
Scispace achieved the highest on all three categories, exiting with a total score of 91.5% indicating significant power and fit across most criteria. This comes from its exclusive support for searching/uploading research papers, comprehensive scientific review preparation capabilities, various paper management/analysis features (including in-article chat/PDF reading), and AI empowered notebook functions. What is more, its special browser extension (Figure 11) also adds a lot to the usability, which applies to no other tested tool, and contributes essentially to its full rating with respect to the sub-criterion “Add Tool as Browser Extension” of the user interface. The Browser Extension Feature in Scispace. Note. Screenshot Captured by the Author From the Google Chrome Web Store Page for the Scispace Extension
Following Scispace, NotebookLM (56.6%) and Scholarly (54.5%) showcased commendable performance, particularly excelling in AI-powered notebooks and cost-effectiveness (for NotebookLM), and strong paper uploading capabilities and citation management (for Scholarly). Their comparatively lower overall scores stem from limitations in areas such as direct literature search capabilities (as both primarily rely on user-uploaded documents) and fewer integrated review preparation features. Schobot (51.3%) and Consensus (47.8%) presented moderate overall performance. Schobot’s utility is pronounced for researchers working with existing document sets, leveraging its PDF upload, training, rephrasing, and translation features. Conversely, Consensus excels in direct literature search and AI-powered summarization but lacks paper upload functionality and has limited AI notebook features. Scoring lowest, Research Rabbit (33.4%), and Elicit (31.2%) offer more niche functionalities. Research Rabbit’s primary strength lies in its visual mapping of research relationships and author discovery, while Elicit is strong in literature search and generating summaries/questions. However, both lack features in paper uploading and extensive AI-powered analysis, impacting their suitability for a complete, end-to-end literature review process.
Key Criterion Insights
On UI (10% weight), all the tools they supplied performed excellently in the Arabic language while providing all with simple user interfaces, confirming the general orientation of the developer to the new generation of users (average score: 7.86/10). The browser extension developed by Scispace was a main differentiator. For adding research papers (20%), Scispace was the only app that allowed direct search/filtering and upload/training, and those did a full 20% score, while the others were distinguished in one method, significant difference that this sub-criterion showed (Std. Dev. 0.057735), demonstrating the existence of various essential core mechanisms.
In Scientific Review Preparation (20% weighting), Scispace came top again (15%) for literature summarization, as shown in Figure 12, analysis sharing, and specialized writer detection. Research Rabbit, Consensus, Schobot, and NotebookLM were other software that also had beneficial features (10%). In Management and Analysis of Research Papers (20% weighting), Scispace came top with extensive features like in-article chat, as well as PDF reading (19.8%). NotebookLM (16.5%) and Scholarly/Consensus (13.2%) also resulted in strong functionalities, with the “Chat with the Article” feature (available in Scispace, Consensus, and the paid version of Elicit) supporting more enriched content interaction. Summarizing Literature Within the Scispace Interface. Note. Screenshot Captured by the Author, Illustrating the Literature Summarization Feature in Scispace
For AI-Powered Notebooks (10% weight), Scispace (10%) offered the most extensive feature set with note-taking, AI writing, and translation. NotebookLM (8%) also performed well, particularly with its Notebook Guide and rephrasing functionality (as shown in Figure 13). Other tools offered more limited features outside basic note-taking. In Citations and Referencing (10% weight), Scholarly, Scispace, and Research Rabbit (10% each) demonstrated strong proficiency in citation crafting and sync with management tools. Finally, under the Cost (10% weight - that of free-tier features), Scispace (10%) provided the most complete set of free features, followed by Schobot and NotebookLM (7% each). Overview of AI-Powered Services in NotebookLM. Note. Screenshot Captured by the Author, Showcasing the Various AI Services and Features Available Within the NotebookLM Interface
This detailed overview identifies that although certain tools provide a more comprehensive range of functionalities, the optimal solution is subjective and reliant on the researcher’s needs, priorities, and phase in the literature review process.
Implications for Social Science Research
The findings of this comparative evaluation carry significant implications for researchers in the social sciences, extending the ongoing scholarly conversation about AI’s role in academic work (Tomczyk et al., 2024). The dominance of tools like Scispace, which excel in processing large volumes of unstructured text, presents new opportunities for qualitative researchers who traditionally face challenges in systematically managing and synthesizing extensive textual data. This aligns with the potential highlighted by Wagner et al. (2022) for AI to enhance search and screening in text-heavy disciplines. Features such as in-article chat and automated summarization can potentially accelerate the thematic analysis phase, a cornerstone of much qualitative inquiry. Conversely, the identified weaknesses in some tools, particularly inconsistencies in citation generation, echo the risks of generating plausible but unsupported information, a primary concern raised by Fabiano et al. (2024). This underscores that for social scientists, these AI tools should be adopted not as autonomous agents, but as sophisticated assistants that require continuous critical oversight, reinforcing the call for a Human-Centered AI (HCAI) approach as advocated by Pinzolits (2023). The researcher’s own interpretive skills and ethical judgment remain irreplaceable in validating sources and constructing a coherent scholarly argument (Mogoale et al., 2025). Therefore, the integration of these tools into social science research workflows necessitates the development of new digital literacy skills focused on the critical and ethical use of AI, a sentiment echoed in the broader literature on AI’s impact on research and publication (Jhajj et al., 2024).
Key Features and Factors for Tool Use
General User Interface Observations
Across all evaluated tools, a consistent modular design philosophy was observed, allowing users to select functionalities as needed. The Graphical User Interface (GUI) designs facilitated user interaction through segmented interfaces with independent controls and visualizations, adhering to established usability principles (Tovar, 2023).
Scispace Capabilities
Scispace distinguishes itself by acting as an in-house PDF reader, thereby avoiding external redirection. Key features include bookmark generation, citation creation, and a “Copilot” for live chat, which supports both user-generated questions and AI-generated prompts based on article content. It is important to note that chat results require manual saving via the “Save as Note” function. The tool offers comprehensive writing assistance, including direct note-taking, an AI writer, content review, outline building, grammar repair, and translation. In terms of access, the free version provides a chatbot limited to ten questions per session with a cooldown period, while the premium version offers unlimited queries and advanced rephrasing features.
NotebookLM Features
NotebookLM is optimized by creating an “Outline” first to guide the completion of individual sections. Its unique features include the generation of FAQs, Tables of Contents, and a Timeline view. The tool also provides a “Briefing Doc” for summarizing selected articles and a “Study Guide” for creating question-answer notes. Unlike some competitors, chat interactions in NotebookLM cite sources for their answers, although they require manual saving and users must specify sources to avoid external internet searches. The tool supports extensive note-taking, paraphrasing, AI writing, and translation, with a “Notebook Guide” for prompt ideas. The free version supports training on up to fifty sources and five thousand words.
Chat Citation in Elicit and Scispace
A comparative limitation observed in both Elicit and Scispace (specifically in general chat, not in-document mode) is that chat responses may lack explicit references to specific article sections. This contrasts with NotebookLM’s in-document chat, which provides more granular citation.
The findings of this study must also be contextualized within the hyper-dynamic nature of the AI development ecosystem. The rapid iteration cycle of these tools presents both an opportunity and a challenge for researchers. A compelling example of this was observed even during the revision phase of this manuscript: while our initial evaluation noted that certain key features, such as advanced data export or cross-document synthesis, were paywalled or absent in several tools, subsequent updates have since made some of these functionalities available in their free tiers, underscoring the pace of change. This dynamism reinforces the need for researchers to continuously re-evaluate tools and highlights that any comparative study, including this one, serves as a valuable but time-bound benchmark.
Limitations and Future Research
While this study provides a comprehensive comparative evaluation, several limitations should be acknowledged to contextualize the findings. Firstly, the assessment primarily focused on the freely available features of the selected AI tools. Paid or premium versions might offer enhanced functionalities or different performance levels that were not covered in this analysis. Secondly, the AI landscape is characterized by rapid evolution; therefore, this study represents a snapshot of the tools as they existed at the time of research. For instance, new features for data export and summarization were rolled out in Elicit shortly after the primary data collection for this study was completed, vividly illustrating this point. Thirdly, while efforts were made to select a representative sample, the study was limited to seven AI tools, and other potentially relevant tools may exist or emerge that were not included in this comparison.
These limitations pave the way for several key directions for future research. In-depth assessments of premium versions of AI tools are required to determine their full efficacy and cost-effectiveness. Furthermore, empirical experiments with researchers could quantify the effects of specific tools on thesis and paper writing efficiency and quality. Investigating the varying effectiveness of AI tools across diverse academic disciplines is also of great importance, as literature review needs differ significantly. Given the rapid pace of development, longitudinal studies tracking tool evolution and user uptake over time are essential. Finally, ongoing examination of the ethical implications of AI in literature reviews—addressing issues of bias, privacy, and integrity—is central to developing responsible best practices. These future inquiries will deepen our understanding and promote the efficient and ethical application of AI within social science research, ensuring that the adoption of these powerful computational tools enhances, rather than compromises, the integrity of scholarly inquiry.
Conclusion and Recommendations
This comparative analysis has systematically assessed a variety of AI-assisted tools designed for creating scientific literature reviews, uncovering a multifaceted environment of differentiated capabilities in which no tool achieves pan-universal superiority; instead, each manifests particular strength with a specific inclination to distinct research needs and contexts. The analysis singles out Scispace for its feature richness, with rival tools such as NotebookLM and Scholarly demonstrating specialized strengths. The success of these fast-emerging AI technologies depends on the informed decision and analytical participation of the user. For this purpose, and for the benefit of researchers, academic centers, libraries, and toolmakers to enhance the capabilities of such technology with the elimination of its shortcomings, the following recommendations are proposed:
Based on these findings, several strategic recommendations are proposed to enhance the adoption and utility of AI in scientific reviews.
For researchers, the primary recommendation is to succinctly assess research objectives and focus areas before selecting any tool. It is crucial to experiment with different platforms based on individual needs rather than relying on single opinions. Researchers must always verify AI-generated results personally, understanding the limitations of each instrument to maximize gains. Furthermore, integrating AI tools with other software, such as reference managers, is essential for optimal effectiveness, provided that users strictly adhere to research ethics and avoid plagiarism.
Libraries play a pivotal role by providing training on AI tools for scientific reviews and facilitating access to advanced technologies through developer collaborations. They should formulate clear guidelines for selection and usage while continuously tracking AI developments to modify services accordingly.
Universities and research institutions are encouraged to conduct dedicated research on AI utility in reviews, ensuring necessary support and guidelines. Providing financial and personnel resources to facilitate adoption is critical, as is establishing curricula that incorporate AI skills for scientific inquiry. Institutions should further investigate the effects of these tools on research quality and enact necessary regulations prioritizing ethical aspects.
Finally, developers should emphasize improving current functionalities and adding features that meet researchers’ specific needs. Enhancing user interfaces for productivity and furnishing essential technical assistance are key priorities. Developers are also encouraged to distribute free, high-quality versions to promote adoption and to strictly emphasize ethical considerations in tool creation, steering clear of any unethical activity.
