Abstract
Tamil is one of the world's oldest classical languages still in use. The Tamil language boasts a rich and extensive literary tradition, dating back over 2,000 years. Tamil literature addresses various aspects of life, such as love, war, social values and religion. Tamil classical literature encodes human emotions through dense metaphor, symbolism, and cultural convention, posing significant challenges for automatic emotion analysis. This research investigates the classification of melancholic emotions in Kuruntokai, a Sangam-era Tamil poetic anthology, focusing on two dominant affective categories: Lamentation and Consolation. A manually annotated dataset of 401 poems, along with their explanatory prose (urai), is used to evaluate classical machine learning models, recurrent neural networks, and a fine-tuned multilingual BERT (mBERT) model. To address the linguistic complexity of classical Tamil, the framework incorporates morphological analysis, a word reformation algorithm tailored to poetic constructs, and subword-level tokenization. Experimental results show that while Support Vector Machines perform best among classical classifiers, the fine-tuned mBERT model achieves superior performance, attaining an accuracy of 78% on urai-based classification. Quantitative analysis, supported by statistical significance tests and confidence intervals, demonstrates that explanatory prose provides richer emotional cues than the original poems. Qualitative error analysis further reveals how metaphorical compression in poetry leads to misclassification, which is resolved through urai. The findings highlight the effectiveness of transformer-based models for emotion classification in classical Tamil literature and underscore the importance of explanatory prose for reliable affective modelling.
Introduction
Among the 22 languages mentioned in the Constitution of India, Tamil is also the first Indian language to be awarded the status of classical language (Ramaswamy, 2023). As one of the old Dravidian languages, Tamil
is also one of the oldest living languages in India, being over 2,000 years old. Within the family of Dravidian languages, it is in the Southern group and is in close relation to Malayalam and Kannada. There is outside interference, but in terms of its official and literary forms, Tamil has managed to keep the unity of its language.
The Sangam period extends from 300 BCE to 300 CE, and the Sangam literature is the first and one of the most important collections of Tamil writings to emerge during this period. Sangam, which is a Tamil word, means a gathering and an academy of poets and writers. This literature is divided into three main parts, the Aham (inner-life and love), Puram (external-life and war, politics), and Padinenkilkanakku (Eighteen minor ethical and moral works). The poems bear a lot of description of ancient Tamil society and its geography, emotion, and shape (Sangeetha, 2025). Also included, the Polemical social and philosophical writings. The poems bear a strong love of nature and the country. Tamil poets delineate five landscapes of nature and of distinct emotive and human experiences. Kurinji, the mountain region, Mullai, the forest region, Marutham, the agricultural region, Neithal, the coastal and Palai the desert land region. The five landscapes represent diverse human emotions.
Sangam literature provides information on the culture, governance, and everyday activities of the ancient Tamils and is a literary resource of the ancient Tamils. The poetry elaborates on love, virtue, philanthropy, and the social good as well as the interactions of humanity and nature. The principal literary works of the Tamils include Tolkappiyam (the first and oldest Tamil grammar) (Adigalasiriyar, 1985) and Ettuthogai (the eight-fold anthology). These works testify to the literary and linguistic achievements of the members of the Sangam Academy and the poets and scholars of that period. The ethical literature, Thirukkural, is a later work that is still relevant to the Sangam concepts and literature. The literature of the Sangam period still nourishes the Tamil heritage of the language, culture, and identity of Tamils, and is a testimony to the ancient Tamils.
Kuruntokai is one of the Eight Anthologies (Ettuthogai) of Sangam literature and focuses on love, separation, longing, and the emotional complexities of love. The poems express melancholic emotions using symbolic imagery, metaphors, and highly literary morphological constructs. Because of these features, such texts pose unique challenges for computational analysis.
Motivation
Recent advances in emotion analysis have increasingly explored multimodal frameworks, wherein textual cues are jointly modelled with audio and visual signals to improve robustness in conversational settings. Notably, the RMER-DT framework (Robust Multimodal Emotion Recognition based on Diffusion and Transformers) (Zhu et al., 2025) integrates diffusion-based feature refinement with transformer architectures to capture complex temporal and cross-modal emotional dynamics in dialogue-centric datasets. Such approaches demonstrate that transformer-based representations, when combined with sophisticated modelling strategies, can significantly enhance emotion recognition performance in noisy, real-world scenarios. While RMER-DT and similar models operate in multimodal conversational contexts, their success underscores the effectiveness of transformer architectures for nuanced emotion modelling. In contrast, the present work focuses on monomodal textual emotion analysis in classical Tamil literary texts (Kuruntokai), where emotions are implicitly encoded through poetic structure, metaphor, and cultural symbolism rather than explicit conversational signals. By leveraging multilingual BERT models in this low-resource literary domain, our research complements multimodal research by demonstrating that transformer-based methods remain effective even in text-only, historically rich corpora that pose distinct semantic and interpretive challenges.
Emotion interpretation in classical literary texts is inherently uncertain, as meanings are often conveyed implicitly through metaphor, symbolism, and culturally grounded expressions rather than explicit lexical cues. Prior work on uncertainty modelling, such as the framework of fuzzy random variables and transforms (Vijayabalan et al., 2025), provides a theoretical foundation for representing and reasoning under ambiguity in complex signal and language processing tasks. Such perspectives are particularly relevant to classical Tamil poetry, where emotional states may overlap or remain indeterminate depending on context and interpretation. While the present research does not explicitly employ fuzzy representations, this theoretical lens motivates the adoption of context-aware transformer models, which implicitly encode uncertainty by learning distributed semantic representations across diverse interpretations. This alignment supports the suitability of multilingual BERT for melancholic emotion classification in Kuruntokai.
Despite extensive research in sentiment analysis, hate-speech detection, and topic classification across Dravidian languages, there is no prior work that automatically classifies emotions in classical Tamil poetry. Kuruntokai, a 2,000-year-old Sangam anthology, expresses delicate melancholic emotions using highly figurative and literary vocabulary, making it difficult for modern NLP models to interpret. Existing Tamil NLP resources, such as, tokenizers, morphological analysers, and annotated corpora are designed for contemporary prose and social-media text, not for literary cīr-based poetic structures. Therefore, there is a need for an NLP system that can understand and classify the subtle emotional expressions in ancient poetry using modern deep-learning tools. This motivates the present work, which attempts to bridge classical Tamil literature and transformer-based NLP.
Objectives
The major objectives of this research are: To create the first curated and manually annotated dataset of 401 Kuruntokai poems with expert-validated melancholic emotion labels (Lamentation/Consolation). To design an NLP pipeline suitable for classical Tamil poetry, including word-reformation techniques to handle cīr-based constructions. To evaluate machine learning, deep learning, and transformer-based models on this literary dataset. To fine-tune mBERT for emotion classification and compare its performance with classical and DL models. To study the impact of classical Tamil interpretative prose (urai) on classification accuracy and analyse why urai performs better than poems.
These objectives together establish a systematic approach to understanding melancholic emotions embedded in classical Tamil verse.
Novelty of the Proposed Work
The novelty of this research lies in the following aspects: First attempt to perform melancholic emotion classification (Lamentation vs. Consolation) on classical Tamil poetry, specifically Kuruntokai. Construction of a new expert-annotated dataset with 401 poems and inter-rater reliability of 91.77%, which did not previously exist. Integration of a word-reformation algorithm to reconstruct literary cīr-based Tamil words — a technique not used in earlier Tamil NLP works. Demonstration that mBERT, when fine-tuned, can effectively interpret classical Tamil poetic semantics, showing a significant improvement over traditional models. Novel analysis of poem vs. urai performance, showing that interpretive prose significantly improves classification accuracy.
These contributions collectively establish the uniqueness and originality of the proposed system.
Contribution of the Paper
The contribution of this paper is two-fold: Creating a curated dataset comprising 401 Kuruntokai songs with melancholic classification labels. Fine-tuning the Multilingual BERT model for classifying Kuruntokai songs into two emotion categories.
The remaining sections of the paper are organised in the following way. A literature survey is provided in the following section while the proposed work is outlined in Section 3. Section 4 is dedicated to the results and discussion and Section 5 contains the conclusion and future work.
Literature Survey
The literature survey is focused on the two dimensions, such as classification works done on the Dravidian languages and works done on the Tamil literature. Dravidian languages include Tamil, Telugu, Malayalam and Kannada.
Classification Works Done on Dravidian Languages
Saroj and Pal (2020) presented a study on sentiment analysis and hate speech detection in multilingual code-mixed text, focusing on the participation of IRLab@IIT(BHU) in the Dravidian-CodeMix and HASOC tasks of FIRE 2020. The research primarily aimed to classify subjective opinions and identify hate speech or offensive content within social media texts. For the Dravidian-CodeMix challenge, the team assembled a corpus of YouTube comments containing Tamil-English (Tanglish) and Malayalam-English (Manglish) code-mixed data. The HASOC component drew from over 10,000 pre-annotated tweets in Hindi and German. The primary analytic framework centred on the BERT_BASE architecture; this word-embedding strategy underwent task-specific fine-tuning following a series of standardised pre-processing operations, which included lowercasing, substitution of hyperlinks and numerical strings, and removal of various punctuation marks. The resulting system of the predetermined task achieved F1 score of 0.59 in the Tanglish subset and 0.60 in the Manglish subset of the task of Dravidian-CodeMix. Within the HASOC approach, it was 0.5028 in the Hindi sub-task A and 0.3840 in the German sub-task A.
The study by Sai and Sharma (2020) presents a simplified approach for the detection of abusive language in code-mixed and Romanised Dravidian text and was assessed in the HASOC-Dravidian-CodeMix track of the FIRE 2020 workshop. The group implemented a hybrid pre-processing step of selective translation and transliteration, changing a given text fragment to a fully native character set, thus enabling a language-agnostic representation. Subsequently, the multilingual transformer backbones, XLM-ROBERTa and mBERT, were fine-tuned and stacked in a stage-wise ensemble to utilise common latent knowledge. The experiments were conducted on a dataset of scraped YouTube comments and resulted in the following F1 scores: 0.95 for code-mixed Malayalam (Task-1), 0.90 for Tanglish, and 0.77 for Manglish, thus confirming the usefulness of transformer models in low-resourced Dravidian context.
Kumar et al. (2020) presented the systems submitted to the FIRE 2020 shared tasks, focusing on multilingual joint training for various classification challenges, which aims to perform three different tasks: Hate Speech and Offensive Content Identification across different Indo-European Languages (HASOC), Code-Mixed Sentiment Analysis of Dravidian Languages and (EDNIL) Event Detection from News Articles in Indian Languages. Based on the multi-tasking paradigm, the authors perform fine-tuning on DistilBERT, ROBERTa, and XLM-R models done on the various homogeneous task datasets. Each of the experiments was done on the Simple Transformers library of the Hugging Face framework using a batch size of 12, sequence length of 512, and learning rate of 6e-5 for 15 epochs. For the HASOC A subtask, mBERT provided the following F1 scores: 0.72 for Hindi, 0.86 for English, and 0.82 for German. HASOC task B provided 0.69 for Hindi, 0.73 for English, and 0.76 for German. The EDNIL task F1-scores were 0.356, 0.493, and 0.585 for Bengali, Hindi, and English respectively. For Sentiment Analysis it provided 0.51 and 0.58 for Malayalam and Tamil respectively.
The multilingual classification framework pertaining to the Dravidian languages was developed by Lin et al. (2021). The approach utilizes a LaBSE model first with script-diverse embeddings. Language-specific tokens retrieved by a masked language model (MLM) along with pseudo target embeddings combined with adversarial noise in accordance to phonetic distance are used in order to alleviate the attention skew from the multitasking. They achieved 0.7170 weighted-F1 in FIRE 2021 and 0.8617 in EACL 2021, which showcased both task portability and robustness in the typographic variation.
Rashmi et al. (2021) studied how to perform sentiment recognition in a fully unsupervised way in code-mixed YouTube comments in Dravidian languages, mainly Kannada, Malayalam, and Tamil. Their primary dataset consists of aligned Kannada-English, Tamil-English, and Malayalam-English YouTube comments made available to participants of the 2021 FIRE competition. The main goal of the participants was to identify the sentiment of each comment and classify it as one of the following: positive, negative, neutral, mixed, non-target, or other. The competitors settled on using a handful of core classifiers from the machine learning literature such as Logistic Regression, Balanced Random Forest, eXtreme Gradient Boosting (XGBoost), regular Random Forest, and Support Vector Machines to create a suitable baseline. Having achieved a decent baseline, the participants went on to implement an ensemble soft-voting classifier to improve on their initial baseline and to enhance the robustness of their decision. The ensemble classifier achieved 57% accuracy on the Malayalam, 60% on the Kannada, and 63% on the Tamil test sets. They achieved Macro-averaged F1 scores of 0.56, 0.58, and 0.56 on the Malayalam, Kannada, and Tamil test sets, respectively, were quite close.
In regard to sentiment analysis focused on Tamil-English, Malayalam-English, and Kannada-English code-mixed social media messages, Kalaivani and Thenmozhi (2021), described how the group dealt with the first challenge from the Dravidian-CodeMix-FIRE2021 competition. The sentiment analysis problem dealt with classifying sentiments at the level of the message, and for that, they modified and fine-tuned mBERT using the ktrain library. To improve the data set, additional context sensitive transliteration and translation procedures were designed. For this work, they focused on the data from the Dravidian-CodeMix-FIRE2021 competition, particularly the YouTube comments section using the code-mixed vernacular. The final assessment reported weighted average F1 scores of 0.603 for Tamil, 0.698 for Malayalam and 0.595 for Kannada across all tasks on the code-mixed streams. This yielded closely competitive scores on the mBERT configuration: weighted F1 of 0.60 (0.59 Precision, 0.60 Recall) for Tamil, 0.72 (0.72 Precision, 0.72 Recall) for Malayalam, and 0.61 (0.61 Precision, 0.61 Recall) for Kannada.
Jayaraman et al. (2021) examined the English offensive language classification in Tamil-Malayalam code-mixed text using the HASOC-Dravidian CodeMix Shared Task datasets. Tamil data contains 3999 comments, 1980 of them are offensive, while the Malayalam portion contains 2047 offensive comments. Regarding the content, the authors used BiLSTM and Naive Bayes as different techniques. With BiLSTM, they focused on F1, recall, and precision, with their models achieving 16th position in Tamil and 11th position in Malayalam. The F1 scores were 0.503 and 0.580 for Tamil and Malayalam, respectively, as recorded by the JBTTM team. The paper also cites training validation accuracies of 0.8296 for Tamil and 0.5558 for Malayalam in the BiLSTM exceeding Naive Bayes. These are for pre-competition training runs. The JBTTM team also reported these metrics on the official shared task platform.
Bharathi and Samyuktha (2021) Social networking has become an indispensable platform, leading to an increase in oversharing and cyberbullying. This study focused on comparing and analysing methods for comment-level text polarity classification using the Dravidian-CodeMix-FIRE2021 dataset. The authors utilised TFIDF, Count vectorizer, and multilingual transformer embeddings as our feature extraction methods. With the extracted features, we trained several classifiers, including a Multi-Layer Perceptron, Support Vector Machine, and Random Forest. For evaluation, the models were tested on Tamil-English, Kannada-English, and Malayalam-English code-mixed datasets, resulting in F1 scores of 0.588, 0.690, and 0.630, respectively.
The work by Pavan Kumar et al. (2021) presented the AmritaCENNLP group's entry for the sentiment-analysis subtask of the Dravidian-CodeMix-FIRE2021 challenge, which evaluates performance on code-mixed Malayalam, Tamil, and Kannada texts. Their submission tested three deep-learning frameworks: a CNN paired with LSTM (labelled Model-1), a Bi-LSTM-only architecture (Model-2), and a slender DNN featuring a single hidden layer (Model-3). The training data arose from domain-specific sets of social-media fragments, comprising disproportionate samples drawn from YouTube comments and Facebook postings across the Malayalam-English, Tamil-English, and Kannada-English pairings. To counteract the negative impact imbalance presents, the authors looked at implementing a weighted loss strategy. Out of all the models, Models 1 and 3 performed the best when it came to the Malayalam dataset, while Model 2 was the best performing model for the Tamil and Kannada datasets. Remarkably, while using the Bi-LSTM model on the Malayalam-English dataset, the model achieved an accuracy of 94.82% while also achieving a precision and recall score of 88.81% and 84.80% respectively along with an AUC of 98.06%. The Tamil-English dataset gave an accuracy output of 84.39% with 70.37% being the precision score and an AUC score of 83.89% along with a recall score of 37.78%. The Kannada-English dataset gave an accuracy of 98.96% along with a precision score of 97.62% and a recall score of 97.19% with a 99.92% AUC score using Model 3. Testing performance also varied, with Model-3 yielding precision 0.6303, recall 0.6346, and F1 0.5995 for Malayalam-English; Model-2, for Kannada-English, yielded precision 0.5062, recall 0.5455, and F1 0.5193.
Offensive language detection was also investigated in the Dravidian languages, particularly the code-mixed social media texts on YouTube for Tamil-English, Malayalam-English, and Kannada-English by Chen and Kong (2021). Their method employs a combination of multilingual BERT and TextCNN, with BERT performing deep semantic feature extraction while TextCNN conducts context-specific shallow classification. A concatenation, weighting, and trimming of the top three BERT layers into TextCNN are conditioned and output to a classifier. The output to this architecture is three F1 values of the Micro-averaging type, yielding 0.93, 0.74, and 0.64 for Malayalam, Tamil, and Kannada respectively. Therefore, the considerable difference in the models applicability across languages along with the need to improve the Kannada score is clear.
Huang and Bai (2021) described how the HUB team participated in the Dravidian-LangTech – EACL2021 shared task, which involved the identification and categorization of offensive content in Dravidian language code-mixed automated social media comments. The dataset consisted of comments and posts from Southern YouTube (in Tamil, Malayalam, and Kannada) and included a range of classification tasks: the Malayalam dataset used a five-way classification, whereas the Kannada and Tamil datasets considered a six-way classification. The core technique involved a multilingual BERT model, combined with the TF-IDF algorithm to mitigate the effects of code-mixing. The system processed text through the multilingual BERT model, applied TF-IDFfor text encoding, fed these into a CNN block, and used two linear classifiers, with the final output being an arithmetic average of their results. On the validation set, the model achieved an F1 score, Precision, and Recall of 0.91, 0.91, and 0.92 for Malayalam; 0.78, 0.78, and 0.79 for Tamil; and 0.70, 0.71, and 0.73 for Kannada, respectively. For the test set, the results were 0.91 F1, 0.89 Precision, and 0.93 Recall for Malayalam; 0.74 F1, 0.73 Precision, and 0.78 Recall for Tamil; and 0.64 F1, 0.65 Precision, and 0.69 Recall for Kannada.
Bellamkonda et al. (2022) attempted automatic humour detection on Telugu texts. For that, the authors have collected and annotated the Telugu tweets. They classified the Telugu texts into two categories: humorous and non-humorous. They have utilised various transformer models, including Multilingual BERT, Multilingual DistillBERT, and XLM-RoBERTa, for their experiments. They have identified that XLM-RoBERTa outperformed other models and achieved an F1-score of 0.82 with 81.5% accuracy.
Chakravarthi et al. (2023) In the shared task of sarcasm identification, the primary objective was to detect instances of sarcasm within a dataset comprising code-mixed comments in Tamil-English and Malayalam-English, sourced from social media platforms. This challenge saw participation from 11 teams, each with distinct approaches and methods to complete the challenge. Team ABC had LinearSVC, Random Forest, and KNN in their stacking classifier and obtained a weighted average F1 score of 0.73 for Tamil and 0.72 for Malayalam. On the Macro-F1 scale, Team SSNCSE obtained 0.74 and 0.73 for Malayalam and Tamil, respectively, by using a count vectorizer and MLP Classifier with Logistic Regression. In their evaluations, we could see the difficulty of sarcasm detection in Tamil in comparison to Malayalam and this even more particularly motivated the necessity of better multilingual systems to improve the robustness of sentiment analysis in code-mixed speech. The results show it is indeed a non-trivial task; however, the competing systems have the desired functionality, which illustrates that the task has a lot of potential in the real world.
Sreelakshmi et al. (2024) advance CodeMix hate-speech and offensive-language (HOS) detection within Dravidian languages. Experiments incorporated six openly available datasets covering Kannada-English, Malayalam-English, and Tamil-English. The study seeks a single, tested pre-trained embedding that efficiently serves HOS detection across these tongues. Consequently, an exhaustive comparison of multilingual transformer embeddings—BERT, DistilBERT, LaBSE, MuRIL, XLM, IndicBERT, and FNET—was performed on HOS downstream tasks. MuRIL's embedding, integrated within a Radial Basis Function (RBF) kernel Support Vector Machine, demonstrated persistently superior results across all datasets. Supplementary results achieved a peak accuracy of 96% in Malayalam (DravidianLangTech 2021), 72% in Tamil, and 66% in Kannada. Cross-validation of the results from HASOC showed 68% in Malayalam (HASOC 2021) and 76% in Tamil. Also, HASOC 2020 results achieved 92% in Malayalam, which served as a benchmark for replication.
Namburu et al. (2024) set out to test the use of a Graph Neural Network (GNN) to classify topics of Telugu news articles to see if this approach is helpful compared to other available options. To address this question, the authors created a dataset of 17,312 instances from a leading Telugu daily newspaper, where newspaper articles’ headlines were recorded as feature vectors with corresponding classification labels. Dataset preprocessing was followed by a reduction to 15,515 records and a significant topic imbalance where the most prevalent was national news. In the experiment, Support Vector Machines, Naïve Bayes, K-nearest Neighbours, Random Forest, as well as Bidirectional Long Short-Term Memory (LSTM) were trained with the current best performing classical and modern machine learning techniques. While a variant of the GNN was also trained, results highlighted that the Bidirectional LSTM dominated GNN by 20 percent overall. Interestingly, classical algorithms continued to outperform GNN, albeit narrow. This showed the GNN architecture needs improvement to compete in the Telugu language and this study promotes the improvement of other GNN architecture and feature engineering to other techniques.
Recent works concerning classification of Dravidian Languages reveal a trending performance behaviour in the domain, where deep learning, particularly of the transformer variety, is dominant in the analysis of social media platforms (especially informal, code-mixed, and low-resource ones) in the tasks of sentiment and hate speech detection and humour detection/recognition. Finely-tuned multilingual transformers, mBERT, XLM-RoBERTa, LaBSE, MuRIL and others, have all demonstrated, in various competitions (e.g., FIRE, HASOC, EACL), to outperform the previously dominant handcrafted feature models. Adding techniques including selective translation, adversarial example generation, and hybrid architectures (with multiple decoders) help push the state of the art, making it possible to build high performance classifiers even in low-data scenarios. Interestingly, and contrary to the trend, simple classifiers using BiLSTM layers also showed good performance. Overall, while imbalanced data, code-mixing, etc. remain active obstacles, the remaining works hint that highly-performing models can still be attained via tailored pre-processing and advanced multilingual models on tasks concerning Dravidian text classification, yielding helpful and competitive performance in solving the classification tasks.
The previous studies in Table 1 have also provided insights in the various fields of sentiment analysis, offensive language detection, humour detection, and code-mixed Dravidian tasks. This also shows the popularity of the transformer-based methods, though none of the studies have looked at the melancholic emotion classification in classical Tamil literature, which is the primary focus of this research.
Summary of Classification Works on Dravidian-Language.
Summary of Classification Works on Dravidian-Language.
Kuralagam, the first Thirukkural searching framework, is created by Elanchezhiyan et al. (2011). Kuralagam differs from the rest by the way it indexes Thirukkural documents. Other systems index documents based on keywords, but Kuralagam indexes Thirukkural based on concepts and relations. For training and testing, data from 1,330 Thirukkural texts, with four different scholarly glosses, were collected. The system is built using CoReX and a new ranking module. CoReX parses documents semantically such that each token is related to the rest of the document, and not in string form. The rank module determines how salient each concept is, as well as how good each Thirukkural is. The system is evaluated with a Mean Average Precision score of 0.83. Other systems based on traditional string-word matching (0.52 MAP score) failed to surpass the Kuralagam. Other systems do not perform well as Kuralagam because of how they dispose documents, and it is seen in the Precision requirements at 5, 10, and 20 (P@5, P@10; P@20) levels.
Madhavan et al. (2012) developed a grammar-driven approach for the classification of Tamil metrical structures by focusing on identifying the relevant ‘paa’ for each poem. Tamil metrical scholarship was used to build the Context Free Grammar (CFG) rule set. After the poem is parsed using CFG, the poem is reformulated into an intermediate representation and a ‘venpa’ class is assigned. The current prototype, limited to ‘venpa’ forms, attained classification precision above 90%, and exclusive testing on Thirukkural pairs returned error-free predictions. By condensing the parsing sequence from three to two passes, the authors shrink latency further, with the architecture permitting additional consolidation.
Subalalitha (2019) designed a template-based architecture for information extraction for Kuruntokai, a significant Tamil poetry collection. Mostly, the framework focuses on the retrieval of historical data, and the retrieval of particular instances of plants, animals, foods, serving items and water markers, and the simultaneous harvesting of different N-grams—Noun Unigrams, Verb Unigrams, Adjective-Noun Bigrams, and Adverb-Verb Bigrams—extracted by a Tamil Morphological Analyser. A notable aspect is the framework's proficiency with parallel Tamil-English corpora. Experiments on a balanced corpus of about 200 Kuruntokai poems in both languages resulted in a significant mean precision of 88.8%. It is important to note the N-grams were more accurate than the templates created.
The study conducted by Anita and Subalalitha (2021) applied new theory to custom fields of information retrieval by using semantic analysis for Tamil literary texts to develop search system applications. The analysis and representation of semantics for the discourse processing embedded in the structure is more sophisticated than a simple contextual pattern recognition. The assessments use the classical Tamil Thirukkural and the Naladiyar. Thirukkural consists of 1330 couplets of 7 words and Naladiyar consists of 400 quatrains of 15 words. For all the records of the two sources, the MAP score obtained is 89% compared to a Google Tamil search of 56% and a search system with keywords of 62%.
Anita and Subalalitha (2022a) designed a systematic classifier with the primary goal of enhancing the searchability and functional utility of the Thirukkural corpus for Tamil literature scholars. They regrouped all 1,330 couplets of the text into 10 new, higher-order organizational units, which they labeled as ‘superclasses’ and each of these superclasses was partitioned into two practical action labels (To Do and Not To Do) that reflect the actionable nature of the text, which is instructional. Couplet assignment to superclasses utilized the Multinomial Naïve Bayes (MNB) algorithm and subclass assignment to the subclasses was through specific morphological characteristics of Tamil language. The complete Thirukkural corpus served as the experimental base, with 80% (1,064 couplets) set aside for the learning phase and the remaining 20% (266 couplets) reserved for validation. The overall classification yields a balanced F-score of 82.33%. Subsequently, a rudimentary Information Retrieval (IR) engine, derived from the classification, was constructed, and it delivered a mean average precision of 89%. This result is higher than the searching by Google, which achieved 59%, and a keyword based search that scored 68%.
A method of analysing Tamil proverbs concerned with contextual meaning was misinterpreted and grossly underutilized within society by the new generations, by Anita and Subalalitha (2022b) Researchers developed a balanced dataset with 400 concrete meaning definitions and 400 contextual prose variants that the proverbs appear within, teaching the model contextual definition inference. The solution draws its insight from a multilingual base version of Bidirectional Encoder Representations from Transformers (BERT) architecture, which generates embeddings at both the token level and the sentence level. With the aid of a measurement of cosine distance, the method identifies the sentence most similar to the contexts surrounding the prose and returns a scholarly accurate definition. The study utilized the most basic metrics of information retrieval, obtaining 92% precision, 87% recall and 89% of the F1 score, and further confirming the contextual method was reliable with the help of a transformer for Tamil proverbs.
Anita and Subalalitha (2022b) have established an approach to automate semantic interpretation for classical Tamil literature. They highlight the Tamil classical writings’ ethical aspects, and subtle divinity, philosophy and the potential value these ancient writings may provide to the present society. The structure and forms of Tamil literary documents are very different than those of standardized written forms, or even a conventional essay, leading to a non-systematic and uneven implementation of written discursive and mapped semantic features. Pre-existing NLP techniques and annotated data computations seem to be ill-equipped to address these particular deficiencies and therefore should be able to perform these Tamil literature tasks more corrective. The authors attempt to avoid this extremely complex problem through a set of processing steps including, but by no means limited to, discourse parsing, coarse topic classification, discourse-based clustering, and selective data retrieval, each tailored to a specific set of Tamil lexicon and syntax.
In terms of computational methods relating to Tamil literature, some progress seems to have occurred with the development of some tools for the ancient writings, Thirukkural and Kuruntokai. Texts have been grouped and classified, poems have been classified, sentiments have been analysed, proverbs have been interpreted, lexicons have been validated, loss and retrieval of information have been addressed, and discourse parsing has been executed. Other Researchers have used rule-based systems, context free grammars (CFGs), Rhetorical Structure Theory (RST), some of the standard machine learning methods (Naive Bayes, K-means, etc), and deep learning systems (BERT, BiLSTM), to the intricacies and morphology of the Tamil language. Of particular interest is that BERT-based systems have been developed that are able to perform classification and emotional discrimination well, and that the systems are able to recover the complex, nuanced Tamil literary and cultural traditions. This is the initial step to demonstrate the work needed to instrument and to the systems like BERT to literary studies for the detection of the complex, nuanced, and literary melancholy present in Kuruntokai.
As summarized in Table 2, studies in Tamil classical literature relate to information retrieval, discourse parsing, lexical analysis, semantic interpretation, and building a bilingual lexicon. Nevertheless, no study, to our knowledge, deals with the classification of melancholic emotions in Sangam poetry, such as the Kuruntokai. This presents a gap in the literature and this research attempts to address. Unlike these studies, this research deals with the classification of melancholic emotions in the Kuruntokai and focuses on a comparative analysis, both statistical and qualitative, of poem-based and urai-based representations.
Summary of Computational Works on Tamil Literature.
Summary of Computational Works on Tamil Literature.
In this research, Kuruntokai poems are classified into two categories, namely, Lamentation, Consolation, since almost all Kuruntokai poems expresses these two feelings.
Dataset Details
Kuruntokai, consisting of short Sangam-era poems that predominantly express emotional states related to love, separation, and reunion. Kuruntokai has 401 poems, and each paired with its corresponding traditional Tamil urai (explanatory prose). Each poem consists of 4 to 8 lines, except poems 307 and 391, which have nine lines. The dataset was prepared in two parallel forms: (i) poem-only text, and (ii) urai-based explanatory text, enabling a direct comparison of emotion classification performance across representations.
In the dataset, poems were assigned two classes of emotions: Lamentation and Consolation. 204 poems were assigned the Lamentation label, whereas 197 poems were assigned the Consolation label, yielding a dataset that was moderately balanced.
The dataset was divided into 70% for training with the other 15% with each for validation and testing. This was done with each class in the dataset in order to keep the class distribution consistent in each split.
Based on the rules and rational guidelines from the ancient Tamil field of literary studies, emotion annotation was done. Each Kuruntokai poem was either emotionally labelled as Lamentation or Consolation, depending on whose emotion the speaker was primarily focussing, in alignment with the ancient Sangam observations on poetry.
The annotation process was especially concerned with the following requirements of the task: Lamentation was assigned to poems expressing grief, longing, separation, emotional distress, or abandonment. Consolation was assigned to poems conveying reassurance, emotional relief, hope, reconciliation, or emotional stability.
To minimize subjectivity arising from metaphorical language, authoritative Tamil urai interpretations were used as primary references during labelling. In cases where the poem's emotional state was ambiguous, the urai served as a decisive interpretive aid by explicitly articulating the underlying emotional context and causal factors. Each label was finalized only after ensuring consistency between the poetic expression and its explanatory prose. This annotation strategy ensures linguistically informed and culturally grounded emotion labels, making the dataset suitable for supervised emotion classification in classical Tamil literature.
The dataset is created with all Kuruntokai songs and manually classified labels. For instance, the Kuruntokai song 37 shown in Figure 1 is labelled as Consolation, and the Kuruntokai song 122 shown in Figure 2 is labelled as Lamentation. Similarly, all the Kuruntokai songs are labelled as Lamentation or Consolation.

Example 1.

Example 2.
Since the dataset is in-house, it is tested using the inter-rater reliability (Gwet, 2008). The inter-rater reliability is a test validity method used to measure the score given by human experts. In our dataset, three human experts found the emotions in the Kuruntokai songs.
Table 3 describes the percentage agreement for the categories of the Kuruntokai songs. We have used 0 and 1 to represent the classes Lamentation and Consolation, respectively. All the annotators classified song one as 0, which is the Lamentation class, and the agreement is 100%. Two annotators classified the song as 1, which is the Consolation class, and one annotator classified the song 3 as 0, which is the Lamentation class, so the agreement is 66.67%. Similarly, the agreement among all three 3 annotators is calculated for all the Kuruntokai songs, and finally, the inter-rater reliability of 91.77 is achieved. The following subsection describes the fine-tuning of the mBERT model.
Percentage Agreement Across Multiple Annotators.
The proposed work aims to classify melancholic emotions in Kuruntokai poems using a transfer learning-based fine-tuning of Multilingual BERT (mBERT). mBERT was selected in this research due to its multilingual pre-training across more than 100 languages, including Tamil, which enables effective knowledge transfer in low-resource and classical language settings (Devlin et al., 2019, Pires et al., 2019). Its WordPiece subword tokenization is well suited for handling the morphological richness and inflectional complexity of Tamil, allowing the model to capture sub-lexical and contextual patterns that are essential for interpreting poetic language. Moreover, mBERT provides contextualized embeddings that help identify implicit emotional cues expressed through metaphor, repetition, and symbolic constructs common in Kuruntokai (Chakravarthi et al., 2022). As a widely adopted and well-established multilingual transformer, it serves as a strong and fair baseline for comparison with classical machine learning and recurrent neural models under consistent experimental conditions. Finally, mBERT offers a favourable balance between performance and computational cost, making it practical for reproducible experiments on limited annotated datasets and modest GPU resources. The methodology involves the significant steps: dataset preparation, preprocessing, model selection, training strategy, and emotion classification, which is shown in Figure 3.

Block-level architecture of the proposed emotion classification framework.
This research uses a dataset of 401 poems from the Kuruntokai, each annotated emotional class of Lamentation or Consolation. Since the annotating was done by experts in Tamil literature, linguistic and cultural accuracy was preserved.
To prepare the dataset for model consumption: Rows in the Song column received an emotion tag displayed in the Label column. The emotion categories were converted to integer identifiers for the convenience of deep-learning libraries: Lamentation = 0, Consolation = 1. A stratified split of poems into training, validation and testing sets was conducted, reserving 70% for model training, 15% for validation and 15% for testing; both emotional categories were maintained in the same proportions.
This attentive preparation established a durable base for subsequent training, while the stratification further minimised the introduction of systematic bias during fine-tuning.
Model Selection and Justification
The task of emotion classification in Tamil poetry requires a model capable of handling both contextual semantics and linguistic diversity. For this purpose, this research employs bert-base-multilingual-cased, a variant of Multilingual BERT pre-trained on 104 languages, including Tamil.
The choice of mBERT is justified by several factors: Multilingual capability – mBERT is trained on large multilingual corpora, making it robust in handling Tamil text and its morphological variations. Contextual embeddings – unlike traditional embeddings (e.g., Word2Vec or GloVe), BERT generates context-dependent embeddings, which are essential for capturing poetic constructs and metaphorical language in Kuruntokai. Transfer learning advantage – fine-tuning mBERT requires fewer resources than training a deep neural network from scratch, while still leveraging its pre-trained knowledge.
For the classification task, this research adopts the TFBertForSequenceClassification architecture, which integrates a linear classification head on top of the BERT encoder. This head produces logits for the two emotion classes, enabling efficient fine-tuning for the specific downstream task of melancholic emotion detection. As the optimization dynamics and convergence properties of transformer-based models such as mBERT are well studied in prior literature, this work focuses on empirical performance, linguistic suitability, and statistical robustness rather than theoretical convergence analysis.
Preprocessing
Morphological Analysis
The Tamil Computing Lab, Anna University, Chennai's Morphological Analyser tool (Anandan et al., 2002) is used to separate the root words from the Tamil words. For instance, the word
has the root word
, which is given by the Morphological Analyser tool, which is shown below.
This is the output of the Morphological Analyser tool. This tool handles the Tamil text well. But it is not able to handle the Tamil literary text well. Tamil literary text has
instead of words. A cīr might consist of a single word or a group of words. It can also contain part of a word, while the rest are appended to the next to sustain style and prosody (Adigalasiriyar, 1985). These words are not handled by the Tamil Morphological Analyser (Anandan et al., 2002). To the best of our knowledge, no Tamil morphological analyser can handle this type of literary text. This is because Tamil literary texts are distinctive, with no linguistic resources to process them computationally. The proposed approach attempts to address this problem by utilising a word reformation algorithm proposed by Anita and Subalalitha (2021), which identifies and separates words from cīrs for further processing.
Word Reformation Algorithm
The Tamil Morphological Analyser fails to identify morphemes from some poetic words since its working is based on word tokens separated by white spaces. So the word reformation algorithm is used (Anita & Subalalitha, 2021). This algorithm merges the word that analyser, causing the error in the output, as identified by the morphological analyser, with the previous word.
For instance, for the phrase
, the morphological analyser throws the error. The word reformation algorithm combines the two words and forms the word
. Now the analyser split the word into 3 words
(Iva. pulampu akala – She is moaning loudly). It is used to classify the poem as the Lamentation class.
Input Tokenisation
Preprocessing of poems is also performed through the BertTokenizer associated with the bert-base-multilingual-cased model. Tokenisation transforms each poem into a numerical format suitable for model input. The process includes: Subword tokenisation using WordPiece: Each poem is decomposed into smaller units, allowing the model to handle rare and compound Tamil words. Padding and truncation: To maintain consistent input shapes, sequences are truncated or padded to a maximum length of 128 tokens. This length was chosen to balance computational efficiency and adequate poem coverage. Attention mask creation: A binary mask is generated to differentiate between actual tokens and padded tokens, ensuring that padding does not influence contextual learning.
Finally, the tokenised data is converted into TensorFlow datasets with a batch size of 16, enabling efficient training and validation.
After WordPiece tokenization using mBERT, Kuruntokai poems contain on average 70–80 tokens, while their corresponding urai explanations contain approximately 100–120 tokens, motivating the choice of a maximum sequence length of 128 tokens to minimise truncation.
These preprocessing steps are specifically designed for classical Tamil poetry, as they preserve poetic structure and semantic nuance while effectively handling cīr-based constructions, rich morphology, and metaphorical language that are not adequately supported by standard normalization or stemming techniques.
Model Training Strategy
The fine-tuning of mBERT is carried out using a carefully designed training strategy to optimise performance while preventing overfitting. Key aspects include: Optimiser: The AdamWeightDecay optimiser with a learning rate of 3e-5 is employed. This choice stabilises learning and helps maintain pre-trained knowledge while adapting to the new classification task. Loss function: Sparse Categorical Crossentropy (from logits = True) is used to compute classification loss, aligning with TensorFlow's efficient handling of integer-encoded labels. Metrics: Accuracy is monitored throughout training to track classification performance. Class imbalance handling: Since the dataset contains nearly balanced but not perfectly equal samples for both classes, class weights are computed and applied during training. This ensures that both Lamentation and Consolation are given proportional importance in model updates. Regularisation: Early stopping is applied with a patience of 2 epochs, monitoring validation accuracy. This prevents overfitting by halting training when performance plateaus. Epochs: Training is conducted for a maximum of 5 epochs, after which the best model is retained using ModelCheckpoint. The final model is saved in the modern Keras format (.keras) for portability and inference.
This strategy ensures stable and efficient convergence, enabling the model to generalise well on unseen poems.
Result and Discussion
Kuruntokai poems are classified using a fine-tuned mBERT model. The proposed work is evaluated using precision, recall, F1-score and accuracy. The result is compared with various machine learning and deep learning algorithms. Table 4 shows the comparison of fine-tuned mBERT with multiple algorithms.
Evaluation on various Models Based on Poems.
Evaluation on various Models Based on Poems.
Figures 4–7 show that comparison of various models on different evaluation metrics. It can be observed that deep learning models namely, Recurrent Neural Network (RNN) and Long Short Term Memory (LSTM) are performed well when comparing the machine learning models such as Naïve Bayes, Logistic Regression etc. The fine-tuned mBERT model outperforms the other machine learning and deep learning models.

Comparison of models on precision.

Comparison of models on recall.

Comparison of models on F1-score.

Comparison of models on accuracy.
Kuruntokai poems are classified based on
(Urai – Explanation) using a fine-tuned mBERT model. The proposed work is evaluated using precision, recall, F1-score and accuracy. The result is compared with various machine learning and deep learning algorithms. Table 5 shows the comparison of fine-tuned mBERT with multiple algorithms.
Evaluation on various Models for Classification Based on Urai.
Figures 8–11 show that comparison of various models based on
(Urai – Explanation) on different evaluation metrics. It can be observed that deep learning models namely, Recurrent Neural Network (RNN) and Long Short Term Memory (LSTM) are performed well when comparing the machine learning models such as Naïve Bayes, Logistic Regression etc. The fine-tuned mBERT model outperforms the other machine learning and deep learning models.

Comparison of models on precision for classification based on Urai.

Comparison of models on recall for classification based on Urai.

Comparison of models on F1-score for classification based on Urai.

Comparison of models on accuracy for classification based on Urai.
Table 6 presents the performance comparison of the fine-tuned mBERT model on poem-based and urai-based emotion classification. The results show that urai-based classification beats poem-based classification on all the evaluation metrics. The macro-averaged F1-score improves by around 7 percentage points. This shows that urai paired with the text is a more descriptive and dependable source for emotion classification than the poem.
Performance Comparison of Fine-Tuned mBERT on Poem-Based and Urai-Based Emotion Classification.
In order to gain more insights for differences in performances, we look at the comparative poems and their urai from a qualitative standpoint. The Kuruntokai poems, from a qualitative viewpoint, encode emotions in a more implicit fashion through the use of a metaphor, and then construct the poem using culturally influenced grounded conventions that do not use explicit markers of emotion. In juxtaposition, urai texts articulate the poet's intent, and use explicit emotion descriptors, logical cause connections, and narratives for illustration. This culminates in a more enhanced context and allows the capture of the more affective signals, thereby fogging a lesser degree of ambiguity and less chance of misclassification. Framed within this context, it is important to note a couple of poem instances that were misclassified as Consolation that were processed more correctly as Lamentation in the urai form due to the explaining terms that capture states of emotion in words like longing, separation, and distress. All of these observations explain the substantial role texts served explaining models in the emotional mapping of classical Tamil literature and also explain the observations that show urai to win more in the classification battles.
To further illustrate this effect, Kuruntokai Song 4, shown in Figure 12, is examined as a representative qualitative example. In the original poem, the emotional state is conveyed predominantly through metaphorical and evocative expressions, such as burning tears and the repeated lament
. While these poetic devices strongly suggest emotional distress, the poem does not explicitly name either the emotion or its cause. Consequently, the emotional signal remains implicit and requires cultural and literary interpretation to infer that the speaker is experiencing lamentation due to separation from her lover.

Example 3
In contrast, the corresponding urai renders the emotion explicit by introducing clear affective vocabulary and causal explanation, including phrases such as
(Enecam varukiatu – my heart grieves) and
. These additions explicitly identify both the emotional state (lamentation) and its cause (separation), thereby removing interpretive ambiguity.
As a result, while the poem-based representation poses challenges for automatic classification due to its symbolic and condensed nature, the urai-based representation provides linguistically explicit cues that are more readily captured by transformer-based models. This example explains why the fine-tuned mBERT model consistently performs better on urai texts: the explanatory prose supplies semantically rich and emotionally grounded vocabulary that enables more reliable emotion recognition. This qualitative analysis complements the quantitative results and highlights the crucial role of urai in modelling emotions in classical Tamil poetry.
It can be observed that the classification based on Urai performs well compared to classification based on poems. This is because the Urai has explanatory Tamil words, whereas the poem has Tamil literary words. It makes the models to perform good on Urai than poems.
To strengthen the reliability of the experimental findings and address the need for real-world validation, additional statistical analyses and implementation details are incorporated. For statistical validation, confidence intervals, significance testing, and stability analysis are reported for mBERT and representative baseline models. This focused analysis highlights reliability and robustness of the proposed approach, while detailed mean performance for all evaluated models is already provided in earlier result tables.
Implementation Details
The Table 7 provides the major configuration settings used for fine-tuning and evaluating the models.
Experimental Setup and Implementation Details.
Experimental Setup and Implementation Details.
These details demonstrate that the experiments were performed under controlled and reproducible conditions rather than purely simulation-based settings. As shown in Table 7, the fine-tuning configuration for mBERT follows standard best practices for low-resource literary text classification. Although WordPiece subword tokenisation may fragment morphologically rich Tamil words, this fragmentation enables the model to capture sub-lexical patterns and contextual cues, which is particularly important for Kuruntokai poems where emotions are often expressed implicitly through metaphor, inflection, and poetic constructs rather than explicit affective terms.
All transformer-based experiments were conducted on an NVIDIA T4 GPU using Google Colab Pro. Fine-tuning the mBERT model required approximately 6–8 min per epoch, with total training time remaining under one hour for five epochs, reflecting the computational feasibility of the proposed approach for low-resource literary datasets.
From a computational perspective, the fine-tuned mBERT model follows the standard transformer architecture, where self-attention incurs a time and memory complexity of
Key hyperparameters such as learning rate, batch size, number of epochs, and maximum sequence length were selected following standard recommendations for fine-tuning multilingual transformer models on low-resource datasets. Preliminary trials indicated that moderate variations around these values did not lead to substantial performance gains, while aggressive tuning increased the risk of overfitting; therefore, stability was prioritized over exhaustive hyperparameter optimization.
Performance metrics were computed across five repeated runs and used to derive 95% Confidence Intervals (CIs), which is shown in Table 8. This is for statistical support to validate the performance differences.
Performance with 95% Confidence Intervals.
Performance with 95% Confidence Intervals.
The narrow confidence intervals for mBERT show that its performance is stable and not a result of random variations.
To verify whether improvements of mBERT over classical models are statistically meaningful rather than coincidental, paired t-tests were conducted on the F1-scores over repeated runs and are given in Table 9.
Paired t-Test p-Values.
Paired t-Test p-Values.
These results confirm that the observed performance gains of mBERT are statistically significant.
Standard deviations across repeated runs were also computed to quantify model stability and are shown in the Table 10.
Performance Stability Across 5 Runs.
Performance Stability Across 5 Runs.
These low standard deviation scores further support that the results are stable and reproducible.
This research provided a detailed and systematic methodology for dealing with the melancholic emotions of Lamentation and Consolation as present in the classical Tamil poetry Kuruntokai. By exploring the fine-tuned multilingual BERT (mBERT) model) the research demonstrates that the emotion classification in classical Tamil text was improved by contextualized embedding of transformer. Old traditional classifiers such as the Naïve Bayes, Logistic Regression, and SVM did moderate, while the deep learning classifiers predicted superior accuracy; mBERT was superior across the baselines. The low resourced literature domain with extreme morphologically rich and metaphorical essence confirms the findings.
The research analysed challenges faced in classical poetry where emotions expressed as symbolic essence getting lost in ambiguity of the cultural convention. Working with qualitative and experimental models the research had extreme contextual affective clarity and thus a lower emotion ambiguity. This is instrumental. Evidence proves the necessity of prose as interpretive for classical Tamil literature.
There are a number of limitations which further work on the area could seek to address, some of which are identified in the work. One such issue is that the dataset is of a relatively small size and the classification of the emotions is binary, which results in a loss of the full range of emotions that are found in Sangam poetry. Furthermore, the use of poetic compression and figurative language are also challenges that need to be addressed even with the use of transformer models.
Future work will integrate techniques for explainability of transformers like attention visualization and token attribution to better understand which poetic and lexical features help contribute most to the detection of melancholic emotions in classical Tamil texts. The proposed method is meant to be a framework for future work to be done in real-time, at scale for the distributed computing inference to be of concern in the future.
Future work will also seek to enrich the corpus in other Sangam texts as well as expand the emotional classification system to be more than binary classification. Indic-specific transformer models like IndicBERT, MuRIL and XLM-RoBERTa will be more deeply investigated for enhancing the system to be more interpretable, and robust. The underlying aim of the designed system is to cater predominantly to academia and scholarly pursuits, whereas large scale operationalization and real-time use will be the subject of future work.
Footnotes
ORCID iDs
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
