Abstract
Automated traffic violation detection is a critical component of intelligent transportation systems, aimed at enhancing road safety and minimizing human oversight. This paper proposes a real-time deep learning-based framework capable of detecting multiple traffic violations, including helmet and seatbelt non-usage, triple riding, and mobile phone usewhile performing automatic license plate recognition. The system integrates five independently trained YOLOv8 models on specialized datasets and uses Keras-OCR(Optical Character Recognition) for license number extraction, triggered conditionally upon violation detection to optimize computational efficiency. The proposed system achieved an average mAP@50 of 97.9%, with a precision of 96.9% and recall of 95.9%, outperforming existing models in both speed and accuracy. The challenges faced while developing the system were low-light conditions, motion blur, and license plate variability which are discussed in this paper alongside legal and ethical considerations.Future work will focus on enhancing OCR robustness, supporting complex violations through temporal analysis, and deploying on edge-computing platforms within smart city infrastructures.
Keywords
Introduction
Recent advancements in deep learning and computer vision have brought major improvements to automated visual analysis, opening up new possibilities across fields such as surveillance, healthcare, manufacturing, and intelligent transportation systems. At the heart of many of these applications lies object detection, which involves accurately identifying and localizing multiple objects within images or video streams (Charran & Dubey, 2022; Mustafa & Karabatak, 2024). Thanks to the evolution of detection algorithms, particularly those built on convolutional neural networks (CNNs), it is now possible to achieve real-time detection without significantly compromising precision.
In the field of smart transportation, there has been growing interest in using these technologies for automated traffic monitoring and violation detection. Traditional traffic enforcement methods, which mostly depend on manual observation and static surveillance systems, are often limited by human error, slow response times, and high operational costs. In contrast, computer vision-based solutions offer scalable, continuous, and highly accurate monitoring capabilities (Bose et al., 2023; Jiang et al., 2023; Kielty et al., 2023; Sutikno & Kusumaningrum, 2023). These systems can automatically identify traffic violations, record evidence, and support enforcement actions, contributing to safer roads and more effective regulation (Chandravanshi et al., 2021; Du et al., 2024; Fathi et al., 2024; Hosseini & Fathi, 2023; Saravanan & Rajini, 2024).
Several object detection architectures have been explored for traffic surveillance, including Faster R-CNN, SSD, and the YOLO (You Only Look Once) family of models (Liu et al., 2024; Pan et al., 2022; Said et al., 2024; Sarhan et al., 2024). Among these, YOLO models have become particularly popular because of their ability to deliver real-time performance with strong accuracy, making them well-suited for the dynamic and often resource-constrained environments encountered in traffic monitoring (Hu et al., 2023; Moussaoui et al., 2024; Pan et al., 2023; Sudharson et al., 2023). Additionally, advances in optical character recognition (OCR) techniques have made it possible to automatically extract license plate information, a key step in linking detected violations to specific vehicles (Gu et al., 2024; Kwak & Kim, 2023; Wang et al., 2023; Wei et al., 2024).
In this study, we propose a complete system for automated traffic violation detection and license plate recognition using deep learning techniques. Five YOLOv8 models were individually trained on task-specific datasets, including the Seat Belt Detection dataset by Ff, the Seat Belt Detection API dataset, FB-YOLOv7, the Indian Helmet Detection Dataset, Traffic Violation Detection (TVD 2), and license plate datasets such as the Indian Vehicle License Plate Dataset and the Indian License Plates with Labels, sourced from Kaggle and Roboflow. These models were developed to detect specific violations: seatbelt non-usage, helmet non-usage, triple riding on two-wheelers, mobile phone usage while driving, and license plate localization. All five models process the same input stream simultaneously to detect violations in real time. The novelty of this work lies in leveraging KerasOCR for license plate recognition, which uniquely combines end-to-end text detection and recognition, under varying lighting, angles, and partial occlusions, outperforming other OCR approaches which requires extra preprocessing steps like gray scaling, binarization, etc. When a violation is identified, the system activatesKerasOCR to crop and extract the license plate text. This approach ensures efficient use of computational resources by invoking OCR only when necessary. The performance of the proposed system is evaluated based on accuracy, precision, and recall, with a focus on achieving reliable results across a variety of environmental conditions.
Related Work
Charran and Dubey (2022) proposed a violation detection framework using YOLOv4 and DeepSORT for two-wheeler violations such as no-helmet, triple riding, wheeling, no-parking, and mobile phone usage. Tesseract OCR was employed for license plate recognition.Their system achieved a mAP of 98.09% for violation detection and 99.41% accuracy for automatic number plate recognition (ANPR). Mustafa and Karabatak (2024) presented a car model and license plate recognition system utilizing MobileNetV2, YOLOx, YOLOv4-tiny, PaddleOCR, and SVTR-tiny. The dataset comprised 1000 real-time captured images under various weather conditions. Their system achieved 97.5% accuracy in car model identification and license plate recognition, with Grad-CAM used to validate model focus.
Jiang et al. (2023) enhanced YOLOv5 for helmet detection by integrating a Dilated Convolution in Coordinate Attention (DICA) layer and a Rebuild Bidirectional Feature Pyramid Network (Re-BiFPN). Using the Helmet Wearing dataset for Non-motor Drivers (HWND) containing 2096 images, the hybrid model achieved 93.5% precision, 91.1% recall, and 94.3% mAP, with DICA and Re-BiFPN contributing significant performance gains. Bose et al. (2023) proposed a real-time deep learning system for detecting two-wheeler violations such as no-helmet, triple riding, wheeling, and stoppie using YOLOv8. They introduced a low-light video enhancement module using OTSU's thresholding followed by CLAHE for improving image visibility under poor lighting conditions.The system achieved 98.2% average precision, 97.5% recall, and 97.05% accuracy, detecting 172 violations out of 188, with a 60% faster processing time compared to other state-of-the-art models.
Kielty et al. (2023) introduced a hybrid architecture combining MobileNetV2, a 2D CNN, a self-attention module, and a two-layer Bi-LSTM network to detect yawning and seatbelt status from video frames. Using a self-captured dataset with 26,386 frames, their system achieved 100% accuracy in detecting seat belt fastening and unfastening states. Sutikno and Kusumaningrum (2023) proposed a YOLOv8-based system for detecting drivers and passengers without seatbelts. Their method included three subsystems: windshield detection, passenger classification, and seatbelt classification. Datasets collected from CCTV and cameras in Indonesia included Dataset1 (3289 images) for windshield detection, Dataset2 (972 images) for passenger classification, and Dataset3 (1684 images) for seatbelt classification. YOLOv8s achieved a mAP of 0.960 for windshield detection and up to 89.23% and 88.46% accuracy for passenger and seat belt classification, respectively. Fathi et al. (2024) developed a two-stage approach using YOLOv8, SSD, and Faster R-CNN models for Iranian motorcycle license plate detection and character recognition. The Iranian Motorcycle License Plate (IMLP) dataset and a derived Iranian Motorcycle License Plate Digits (IMLPD) dataset were used. YOLOv8 achieved the best performance with 98.5% license plate detection accuracy and 99% license plate character recognition accuracy under varying lighting and angle conditions.
Hosseini and Fathi (2023) used YOLOv5s and a ResNet34 hybrid model with SPP, PMT, and TPP layers to detect vehicle occupancy and seat belt usage. YOLOv5s detected the windshield, followed by ResNet34 for classifying occupancy and seatbelt status. Evaluated on a dataset of over 3500 images from the Traffic Transport Organization, the method achieved 99.7% accuracy for windshield and occupant detection, and 98.9% for seatbelt violation detection. Shashirangana et al. (2021) proposed optimized decision base solution for automated license plate recognition (ALPR) which acts on edge devices during night session. The ALPR model is more computational model and time taking and may not run on edge devices with restricted resource.
Jithmi Shashirangana et al. Presented complete survey on models and methods on License plate detection. The comprehensive work examines limitation of APLPR model to its efficiency, computing cost and lustiness to natural conditions adaptability. The challenges to the model include computational cost or time required to handle huge LP problem and also hardware required to carry out the processing in short power devices (Shashirangana et al., 2020).
Proposed Method
The proposed method integrates state-of-the-art deep learning techniques to automate traffic violation detection and license plate recognition. The approach leverages multiple YOLOv8 models for violation detection, combined with Keras-OCR for accurate license plate text recognition, ensuring real-time performance and high precision. Unlike traditional systems that rely on static surveillance and manual observation, our method utilizes a robust pipeline that processes video streams to detect violations such as seatbelt and helmet usage, and to recognize license plates for further enforcement actions.The proposed framework is designed to handle dynamic, real-world traffic conditions, addressing challenges like occlusions, environmental variability, and motion blur. The proposed system offers a scalable solution for automated enforcement while minimizing false positives and negatives, making it adaptable for real-world deployment across various traffic monitoring applications.The algorithm for the proposed model is as follows:
1: function TRAFFIC.VIOLATION.MONITORING(V)
2: Initialize YOLOv8 models:
3: Model1: SeatbeltDetection
4: Model2: Helmet, Phone Usage and Triple Riding Detection
5: Model3: License Plate Localization
6: Initialize OCR module (KerasOCR)
7: for each frame f from video/image stream do
8: Resize and normalize frame f
9: end for
10: for each preprocessed frame f do
11: Run Model1 to detect seatbelt violations
12: Run Model2 to detect helmet, phone usage and triple riding
13: Run Model3 to detect and localize license plates
14: end for
15: for each detected violation v do
16: Crop license plate region using detected coordinates
17: Extract text using KerasOCR
18: Record:
19: Violation Type
20: Timestamp
21: License Plate Text
22: Cropped Image
23: Store all data in persistent storage
24: end for
25: Return recorded violations with metadata
26: end function
Dataset Collection
To develop a robust multi-class traffic violation detection system, five separate YOLOv8 models were trained, each tailored to a specific type of violation using dedicated datasets. For seatbelt detection, two annotated datasets were employed: the FB-YOLOv7 dataset and the Seatbelt Detection API dataset. These datasets include diverse real-world images capturing drivers and passengers with and without seatbelts under varying lighting and environmental conditions. Helmet usage detection utilized the Indian Helmet Detection Dataset, which comprises annotated images of two-wheeler riders labelled based on helmet presence. For detecting triple riding, the Traffic Violation Detection (TVD 2) dataset was used, featuring images of two-wheelers carrying three or more individuals. Mobile phone usage detection relied on a custom, manually annotated dataset created from open-source video frames and photographs showing drivers using mobile devices while driving. License plate localization and recognition were supported by the Indian Vehicle License Plate Dataset and the Indian License Plates with Labels dataset. Both offer a large volume of annotated license plate images from different Indian states, covering various plate formats and environmental settings. All datasets were sourced from publicly available platforms such as Roboflow, Kaggle, and GitHub.
Exploratory Data Analysis
An initial exploratory data analysis (EDA) was performed to assess dataset quality and distribution. The analysis revealed class imbalances, notably in the triple riding and mobile phone usage categories, which had fewer instances compared to more common violations like helmet and seatbelt non-compliance. To address this issue and minimize model bias, a comprehensive suite of data augmentation techniques was applied. These included geometric transformations (e.g., rotation and flipping), photometric adjustments (e.g., brightness and contrast modifications), and cropping. These augmentations effectively increased the representation and diversity of minority classes, thereby enhancing model generalization. This facilitated the development of models capable of recognizing violations occurring across various regions of the input images, thus improving contextual adaptability.
Data Preprocessing
All input frames - whether captured from continuous surveillance video or submitted as standalone images - were uniformly resized to 640 × 640 pixels to meet YOLOv8's input resolution requirements. Following resizing, pixel intensity normalization was applied, converting raw pixel values to a standardized numerical range. This normalization helped reduce variability across the dataset and improved training stability. It helped to provide uniformity across images and improved visibility (Equation (1)).
Compared to YOLOv8's default [0–1] scaling, mean–std normalization reduced illumination variance and stabilized gradients, leading to faster convergence and better generalization in challenging traffic scenes. Together, these preprocessing steps ensured consistency in data fed into the detection pipeline, thereby enhancing both model performance and training efficiency.
The YOLOv8 architecture consists of C2f-based backbone for efficient feature extraction, a PAN-FPN neck to fuse multi-scale features, and a decoupled head for classification and box regression. Each YOLOv8 model used in this system employs a deep convolutional neural network for hierarchical feature extraction. Initial layers capture low-level patterns such as edges and contours, while deeper layers extract higher-level semantic features useful for detecting traffic violations. During training, a composite loss function is used to optimize three aspects simultaneously: accuracy of object localization, correctness of classification, and confidence in object presence (Equation (3)).
To enhance convergence during training, cosine learning rate scheduling is applied, which gradually reduces the learning rate following a cosine function over the course of training. This approach avoids abrupt drops and helps the model settle into a smoother minimum. The learning rate at epoch t is calculated using the formula (Equation (5)).
Thus, the mathematical model of YOLOv8, with its optimized architecture and anchor-free design, enables robust and effective detection of traffic violations under diverse real-world conditions.
Once a violation is detected in a given frame, that frame is passed to a separate YOLOv8 model trained specifically for license plate localization. This model identifies the spatial region of the license plate and outputs the coordinates of a bounding box around it. Using these coordinates, the license plate region is cropped from the original frame and sent to an OCR module built using KerasOCR, a pre-trained model trained consisting of convolutional feature extractor for detecting text and CRNN model with CTC decoding for text recognition. It implements CRAFT (Character Region Awareness for Text detection
Deep learning-based OCR systems like KerasOCRcommonly use Connectionist Temporal Classification (CTC) loss for training, which enables sequence-to-sequence learning without needing pre-segmented characters. The CTC loss is defined (Equation (6)).
Since the model may output blank tokens or repeated characters, CTC decoding includes a collapsing step to remove duplicates and blanks, refining the final prediction. To further improve recognition accuracy, a post-processing module applies rule-based corrections to address frequent misclassifications (e.g., ‘S’ → ‘5’, ‘O’ → ‘0’). The output is validated against regular expressions based on the syntax of Indian license plates to ensure syntactic correctness.
This enables the automatic linking of detected violations to specific vehicles based on their license plates.
The proposed system deploys five specialized YOLOv8 models running in parallel on a unified video input stream, with each model focused on a distinct category of traffic violation. When any violation is detected, an Optical Character Recognition (OCR) module is conditionally triggered to extract and decode the vehicle's license plate from the same frame. A violation report is then generated, containing the type of violation, the detected license plate number, bounding box coordinates, and timestamp. For data storage, all records are stored in a structured SQL database. The modular design of the system facilitates scalability, allowing easy integration of additional violation types and policy-based extensions.
Architectural Summary
The overall architecture is designed for real-time, scalable traffic violation detection and license plate recognition (Figures 1 and 2). The three models aimed at detecting helmet violation, triple riding and phone usage are combined together to detect two-wheeler violations parallelly, ensuring high-speed inference without compromising accuracy. The other models are aimed at detecting seatbelt violation detection and license plate detection. All these models run parallelly on real time video feeds captured from traffic signals. In case of any violation detected, the cropped license plate image is sent to Keras OCR model for license plate character recognition.This modular framework permits easy integration of new violation categories and supports deployment in a wide range of environments, including roadside surveillance systems, smart city networks, and automated tolling infrastructure. Through optimized preprocessing, balanced dataset training, and selective OCR invocation, the system achieves both efficiency and reliability in real-world traffic monitoring scenarios. Figure 1 illustrates the workflow of the entire process.

Proposed System Workflow.

Proposed YOLOv8 + KerasOCR Architecture.
The proposed system utilizes a diverse set of publicly available and custom-curated datasets to train and evaluate deep learning models for traffic violation detection and license plate recognition. These datasets include annotated images covering a wide range of violations such as seatbelt non-usage, helmet violations, mobile phone usage, and triple riding, captured under various environmental conditions and camera angles. Specialized license plate datasets further support accurate localization and OCR-based recognition across different Indian plate formats. Together, these datasets consisting of 15,024 images (3583 license plate images, 5443 seatbelt violation images, 5998 two-wheeler detection images) ensuring robust model performance, adaptability to real-world scenarios, and compliance with region-specific traffic regulations.
Seat Belt Detection Dataset by Ff
The Seat Belt Detection Dataset by Ff, available on Images. CV, consists of annotated images that focus on detecting seatbelt usage by vehicle occupants. These images are captured under various environmental conditions and from different angles, making the dataset robust for real-world applications. It includes labelled data highlighting both the presence and absence of seatbelts, facilitating accurate training for object detection models aimed at road safety compliance.
Seat Belt Detection API Dataset
This dataset, sourced from Roboflow Universe and published by SACAIM, contains several images annotated for detecting seatbelt usage. It includes labels such as ‘seat-belt’, ‘noseat-belt’, and ‘number-plate’, making it suitable for multi-object detection tasks. The dataset encompasses a variety of vehicle interior scenarios, contributing to the development of systems that can detect violations like seatbelt non-usage in real time under varying lighting and positioning conditions.
FB-YOLOv7 Dataset
The FB-YOLOv7 dataset is a custom collection developed specifically for training YOLOv7 models on multiple types of traffic violations. It includes license plate images 3583, seat belt image 5443 and two-wheeler images 5998. The dataset includes diverse annotations for violations such as helmet non-usage, triple riding, and mobile phone usage while driving. The images are primarily sourced from Indian roads, ensuring relevance to regional traffic enforcement systems. It plays a crucial role in training models to perform multi-class detection of common traffic violations
Indian Helmet Detection Dataset
Available on Kaggle, the Indian Helmet Detection Dataset includes numerous annotated images focused on identifying helmet usage among two-wheeler riders. Although the exact number of images is unspecified, it contains varied examples captured in different lighting and traffic conditions. Each image is labelled to indicate whether riders and pillion passengers are wearing helmets, making it highly effective for training detection systems aimed at improving compliance with helmet laws in densely populated regions.
Traffic Violation Detection (TVD)
The TVD 2 dataset, sourced from Roboflow Universe, contains annotated images used to detect a range of traffic violations, including mobile phone use while driving, triple riding, and other unsafe behaviours. Designed for instance segmentation tasks, it supports real-time object detection applications. The dataset includes images with varying backgrounds and camera angles, providing a broad base for developing robust traffic surveillance models capable of identifying multiple infractions simultaneously.
Indian Vehicle License Plate Dataset
This dataset, hosted on GitHub by Data Cluster Labs, contains images of Indian vehicle license plates collected from more than 20 states. It features high-resolution vehicle images with diverse plate designs, fonts, and orientations. Annotations are provided in multiple formats such as COCO, YOLO, and PASCAL VOC, making it adaptable for various detection pipelines. The dataset's diversity and scale make it ideal forlicense plate localization and recognition tasks in intelligent traffic systems.
Indian License Plates with Labels
Collected from sources like Google Images and published on Kaggle, the Indian License Plates with Labels dataset includes a broad range of labelled license plate images. While the exact number of images is not specified, each image includes both bounding box annotations and corresponding alphanumeric labels. The dataset reflects the variety in Indian license plate formats and styles, supporting tasks like optical character recognition (OCR) and end-to-end license plate recognition. Its real-world variability enhances model generalizability across different regions.
The above-mentioned datasets are publicly available and can be used by any individual or organisation for research purposes.
Experimental Results
The proposed model, integrating YOLOv8 for object detection and KerasOCR for license plate recognition was evaluated on multiple datasets using various performance metrics such as mAP50, mAP50-95, precision, recall, etc. to measure the efficiency of the model both quantitatively and qualitatively.
The YOLOv8 model was trained to perform three main tasks: (i) two-wheeler violations like no helmet, triple riding, phone usage, and wheeling, (ii) seatbelt violation, and (iii) license plate detection. Each model was trained on its respective dataset, which was divided into training, validation, and test sets in an 80:10:10 split. For violation-related tasks (helmet, seatbelt, and others), training was conducted for up to 50 epochs using stochastic gradient descent (SGD) with an initial learning rate of 0.01, a batch size of 16, cosine learning rate scheduling, and early stopping. In contrast, the license plate detection model reached convergence much earlier, with performance saturating around 20 epochs. Model convergence was monitored using validation mAP and key loss metrics including box loss, objectness loss, and classification loss
Two Wheeler Violation Detection
In the case of two-wheeler violation detection, the model's performance is assessed using mAP50 and mAP50-95, as shown in Figure 3. The left graph illustrates the model's ability to detect violations with an IoU threshold of 0.5, with the mAP50 value consistently increasing over epochs, indicating improved accuracy. The right graph, showcasing mAP50-95, presents a more stringent evaluation across IoU thresholds from 0.5 to 0.95, with a consistent rise in values, suggesting better detection under challenging scenarios. Both graphs show an upward trend, indicating the model's superior performance in detecting two-wheeler violations.

Graphical Analysis of mAP50 and mAP50-95 vs Epochs.
The model's performance is also evaluated using Precision and Recall metrics, as shown in Figure 4. The left graph illustrates the steady rise in Precision indicating that the model is significantly minimizing the false positives throughout the training. The right graph depicts a smoother rise in recall indicating that the model is significantly improving in terms of detecting violations with only few violations being missed. Both graphs demonstrate an upward trend, signifying that the model is enhancing both its ability to correctly identify violations and its capacity to detect as many violations as possible as training progresses.

Graphical Analysis of Precision and Recall vs Epochs.
As shown in Figure 5, the model detects no-helmet violators for both the riders on a motorcycle showcasing the model's ability to detect multiple violators in a single pass. The model also detected phone usage and no helmet violations, both caused by the same person, indicating that the model can detect multiple violations for the same person even under challenging conditions like motion blur. The model effectively localizes and labels each violation, indicating its robustness in real-world scenarios.

Two-Wheeler Violation Detection with Bounding Boxes.
Figure 6 illustrates wheeling and triple riding detection precisely. In the image, the rider is clearly performing bike stunt by lifting the front wheel of the motorcycle indicating abnormal traffic behaviour. In the second image, three individuals are seated on a two-wheeler vehicle exceeding the passenger limit. Hence, this shows the model's capability to accurately detect abnormal behaviour of such traffic rule violators in real time scenarios.

Two-Wheeler Violation Detection with Bounding Boxes.
The Precision-Recall curve in Figure 7 exhibits a favourable pattern, with precision initially close to 1 and recall steadily increasing. A distinct bump near the top-right corner signifies that the model achieves a strong balance between precision and recall during training. Although a slight decline in precision follows, the overall trend reflects the model's capacity to detect a higher number of true no seatbelt violations while maintaining high accuracy, indicating robust and reliable performance in real-world scenarios.

The P-R Curve for Seatbelt Detection.
Figure 8 illustrates the plotting of precision vs confidence threshold which depicts a steady rise in precision as confidence increases. This trend indicates that the model effectively reduces false positives, resulting in higher precision. The subsequent flattening of the curve reflects consistent, high-confidence predictions, highlighting the model's reliability in maintaining accuracy at stricter decision boundaries.

Plotting of Precision vs Confidence Threshold.
Figure 9 shows the plotting of recall vs confidence threshold depicting a slight initial drop from the maximum recall, followed by a gentle bump near the top-right corner. As the confidence increases further, the recall gradually declines. This trend suggests that while the model initially captures most violations, increasing the confidence threshold leads tofewer detections, indicating a trade-off between detection rate and prediction certainty.

Plotting of Recall vs Confidence Threshold.
Figure 10 depicts the F1 Score vs Confidence Threshold curve which exhibits the characteristic rise of the model which peaks at an optimal point where the confidence threshold is around 0.5. This pattern indicates that moderate confidence thresholds strike a better balance between precision and recall. This shows that the model might suffer from false positives at lower thresholds and at higher thresholds the model might miss some detections. Hence, the peak of the curve thus reflects the threshold that maximizes the model's overall detection performance.

Plotting of F1 Score vs Confidence Threshold.
In Figure 11, we can see that in spite of motion blur, the model exhibits superior performance in detecting seatbelt usage in both passengers and drivers. In the image, the system accurately detects the windshield and identifies a seatbelt violation for a passenger in one car and in the second vehicle, both the driver and passenger are correctly flagged for seatbelt violations, demonstrating the model's exceptional ability to detect smaller objects like seatbelts with high precision.

Seatbelt Violation Detection with Bounding Boxes.
The two graphs in Figure 12 depict the models’ performance and its detection capabilities across the epochs. The mAP50 demonstrates consistent increase with intermittent spikes reflecting the model's growing detection accuracy and ability to handle varying license plate sizes. The right-side graph follows a similar trend where the mAP50-95 is fluctuating, highlighting the model's adaptation in detecting smaller or more challenging objects and its fine-tuning over time.

Plotting of mAP50 and mAP50-95 vs Epochs.
The precision graph in Figure 13 exhibits noticeable fluctuations with large peaks, followed by a gradual settling, indicating occasional inconsistencies in detection but overall high precision towards the end. In contrast, the recall graph demonstrates a more consistent upward trend with smaller variations, suggesting steady improvements in identifying relevant objects, ultimately stabilizing at a high value. This comparison highlights the model's ability to fine-tune precision with occasional fluctuations, while recall steadily improves with fewer variations.

Comparison of Precision and Recall Across Epochs.
The image in Figure 14 presents four real-time captured license plates, each accurately localized within bounding boxes by the model. The detections demonstrate the model's robustness and precision in identifying license plates under varying real-world conditions, highlighting its effectiveness in real-time traffic monitoring scenarios.

License Plate Recognition with Bounding Boxes.
The YOLOv8 models trained across these three primary tasks can be used to run parallelly on real time traffic video feeds. This setup allows the system to detect multiple objects in a single pass making it suitable for continuous monitoring in busy traffic scenarios.
There are various versions of YOLOv8 such as YOLOv8n (nano), YOLOv8s (small), YOLOv8m (medium) and YOLOv8l (large). An experimental analysis is done in Figures 15 and 16 to find out the performance of various variants of YOLOv8.

Plotting of mAP vs Epochs for YOLOv8 Variants.

Plotting of Box Loss vs Epochs for YOLOv8 Variants.
The mAP vs. Epochs graph highlights the detection accuracy of four YOLOv8 variantsn, s, m, and lover 20 training epochs. All models converge rapidly to high mAP values, with YOLOv8n achieving the highest final mAP of 0.994, indicating superior overall detection performance. This suggests that the lightweight YOLOv8n is highly effective in localizing license plates accurately, possibly due to its reduced model complexity and efficient convergence, which prevent overfitting on simpler tasks like license plate detection.
In contrast, the Box Loss vs. Epochs graph reveals that YOLOv8s achieves the lowest final box loss, indicating more precise bounding box localization. While YOLOv8n maintains competitive performance, YOLOv8s appears to offer better spatial precision. Hence, YOLOv8n provides the best overall detection performance, whereas YOLOv8s excels in box-level accuracy. Therefore, if the goal is high accuracy with minimal computational cost, YOLOv8n is preferable, but for applications where precise localization is critical, YOLOv8s may be the optimal choice.
Table 1 depicts a detailed evaluation of the model's performance across ten target classes of all the three YOLOv8 object detection models: with helmet, without helmet, triple riding, using mobile, motorcycle, person-no seatbelt, person-seatbelt, seatbelt, windshield and license plate. The performance metrics like precision, recall, mAP50, and mAP50-95 were calculated for each class, providing insights into the model's ability to localize and classify both broad and fine-grained features.Thetwo-wheeler violations were detected with an average precision of 97.8%, the seatbelt violations with an average precision of 94.1% and license plate with an average precision of 99.4%.
Performance Metrics for 10 Classes of YOLOv8 Models.
The bar graph in Figure 17 illustrates the model's performance across seven classes: helmet, without helmet, triple riding, using mobile, motorcycle, seatbelt, and license plate. Each bar represents the number of correct and incorrect predictions for each class shown in red and green colour respectively. The graph clearly demonstrates that incorrect predictions areminimal, with the majority of predictions being accurate, as reflected by the dominance of the green segments in each bar. This highlights the model's strong ability to accurately detect and classify smaller and overlapping objects in various traffic scenarios effectively.

Number of Correct vs Incorrect Predictions per Class.
The YOLOv8 models were run in parallel on live webcam feed (Figure 18) where a person riding a motorcycle was captured without a helmet. The YOLOv8 model successfully identified the rider and classified the violation. As part of the pipeline, the system then proceeded to detect the motorcycle's license plate. The YOLOv8 model's accurate detection of the vehicle and rider, followed by the license plate, was crucial in ensuring that the subsequent ALPR (Automatic License Plate Recognition) phase by Keras-OCR could be applied effectively.

No Helmet Violation and License Plate Detection.
After the models were run in parallel on live traffic video feed, the license plate regions detected by the YOLOv8 model were directly passed to the Keras-OCR pipeline for recognition. A key novelty of the proposed system lies in leveraging Keras-OCR's ability to process natural scene images without requiring additional preprocessing. This computationally efficient behavior allows Keras-OCR to perform well without demanding significant computational resources, making it ideal for real-time applications. This streamlined approach enables direct extraction of license plate text from real-world frames, preserving recognition accuracy even under challenging conditions such as variable lighting, occlusions, or motion blur. The performance was validated both visually and quantitatively, demonstrating reliable text extraction across diverse license plate styles and environments. Figure 19 shows the OCR generated output text extracted from the detected license plate in Figure 19.

Keras-OCR Generated Text from Detected License Plate.
The models were evaluated for inference speed and pipeline latency using an NVIDIA Tesla T4 GPU 585MHZ 16GBon Google Colab. The time taken during training stage is 5 min for YOLOv8 model detecting two-wheeler violations, 6 min for detecting seatbelt violations and 3 min for detecting license plates.The average inference speed was approximately 120 FPS when processing live webcam feed, enabling timely real-time detection and recognition of traffic violations. The end-to-end processing time from violation detection to OCR text extraction was recorded to be around 12 s per incident. This includes the time taken for violation detection, license plate detection and text recognition. The model's ability to operate with such efficiency demonstrates its potential for deployment in real-time traffic monitoring and road safety systems.The mentioned time is required for incident covering, then capturing images and accessing the database module through API calls.
To evaluate the effectiveness of the proposed method, we conducted a comparative benchmark analysis against an existing state-of-the-art methodology. The evaluation was performed on three key safety compliance categories: helmet detection, seatbelt detection, and license plate detection as shown in Table 2. The models were assessed using the common performance indicator mAP.
Comparison with Other State of Art Models with Proposed Work.
Comparison with Other State of Art Models with Proposed Work.
The experimental results demonstrate that the proposed method consistently achieves superior performance across all evaluated categories—helmet detection, seatbelt detection, and license plate detection. In particular, it outperformed existing approaches in terms of mean Average Precision (mAP), indicating a higher overall accuracy in both object localization and classification tasks. Compared to the methods proposed by Jiang et al. (2023), Pan et al. (2022), Said et al. (2024) and Gu et al. (2024) each targeting specific detection tasks, our model achieved a notable improvement in mAP scores, highlighting its robustness and reliability in real-world traffic surveillance scenarios. However, this performance could be boosted even more by implementing efficient hardware oriented models. For example, JithmiShashirangana et al. implemented FB-Net (Facebook-Berkeley-Nets)and PC-DARTS (Partially Connected Differentiable architecture search) along with intensity IR illuminator to achieve an accuracy of 98%.
While the proposed traffic violation detection and license plate recognition system shows strong performance, several limitations remain. Environmental factors like poor lighting, adverse weather, and occlusions can reduce detection accuracy. Motion blur from fast-moving vehicles and overlapping objects may cause misclassifications or missed detections. KerasOCR struggles with non-standard license plates, varied fonts, skewed angles, and damaged surfaces, affecting recognition accuracy. The models’ generalization is limited without diverse training datasets covering various vehicle types, rider postures, and helmet styles. Integration with external APIs, such as those from transport authorities, presents challenges related to secure access, regulatory compliance, and latency. Managing large volumes of video and violation data also demands optimized, scalable database solutions. Legal and ethical concerns around personal data capture must be addressed to ensure privacy and compliance. Additionally, occasional false positives and false negatives can impact the system's enforcement reliability.
Future work should focus on improving robustness to environmental variability such as different light conditions and climates, enhancing OCR performance on diverse plates, expanding dataset diversity, optimizing scalability, and strengthening data privacy measures. Techniques like multimodal fusion and adaptive thresholding may further boost real-world performance. By applying ensemble methods, we can improve the model performance to bring low misdetections. The more advanced data augmented methods and movement related object detection techniques will be explored to maintain the model accuracy
Conclusion and Future Work
Thus, the proposed automated traffic violation detection and license plate recognition system, built on YOLOv8 and Keras-OCR, demonstrates high effectiveness in identifying multiple types of road safety violations. The system achieved an average precision of 0.969, showcasing strong detection capabilities across varied traffic scenarios and also ensures computational efficiency while maintaining high accuracy.The integration of diverse, real-world datasets further enhances its generalizability to practical environments. Overall, the solution offers a scalable and reliable framework for intelligent traffic monitoring and enforcement, with significant potential for deployment in smart city infrastructures and regional transport authority systems topromote safer roads.
Future work will focus on integrating super-resolution techniques to enhance OCR accuracy under challenging conditions such as low light, rain, and fog. The system will also be expanded to detect more complex traffic violations, including signal jumping and wrong-lane driving, through temporal video analysis.Deploying the system within smart city infrastructures using edge computing will be explored to minimize latency and improve real-time responsiveness, aligning with the broader goal of promoting safer roads.
Footnotes
ORCID iDs
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
