Abstract
This study introduces a vision–language pipeline that detects risky driving behaviors and generates emotionally expressive responses to support driver awareness and comfort. Although vision–language models have advanced perception and reasoning in autonomous driving, existing systems rarely consider the emotional dimension or real-world user experience. Keep Yelling Assistant (KYA) detects high-risk driving maneuvers in real time, such as sudden cut-ins. It then produces emotional responses through a large language model tailored to driver preferences. The framework comprises two core modules. The vision module uses You Only Look Once (YOLO) v8 variants to detect nearby vehicles and identify risky behaviors such as sudden cut-ins. Key driving metrics, including relative distance, speed, and projected reach time, are extracted and normalized to produce a structured behavior log. The language module processes this log with user-defined emotional tone settings (e.g., neutral, humorous, analytical) and generates verbal reactions using state-of-the-art large language models (LLMs) (ChatGPT-4o, Claude 3, Gemini 2.5, and Copilot). We evaluated the proposed system using dashcam videos containing risky driving behaviors and a user study involving 108 participants. Participants selected preferred response styles, and LLMs were evaluated based on emotional alignment. All models received favorable ratings, though preferences varied across personas. Notably, the combination of YOLOv8s and ChatGPT-4o achieved the highest score, 4.29 out of 5.00. By integrating real-world perception with emotionally adaptive dialogue, KYA advances emotionally intelligent in-vehicle artificial intelligence. It highlights new opportunities to improve safety, trust, and driver comfort in conventional and autonomous vehicles.
Keywords
Get full access to this article
View all access options for this article.
