A Confidence-Weighted Memory-Augmented Multimodal Voice Agent for Real-Time Emotion-Aware Interaction

General Information

ISSN: 2319-2518 (Online)
Frequency: Bimonthly
Executive Editor-in-Chief: Prof. Jason Z. Kang
Managing Editor: Nancy Liu
DOI: 10.18178/ijeetc
Abstracting/Indexing: Scopus (CiteScore 2025: 6.9), Google Scholar, etc.
E-mail: editor@ijeetc.com; nancy.liu@ijeetc.com
Article Processing Charge: 800 USD

Joumal Metrics

Editor-in-Chief

$D`YT6YOQ)BX${[V_@OE6YO0.png$

Prof. Pascal Lorenz

University of Haute Alsace, France

It is my honor to be the editor-in-chief of IJEETC. The journal publishes good papers which focus on the advanced researches in the field of electrical and electronic engineering and telecommunications.

What's New

2026-06-04

The 2025 CiteScores have been released by Scopus. IJEETC received the CiteScore 2025 with 6.9!

2026-05-22

IJEETC Vol. 15, No. 3 has been published online!

2026-03-01

IJEETC Vol. 15, No. 2 has been published online!

Home > Published Issues > 2026 > Volume 15, No. 3, May 2026 >

IJEETC 2026 Vol.15(3): 205-220
doi: 10.18178/ijeetc.15.3.205-220

Vijaya Bharathi A.*and Prashant Nitnaware

Department of Computer Engineering, Pillai College of Engineering, Panvel, India
Email: jagan21it@student.mes.ac.in (V.B.A.), pnitnaware@mes.ac.in (P.N.)
^*Corresponding author

Manuscript received March 11, 2026; revised April 17, 2026; accepted May 2, 2026

Abstract—Conversational emotion-aware systems are critical in the development of human-centric Artificial Intelligence (AI). Single-modality systems often fail to identify acoustic patterns and text meaning associated with emotional expression. The study proposes an emotion-aware conversational framework that integrates offline-trained speech emotion recognition with a multimodal chatbot pipeline. In the offline phase, three self-supervised speech models, Wav2Vec2 (waveform to vectors version 2), Hidden-Unit Bidirectional Encoder Representations from Transformers (HuBERT), and Whisper, were fine-tuned on the Ryerson Audio-visual Database of Emotional Speech and Song (RAVDESS) emotional speech dataset for multi-class emotion classification. Among them, the fine-tuned Whisper model achieved superior performance and was selected as the primary acoustic emotion encoder. The trained Whisper model was then deployed within a real-time chatbot architecture, where speech input is transcribed using automatic speech recognition and combined with acoustic emotion predictions. A confidence-weighted multimodal fusion strategy integrates audio and text-based emotion cues, followed by an Adaptive Confidence-Weighted Temporal Memory (ACWTM) module to model short-term emotional continuity. The model attained 87.3 percent accuracy and outperforms the unimodal baselines. However, previous emotion-aware models have failed to address the complex issue of cross-modal dependencies and temporal information in an accurate manner due to real-time considerations, resulting in poor recognition results. The current study offers a novelty with new modules, Adaptive Confidence-Weighted Decision fusion Module (ACWDFM) and ACWTM, which play key roles in improving emotion recognition. It achieves enhanced robustness and conversational coherence with stability, offering a scalable framework for real-time empathetic human-AI interaction.

Index Terms—artificial intelligence, conversational system, emotion-aware interaction, memory-augmented voice agent, multimodal voice agent

Cite: Vijaya Bharathi A. and Prashant Nitnaware, "A Confidence-Weighted Memory-Augmented Multimodal Voice Agent for Real-Time Emotion-Aware Interaction," International Journal of Electrical and Electronic Engineering & Telecommunications, vol. 15, no. 3, pp. 205-220, 2026. doi: 10.18178/ijeetc.15.3.205-220

附件说明

PREVIOUS PAPER

Hybrid CNN-LSTM Architecture with MFCC-Based Class-Balanced Augmentation for Arabic Speech Emotion Recognition

NEXT PAPER

FIR Window and Phase Lead Effects in Repetitive Control for Exoskeleton Tracking

Home

Published Issues

Author Guide

Editor Guide

Reviewer Guide

journal menu