Advancing Education, Research, and Quality of Care for the Head and Neck oncology patient.
Introduction: Voice analysis for medical diagnostics is advancing rapidly with Artificial Intelligence (AI) and machine learning (ML). This study focuses on distinguishing mucosal lesions from laryngeal immobility in patients with dysphonia using ML models. Non-invasive voice analysis could optimize diagnostic workflows in ENT practices, offering a supplementary tool to invasive endoscopic evaluations and helping guide treatment strategies more effectively.
Materials and Methods: The study included dysphonic patients from a tertiary voice disorders and head-and-neck surgery center and a controlled group of healthy subjects. Patients diagnosed with unilateral laryngeal immobility or vocal fold mucosal lesions were selected, while those with other voice-affecting conditions were excluded. Voice recordings of sustained vowels were captured using high-quality audio equipment, and vocal parameters like fundamental frequency, jitter, and shimmer were extracted for analysis. A machine learning pipeline was developed in Python, featuring data normalization, hyperparameter tuning, and augmentation via SMOTE to balance class distributions. A stacking classifier combining multiple algorithms (Random Forest, SVM, XGBoost) was employed, with a meta-model used for enhanced predictive accuracy. Performance was evaluated using precision, recall, and area under the ROC curve (AUC).
Results: The study analyzed 66 patients (44 with laryngeal immobility, 22 with mucosal lesions) and 30 healthy subjects. Data augmentation increased the usable dataset size to 1144 samples. The machine learning model demonstrated high performance, with promising accuracy in differentiating between the two conditions, validated through cross-validation.
Conclusion: Machine learning shows strong potential for distinguishing vocal fold mucosal lesions from laryngeal immobility. This non-invasive method could improve diagnostic accuracy in clinical settings and offer earlier intervention options, especially in regions with limited specialist access. Further refinement could make it a valuable tool in public health.
KEY-WORDS
Dysphonia; Vocal Cord Paralysis; Voice Disorders; Vocal Cord Lesions; Acoustic Analysis; Artificial Intelligence; Machine Learning; Laryngology; Early detection; Head and Neck cancers