Multimodal AI for Robust Human-Computer Interaction

Completed: Nov 2024

This project created a robust voice interface for Edge/IoT devices (wearables, hearables) by using AI/ML to overcome noisy, real-world environments.

Multimodal Sensor Fusion: Engineered an AI model to intelligently fuse data from many specialzed sensors, creating a clear signal where a single sensor would fail.
Generative Audio Enhancement: Used a generative AI to reconstruct the full signal spectrum from the fused sensor data, dramatically improving clarity for better HCI.
Forward-Looking R&D: Conceptualized and explored a novel approach using LLMs to perform real-time transcription, then leveraging the text to contextually guide and refine the final speech enhancement.

Tech & Skills

Core Competencies: Multimodal AI, Sensor Fusion, Edge/IoT Systems, HCI, AI/ML, LLM

Austin Lu

Tech & Skills