Research
My research leverages computer vision, machine learning and audio signal processing to build intelligent information systems for applications in AR/VR, Robotics and AI4Science. I'm also interested in exploring the potential of large language model (LLM) and vision language model (VLM) to further enhance these capabilities.
|
|
Audio Geolocation: A Natural Sounds Benchmark
Mustafa Chasmai,
Wuao Liu,
Subhransu Maji,
Grant Van Horn
Under Review, 2025
project page
/
code
/
arXiv
Using a vision-inspired approach, we tackle the novel challenge of predicting the geographic location of audio recordings by leveraging species sounds and multimodal retrieval techniques.
|
|
Can Large Language Models Reason About Goal-Oriented Tasks?
Filippos Bellos,
Yayuan Li,
Wuao Liu,
Jason J. Corso
ACL Workshop, 2024
project page
/
video
/
pdf
We study how well LLMs can complete a sequence of steps to achieve a certain goal, such as making a sandwich or repairing a bicycle tire.
|
Thanks to Jon Barron for providing this amazing template. Thanks to my friend Xiaofeng Lin for his help building this webpage.
|
|