Research
I'm interested in Computer Vision, Multimodal Machine
Learning, and Robotics. My research footprints have
covered Multi-Object Tracking (MOT), Multimodal Human Language Sequences, and human gaze
tracking.
I want to one day build a robot/agent that understands the intriguing human behaviors and
communicates naturally with humans.
I've also got some experience in data visualization.
|
|
MTAG: Modal-Temporal Attention Graph for Unaligned Human Multimodal Language
Sequences
Jianing Yang*,
Yongxin Wang*,
Ruitao Yi,
Yuying Zhu,
Azaan Rehman,
Amir Zadeh,
Soujanya Poria,
Louis-Philippe Morency
Annual Conference of the North American Chapter of the Association for Computational
Linguistics (NAACL-HLT), 2021
code /
bibtex
Modal-Temporal Graph for analysing unaligned human language sequences.
(* indicates equal contribution)
|
|
Joint Object Detection and Multi-Object Tracking with Graph Neural Networks
Yongxin Wang,
Kris M. Kitani,
Xinshuo Weng
International Conference on Robotics and Automation (ICRA) 2021
code /
website /
slides /
bibtex
Joint detection and association using Graph Neural Networks. Named GSDT on MOTChallenge.
|
|
GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D
Multi-Feature Learning
Xinshuo Weng,
Yongxin Wang,
Kris M. Kitani
Computer Vision and Pattern Recognition (CVPR), 2020
code /
website /
slides /
bibtex
State-of-the-art performance in 3D MOT in KITTI dataset
|
|
Detecting Attended Visual Targets in Video
Eunji Chong,
Yongxin Wang,
Nataniel Ruiz ,
James M. Rehg
Computer Vision and Pattern Recognition (CVPR), 2020
code /
dataset /
bibtex
Predicting where the people are looking at in videos.
|
|
Connecting Gaze, Scene, and Attention:
Generalized Attention Estimation via Joint
Modeling of Gaze and Scene Saliency
Eunji Chong,
Nataniel Ruiz ,
Yongxin Wang,
Yun Zhang,
Agata Rozga,
James M. Rehg
European Conference on Computer Vision (ECCV), 2018
poster /
bibtex
Predicting where the people are looking at.
|
|
TypoTweet Maps: Characterizing Urban Areas
through Typographic Social Media Visualization
Alex Godwin,
Yongxin Wang,
John T. Stasko,
European Conference on Visualization (EuroVis), 2017
bibtex
Visualizing social media data in a Typographic map.
|
|