Yongxin (Richard) Wang

I am an Applied Scientist at Amazon AWS AI, under the Rekognition team providing image analysis service to customers. I primaliry work on topics related to face recognition.

Prior to Amazon, I obtained my Master of Science in Computer Vision (MSCV) Degree at the Robotics Insitute of Carnegie Mellon University. I work with Prof. Kris Kitani on Multi-Object Tracking (MOT), and Prof. Louis-Philippe Morency on Multimodal Machine Learning (MMML). I obtained my Bachelor's Degrees from Georgia Institute of Technology with double majors in Computer Science and Industrial Engineering. I also have worked with Prof. Jim Rehg on deep learning based human gaze analysis.

yongxinw [at] amazon.com  /  CV  /  Google Scholar  /  LinkedIn

profile photo

I'm interested in Computer Vision, Multimodal Machine Learning, and Robotics. My research footprints have covered Multi-Object Tracking (MOT), Multimodal Human Language Sequences, and human gaze tracking. I want to one day build a robot/agent that understands the intriguing human behaviors and communicates naturally with humans. I've also got some experience in data visualization.

MTAG: Modal-Temporal Attention Graph for Unaligned Human Multimodal Language Sequences
Jianing Yang*, Yongxin Wang*, Ruitao Yi, Yuying Zhu, Azaan Rehman, Amir Zadeh, Soujanya Poria, Louis-Philippe Morency
Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT), 2021
code / bibtex

Modal-Temporal Graph for analysing unaligned human language sequences.

(* indicates equal contribution)

Joint Object Detection and Multi-Object Tracking with Graph Neural Networks
Yongxin Wang, Kris M. Kitani, Xinshuo Weng
International Conference on Robotics and Automation (ICRA) 2021
code / website / slides / bibtex

Joint detection and association using Graph Neural Networks. Named GSDT on MOTChallenge.

GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D Multi-Feature Learning
Xinshuo Weng, Yongxin Wang, Kris M. Kitani
Computer Vision and Pattern Recognition (CVPR), 2020
code / website / slides / bibtex

State-of-the-art performance in 3D MOT in KITTI dataset

Detecting Attended Visual Targets in Video
Eunji Chong, Yongxin Wang, Nataniel Ruiz , James M. Rehg
Computer Vision and Pattern Recognition (CVPR), 2020
code / dataset / bibtex

Predicting where the people are looking at in videos.

Connecting Gaze, Scene, and Attention: Generalized Attention Estimation via Joint Modeling of Gaze and Scene Saliency
Eunji Chong, Nataniel Ruiz , Yongxin Wang, Yun Zhang, Agata Rozga, James M. Rehg
European Conference on Computer Vision (ECCV), 2018
poster / bibtex

Predicting where the people are looking at.

TypoTweet Maps: Characterizing Urban Areas through Typographic Social Media Visualization
Alex Godwin, Yongxin Wang, John T. Stasko,
European Conference on Visualization (EuroVis), 2017

Visualizing social media data in a Typographic map.

Amazon Rekognition, Mar. 2020 - Present

Applied Scientist

Working on face recognition services and technologies.

Carnegie Mellon University, Jan. 2019 - Present

Research Assistant with Prof. Kris Kitani

Worked on simultaneous detection and associate with Graph Neural Networks for Multi-Object Tracking

Carnegie Mellon University, Aug. 2019 - Present

Research Assistant with Prof. Louis-Philippe Morency

Worked on modeling multimodal temporal languange sequences with Graph Neural Networks

Amazon Rekognition, May. 2019 - Aug. 2019

Applied Scientist Intern with Dr. Wei Xia

Worked on high-resolution face synthesis with disentangled control through facial identity and attributes

Georgia Institute of Technology, Jan. 2017 - May. 2018

Research Assistant Intern with Prof. Jim Rehg

Worked on gaze target prediction in image and in video

Template from here