Yongxin (Richard) Wang

I am a second-year Master of Science in Computer Vision (MSCV) student at the Robotics Insitute of Carnegie Mellon University. I work with Prof. Kris Kitani on Multi-target Pedestrian Tracking, and Prof. Louis-Philippe Morency on Multimodal Social Intelligence Modeling. I am interested in computer vision, multimodal machine learning, human and social situation understanding.

In Summer 2019, I interned at Amazon Rekognition as an applied scientist in Seattle advised by Dr. Wei Xia. We worked on high-resolution controllable face image generation. Prior to CMU, I obtained my Bachelor's Degrees from Georgia Institute of Technology with double majors in Computer Science and Industrial Engineering. I also had the fortune to work with Prof. Jim Rehg at the Center for Behavioral Imaging

yongxinw@andrew.cmu.edu  /  CV  /  Google Scholar  /  LinkedIn

profile photo

I'm interested in computer vision and multimodal machine learning. Much of my research is about trying to understand human's behaviors through computer vision and multimodal machine learning. In the past, I've worked on human gaze tracking, pedestrian tracking, and social intelligence modeling. I've also got some experience in data visualization.

Detecting Attended Visual Targets in Video
Eunji Chong, Yongxin Wang, Nataniel Ruiz , James M. Rehg
Computer Vision and Pattern Recognition (CVPR), 2020

Predicting where the people are looking at in videos.

Graph Neural Network for 3D Multi-Object Tracking with Multi-Feature Learning
Xinshuo Weng, Yongxin Wang, Kris M. Kitani
Computer Vision and Pattern Recognition (CVPR), 2020

Graph Neural Network for Simultaneous Detection and Association for Multi-Object Tracking
Yongxin Wang, Xinshuo Weng, Kris M. Kitani
In Progress

Simultaneous pedestrian detection and tracking using Graph Neural Networks

Modeling Social Intelligence: Multimodal Social Question Answering using Graph Neural Networks
Amir Zadeh, Yongxin Wang, Jianing Yang, Yuying Zhu, Ruitao Yi, Louis-Philippe Morency
In Progress

Using Graph Neural Networks for video question answering grounded in social space

Connecting Gaze, Scene, and Attention: Generalized Attention Estimation via Joint Modeling of Gaze and Scene Saliency
Eunji Chong, Nataniel Ruiz , Yongxin Wang, Yun Zhang, Agata Rozga, James M. Rehg
European Conference on Computer Vision (ECCV), 2018
poster / bibtex

Predicting where the people are looking at.

TypoTweet Maps: Characterizing Urban Areas through Typographic Social Media Visualization
Alex Godwin, Yongxin Wang, John T. Stasko,
European Conference on Visualization (EuroVis), 2017

Visualizing social media data in a Typographic map.

Carnegie Mellon University, Jan. 2019 - Present

Research Assistant with Prof. Kris Kitani

Worked on simultaneous detection and associate with Graph Neural Networks for Multi-Object Tracking

Carnegie Mellon University, Aug. 2019 - Present

Research Assistant with Prof. Louis-Philippe Morency

Worked on social intelligence modeling with Graph Neural Networks on Social-IQ dataset

Amazon Rekognition, May. 2019 - Aug. 2019

Applied Scientist Intern with Dr. Wei Xia

Worked on high-resolution face synthesis with disentangled control through facial identity and attributes

Georgia Institute of Technology, Jan. 2017 - May. 2018

Research Assistant Intern with Prof. Jim Rehg

Worked on gaze target prediction in image and in video

Template from here