| 
                
                
                    | 
                            Yongxin (Richard) Wang
                         
                            I am an Applied Scientist at Amazon. I work on Amazon's suite of multimodal foundation models, with a focus on safety and alignment of generative models. We recently launched Amazon Titan (2023) and Amazon Nova (2024)
                            I am currently an Applied Scientist at Amazon AutoGluon, working on an AutoML platform that allows users to train and eval ML models in just 3 lines of code.I am an Applied Scientist at Amazon AWS AI, under the Rekognition team providing image
                            analysis service to customers. I primaliry work on topics related to face recognition. Prior to Amazon, I obtained my Master
                            of Science in Computer Vision (MSCV)
                            Degree at the Robotics Insitute of Carnegie Mellon University.  I work with
                            Prof. Kris Kitani on Multi-Object Tracking
                            (MOT), and Prof. Louis-Philippe Morency
                            on Multimodal Machine Learning. I obtained my Bachelor's Degrees from Georgia Institute of Technology
                            with double majors in Computer Science and Industrial Engineering. I also have worked with
                            Prof. Jim Rehg on deep learning based human gaze analysis.
                         
                            yongxinw [at] amazon.com  / 
                            CV  / 
                            Google Scholar
                             / 
                             LinkedIn 
                         |   |  
                
                    
                        |  | The Amazon Nova family of models: Technical report and model card Amazon Artificial General Intelligence, 2024
 
 |  
                    |  | Unsupervised and semi-supervised bias benchmarking in face recognition Alexandra Chouldechova, Siqi Deng, Yongxin Wang, Wei Xia, Pietro Perona
 European Conference on Computer Vision (ECCV), 2022
 
 
 |  
                    |  | PSS: Progressive Sample Selection for Open-World Visual Representation Learning Tianyue Cao, Yongxin Wang, Yifan Xing, Tianjun Xiao, Tong He, Zheng Zhang, Hao Zhou, Joseph Tighe
 European Conference on Computer Vision (ECCV), 2022
 
 
 |  
                    |  | Learning hierarchical graph neural networks for image clustering Yifan Xing, Tong He, Tianjun Xiao, Yongxin Wang, Yuanjun Xiong, Wei Xia, David Wipf, Zheng Zhang, Stefano Soatto
 International Conference on Computer Vision (ICCV), 2021
 
 
 |  
                    |  | MTAG: Modal-Temporal Attention Graph for Unaligned Human Multimodal Language
                                Sequences Jianing Yang*,
                        Yongxin Wang*,
                        Ruitao Yi,
                        Yuying Zhu,
                        Azaan Rehman,
                        Amir Zadeh,
                        Soujanya Poria,
                        Louis-Philippe Morency
 Annual Conference of the North American Chapter of the Association for Computational
                            Linguistics (NAACL-HLT), 2021
 code /
                        bibtex
 
 Modal-Temporal Graph for analysing unaligned human language sequences.  (* indicates equal contribution) |  
                    |  | Joint Object Detection and Multi-Object Tracking with Graph Neural Networks Yongxin Wang,
                        Kris M. Kitani,
                        Xinshuo Weng
 International Conference on Robotics and Automation (ICRA) 2021
 code /
                        website /
                        slides /
                        bibtex
 
 Joint detection and association using Graph Neural Networks. Named GSDT on MOTChallenge.  |  
                    |  | GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D
                                Multi-Feature Learning Xinshuo Weng,
                        Yongxin Wang,
                        Kris M. Kitani
 Computer Vision and Pattern Recognition (CVPR), 2020
 code /
                        website /
                        slides /
                        bibtex
 State-of-the-art performance in 3D MOT in KITTI dataset |  
                    |  | Detecting Attended Visual Targets in Video Eunji Chong,
                        Yongxin Wang,
                        Nataniel Ruiz ,
                        James M. Rehg
 Computer Vision and Pattern Recognition (CVPR), 2020
 code /
                        dataset /
                        bibtex
 Predicting where the people are looking at in videos. |  
                    |  | Connecting Gaze, Scene, and Attention:
                                Generalized Attention Estimation via Joint
                                Modeling of Gaze and Scene Saliency Eunji Chong,
                        Nataniel Ruiz ,
                        Yongxin Wang,
                        Yun Zhang,
                        Agata Rozga,
                        James M. Rehg
 European Conference on Computer Vision (ECCV), 2018
 poster /
                        bibtex
 
 Predicting where the people are looking at. |  
                    |  | TypoTweet Maps: Characterizing Urban Areas
                                through Typographic Social Media Visualization Alex Godwin,
                        Yongxin Wang,
                        John T. Stasko,
 European Conference on Visualization (EuroVis), 2017
 bibtex
 
 Visualizing social media data in a Typographic map. |  
                
                    
                    
                        |  | Amazon AGI, Jun. 2023 - Present 
 
                                Applied Scientist
                             Launched Amazon Titan (2023) and Amazon Nova (2024) suites of foundation models, including Amazon’s
                                large language models (LLMs), image generation models, and video generation models. Responsible for R&D to improve the performance, safety, and transparency of generative AI models. |  
                        |  | Amazon AutoGluon, Oct. 2022 - Jun. 2023 
 
                                Applied Scientist
                             Amazon's opensource AutoML Framekwork that allows users to train and evaluate ML models with 3 lines of code |  
                    |  | Amazon
                            Rekognition, Mar. 2020 - Oct. 2022 
 
                            Applied Scientist
                          Launched Celebrity Recognition V2 API. Launched Face Embedding Model V6 |  
                    |  | Carnegie Mellon University,
                        Jan. 2019 - Mar. 2020 
 
                            Research Assistant with Prof. Kris
                            Kitani
                          Worked on simultaneous detection and associate with Graph Neural Networks for Multi-Object
                            Tracking |  
                    |  | Carnegie Mellon University,
                        Aug. 2019 - Mar. 2020 
 
                            Research Assistant with Prof.
                            Louis-Philippe Morency
                          Worked on modeling multimodal temporal languange sequences with Graph Neural Networks |  
                    |  | Amazon
                            Rekognition, May. 2019 - Aug. 2019 
 
                            Applied Scientist Intern with Dr. Wei
                            Xia
                          Worked on high-resolution face synthesis with disentangled control through facial identity
                            and attributes |  
                    |  | Georgia Institute of
                            Technology, Jan. 2017 - May. 2018 
                            Research Assistant Intern with Prof. Jim
                            Rehg
                          Worked on gaze target prediction in image and in video |  |