Yongxin (Richard) Wang

I am an Applied Scientist at Amazon. I work on Amazon's suite of multimodal foundation models, with a focus on safety and alignment of generative models. We recently launched Amazon Titan (2023) and Amazon Nova (2024) I am currently an Applied Scientist at Amazon AutoGluon, working on an AutoML platform that allows users to train and eval ML models in just 3 lines of code. I am an Applied Scientist at Amazon AWS AI, under the Rekognition team providing image analysis service to customers. I primaliry work on topics related to face recognition.

Prior to Amazon, I obtained my Master of Science in Computer Vision (MSCV) Degree at the Robotics Insitute of Carnegie Mellon University. I work with Prof. Kris Kitani on Multi-Object Tracking (MOT), and Prof. Louis-Philippe Morency on Multimodal Machine Learning. I obtained my Bachelor's Degrees from Georgia Institute of Technology with double majors in Computer Science and Industrial Engineering. I also have worked with Prof. Jim Rehg on deep learning based human gaze analysis.

yongxinw [at] amazon.com  /  CV  /  Google Scholar  /  LinkedIn

profile photo
News
  • Nov. 2024 - Amazon Nova Technical Report.
  • Nov. 2024 - Amazon Nova is launched.
  • Jan. 2024 - Patent granted for Hierarchical graph neural networks for visual clustering
  • Nov. 2023 - Amazon Titan is launched.
  • Oct. 2022 - 2 papers accepted in ECCV 2022.
  • Oct. 2021 - 1 paper accepted in ICCV 2021
  • Jun. 2021 - 1 paper accepted in NAACL-HLT 2021
  • May. 2021 - 1 paper accepted in ICRA 2021
  • Publications
    The Amazon Nova family of models: Technical report and model card
    Amazon Artificial General Intelligence, 2024

    Unsupervised and semi-supervised bias benchmarking in face recognition
    Alexandra Chouldechova, Siqi Deng, Yongxin Wang, Wei Xia, Pietro Perona
    European Conference on Computer Vision (ECCV), 2022

    PSS: Progressive Sample Selection for Open-World Visual Representation Learning
    Tianyue Cao, Yongxin Wang, Yifan Xing, Tianjun Xiao, Tong He, Zheng Zhang, Hao Zhou, Joseph Tighe
    European Conference on Computer Vision (ECCV), 2022

    Learning hierarchical graph neural networks for image clustering
    Yifan Xing, Tong He, Tianjun Xiao, Yongxin Wang, Yuanjun Xiong, Wei Xia, David Wipf, Zheng Zhang, Stefano Soatto
    International Conference on Computer Vision (ICCV), 2021

    MTAG: Modal-Temporal Attention Graph for Unaligned Human Multimodal Language Sequences
    Jianing Yang*, Yongxin Wang*, Ruitao Yi, Yuying Zhu, Azaan Rehman, Amir Zadeh, Soujanya Poria, Louis-Philippe Morency
    Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT), 2021
    code / bibtex

    Modal-Temporal Graph for analysing unaligned human language sequences.

    (* indicates equal contribution)

    Joint Object Detection and Multi-Object Tracking with Graph Neural Networks
    Yongxin Wang, Kris M. Kitani, Xinshuo Weng
    International Conference on Robotics and Automation (ICRA) 2021
    code / website / slides / bibtex

    Joint detection and association using Graph Neural Networks. Named GSDT on MOTChallenge.

    GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D Multi-Feature Learning
    Xinshuo Weng, Yongxin Wang, Kris M. Kitani
    Computer Vision and Pattern Recognition (CVPR), 2020
    code / website / slides / bibtex

    State-of-the-art performance in 3D MOT in KITTI dataset

    Detecting Attended Visual Targets in Video
    Eunji Chong, Yongxin Wang, Nataniel Ruiz , James M. Rehg
    Computer Vision and Pattern Recognition (CVPR), 2020
    code / dataset / bibtex

    Predicting where the people are looking at in videos.

    Connecting Gaze, Scene, and Attention: Generalized Attention Estimation via Joint Modeling of Gaze and Scene Saliency
    Eunji Chong, Nataniel Ruiz , Yongxin Wang, Yun Zhang, Agata Rozga, James M. Rehg
    European Conference on Computer Vision (ECCV), 2018
    poster / bibtex

    Predicting where the people are looking at.

    TypoTweet Maps: Characterizing Urban Areas through Typographic Social Media Visualization
    Alex Godwin, Yongxin Wang, John T. Stasko,
    European Conference on Visualization (EuroVis), 2017
    bibtex

    Visualizing social media data in a Typographic map.

    Experiences
    Amazon AGI, Jun. 2023 - Present

    Applied Scientist

    Launched Amazon Titan (2023) and Amazon Nova (2024) suites of foundation models, including Amazon’s large language models (LLMs), image generation models, and video generation models.

    Responsible for R&D to improve the performance, safety, and transparency of generative AI models.

    Amazon AutoGluon, Oct. 2022 - Jun. 2023

    Applied Scientist

    Amazon's opensource AutoML Framekwork that allows users to train and evaluate ML models with 3 lines of code

    Amazon Rekognition, Mar. 2020 - Oct. 2022

    Applied Scientist

    Launched Celebrity Recognition V2 API. Launched Face Embedding Model V6

    Carnegie Mellon University, Jan. 2019 - Mar. 2020

    Research Assistant with Prof. Kris Kitani

    Worked on simultaneous detection and associate with Graph Neural Networks for Multi-Object Tracking

    Carnegie Mellon University, Aug. 2019 - Mar. 2020

    Research Assistant with Prof. Louis-Philippe Morency

    Worked on modeling multimodal temporal languange sequences with Graph Neural Networks

    Amazon Rekognition, May. 2019 - Aug. 2019

    Applied Scientist Intern with Dr. Wei Xia

    Worked on high-resolution face synthesis with disentangled control through facial identity and attributes

    Georgia Institute of Technology, Jan. 2017 - May. 2018

    Research Assistant Intern with Prof. Jim Rehg

    Worked on gaze target prediction in image and in video


    Template from here