简介： 朱松纯教授出生于湖北省鄂州，全球著名计算机视觉专家，统计与应用数学家、人工智能专家。 1991年毕业于中国科技大学，1992年赴美留学，1996年获得美国哈佛大学计算机博士学位。2002年至2020年，在美国加州大学洛杉矶分校（UCLA）担任统计系与计算机系教授，UCLA视觉、认知、学习与自主机器人中心主任。在国际顶级期刊和会议上发表论文300余篇，获得计算机视觉、模式识别、认知科学领域多个国际奖项，包括3次问鼎计算机视觉领域国际最高奖项--马尔奖，赫尔姆霍茨奖等，2次担任国际计算机视觉与模式识别大会主席（CVPR2012、CVPR2019），2010-2020年 2次担任美国视觉、认知科学、人工智能等领域多大学、跨学科合作项目MURI负责人。朱松纯教授长期致力于构建计算机视觉、认知科学、乃至人工智能科学的统一数理框架。在留美28年后， 朱教授于2020年9月回国，担任北京通用人工智能研究院院长，并任北京大学讲席教授、清华大学基础科学讲席教授。
报告题目：Computer Vision: A Task-oriented and Agent-based Perspective
摘要： 随着文本模型GPT3/BERT等提出，预训练模型呈现高速发展的趋势，图像-文本联合学习的双模态模型也不断涌现，显示出在无监督情况下自动学习不同任务和快速迁移到不同领域数据的强大能力。然而，当前的预训练模型忽略了声音信息。在我们周边还包含大量的声音，其中语音不仅是人类之间交流的手段，还蕴藏着情绪和感情。本报告将介绍引入语音以后的首个图-文-音三模态大模型 “紫东太初”。模型将视觉、文本、语音不同模态通过各自编码器映射到统一语义空间，然后通过多头自注意力机制（Multi-head Self-attention）学习模态之间的语义关联以及特征对齐，形成多模态统一知识表示；既可以实现跨模态理解，还能实现跨模态生成，同时做到理解和生成认知能力的平衡；我们提出了一个基于词条级别(Token-level)、模态级别(Modality-level)以及样本级别(Sample-level)的多层次、多任务自监督学习统一框架，对更广泛、更多样的下游任务提供模型基础支撑，并特别地实现了通过语义网络以图生音、以音生图的功能。三模态大模型是迈向具有艺术创作能力、强大交互能力和任务泛化能力的通用型人工智能的一次重要的尝试。
简介： Larry S. Davis is a Distinguished University Professor in the Department of Computer Science and director of the Center for Automation Research (CfAR). His research focuses on object/action recognition/scene analysis, event and modeling recognition, image and video databases, tracking, human movement modeling, 3-D human motion capture, and camera networks. Davis is also affiliated with the Computer Vision Laboratory in CfAR. He served as chair of the Department of Computer Science from 1999 to 2012. He received his doctorate from the University of Maryland in 1976. He was named an IAPR Fellow, an IEEE Fellow, and ACM Fellow.
简介： Yoichi Sato is a professor at Institute of Industrial Science, the University of Tokyo. He received his B.S. degree from the University of Tokyo in 1990, and his MS and PhD degrees in robotics from School of Computer Science, Carnegie Mellon University in 1993 and 1997. His research interests include first-person vision, and gaze sensing and analysis, physics-based vision, and reflectance analysis. He served/is serving in several conference organization and journal editorial roles including IEEE Transactions on Pattern Analysis and Machine Intelligence, International Journal of Computer Vision, Computer Vision and Image Understanding, CVPR 2023 General Co-Chair, ICCV 2021 Program Co-Chair, ACCV 2018 General Co-Chair, ACCV 2016 Program Co-Chair and ECCV 2012 Program Co-Chair.
报告题目：Understanding Human Activities from First-Person Perspectives
摘要： Wearable cameras have become widely available as off-the-shelf products. First-person videos captured by wearable cameras provide close-up views of fine-grained human behavior, such as interaction with objects using hands, interaction with people, and interaction with the environment. First-person videos also provide an important clue to the intention of the person wearing the camera, such as what they are trying to do or what they are attended to. These advantages are unique to first-person videos, which are different from videos captured by fixed cameras like surveillance cameras. As a result, they attracted increasing interest to develop various computer vision methods using first-person videos as input. On the other hand, first-person videos pose a major challenge to computer vision due to multiple factors such as continuous and often violent camera movements, a limited field of view, and rapid illumination changes. In this talk, I will talk about our attempts to develop first-person vision methods for different tasks, including action recognition, future person localization, and gaze estimation.
简介： Michael Black received his B.Sc. from the University of British Columbia (1985), his M.S. from Stanford (1989), and his Ph.D. from Yale University (1992). After post-doctoral research at the University of Toronto, he worked at Xerox PARC as a member of research staff and area manager. From 2000 to 2010 he was on the faculty of Brown University in the Department of Computer Science (Assoc. Prof. 2000-2004, Prof. 2004-2010). He is one of the founding directors at the Max Planck Institute for Intelligent Systems in Tübingen, Germany, where he leads the Perceiving Systems department. He is also a Distinguished Amazon Scholar (VP), an Honorarprofessor at the University of Tuebingen, and Adjunct Professor at Brown University. His work has won several awards including the IEEE Computer Society Outstanding Paper Award (1991), Honorable Mention for the Marr Prize (1999 and 2005), and all three major test-of-time awards including the 2010 Koenderink Prize, the 2013 Helmholtz Prize, and the 2020 Longuet-Higgins Prize. He is a foreign member of the Royal Swedish Academy of Sciences. In 2013 he co-founded Body Labs Inc., which was acquired by Amazon in 2017.