Director
PlusAI, USA
Bio:Dr. Riad I. Hammoud currently works at PlusAI Inc. (Santa Clara, CA) as a Director of Autonomy, where his team is developing highly-automated and self-driving vehicles. He holds a Ph.D. in Computer Vision and Robotics from INRIA (2001) and an MS in Control of Systems from UTC (France, 1997). In the past six years, his R&D has been on multi-modal object detection, deep learning, multi- sensor fusion, multi-object tracking, state estimation, situational awareness, and motion planning for autonomous driving. Prior to that, he worked at TuSimple, FAST Labs, Tobii-Dynavox and Delphi Automotive Systems. He led numerous EO/IR/aerial/ground imagery exploitation projects. He contributed to the DARPA Robotics Challenge (DRC) while serving as a collaborative researcher at MIT. He also taught motion planning at Worcester Polytechnic Institute (2018-2021). Dr. Hammoud has been serving on the tech committee of several conferences, chairing the IEEE PBVS and SPIE ATR conferences, has authored numerous conference publications, books, and guest editor of several special issues of top journals.
Abstract:In this keynote talk, I will reflect on two decades of transformative progress in Perception Beyond the Visible Spectrum at CVPR (PBVS 2004-2024). I will highlight the diverse applications, benchmarks, deep learning advancements, and opportunities that have emerged from exploring the non-visible electromagnetic spectrum (including near infrared, thermal, radar imaging, x-ray, and multi- spectral). Additionally, I will share key insights learned and discuss the pivotal role of the PBVS CVPR workshop community in shaping the future of this dynamic field.
Professor
University of Central Florida, USA
Bio:Dr. Mubarak Shah, the UCF Trustee Chair Professor, is the founding director of Center for Research in Computer Visions at University of Central Florida (UCF). Dr. Shah is a fellow of ACM, IEEE, AAAS, NAI, IAPR, AAIA and SPIE. He has published extensively on topics related to human activity and action recognition, visual tracking, geo localization, visual crowd analysis, object detection and categorization, shape from shading, etc. He has served as ACM and IEEE Distinguished Visitor Program speaker. He is a recipient of 2022 PAMI Mark Everingham Prize for pioneering human action recognition datasets; 2019 ACM SIGMM Technical Achievement award; 2020 ACM SIGMM Test of Time Honorable Mention Award for his paper “Visual attention detection in video sequences using spatiotemporal cues”; 2020 International Conference on Pattern Recognition (ICPR) Best Scientific Paper Award; an honorable mention for the ICCV 2005 Where Am I? Challenge Problem; 2013 NGA Best Research Poster Presentation; 2nd place in Grand Challenge at the ACM Multimedia 2013 conference; and runner up for the best paper award in ACM Multimedia Conference in 2005 and 2010. At UCF he has received Pegasus Professor Award; University Distinguished Research Award; Faculty Excellence in Mentoring Doctoral Students; Faculty Excellence in Mentoring Postdoctoral Scholars, Scholarship of Teaching and Learning award; Teaching Incentive Program award; and Research Incentive Award.
Abstract:I presented my talk, "Target Tracking in FLIR Imagery Using Mean-Shift and Global Motion Compensation," at the inaugural Workshop on Perception Beyond the Visible Spectrum during CVPR 2001. Since that milestone, the realm of computer vision has undergone a profound transformation, primarily fueled by the advent of deep learning. This revolution has left an indelible mark on the entire field, extending its reach beyond the boundaries of the visible spectrum. In this presentation, I aim to provide a reflective journey through the evolution of computer vision research spanning the past quarter-century. Moreover, I will delve into the transformative impact of deep learning on our understanding and exploration of perception, particularly in realms beyond what the human eye can see. Specifically, I will illuminate our endeavors in advancing perception technologies across infrared (IR), synthetic aperture radar (SAR), and light detection and ranging (LiDAR) domains. Through this retrospective lens, we will explore the fascinating interplay between technological innovation and the widening horizons of computer vision.
Professor
University of Munich, Germany
Bio:Björn Ommer is a full professor at Ludwig Maximilian University of Munich where he is heading the Computer Vision & Learning Group. Before, he was a full professor in the department of mathematics and computer science at Heidelberg University and a co-director of its Interdisciplinary Center for Scientific Computing. He received his diploma in computer science from University of Bonn, his PhD from ETH Zurich, and he was a postdoc at UC Berkeley. Björn serves in the Bavarian AI council and has been an associate editor for IEEE T-PAMI. His research interests include semantic scene understanding and retrieval, generative AI and visual synthesis, self-supervised metric and representation learning, and explainable AI. Moreover, he is applying this basic research in interdisciplinary projects within neuroscience and the digital humanities. His group has published a series of generative approaches, including work known as "VQGAN" and "Stable Diffusion", which are now democratizing the creation of visual content and have already opened up an abundance of new directions in research, industry, the media, and beyond.
Abstract: Recently, generative models for learning image representations have seen unprecedented progress. Approaches such as diffusion models and transformers have been widely adopted for various tasks related to visual synthesis, modification, analysis, retrieval, and beyond. Despite their enormous potential, current generative approaches have their own specific limitations. We will discuss how recently popular strategies such as flow matching can significantly enhance efficiency and democratize AI by empowering smaller models. The main part of the talk will then investigate effective ways to utilize pretrained diffusion-based image synthesis models for different tasks and modalities. Therefore, we will efficiently translate powerful generative image representations to different modalities and show evaluations on other tasks.