Every year before the Embedded Vision Summit, I try to step back and reflect on the big picture in embedded AI and computer vision. This year, on the Summit’s 15th anniversary, two trends could not be clearer. First, AI and computer vision applications are moving from the lab to the real world, from science projects to widespread deployment. Second, multimodal AI—encompassing text, vision, audio and other sensory inputs—is revolutionizing what these systems are capable of.
The first trend—scaling—is wonderfully illustrated by Gérard Medioni’s keynote talk, “Real-World AI and Computer Vision Innovation at Scale.” Medioni was one of the team responsible for Amazon’s Just Walk Out cashier-less checkout technology, so he knows a thing or two about computer vision at scale. He will also discuss AI innovations that are improving the streaming experience for over 200 million Amazon Prime Video users worldwide.
Medioni’s talk will be followed by a panel discussion, “Edge AI and Vision at Scale: What’s Real, What’s Next, What’s Missing,” moderated by Sally Ward-Foxton of EE Times. On this panel, Medioni will be joined by distinguished experts from Waymo, Hayden AI,and Meta Reality Labs to discuss how vision and AI projects can go from an idea to being used by thousands or millions of people, and the challenges that must be overcome along the way.
On that same theme, Chris Padwick of Blue River Technology (a subsidiary of John Deere) will discuss “Taking Computer Vision Products from Prototype to Robust Product.” David Selinger will relate his experiences scaling up his start-up in “Deep Sentinel: Lessons Learned Building, Operating and Scaling an Edge AI Computer Vision Company.” And Jason Fayling will talk about what is needed to use AI and vision to improve operations at car dealerships in “SKAIVISION: Transforming Automotive Dealerships with Computer Vision.”
The second trend—multimodal intelligence—is spotlighted by another keynote talk, this one from Trevor Darrell of U.C. Berkeley: “The Future of Visual AI: Efficient Multimodal Intelligence.” Darrell will discuss the integration of natural language processing and computer vision through vision-language models (VLMs) and will share his perspective on the current state and trajectory of research advancing machine intelligence. Particularly relevant to edge applications, much of his work aims to overcome obstacles, such as massive memory and compute requirements, that limit the practical applications of state-of-the-art models.
Continuing the theme of multimodal intelligence, the Summit will feature several insightful talks that dive into the integration and application of multimodal AI. Mumtaz Vauhkonen from Skyworks Solutions will present “Multimodal Enterprise-Scale Applications in the Generative AI Era,” highlighting the importance of multimodal inputs in AI problem-solving. Vauhkonen will discuss the creation of quality datasets, multimodal data fusion techniques and model pipelines essential for building scalable enterprise applications, while also addressing the challenges of bringing these applications to production.
Frantz Lohier from AWS will introduce the concept of AI agents in his talk, “Introduction to Designing with AI Agents.” Lohier will explore how these autonomous components can enhance AI development through improved decision-making and multiagent collaboration, offering insights into the creation and integration of various types of AI agents. And Niyati Prajapati from Google will discuss “Vision LLMs in Multi-Agent Collaborative Systems: Architecture and Integration,” focusing on the use of vision LLMs in enhancing the capabilities and autonomy of multi-agent systems. Prajapati will provide case studies on automated quality control and warehouse robotics, illustrating the practical applications of these advanced architectures.
Because many product developers are eager to learn the practical aspects of incorporating multimodal AI into products, I will be co-presenting a three-hour training, “Vision-Language Models for Computer Vision Applications: A Hands-On Introduction,” in collaboration with Satya Mallick, the CEO of OpenCV.Org. With a focus on practical VLM techniques for real-world use cases, this class is designed for professionals looking to expand their skill set in AI-driven computer vision, particularly in systems designed for deployment at the edge.
Of course, the Summit would not be the same without its Technology Exhibits, focused on the latest building block technologies for creating products that incorporate AI and vision. The more than 65 exhibitors include Network Optix, Qualcomm, BDTI, Brainchip, Cadence, Lattice, Micron, Namuga, Sony, SqueezeBits, Synopsys, VeriSilicon, 3LC, Chips&Media, Microchip, Nextchip, Nota AI and STMicroelectronics, as well as dozens of others.
Looking back on the progress in embedded AI and computer vision over the last fifteen years, I can only shake my head in wonder. Back then, the idea of a computer being able to reliably understand images was almost science fiction. Today, machines cannot just understand images and other sensing modalities, but actually reason about them, enabling vast new classes of applications. I can barely imagine what the next fifteen years holds!
Website: International Research Awards on Computer Vision
#computervision #deeplearning #machinelearning #artificialintelligence #neuralnetworks, #imageprocessing #objectdetection #imagerecognition #faceRecognition #augmentedreality #robotics #techtrends #3Dvision #professor #doctor #institute #sciencefather #researchawards #machinevision #visiontechnology #smartvision #patternrecognition #imageanalysis #semanticsegmentation #visualcomputing #datascience #techinnovation #university #lecture #biomedical
Awards-Winners : computer-vision-conferences.scifat.com/awards-winners
Contact us : computersupport@scifat.com
No comments:
Post a Comment