April : Week 1 Seminar

InnovAI April Week 1 Seminar

On April 3, the InnovAI April Week 1 Seminar was held from 7:00 PM to 9:00 PM in D407. The session featured a presentation by Jihun Kim on SayCan and R3M.

SayCan leverages the semantic reasoning capabilities of large language models (LLMs) by combining them with a set of pretrained, grounded robotic skills. Instead of directly outputting actions, the LLM suggests high-level natural language steps, and a value function assesses their feasibility in the current physical context. This separation enables robots to follow long-horizon, abstract instructions grounded in real-world environments, effectively using the language model as the “brain” and the robot as the “body.”

R3M is a visual representation learned from large-scale human video datasets (e.g., Ego4D), combining time-contrastive learning, video-language alignment, and sparsity regularization. Once pretrained, R3M can be used as a frozen visual encoder, providing rich, transferable features that enable efficient learning of diverse manipulation tasks with minimal demonstrations, both in simulation and real-world settings.

Following the presentation, a Q&A session was held to discuss related topics.