Generative AI and foundation models in robotics

The use of generative AI and foundation models within modern robotics systems opens up new potential for research and industry. Adaptive robots independently create movement and action plans, recognize objects and scenes using vision language models, and execute natural language commands in real time thanks to large language models (LLMs). Companies benefit from greater flexibility, reduced downtime, and more intuitive operation—even in dynamic or unstructured environments.

Zwei Intalogistikroboter, die in einer Testhalle einander gegenüberstehen.
© Fraunhofer IML

Relevance of Generative Artificial Intelligence in Robotics 

Automation solutions and robots have been used successfully for many years for clearly defined tasks in structured environments. However, in unstructured scenarios or when interacting with diverse objects and people, classic systems reach their limits. The underlying algorithms and AI models are usually designed for specific tasks and cannot generalize to unknown situations and objects.

Generative AI models such as large language models (LLMs) and vision language models (VLMs) extend classic AI algorithms with the ability to generalize. These so-called foundation models are trained on huge amounts of data from the internet and are able to generate new content such as texts or images from what they have learned. This gives robots a detailed understanding of their environment and enables them to plan specific actions based on this understanding.

By building agent systems, we are developing intelligent tools that improve both robots' understanding of situations and their ability to act. While these models often deliver impressive results in virtual applications, transferring them to the physical world—especially in robotics and specialized domains such as logistics and intralogistics—poses a particular challenge.

We combine interdisciplinary expertise in robotics, AI, and logistics and utilize our infrastructure—from motion capture in the PACE Lab to high-performance computing clusters—to successfully integrate and adapt models to specific domains, including fine-tuning. This allows us to combine the latest research trends with concrete industrial requirements.

Approaches for flexible systems 

 

  1. Agent systems for flexible task completion: Generative models enable the automatic creation of motion plans (task planning) and programs (code generation), thus allowing dynamic adaptation to new tasks.
  2. Expanded environmental understanding: Vision Language Models (VLMs) recognize scenes and objects and can describe and interpret them. This contextual awareness enables informed decision-making in complex environments.
  3. Intuitive human-robot interaction: Large language models (LLMs) enable natural language commands and support dialogue-based programming, commissioning, and fault diagnosis.
  4. Knowledge-based process support: Through the use of Retrieval-Augmented Generation (RAG), relevant process knowledge can be efficiently managed and provided in a context-specific manner.

Are you considering using AI-based robots? Contact us

Ein Roboterarm der einen Schraubendreher hält mit einer Frau mit VR Brille im Hintergrund.
© Fraunhofer IML

Comprehensive project support 

  • From the concept idea and needs analysis to the productive deployment of your AI robotics solution
  • Continuous coordination with your departments and iterative optimization

Generative AI models

  • We integrate foundation models for autonomous robot control and take care of the fine-tuning.

Data infrastructure in the PACE Lab

  • Motion- and image data are captured with high precision using motion capture
  • Generate photorealistic, synthetic datasets for robust model training

Integration of large language models, vision language models, and RAG

  • Integration of LLMs for natural language interaction 
  • Integration of VLMs for object recognition and scene interpretation in logistics processes

Development and testing of adaptive cobotics and picking solutions 

  • Realistic test scenarios in the PACE Lab for validating mobile robots, cobots, and picking workflows 
  • Feedback loops for rapid adaptation of hardware and software components

 System integration and validation with digital twins 

  • Seamless integration of AI modules with your control software and robot hardware
  • Automated simulation tests to check safety, robustness, and performance
Profilfoto von Oliver Urbann

»Generative AI expands robots' understanding of their environment and enables natural interaction—almost like with a human colleague. This turns robots into universal tools, for example in order picking or in semi-public spaces such as hospitals.«

Dr. Oliver Urbann, Head of AI & Robotics Research Group

FAQ robotics simulation

  • Generative AI is used to automate control and environmental perception in robotics, enabling in particular the generalization of robot behavior to unstructured problems.

  • Vision Language Models (VLMs) recognize objects and people, while Large Language Models (LLMs) adapt movement sequences in real time. This allows robots to navigate safely through changing scenarios.

  • Generative AI creates programs and plans on the fly, reducing manual parameterization and minimizing failure risks for unknown tasks.

  • Many existing models have been trained on general Internet data and are therefore often not directly transferable to specific industrial domains—domain-specific knowledge is lacking.

    Fraunhofer IML supports companies in selecting suitable models with its domain knowledge and has extensive experience in integrating different AI models into existing systems. The PACE Lab also provides a test environment in which functions can be tested and domain-specific data can be collected for fine-tuning generative models.

  • Task planning refers to the automatic generation of sequences of work steps or movement sequences for a robot. The system plans in real time how to optimally implement a given task, taking into account resources, spatial conditions, and objectives.

  • Code generation describes the automated process in which AI independently creates program code based on a defined task or description. In robotics, this means that control and sequence programs for robots are derived directly from generative models without the need for manual programming. This reduces the integration effort and dependence on system integrators.