The Physical AI Data Bottleneck

A Four-Part Challenge - the limits of simulation, missing tactile feedback, the Sim-to-Real gap, and the constraints of teleoperation

woman in black turtleneck shirt

Julia Kim

Co-founder & CEO

Featured

Featured

Featured

A woman with a file
A woman with a file
A woman with a file

The race for physical artificial intelligence has already started, and robotics companies are growing more quickly than before.

However, nearly every team encounters the same four-part problem: a challenging puzzle with no viable answers.

These days, every robotics company has to deal with the same four-part problem: a difficult puzzle with no good options.

🤨 It's costly, time-consuming, and excruciatingly slow to scale, but you need real-world data collected as robots move and interact in physical environments.

🤨 You resort to teleoperation, but the data is noisy, unintentional, and limited by human skill and sensory feedback.

🤨 Then you try simulation, but it's not real, which causes a gap between simulation and reality that breaks down when physics becomes complex.

🤨 Furthermore, the most important component—tactile feedback—is disregarded throughout, making robots awkward and devoid of feeling.

These four problems are not separate ones.

A bottleneck that necessitates a new approach to gathering, scaling, and integrating data for the physical world is one of the systemic failures.

Limits of Teleoperation Data

There are significant problems with teleoperated demonstrations, in which a human controls a robot to gather training data.

First, the policy intent underlying actions is frequently not captured by such data. It is difficult to determine the expert's true intent because the robot only sees the actions performed, not the underlying purpose or reasoning. Due to fatigue or trial-and-error, human teleoperators may perform exploratory or corrective actions that are not part of an optimal strategy. As a result, the collected trajectories may be noisy or suboptimal, making it difficult to understand the "why" behind each action.

Second, sensory data in teleoperation datasets is often lacking. The human operator receives little feedback from the robot in many teleop setups. For instance, the majority of systems in use today do not have haptic or tactile feedback, which means that the sense of force or touch is absent when gathering data. Even in surgical telerobots, this lack of force/tactile data is a well-known problem that results in the robot's logs lacking important contact cues. Because the demonstrations did not convey the feel or difficulty of grasping an object, policies learned from pure teleop data may struggle with contact-rich tasks.

Lastly, teleoperation requires a lot of work and requires specialized hardware. Scaling up the quantity and diversity of data is very expensive because each teleoperated dataset requires skilled humans and frequently custom rigs (such as VR consoles or motion controllers mapped to the robot).

In conclusion, the limitations of teleoperation can be summarized as follows:

  • No intention to implement the policy. Motion can be imitated by robots, but motivation cannot. It is challenging to deduce expert reasoning from their observation of how an action was carried out rather than why.

  • Suboptimal and noisy. Optimal strategies are obscured by the corrections or exploratory movements made by human operators due to fatigue.

  • Senses missing. Most teleop setups lack tactile or force feedback. Robots cannot learn friction, weight, or texture if they do not experience contact.

  • Not scalable. Because teleoperation relies on trained operators and specialized hardware rigs, it is too costly and slow to diversify on a large scale.

The Cost of Real Data

It takes extensive, varied, and real-world experience to build general-purpose, nimble robots, but scaling that experience is practically impossible.

There are no Internet of robot experiences to scrape, unlike text or images.

Physical interaction is required for every hour of real robot data; this is costly, time-consuming, and dangerous.

Every human environment is different, according to researchers at Toyota Research Institute (TRI). A robot would have to encounter thousands of homes, items, and edge cases in order to generalize, which is not feasible for any one lab or fleet.

The "enough" scale is startling. Despite requiring more than 700 tasks and 130,000 teleoperated episodes, Google's RT-1 project only covered a small portion of real-world situations.

Physical data collection, even at this level, wears down robots, necessitates ongoing maintenance, and requires specialized setups that prevent mass deployment.

Salesforce Ventures noted that real-world robot data is still essentially unscalable, a costly procedure that lacks an analog to the abundance of crowdsourced data on the internet.

In conclusion, real data comes from real robots, comes with limitations as follows:

  • Slow. Collecting data through physical interaction is costly, time-consuming, and resource-heavy.

  • The “Home” Problem. True generalization requires exposure to countless unique environments — something no single lab or fleet can cover.

  • The “Enough” Problem. Even large-scale efforts like RT-1 barely scratch the surface of what’s needed for real-world diversity.

  • Hardware Bottleneck. Robots wear down quickly under continuous workloads, driving up maintenance and operational costs.

Missing Tactile Data

If digital AI was built on vision, then physical AI lacks touch as a sense.

Most roboticists concur that robots will never be as dexterous as humans if they do not receive tactile feedback.

Humanoids of today have been compared to "adult-sized toddlers" because they are able to walk but lack feeling. Because we perceive imperfections in texture, slip, resistance, and pressure, humans are adept at manipulating.

While precise motors are given priority in modern robots, they lack tactile and proprioceptive feedback, which are the very channels that give human manipulation its resilience.

Although video or web-scale data can speed up pretraining, grounded, physical experience cannot be replaced, according to researchers at TRI and Google DeepMind. As Russ Tedrake of TRI points out, "Video alone isn't enough for embodied learning without real interaction."

Limits of Simulation: The Sim-to-Real Gap

Simulation and synthetic data have become essential for robotics research — but they’re no substitute for the real world.

Simulators can generate millions of virtual trials safely and cheaply. They’re invaluable for testing and covering rare or dangerous scenarios.

But even the best simulators remain approximations of reality. Subtle errors in modeling friction, deformable materials, or contact forces lead to failures once robots meet real physics.

The result is the well-known sim-to-real gap — controllers that work flawlessly in sim but break on deployment.

This is especially critical for contact-rich tasks, where the smallest deviation in force or shape can mean failure.

Simulation is powerful — but only as a complement. As NVIDIA and Google emphasize, the future lies in hybridization: using simulation to scale breadth, and real, sensor-rich data to ground accuracy.

To sum up:

  • The Sim2Real Gap. Even slight mismatches in friction or material deformation cause policies to “break” once deployed to physical robots.

  • A Supporting Role. Simulation is powerful for testing and augmentation — but it must complement, not replace, real interaction data.

The summary

Viewed together, these challenges reveal a single truth: the robotics industry isn’t facing four independent problems. It’s confronting one systemic infrastructure gap.

Each current path represents a failed trade-off. Simulation sacrifices realism. Teleoperation sacrifices scalability. Real-world data sacrifices speed and diversity.

To unlock progress, we don’t need another model—we need infrastructure. We need a foundation that lets robots feel, fail, and learn at scale.

The solution, then, is clear. We must shift our focus from models to the systems that feed them. The next frontier is Building the Foundation for Physical Intelligence.

Building the Foundation for Physical Intelligence

What's required is an integrated system designed to solve the four-part data bottleneck simultaneously, not in isolation. We must treat data collection not as a one-off task, but as a continuous, industrial-scale process.

This Data Engine is the true foundation for physical intelligence. Here is how it works.

  1. SET POLICY — Define the Goal

Every system needs direction. Before a single piece of data is generated, scaled, or evaluated, we must first define what success looks like.

Setting clear policy goals—what the robot should achieve, under what conditions, and why—ensures that every piece of collected data has purpose and relevance. This "Step 0" establishes the intent foundation that guides the entire engine.

  1. GENERATE — Start with Reality

The foundation must begin with high-fidelity, sensor-rich physical data. This is the non-negotiable ground truth.

This is where robots experience true physics—the friction of a drawer, the slip of a grasped object, the subtle force required to plug in a cable. In this step, we capture the critical tactile and force data that simulation misses and teleoperation often ignores.

  1. SCALE — Amplify with Human Intent

High-quality physical data is precious, but it is painfully slow to obtain. To solve this, the engine must connect this real-world data to a global Human-in-the-Loop network.

This isn’t the blind teleoperation of the past, which creates noisy, low-value data. This is a system for targeted, high-intent interaction. Operators don’t just drive the robot—they teach it. They correct its failures, confirm its successes, and embed a layer of reasoning and intent that raw logs alone cannot capture.

  1. EVALUATE — Close the Loop

With a foundation of real, intent-driven data, simulation's role is transformed. It becomes a powerful verification layer, not a flawed replacement for reality.

Millions of real-world task variations can be replayed in simulation to uncover edge cases and validate robustness. Models trained on real data can be stress-tested in synthetic environments. This powerful feedback loop transforms the "sim-to-real" gap into a "real-to-sim-to-real" flywheel—dramatically accelerating iteration, safety, and reliability.

This four-step process—Set, Generate, Scale, and Evaluate—is not a linear path but a continuous cycle. It is the engine that will build the foundation for the next era of robotics.