I spent time reflecting on Jensen Huang’s recent keynote, and one idea in particular stood out to me: structured data is the foundation of trustworthy AI.
It’s a simple idea—but it cuts to the core of what’s changing right now in AI.
For years, we’ve had no shortage of data. In fact, most enterprises are overwhelmed by it. But raw data—especially unstructured data like video—doesn’t automatically translate into intelligence. It doesn’t drive decisions. And it certainly doesn’t build trust.
What Jensen is pointing to is a shift: AI is moving from pattern recognition to reasoning. And reasoning systems—whether we call them Physical AI, VLMs, or agentic AI—depend on structured, contextualized data they can reliably interpret.
In NVIDIA terms, this is about building a full-stack pipeline—from sensors to accelerated compute to AI models—that turns real-world signals into actionable understanding.
In business terms, it’s much more straightforward:
If your data isn’t structured, your AI won’t be trusted. And if it isn’t trusted, it won’t be used.
This is where I see a strong alignment with what we’ve been building at Vaidio.
Most organizations already have massive sensor networks deployed—millions of cameras capturing operations, safety events, customer behavior, and workflows every second. But that data is largely unusable in its raw form.
What we do is convert that video into structured, searchable, and actionable data.
We identify objects, behaviors, movements, dwell times, and interactions—and turn those into metadata that systems can query, analyze, and act on in real time.
But more importantly, this isn’t just about analytics.
It’s about creating a visual data layer—a foundation of structured, high-quality data that AI systems can reason over. The same way transactional systems rely on structured databases, Physical AI systems will rely on structured representations of the real world.
But structure alone isn’t enough.
Structured data tells you what happened. What’s emerging now—and what NVIDIA is actively building—is the ability to understand what that data means in context.
At GTC, NVIDIA introduced new building blocks like the Blueprint for Video Search and Summarization (VSS)—a framework for creating interactive AI agents that can search, summarize, and reason over massive amounts of video. Under the hood, this combines Vision Language Models (VLMs), large language models for search and orchestration, and reasoning models like Cosmos Reason VLM.
That stack matters.
Because VLMs don’t replace structured data—they depend on it. They sit on top of structured visual data and bring reasoning to it: connecting events across time, interpreting intent, and enabling systems to answer higher-order questions like What changed? Is this expected? What should happen next?
There’s an important distinction here.
Platforms like Vaidio create the structured data foundation by transforming video into reliable, queryable information. NVIDIA’s VLM-driven frameworks then operate on top of that foundation—adding context, reasoning, and enabling a new class of interactive, agent-driven applications.
In simple terms:
That progression is what turns raw observation into real-world outcomes—and it’s exactly the direction NVIDIA is accelerating toward.
Structured data is what makes AI trustworthy—but it’s also what makes AI actionable.
Once video is transformed into structured intelligence, it doesn’t just sit in a dashboard. It can trigger workflows, automate decisions, and integrate directly into operational systems.
This is where the conversation shifts from analytics to agentic automation.
Instead of a human watching video and deciding what to do, systems can:
At Vaidio, we see this every day—video becoming part of the enterprise data fabric, feeding systems that drive security, compliance, and operational outcomes in real time.
That’s the bridge between perception and action.
NVIDIA’s vision for Physical AI—where AI understands and interacts with the physical world—depends on closing that loop.
You need:
Without that data layer, Physical AI remains theoretical.
With it—and with VLM-based systems like those NVIDIA is introducing—you can begin to orchestrate real-world systems across facilities, cities, and industries.
Systems that don’t just analyze what’s happening, but actively participate in how operations run.
That’s where agentic systems become real.
What Jensen articulated in a single sentence is something we see playing out across our customers today.
The organizations that succeed with AI aren’t the ones with the most data. They’re the ones that can structure it, operationalize it, and trust it.
At Vaidio, we believe the future of AI won’t just be built on models—it will be built on data layers that translate the physical world into something AI can understand—and now, increasingly, reason about and act on.
That’s the real foundation of Physical AI—and where the next wave of value will be created.