Training data from environments built with real-world scanned products. Real geometry. Real physics. Real materials. Complete ground truth with every render.
25,000+ real-world products. 5M+ generated visual states. 150+ structured metadata attributes.
RGB, depth maps, surface normals, semantic segmentation, instance segmentation, and material properties, all generated automatically from physics-accurate environments. Every label is computed from scene geometry at render time, not annotated after the fact.
Lifestyle renders in realistic 3D environments. Studio/silo shots with controlled lighting. Dimension views with orthographic projection. 360-degree turntables with 36 to 72 frames. PBR material captures at macro scale. All from the same underlying geometry and materials.
Every environment is fully 3D, spatially accurate, and built with real-world scale. Lighting metadata (type, intensity, color temperature) and spatial properties (scene bounds, floor plane, world up axis, units) included.
3D assets with watertight meshes and collision metadata. PBR materials at 4K resolution. Multi-angle render outputs including lifestyle, turntable, and transparent backgrounds. Configuration graphs with parent-child relationships and SKU-to-variant mappings. Rich metadata with 100 to 150 structured attributes per scene. Visual preference signals with pairwise rankings and quality ratings.
A continuous 7-stage pipeline: source inputs, 3D modeling and QC, canonical object model, render recipes, synthetic generation, quality control, and versioned releases. Not a one-off data dump.
Well-typed object graph with core entities: Product, SKU/Variant, Component, Material, and RenderRecipe. Stable IDs, timestamps, and clear relationships designed for ML pipelines.
Structured metadata covering geometry (~25 attrs), materials (~30 attrs), camera (~15 attrs), lighting (~35 attrs), environment (~20 attrs), and references (~15 attrs). Available as JSON or Parquet.
Imagine.io provides real commercial products, clean mesh topology, full scene graphs, rich structured metadata, regeneratable renders, and human refinement traces. Open 3D datasets, scraped images, and academic synthetic data lack most of these capabilities.
Perception model training (object detection, instance segmentation, depth estimation). World model pre-training (multi-view, temporal consistency). VLA and foundation model training (scene graphs, action annotations). Vision foundation models (image generation, multimodal, preference signals).