ICML 2026

Trajectory Consistency for One-Step Generation on Euler Mean Flows

Direct dataset supervision for long flow maps, without Jacobian-vector products.

Euler Mean Flow one-step generation results across images and 3D tasks
Euler Mean Flows support one-step and few-step generation across latent-space images, pixel-space images, SDF-based 3D geometry, and point clouds. The 2D SDF validation is shown last.

Highlights

01

Long flow maps receive direct data supervision without JVPs.

02

Euler Mean Flow replaces difficult consistency with a local linear surrogate.

03

First \(x_1\)-prediction formulation enabling one-step pixel-space generation in this framework.

Abstract

We propose Euler Mean Flows (EMF), a flow-based generative framework for one-step and few-step generation that enforces long-range trajectory consistency with minimal sampling cost. EMF replaces the hard-to-supervise trajectory consistency constraint with a principled linear surrogate that enables direct data supervision for long-horizon flow-map compositions.

The resulting JVP-free training framework supports both \(u\)-prediction and \(x_1\)-prediction variants, avoiding explicit Jacobian computations while reducing memory and compute. We validate EMF on image synthesis, particle-based geometry generation, and functional generation tasks, with improved stability and sample quality under fixed sampling budgets.

Core Motivation

Can long flow maps receive direct data supervision without using JVPs?

The goal is not merely to improve MeanFlow. The central question is whether long flow maps can be trained from dataset supervision while avoiding the Jacobian-vector products required by continuous formulations.

Trajectory consistency is indirect

Trajectory consistency is a fundamental property of flow maps: composing shorter maps should agree with the corresponding long map.

\[ \mathcal L^{TC} = \mathbb E\left[ \left\| \psi^\theta_{t\to r}(x_t) - \psi^\theta_{s\to r}\!\left( \psi^\theta_{t\to s}(x_t)\right) \right\|^2 \right]. \]

However, this loss only enforces consistency between model outputs. It does not directly ensure that long maps match the target distribution.

EMF answer

EMF combines the semigroup view with an Euler-style local linear approximation. This injects the directly computable conditional velocity \(u_t(x_t\mid x_1)=\frac{x_1-x_t}{1-t}\) into long-map learning, without JVPs and with a simple fixed \(\Delta t\).

Prior paths leave a gap

Progressive extension

ShortCut, PSD, and SplitMeanFlow extend short maps to long maps through consistency. Their long maps are learned from model outputs, not direct dataset supervision.

\[ \left\| u^\theta_{t\to t}(x) - u_1(x\mid x_1) \right\|^2,\qquad u_1(x\mid x_1)=\frac{x_1-x}{1-t},\;x_1\in\mathcal D \] \[ \left\| u^\theta_{t\to t+2d}(x_t) - \frac{1}{2} \left[ u^\theta_{t\to t+d}(x_t) + u^\theta_{t+d\to t+2d}(x_{t+d}) \right] \right\|^2 \]

Thus \(u^\theta_{t\to t+2d}(x)\) is extended from previous model predictions, so long maps get no direct supervision from \(\mathcal D\).

Continuous methods

MeanFlow, ESD, and LSD use data-based conditional velocities for direct supervision, but require JVPs, which hurts memory, speed, stability, and use in derivative-fragile settings.

\[ \left\| u^\theta_{t\to r}(x) - \operatorname{sg}\!\left[ u_t(x\mid x_1) + (r-t) \left( u_t(x\mid x_1)\partial_x u^\theta_{t\to r}(x) + \partial_t u^\theta_{t\to r}(x) \right) \right] \right\|^2 \] \[ u_t(x\mid x_1)=\frac{x_1-x}{1-t},\qquad x_1\in\mathcal D . \]

This gives direct dataset supervision, but the \(\partial_x u^\theta\) and \(\partial_t u^\theta\) terms introduce derivative computation and JVP-style overhead.

AlphaFlow combines the two directions, but still relies on JVPs.

Comparison of Euler Mean Flow with continuous-equation-based and progressive-extension methods
EMF targets the missing quadrant: direct dataset supervision for long flow maps without continuous-equation JVPs.
Formulation

Euler Mean Flow

EMF linearizes the semigroup trajectory objective and turns it into a supervised target for \(u^\theta_{t\to r}\).

Local Euler surrogate

\[ u_{t\to r}(x_t) \approx u_{t\to t}(x_t) + (r-t-\Delta t) \frac{ u_{t+\Delta t\to r}(x_{t+\Delta t}) - u_{t\to r}(x_t)} {\Delta t}. \]

Replacing the instantaneous velocity with \(u_t(x\mid x_1)=\frac{x_1-x}{1-t}\) gives a learnable conditional target with explicit dataset supervision.

\(u\)-prediction loss

\[ \begin{aligned} \mathcal L^{E}(\theta) &= \mathbb E\Big[ \big\| u^\theta_{t\to r}(x) - \big( u_t(x\mid x_1) + (r-t-\Delta t)_+ \operatorname{sg} \frac{ u^\theta_{t+\Delta t\to r}(x') - u^\theta_{t\to r}(x)} {\Delta t} \big) \big\|^2 \Big],\\ x'&=x+\operatorname{sg}(\Delta t\,u^\theta_{t\to t}(x)). \end{aligned} \]

\(x_1\)-prediction parameterization

\[ \tilde x_{t\to r}(x) = (1-t) \frac{\phi_{t\to r}(x)-x}{r-t} + x. \]

When \(r=t\), the conditional endpoint target becomes \(\tilde x_t(x\mid x_1)=x_1\), which is especially useful for pixel-space and SDF generation.

\(x_1\)-prediction loss

\[ \begin{aligned} \mathcal L^{E'}(\theta) &= \mathbb E\Big[ \big\| \tilde x^\theta_{t\to r}(x) - \big( \tilde x_t(x\mid x_1) + (r-t-\Delta t)_+ \frac{1-t}{1-r} \operatorname{sg} \frac{ \tilde x^\theta_{t+\Delta t\to r}(x') - \tilde x^\theta_{t\to r}(x)} {\Delta t} \big) \big\|^2 \Big],\\ x'&=x+\operatorname{sg}\!\left( \Delta t\frac{\tilde x^\theta_{t\to t}(x)-x}{1-t} \right). \end{aligned} \]
Tasks

Applications

EMF is evaluated across latent image generation, pixel-space image generation, functional SDF generation, and sparse point cloud generation.

Latent Space

Latent-Space Image Generation

EMF supports standard latent-space image generation with a DiT backbone and a pre-trained VAE latent representation. We show class-conditional ImageNet generation with one, two, and four sampling steps under the same condition.

Latent-space ImageNet generation results from Euler Mean Flows
Latent-space ImageNet generation. Rows show 1-step, 2-step, and 4-step samples with the same class condition.

Pixel Space

Pixel-Space Image Generation

The \(x_1\)-prediction variant enables stable one-step pixel-space generation with JiT, which directly processes image patches without a VAE latent space.

Image generation results from Euler Mean Flows
Image generation results from the EMF paper.

Functional Geometry

SDF-Based 3D Shape Generation

EMF also applies to functional SDF generation. The model learns a one-step map over signed-distance functions and can generate 3D shapes from sparse geometric conditioning.

One-step SDF-based 3D shape generation results
SDF-based 3D shape generation, including results conditioned on sparse surface samples.

Sparse Geometry

Point Cloud Generation

EMF is JVP-free, which makes it compatible with sparse point cloud architectures based on PVCNNs. We apply it to LION-style latent point cloud generation on ShapeNet categories.

One-step point cloud generation results from Euler Mean Flows
One-step point cloud generation results on ShapeNet.

Validation

2D SDF Stability

This validation isolates the stability issue in a 2D MNIST SDF setting. It compares \(u\)-prediction and \(x_1\)-prediction under the same functional-generation setup.

2D MNIST SDF validation comparing u-prediction and x1-prediction
The 2D SDF experiment shows why \(x_1\)-prediction is important for stable functional generation.
Paper

Paper

The compiled paper is embedded below for quick reading.

Citation

Final author metadata can be filled in after the ICML record is available.

@inproceedings{eulermeanflows2026,
  author    = {Li, Zhiqi and Sun, Yuchen and Turk, Greg and Zhu, Bo},
  title     = {Trajectory Consistency for One-Step Generation on Euler Mean Flows},
  booktitle = {International Conference on Machine Learning},
  year      = {2026}
}