Minghao Fu
← Back to Blogs

Stable Causal Graph ⇔ Complete Nonlinear ICA:
Even in Your Cyclic Graph / Dense Generation!

These are some of my computation drafts exploring the equivalence between stable causal graphs and complete nonlinear ICA (beyond the acyclic/lower-triangle case in Fu et al., 2025). The key insight is extending functional equivalence to the non-DAG setting, showing these two concepts remain equivalent even with cyclic structures.

Original Handwritten Draft (PDF)

Setup

Consider a structural causal model with exogenous variables $e = (e_1, e_2, e_3)$, latent sources $s = (s_1, s_2, s_3)$, and observed variables $x = (x_1, x_2, x_3)$, where each layer is generated through an invertible mechanism:

$$e \;\xrightarrow{s = f_s(e)}\; s \;\xrightarrow{x = f_x(s)}\; x$$

By the chain rule on the Jacobians:

$$J_x(e) = J_x(s) \cdot J_s(e) = \frac{\partial x}{\partial s} \cdot \frac{\partial s}{\partial e}$$

Since all mappings are invertible, the Jacobian $J_x(e)$ is invertible. If the causal graphs over $s$ and $x$ are DAGs with adjacency matrices $B_1$ and $B_2$, we can write:

$$J_x(e) = (I - B_1)^{-1} \cdot (I - B_2)^{-1}$$
e₁ e₂ e₃ s₁ s₂ s₃ x₁ x₂ x₃ fₛ fₓ Jₛ(e) = (I − B₁)⁻¹ Jₓ(s) = (I − B₂)⁻¹
The two-stage generative process. Exogenous noise $e$ maps to latent sources $s$, which map to observations $x$. Each stage carries a causal graph (green: adjacency $B_1$ over $s$, $B_2$ over $x$), and the stage Jacobians are $J_s(e)=(I-B_1)^{-1}$ and $J_x(s)=(I-B_2)^{-1}$. The full mixing Jacobian is their product, $J_x(e)=(I-B_1)^{-1}(I-B_2)^{-1}$.

This gives us the functional equivalence between the mixing function's Jacobian and the causal structure.

Note: self-loops can be handled following Lacerda et al., 2012. If the system is not stable (e.g., $e_3 \to x_1$ explodes), identifiability breaks down.

Main Result: The Equivalence

Direction 1: Complete Nonlinear ICA $\Rightarrow$ Stable Causal Graph

Suppose we have a complete nonlinear ICA model $x = g(e)$ where $J_g(e)$ is invertible and square. Since $J_g(e)$ is an invertible matrix, it always admits a LUP decomposition:

$$J_g(e) = L \cdot U \cdot P$$

where:

  • $L$: lower triangular matrix (non-zero diagonal)
  • $U$: upper triangular matrix (non-zero diagonal)
  • $P$: permutation matrix

If we fix the scaling and the permutation $P$, the LUP decomposition is unique. Both $L$ and $U$ have non-zero diagonals, so we can express them as:

$$J_s(e) = (I - U')^{-1} \quad \text{(up to a permutation } P' \text{)}$$ $$J_x(s) = (I - L')^{-1}$$
Jg(e) = L (I−L′)⁻¹ = Jₓ(s) × U (I−U′)⁻¹ = Jₛ(e) × P relabel sources two stacked DAGs: a stable causal graph
An invertible square Jacobian always factors as $J_g(e)=L\,U\,P$. Fixing the scaling and the permutation $P$ makes this factorization unique. The upper-triangular $U$ becomes a DAG over the sources, $J_s(e)=(I-U')^{-1}$, and the lower-triangular $L$ becomes a DAG over the observations, $J_x(s)=(I-L')^{-1}$. Two triangular factors are two stacked DAGs, which is exactly a stable causal graph.

where $U'$ is strictly upper triangular and $L'$ is strictly lower triangular (both are DAG adjacency matrices). Since $J_s(e)$ and $J_x(s)$ are bounded, the resulting causal system is stable. Self-loops can be freely introduced following Lacerda et al., 2012.

Direction 2: Stable Causal Graph $\Rightarrow$ Complete Nonlinear ICA

Conversely, if we have a stable causal graph, the Jacobian of the full mapping $x = g(e)$ takes the form $(I - B_1)^{-1}(I - B_2)^{-1}$, which is invertible. This is precisely the structure of a complete nonlinear ICA model.

TL;DR

Two days of "multiplying matrices" led to:

$$\boxed{\text{Stable Causal Graph} \iff \text{Complete Nonlinear ICA}}$$

Example: A Chain Graph

Consider a simple chain $x_1 \to x_2 \to x_3$ with adjacency matrix:

$$B = \begin{bmatrix} 0 & 0 & 0 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \end{bmatrix}$$

Its Markov equivalence class contains DAGs with the same set of conditional independencies. The corresponding "super matrix" (union of all edges in the equivalence class) is:

$$\begin{bmatrix} 0 & 1 & 0 \\ 1 & 0 & 1 \\ 0 & 1 & 0 \end{bmatrix}$$
Markov equivalence class (same skeleton, no collider at x₂) x₁ x₂ x₃ x₁→x₂→x₃ x₁ x₂ x₃ x₁←x₂→x₃ x₁ x₂ x₃ x₁←x₂←x₃ union of edges x₁ x₂ x₃ super-graph: skeleton (union of all edges)
The chain $x_1\to x_2\to x_3$ shares its skeleton with two other orientations that imply the same conditional independencies. These three DAGs form one Markov equivalence class. Their union of edges is the undirected "super-graph". Direction is ambiguous from observational data, yet the stable causal graph structure still guarantees complete nonlinear ICA identifiability.

This illustrates that while the causal direction within a Markov equivalence class is ambiguous from observational data alone, the stable causal graph structure still guarantees complete nonlinear ICA identifiability.