cloudposterior: Cloud Execution

import marimo as mo

Run PyMC MCMC sampling on cloud VMs with one line of code. This notebook demonstrates remote execution using the classic Radon dataset from Gelman & Hill (2006).

Run this notebook locally (Jupyter or marimo) to watch the live progress display animate in-cell. Some outputs don’t render in GitHub’s notebook viewer.

import numpy as np
import pandas as pd
import pymc as pm
import arviz as az

Start fresh

Clear any cached results and this project’s Modal volume so the notebook runs cold and reproducibly. (Shown in marimo; hidden in the rendered notebook.)

import shutil
from pathlib import Path

import cloudposterior as cp

# Wipe the local result cache + this project's Modal volume so the example
# starts cold. The sampling cells below use `cp`, so marimo runs this first.
shutil.rmtree(Path(".cloudposterior"), ignore_errors=True)
cp.cleanup_volumes()
/Users/spencerboucher/Projects/cloudposterior-map-dashboard/cloudposterior/backends/modal_backend.py:790: AsyncUsageWarning: A blocking Modal interface is being used in an async context.

This may cause performance issues or bugs. Consider rewriting to use Modal's async interfaces:
https://modal.com/docs/guide/async

Suggested rewrite:
  await modal.Volume.objects.delete.aio(volume_name)

Original line:
  modal.Volume.objects.delete(volume_name)

Data

919 household radon measurements across 85 Minnesota counties (Gelman & Hill, 2006).

df = pd.read_csv(pm.get_data("radon.csv"))

county_names = df.county.unique()
county_idx = df.county_code.values
log_radon = df.log_radon.values
floor = df.floor.values

print(f"{len(df)} observations, {len(county_names)} counties")
919 observations, 85 counties

Model

Hierarchical varying-intercepts model with non-centered parameterization. Each county gets its own intercept (partial pooling), and floor level (basement vs first floor) is a fixed effect.

with pm.Model(name="radon_intercepts", coords={"county": county_names}) as radon:
    mu_a = pm.Normal("mu_a", mu=0, sigma=5)
    sigma_a = pm.HalfNormal("sigma_a", sigma=2)
    a_raw = pm.Normal("a_raw", mu=0, sigma=1, dims="county")
    a = pm.Deterministic("a", mu_a + sigma_a * a_raw, dims="county")
    b_floor = pm.Normal("b_floor", mu=0, sigma=5)
    mu = a[county_idx] + b_floor * floor
    sigma_y = pm.HalfNormal("sigma_y", sigma=2)
    pm.Normal("obs", mu=mu, sigma=sigma_y, observed=log_radon)
pm.model_to_graphviz(radon)

Remote execution

cp.cloud() intercepts pm.sample() and runs it on a cloud VM. The model is uploaded to a volume on first run. Resources (CPU cores, memory) are auto-sized to your model.

with cp.cloud(radon, remote=True):
    idata = pm.sample(draws=2000, tune=1000, chains=4)

Diagnostics

az.summary(idata, filter_vars="like", var_names=["mu_a", "sigma_a", "b_floor", "sigma_y"])
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
radon_intercepts::mu_a 1.492 0.050 1.402 1.587 0.001 0.001 1997.0 3559.0 1.0
radon_intercepts::b_floor -0.664 0.069 -0.797 -0.536 0.001 0.001 6298.0 5719.0 1.0
radon_intercepts::sigma_a 0.322 0.045 0.234 0.402 0.001 0.001 1677.0 2974.0 1.0
radon_intercepts::sigma_y 0.727 0.018 0.693 0.760 0.000 0.000 8036.0 5877.0 1.0
az.plot_trace(idata, filter_vars="like", var_names=["mu_a", "sigma_a", "b_floor", "sigma_y"])[0, 0].figure

GPU acceleration with JAX

For models that benefit from GPU acceleration, use nuts_sampler="numpyro" to sample with JAX via NumPyro. cloudposterior automatically provisions a GPU container and installs jax[cuda12] when it detects a JAX-based sampler – no configuration needed.

with cp.cloud(radon, remote=True):
    idata_jax = pm.sample(draws=2000, tune=1000, chains=4, nuts_sampler="numpyro")

Cleanup

Model payloads are stored in a project-scoped volume. Delete it when you’re done.

cp.cleanup_volumes()
/Users/spencerboucher/Projects/cloudposterior-map-dashboard/cloudposterior/backends/modal_backend.py:790: AsyncUsageWarning: A blocking Modal interface is being used in an async context.

This may cause performance issues or bugs. Consider rewriting to use Modal's async interfaces:
https://modal.com/docs/guide/async

Suggested rewrite:
  await modal.Volume.objects.delete.aio(volume_name)

Original line:
  modal.Volume.objects.delete(volume_name)