cloudposterior: Caching

import marimo as mo

cloudposterior automatically caches sampling results so you never re-run the same model twice. This is useful even without cloud execution – just wrap your model in cp.cloud() and re-running a notebook cell returns the cached result instantly.

Two caching modes:

import numpy as np
import pandas as pd
import pymc as pm
import arviz as az

Start fresh

Wipe the local disk cache so the disk-caching demo below is a genuine miss each run – reproducible and clean to re-run. (Shown in marimo; hidden in the rendered notebook. Remove this cell to watch the cache survive a restart.)

import shutil
from pathlib import Path

import cloudposterior as cp

# Clear .cloudposterior/ so the disk-cache cells below start cold. The
# sampling cells use `cp`, so marimo runs this first.
shutil.rmtree(Path(".cloudposterior"), ignore_errors=True)

Setup

Using the Radon model from basics.ipynb.

df = pd.read_csv(pm.get_data('radon.csv'))
with pm.Model(name='radon_intercepts', coords={'county': df.county.unique()}) as radon:
    _mu_a = pm.Normal('mu_a', mu=0, sigma=5)
    _sigma_a = pm.HalfNormal('sigma_a', sigma=2)
    _a_raw = pm.Normal('a_raw', mu=0, sigma=1, dims='county')
    _a = pm.Deterministic('a', _mu_a + _sigma_a * _a_raw, dims='county')
    b_floor = pm.Normal('b_floor', mu=0, sigma=5)
    _mu = _a[df.county_code.values] + b_floor * df.floor.values
    _sigma_y = pm.HalfNormal('sigma_y', sigma=2)
    pm.Normal('obs', mu=_mu, sigma=_sigma_y, observed=df.log_radon.values)

Local caching (no cloud needed)

You don’t need cloud execution to use caching. Just wrap your model in cp.cloud() – it intercepts pm.sample() and caches the result. Sampling runs locally with PyMC’s normal output. The second time you run the same cell, the result is returned from cache.

This is useful when you’re iterating on analysis code downstream of sampling – you don’t want to re-sample every time you tweak a plot.

# First run: samples normally (PyMC progress bar shown)
with cp.cloud(radon):
    _idata = pm.sample(draws=2000, tune=1000, chains=4)

Sampler Progress

Total Chains: 4

Active Chains: 0

Finished Chains: 4

Sampling for now

Estimated Time to Completion: now

Progress Draws Divergences Step Size Gradients/Draw
3000 0 0.45 15
3000 0 0.46 7
3000 0 0.46 7
3000 0 0.45 7
# Re-run: instant (in-memory cache hit)
with cp.cloud(radon):
    _idata = pm.sample(draws=2000, tune=1000, chains=4)
cached result

Disk caching

With cache="disk", results are saved to .cloudposterior/ and persist across kernel restarts – normally, restarting and re-running returns the result instantly without sampling. (Our Start fresh cell wipes that directory on load so this demo always runs cold; remove it to see persistence across a restart.)

The first run below samples and writes the cache file; the second is an instant disk hit. The cache key includes the model structure, observed data, and all sampling parameters – changing any of these triggers a new sample.

with cp.cloud(radon, cache='disk'):
    _idata = pm.sample(draws=2000, tune=1000, chains=4)

Sampler Progress

Total Chains: 4

Active Chains: 0

Finished Chains: 4

Sampling for now

Estimated Time to Completion: now

Progress Draws Divergences Step Size Gradients/Draw
3000 0 0.48 7
3000 0 0.45 7
3000 0 0.49 7
3000 0 0.47 15
with cp.cloud(radon, cache='disk'):
    _idata = pm.sample(draws=2000, tune=1000, chains=4)
cached result

Forcing a re-run with overwrite

A cache hit returns the stored result. To recompute deliberately – you changed something the cache key doesn’t capture, or just want a fresh sample – pass overwrite=True: it ignores the cached entry, re-runs, and replaces it. (Contrast cache=False, which skips the cache entirely and saves nothing.)

# Ignore the cached result, re-sample, and overwrite the stored entry.
with cp.cloud(radon, cache="disk", overwrite=True):
    _idata = pm.sample(draws=2000, tune=1000, chains=4)

Sampler Progress

Total Chains: 4

Active Chains: 0

Finished Chains: 4

Sampling for now

Estimated Time to Completion: now

Progress Draws Divergences Step Size Gradients/Draw
3000 0 0.51 7
3000 0 0.49 7
3000 0 0.46 7
3000 0 0.43 7

Cache layout

The disk cache uses human-readable directory names with a hash suffix for uniqueness:

.cloudposterior/
├── radon_intercepts/
│   └── draws2000_tune1000_chains4-a3f7b2c9.nc
└── radon_slopes/
    └── draws2000_tune1000_chains4-7c2e5fa8.nc

Model names come from pm.Model(name="radon_intercepts"). The hash suffix ensures that runs with different non-displayed parameters (like random_seed) get separate cache files.

Model iteration

Caching works naturally with model iteration. Each model variant gets its own cache entry. Switching back to a previous model returns the cached result.

with pm.Model(name='radon_slopes', coords={'county': df.county.unique()}) as radon_slopes:
    _mu_a = pm.Normal('mu_a', mu=0, sigma=5)
    _sigma_a = pm.HalfNormal('sigma_a', sigma=2)
    _a_raw = pm.Normal('a_raw', mu=0, sigma=1, dims='county')
    _a = pm.Deterministic('a', _mu_a + _sigma_a * _a_raw, dims='county')
    mu_b = pm.Normal('mu_b', mu=0, sigma=5)
    sigma_b = pm.HalfNormal('sigma_b', sigma=2)
    b_raw = pm.Normal('b_raw', mu=0, sigma=1, dims='county')
    b = pm.Deterministic('b', mu_b + sigma_b * b_raw, dims='county')
    _mu = _a[df.county_code.values] + b[df.county_code.values] * df.floor.values
    _sigma_y = pm.HalfNormal('sigma_y', sigma=2)
    pm.Normal('obs', mu=_mu, sigma=_sigma_y, observed=df.log_radon.values)
# New model -> samples fresh
with cp.cloud(radon_slopes, cache="disk"):
    idata_slopes = pm.sample(draws=2000, tune=1000, chains=4)

Sampler Progress

Total Chains: 4

Active Chains: 0

Finished Chains: 4

Sampling for now

Estimated Time to Completion: now

Progress Draws Divergences Step Size Gradients/Draw
3000 0 0.40 15
3000 0 0.43 15
3000 0 0.39 15
3000 0 0.39 7
az.summary(idata_slopes, filter_vars="like", var_names=["mu_a", "sigma_a", "mu_b", "sigma_b", "sigma_y"])
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
radon_slopes::mu_a 1.490 0.051 1.395 1.588 0.001 0.001 4218.0 4841.0 1.0
radon_slopes::mu_b -0.649 0.082 -0.805 -0.498 0.001 0.001 8036.0 5397.0 1.0
radon_slopes::sigma_a 0.323 0.045 0.238 0.405 0.001 0.000 2289.0 4221.0 1.0
radon_slopes::sigma_b 0.257 0.126 0.004 0.450 0.003 0.001 1533.0 1634.0 1.0
radon_slopes::sigma_y 0.721 0.018 0.686 0.755 0.000 0.000 8113.0 6224.0 1.0