{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "7a7e8cae",
   "metadata": {},
   "source": [
    "# 10 — Peak and Minima Counts\n",
    "\n",
    "**Purpose:** Evaluate non-Gaussian structure in CIB and tSZ patches using peak and\n",
    "minima counts following Sabyr, Hill & Haiman (2024), arXiv:2410.21247.\n",
    "\n",
    "This notebook computes counts of local maxima (peaks) and local minima as a function\n",
    "of amplitude threshold ν = T/σ for Agora, DDPM, and Gaussian baseline maps. This\n",
    "statistic is sensitive to the abundance of clusters and voids at different signal\n",
    "levels and has been shown to be highly informative for cosmological inference.\n",
    "\n",
    "The pipeline:\n",
    "1. Smooth patches at three angular scales θ_s = 1, 2.5, 5 arcmin\n",
    "2. Identify local maxima and minima using scipy's maximum/minimum filter\n",
    "3. Bin counts as a function of ν ∈ [-4σ, 4σ] in 30 bins\n",
    "4. Compare Agora, DDPM, and Gaussian distributions\n",
    "\n",
    "**No additional packages required** beyond numpy and scipy.\n",
    "\n",
    "**Inputs:**\n",
    "- Test maps: `data/low_pass/2mJy/*.npy`\n",
    "- DDPM samples: `data/low_pass/2mJy/new_samples_*.npy`\n",
    "- Norm params: `data/low_pass/2mJy/norm_params_2mJy.npy`\n",
    "\n",
    "**Outputs:** `plots/peak_minima_counts.pdf`\n",
    "\n",
    "**Key module functions:**\n",
    "- `foregrounds_diffusion.peak_counts.compute_peak_minima_counts`\n",
    "\n",
    "**Paper reference:** Sabyr, Hill & Haiman (2024), arXiv:2410.21247"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c4d10616",
   "metadata": {},
   "source": "Imports and physical constants.\npixel_res_arcmin is used by the Gaussian smoothing kernel inside\ncompute_peak_minima_counts to convert FWHM in arcmin to pixels.\n## 1  Configuration\n\nThe peak and minima count statistic follows Sabyr, Hill & Haiman (2024,\narXiv:2410.21247).  Maps are smoothed at three angular scales before peak\nfinding to probe structure at different physical sizes:\n- **θ_s = 1′** — angular scale of typical tSZ cluster cores at z ~ 0.5.\n- **θ_s = 2.5′** — intermediate scale probing extended cluster gas profiles.\n- **θ_s = 5′** — large-scale SZ signal from merging clusters and the\n  warm-hot intergalactic medium (WHIM).\n\nThreshold arrays `THRESHOLDS_PEAKS` and `THRESHOLDS_MINIMA` are in units of\nthe per-patch standard deviation σ (i.e. ν = T/σ), so they are directly\ncomparable across different sample populations with different normalisation."
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dce17ba8",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "from pathlib import Path\n",
    "\n",
    "from foregrounds_diffusion.peak_counts import compute_peak_minima_counts\n",
    "\n",
    "PTSRC           = 2\n",
    "PIXEL_RES_ARCMIN = 1.40625   # 6 deg / 256 px\n",
    "N_MAPS          = 120        # number of test maps to use\n",
    "\n",
    "# Smoothing scales matching Sabyr et al. (2024)\n",
    "SMOOTHING_SCALES = [1.0, 2.5, 5.0]  # arcmin FWHM\n",
    "\n",
    "# Threshold ranges in units of σ\n",
    "THRESHOLDS_PEAKS  = np.linspace(-1, 5, 30)\n",
    "THRESHOLDS_MINIMA = np.linspace(-5, 1, 30)\n",
    "\n",
    "PATCHES_DIR  = Path(f'data/low_pass/{PTSRC}mJy')\n",
    "PROJECT_ROOT = Path('/home/apb86/cmb_foregrounds_diffusion')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1a1b1f37",
   "metadata": {},
   "source": "Load normalised CIB (_zero suffix = min-max to [0,1]) and tSZ (_norm suffix\n= z-score).  norm_params stores (cib_mean, cib_std, tsz_mean, tsz_std) in\nphysical µK, computed over the full training set during patch extraction.\n## 2  Load and denormalise maps\n\nLoad normalised patches and the DDPM samples.  Denormalise CIB from min-max\n(divide by max − min, then multiply back) and tSZ from z-score (multiply by σ\nand add µ).  Both channels must be in the same physical units (µK) so that\nthe per-patch threshold ν = T/σ is meaningful and comparable."
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "131690cf",
   "metadata": {},
   "outputs": [],
   "source": [
    "# ---------------------------------------------------------------------------\n",
    "# Load and denormalise maps\n",
    "# ---------------------------------------------------------------------------\n",
    "norm_params = np.load(PATCHES_DIR / f'norm_params_{PTSRC}mJy.npy')\n",
    "cib_mean, cib_std, tsz_mean, tsz_std = norm_params\n",
    "print(f'CIB — mean: {cib_mean:.4f} µK  std: {cib_std:.4f} µK')\n",
    "print(f'tSZ — mean: {tsz_mean:.4f} µK  std: {tsz_std:.4f} µK')\n",
    "\n",
    "cib_maps = np.load(PATCHES_DIR / f'CIB_map_150GHz_256_st6_zscore_{PTSRC}mJy_lp.npy')\n",
    "tsz_maps = np.load(PATCHES_DIR / f'tSZ3_map_150GHz_256_st6_zscore_{PTSRC}mJy_lp.npy')\n",
    "\n",
    "# 80/20 split (same seed as training)\n",
    "rng = np.random.default_rng(seed=42)\n",
    "indices = rng.permutation(len(cib_maps))\n",
    "test_idx = indices[int(0.8 * len(cib_maps)):]\n",
    "\n",
    "# Denormalise to physical units (µK)\n",
    "cib_test = cib_maps[test_idx, :, :, 0] * cib_std + cib_mean  # (N, H, W)\n",
    "tsz_test = tsz_maps[test_idx, :, :, 0] * tsz_std + tsz_mean\n",
    "\n",
    "print(f'Test set: {len(cib_test)} patches')\n",
    "print(f'CIB pixel range: [{cib_test.min():.2f}, {cib_test.max():.2f}] µK')\n",
    "print(f'tSZ pixel range: [{tsz_test.min():.2f}, {tsz_test.max():.2f}] µK')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2859951e",
   "metadata": {},
   "source": "DDPM samples are stored channels-first: shape (N, 2, H, W).\nGaussian baseline maps (channels-first, same shape) were generated by\ndrawing realisations from the Agora-measured power spectrum.\nDenormalise both back to µK before computing statistics.\n## 3  Load DDPM samples and Gaussian baseline\n\nThe Gaussian baseline is a set of maps generated by drawing independent\nFourier coefficients from the AGORA power spectrum (via `make_gaussian_realisation`).\nThese serve as a null hypothesis: a model that matches only the two-point\nstatistics.  Any excess in Agora peak counts relative to the Gaussian baseline\nreflects non-Gaussian structure (tSZ cluster peaks, CIB source clustering)."
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "760484b3",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load DDPM samples and denormalise\n",
    "ddpm_raw = np.load(\n",
    "    PROJECT_ROOT / 'data' / 'low_pass' / f'{PTSRC}mJy' /\n",
    "    f'new_samples_cib_tsz_{PTSRC}mJy_zero_norm_6x6_w_au_lp.npy'\n",
    ")  # (N, 2, H, W)\n",
    "\n",
    "ddpm_cib = ddpm_raw[:, 0] * cib_std + cib_mean  # (N, H, W)\n",
    "ddpm_tsz = ddpm_raw[:, 1] * tsz_std + tsz_mean\n",
    "print(f'DDPM samples: {len(ddpm_cib)} patches')\n",
    "\n",
    "# Load Gaussian baseline\n",
    "gauss_maps = np.load(PATCHES_DIR / f'gaussian_cib_tsz_{PTSRC}mJy_lp.npy')  # (N, 2, H, W)\n",
    "gauss_cib  = gauss_maps[:, 0] if gauss_maps.shape[1] == 2 else gauss_maps[:, :, :, 0]\n",
    "gauss_tsz  = gauss_maps[:, 1] if gauss_maps.shape[1] == 2 else gauss_maps[:, :, :, 1]\n",
    "print(f'Gaussian samples: {len(gauss_cib)} patches')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "264bda69",
   "metadata": {},
   "source": "compute_peak_minima_counts:\n  maps_nhw           : (N, H, W) array in physical units (µK)\n  thresholds_peaks   : 1D array of ν = T/σ values for peak binning\n  thresholds_minima  : 1D array of ν values for minima binning (typically < 0)\n  smoothing_scales_arcmin: list of FWHM values for the pre-smoothing Gaussian\n  Returns dict keyed by scale → {'peaks': (N, T), 'minima': (N, T)}\n## 4  Compute peak and minima counts\n\n`compute_peak_minima_counts` returns a nested dict keyed by smoothing scale:\n```python\nresult = {\n    1.0:  {'peaks': ndarray(N, 30), 'minima': ndarray(N, 30)},\n    2.5:  {'peaks': ...,            'minima': ...},\n    5.0:  {'peaks': ...,            'minima': ...},\n}\n```\nEach `(N, 30)` array gives the count of local maxima/minima in threshold bin j\nfor map i, normalised by the patch area in arcmin².  The computation takes\n≈ 5–10 min for 120 maps × 3 scales; reduce `N_MAPS` for a quick test."
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8e893c2a",
   "metadata": {},
   "outputs": [],
   "source": [
    "# ---------------------------------------------------------------------------\n",
    "# Compute peak and minima counts at all smoothing scales\n",
    "# NOTE: This cell takes ~5-10 minutes for 120 maps × 3 scales\n",
    "# ---------------------------------------------------------------------------\n",
    "N = min(N_MAPS, len(cib_test), len(ddpm_cib), len(gauss_cib))\n",
    "print(f'Using {N} maps per source')\n",
    "\n",
    "print('\\nComputing Agora CIB counts...')\n",
    "agora_cib_counts = compute_peak_minima_counts(\n",
    "    cib_test[:N], THRESHOLDS_PEAKS, THRESHOLDS_MINIMA,\n",
    "    smoothing_scales_arcmin=SMOOTHING_SCALES)\n",
    "\n",
    "print('\\nComputing Agora tSZ counts...')\n",
    "agora_tsz_counts = compute_peak_minima_counts(\n",
    "    tsz_test[:N], THRESHOLDS_PEAKS, THRESHOLDS_MINIMA,\n",
    "    smoothing_scales_arcmin=SMOOTHING_SCALES)\n",
    "\n",
    "print('\\nComputing DDPM CIB counts...')\n",
    "ddpm_cib_counts = compute_peak_minima_counts(\n",
    "    ddpm_cib[:N], THRESHOLDS_PEAKS, THRESHOLDS_MINIMA,\n",
    "    smoothing_scales_arcmin=SMOOTHING_SCALES)\n",
    "\n",
    "print('\\nComputing DDPM tSZ counts...')\n",
    "ddpm_tsz_counts = compute_peak_minima_counts(\n",
    "    ddpm_tsz[:N], THRESHOLDS_PEAKS, THRESHOLDS_MINIMA,\n",
    "    smoothing_scales_arcmin=SMOOTHING_SCALES)\n",
    "\n",
    "print('\\nComputing Gaussian CIB counts...')\n",
    "gauss_cib_counts = compute_peak_minima_counts(\n",
    "    gauss_cib[:N], THRESHOLDS_PEAKS, THRESHOLDS_MINIMA,\n",
    "    smoothing_scales_arcmin=SMOOTHING_SCALES)\n",
    "\n",
    "print('\\nComputing Gaussian tSZ counts...')\n",
    "gauss_tsz_counts = compute_peak_minima_counts(\n",
    "    gauss_tsz[:N], THRESHOLDS_PEAKS, THRESHOLDS_MINIMA,\n",
    "    smoothing_scales_arcmin=SMOOTHING_SCALES)\n",
    "\n",
    "print('Done.')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0c93ba44",
   "metadata": {},
   "source": [
    "## 5  Plot peak and minima counts\n",
    "\n",
    "Six-panel grid: three rows (smoothing scales θ_s = 1, 2.5, 5′) × four columns\n",
    "(CIB peaks, CIB minima, tSZ peaks, tSZ minima).  Agora (blue solid), DDPM\n",
    "(orange solid), and Gaussian (green dashed).  Error bands show the standard\n",
    "error on the mean across the N test maps.\n",
    "\n",
    "The key diagnostic is whether the DDPM's peak count distribution falls\n",
    "between the Agora and Gaussian curves (partial non-Gaussianity recovery) or\n",
    "matches Agora (full recovery).  For tSZ at small scales, a prominent peak\n",
    "excess above the Gaussian baseline traces the cluster mass function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "32f0f0c8",
   "metadata": {},
   "outputs": [],
   "source": [
    "# ---------------------------------------------------------------------------\n",
    "# Plot: peaks and minima for each smoothing scale, CIB and tSZ\n",
    "# ---------------------------------------------------------------------------\n",
    "fig, axes = plt.subplots(\n",
    "    len(SMOOTHING_SCALES), 4,\n",
    "    figsize=(18, 4 * len(SMOOTHING_SCALES))\n",
    ")\n",
    "\n",
    "sources = [\n",
    "    ('Agora',    'C0', '-',  agora_cib_counts, agora_tsz_counts),\n",
    "    ('DDPM',     'C1', '-',  ddpm_cib_counts,  ddpm_tsz_counts),\n",
    "    ('Gaussian', 'C2', '--', gauss_cib_counts, gauss_tsz_counts),\n",
    "]\n",
    "\n",
    "col_titles = [\n",
    "    'CIB Peaks', 'CIB Minima',\n",
    "    'tSZ Peaks', 'tSZ Minima',\n",
    "]\n",
    "\n",
    "for row, fwhm in enumerate(SMOOTHING_SCALES):\n",
    "    for col, (title, counts_cib, counts_tsz, thresholds) in enumerate([\n",
    "        ('CIB Peaks',  'peaks',  'peaks',  THRESHOLDS_PEAKS),\n",
    "        ('CIB Minima', 'minima', 'minima', THRESHOLDS_MINIMA),\n",
    "        ('tSZ Peaks',  'peaks',  'peaks',  THRESHOLDS_PEAKS),\n",
    "        ('tSZ Minima', 'minima', 'minima', THRESHOLDS_MINIMA),\n",
    "    ]):\n",
    "        ax = axes[row, col]\n",
    "        is_tsz = col >= 2\n",
    "        stat   = 'peaks' if col % 2 == 0 else 'minima'\n",
    "        thresh = THRESHOLDS_PEAKS if col % 2 == 0 else THRESHOLDS_MINIMA\n",
    "\n",
    "        for label, color, ls, cib_c, tsz_c in sources:\n",
    "            counts = tsz_c[fwhm][stat] if is_tsz else cib_c[fwhm][stat]\n",
    "            mean = counts.mean(axis=0)\n",
    "            std  = counts.std(axis=0)\n",
    "            ax.plot(thresh, mean, color=color, ls=ls, lw=1.5, label=label)\n",
    "            ax.fill_between(thresh, mean - std, mean + std,\n",
    "                            alpha=0.2, color=color)\n",
    "\n",
    "        ax.set_xlabel(r'Threshold $\\nu = T/\\sigma$')\n",
    "        ax.set_ylabel('Count per map')\n",
    "        ax.set_title(f'{col_titles[col]} — θ_s = {fwhm}\\'')\n",
    "        if col == 0 and row == 0:\n",
    "            ax.legend(fontsize=9)\n",
    "\n",
    "plt.suptitle('Peak and Minima Counts', fontsize=14, y=1.01)\n",
    "plt.tight_layout()\n",
    "Path('plots').mkdir(exist_ok=True)\n",
    "plt.savefig('plots/peak_minima_counts.pdf', dpi=200, bbox_inches='tight')\n",
    "plt.show()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "04446faf",
   "metadata": {},
   "source": [
    "## 6  Residuals: (Agora − DDPM) / σ_Agora\n",
    "\n",
    "Standardised residuals reveal whether any excess in Agora peak counts is\n",
    "reproduced by the DDPM, in units of the Agora sample variance.  Values within\n",
    "±2 indicate agreement at the 2σ level.  Systematic offsets at particular\n",
    "threshold values would identify where the DDPM's cluster population statistics\n",
    "deviate from the AGORA simulation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "47ca1cc8",
   "metadata": {},
   "outputs": [],
   "source": [
    "# ---------------------------------------------------------------------------\n",
    "# Residuals: (Agora - DDPM) / sigma_Agora\n",
    "# ---------------------------------------------------------------------------\n",
    "fig, axes = plt.subplots(len(SMOOTHING_SCALES), 2, figsize=(12, 4 * len(SMOOTHING_SCALES)))\n",
    "\n",
    "for row, fwhm in enumerate(SMOOTHING_SCALES):\n",
    "    for col, (label, agora_c, ddpm_c, thresh) in enumerate([\n",
    "        ('CIB peaks', agora_cib_counts[fwhm]['peaks'],\n",
    "                      ddpm_cib_counts[fwhm]['peaks'],  THRESHOLDS_PEAKS),\n",
    "        ('tSZ peaks', agora_tsz_counts[fwhm]['peaks'],\n",
    "                      ddpm_tsz_counts[fwhm]['peaks'],  THRESHOLDS_PEAKS),\n",
    "    ]):\n",
    "        ax = axes[row, col]\n",
    "        sigma = agora_c.std(axis=0) + 1e-10\n",
    "        resid = (agora_c.mean(axis=0) - ddpm_c.mean(axis=0)) / sigma\n",
    "        ax.plot(thresh, resid, color='C0', lw=1.5)\n",
    "        ax.axhline(0,  color='k',   lw=0.8, ls='--')\n",
    "        ax.axhline(1,  color='gray', lw=0.6, ls=':')\n",
    "        ax.axhline(-1, color='gray', lw=0.6, ls=':')\n",
    "        ax.set_xlabel(r'Threshold $\\nu$')\n",
    "        ax.set_ylabel(r'$(N^\\mathrm{Agora} - N^\\mathrm{DDPM})/\\sigma$')\n",
    "        ax.set_title(f'{label} residual — θ_s = {fwhm}\\'')\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.savefig('plots/peak_minima_residuals.pdf', dpi=200, bbox_inches='tight')\n",
    "plt.show()\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.11.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}