{ "cells": [ { "cell_type": "markdown", "id": "4a0ae326", "metadata": {}, "source": [ "# 01 — Halo Catalogue\n", "\n", "**Purpose:** Build the filtered MDPL2 halo catalogue used for cluster masking.\n", "\n", "This notebook loads the raw MDPL2 lightcone slice files (`haloslc_rot_*.npz`), filters\n", "to clusters with M500c ≥ 3×10¹⁴ M☉ (the paper's cluster masking threshold), concatenates\n", "all slices into a single catalogue, and saves it as a `.npy` file for use in\n", "`02_masking.ipynb`.\n", "\n", "**Inputs:**\n", "- Raw halo lightcone slices: `haloslc/haloslc_rot_*.npz` (on remote cluster)\n", "\n", "**Outputs:**\n", "- Filtered halo catalogue: `data/halo_catalogue/halo_catalogue_m500gt3e14.npz`\n", "\n", "**Key module functions:** none — this notebook only uses `numpy` and standard I/O.\n", "\n", "**Paper reference:** §2 (cluster masking, M500c ≥ 3×10¹⁴ M☉ threshold)." ] }, { "cell_type": "markdown", "id": "b90d1e43", "metadata": {}, "source": [ "## 1 Configuration\n", "\n", "Key parameters that govern which clusters end up in the catalogue. The mass\n", "threshold M500c ≥ 3×10¹⁴ M☉ corresponds to the cluster population for which\n", "the tSZ signal is detectable at NSIDE = 2048 and whose angular extent is large\n", "enough to bias the DDPM's training distribution if left unmasked (§2.1 of the\n", "paper). The output catalogue is consumed by `get_apodised_mdpl2_cluster_mask`\n", "in notebook 02." ] }, { "cell_type": "code", "execution_count": 5, "id": "9e376dfd", "metadata": {}, "outputs": [], "source": [ "import glob\n", "import numpy as np\n", "from pathlib import Path\n", "\n", "HALO_DIR = \"~/rds/hpc-work/haloslc\"\n", "M500C_THRESHOLD = 3e14 # M_sun — paper's cluster masking threshold\n", "OUT_PATH = \"~/rds/hpc-work/halo_catalogue/halo_catalogue_m500gt3e14\"\n" ] }, { "cell_type": "markdown", "id": "e1d41e90", "metadata": {}, "source": [ "## 2 Discover lightcone slice files\n", "\n", "The MDPL2 halo lightcone is stored as a set of `.npz` slice files, each\n", "covering a narrow comoving distance shell of the simulation box. Sorting by\n", "filename yields slices in order of increasing redshift. The total number of\n", "slices depends on the redshift range of the box; typically ~20–30 slices span\n", "0 < z < 3." ] }, { "cell_type": "code", "execution_count": 6, "id": "2bd2f1e4", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Found 0 lightcone slice files\n" ] }, { "ename": "IndexError", "evalue": "list index out of range", "output_type": "error", "traceback": [ "\u001b[31m---------------------------------------------------------------------------\u001b[39m", "\u001b[31mIndexError\u001b[39m Traceback (most recent call last)", "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[6]\u001b[39m\u001b[32m, line 6\u001b[39m\n\u001b[32m 3\u001b[39m \u001b[38;5;28mprint\u001b[39m(\u001b[33mf\u001b[39m\u001b[33m\"\u001b[39m\u001b[33mFound \u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;28mlen\u001b[39m(slice_files)\u001b[38;5;132;01m}\u001b[39;00m\u001b[33m lightcone slice files\u001b[39m\u001b[33m\"\u001b[39m)\n\u001b[32m 5\u001b[39m \u001b[38;5;66;03m# Inspect the first file to understand its structure\u001b[39;00m\n\u001b[32m----> \u001b[39m\u001b[32m6\u001b[39m sample_data = np.load(\u001b[43mslice_files\u001b[49m\u001b[43m[\u001b[49m\u001b[32;43m0\u001b[39;49m\u001b[43m]\u001b[49m, allow_pickle=\u001b[38;5;28;01mTrue\u001b[39;00m)\n\u001b[32m 7\u001b[39m \u001b[38;5;28mprint\u001b[39m(\u001b[33mf\u001b[39m\u001b[33m\"\u001b[39m\u001b[33mKeys: \u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;28mlist\u001b[39m(sample_data.keys())\u001b[38;5;132;01m}\u001b[39;00m\u001b[33m\"\u001b[39m)\n\u001b[32m 8\u001b[39m first_arr = sample_data[\u001b[38;5;28mlist\u001b[39m(sample_data.keys())[\u001b[32m0\u001b[39m]]\n", "\u001b[31mIndexError\u001b[39m: list index out of range" ] } ], "source": [ "# Each .npz slice covers a thin redshift shell of the MDPL2 lightcone.\n", "# Sorted glob gives slices in order of increasing z for reproducibility.\n", "# Discover all lightcone slice files\n", "slice_files = sorted(glob.glob(f\"{HALO_DIR}/haloslc_rot_*.npz\"))\n", "print(f\"Found {len(slice_files)} lightcone slice files\")\n", "\n", "# Inspect the first file to understand its structure\n", "sample_data = np.load(slice_files[0], allow_pickle=True)\n", "print(f\"Keys: {list(sample_data.keys())}\")\n", "first_arr = sample_data[list(sample_data.keys())[0]]\n", "print(f\"Shape: {first_arr.shape} dtype: {first_arr.dtype}\")\n", "print(\"Columns assumed: ra, dec, z, m200c, m500c, vlos, vtht, vphi\")\n" ] }, { "cell_type": "markdown", "id": "6c4e3be4", "metadata": {}, "source": [ "## 3 Load and filter halos\n", "\n", "Iterate over every slice, extract the halo arrays, and keep only those with\n", "M500c above the threshold. Each `.npz` file stores columns in a fixed order;\n", "the key fields are right ascension (deg), declination (deg), M500c (M☉), and\n", "redshift z. Memory usage is modest because we discard the low-mass majority\n", "immediately after reading each slice." ] }, { "cell_type": "code", "execution_count": 7, "id": "f451fd99", "metadata": {}, "outputs": [ { "ename": "ValueError", "evalue": "need at least one array to concatenate", "output_type": "error", "traceback": [ "\u001b[31m---------------------------------------------------------------------------\u001b[39m", "\u001b[31mValueError\u001b[39m Traceback (most recent call last)", "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[7]\u001b[39m\u001b[32m, line 11\u001b[39m\n\u001b[32m 8\u001b[39m mask = m500c_col >= M500C_THRESHOLD\n\u001b[32m 9\u001b[39m all_halos.append(arr[mask])\n\u001b[32m---> \u001b[39m\u001b[32m11\u001b[39m catalogue = \u001b[43mnp\u001b[49m\u001b[43m.\u001b[49m\u001b[43mconcatenate\u001b[49m\u001b[43m(\u001b[49m\u001b[43mall_halos\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43maxis\u001b[49m\u001b[43m=\u001b[49m\u001b[32;43m0\u001b[39;49m\u001b[43m)\u001b[49m\n\u001b[32m 12\u001b[39m \u001b[38;5;28mprint\u001b[39m(\u001b[33mf\u001b[39m\u001b[33m\"\u001b[39m\u001b[33mTotal halos with M500c >= \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mM500C_THRESHOLD\u001b[38;5;132;01m:\u001b[39;00m\u001b[33m.0e\u001b[39m\u001b[38;5;132;01m}\u001b[39;00m\u001b[33m M_sun: \u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;28mlen\u001b[39m(catalogue)\u001b[38;5;132;01m:\u001b[39;00m\u001b[33m,\u001b[39m\u001b[38;5;132;01m}\u001b[39;00m\u001b[33m\"\u001b[39m)\n\u001b[32m 13\u001b[39m \u001b[38;5;28mprint\u001b[39m(\u001b[33mf\u001b[39m\u001b[33m\"\u001b[39m\u001b[33mCatalogue shape: \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mcatalogue.shape\u001b[38;5;132;01m}\u001b[39;00m\u001b[33m\"\u001b[39m)\n", "\u001b[31mValueError\u001b[39m: need at least one array to concatenate" ] } ], "source": [ "# Load every slice, filter by M500c >= threshold, and concatenate\n", "all_halos = []\n", "for fpath in slice_files:\n", " data = np.load(fpath, allow_pickle=True)\n", " arr = data[list(data.keys())[0]] # shape (N_halos, 8)\n", " # column layout: ra, dec, z, m200c, m500c, vlos, vtht, vphi\n", " m500c_col = arr[:, 4]\n", " mask = m500c_col >= M500C_THRESHOLD\n", " all_halos.append(arr[mask])\n", " \n", "catalogue = np.concatenate(all_halos, axis=0)\n", "print(f\"Total halos with M500c >= {M500C_THRESHOLD:.0e} M_sun: {len(catalogue):,}\")\n", "print(f\"Catalogue shape: {catalogue.shape}\")\n", "print(f\"Redshift range: {catalogue[:, 2].min():.3f} – {catalogue[:, 2].max():.3f}\")\n", "print(f\"M500c range: {catalogue[:, 4].min():.2e} – {catalogue[:, 4].max():.2e} M_sun\")\n" ] }, { "cell_type": "markdown", "id": "72c2413e", "metadata": {}, "source": [ "## 4 Save filtered catalogue\n", "\n", "Write the concatenated catalogue to a single `.npy` array for fast reloading\n", "downstream. The shape is `(N_clusters, 4)` with columns `[RA_deg, Dec_deg,\n", "M500c_Msun, z]`. On a typical MDPL2 run the 3×10¹⁴ M☉ cut leaves ≈ 500–800\n", "clusters within the AGORA footprint." ] }, { "cell_type": "code", "execution_count": null, "id": "064268d8", "metadata": {}, "outputs": [], "source": [ "Path(OUT_PATH).parent.mkdir(parents=True, exist_ok=True)\n", "np.save(OUT_PATH, catalogue)\n", "print(f\"Saved catalogue → {OUT_PATH}\")\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.13" } }, "nbformat": 4, "nbformat_minor": 5 }