preprocessing — Normalisation and patch extraction
==================================================

``preprocessing`` handles everything between raw HEALPix FITS files and
the training-ready ``.npy`` arrays consumed by the model.  It also provides
the inverse normalisation functions needed to convert DDPM samples back to
physical units after generation.

.. rubric:: Pipeline context

The full preprocessing workflow runs across three tutorial notebooks:

1. :doc:`../tutorials/01_halo_catalogue` — build the filtered MDPL2 halo catalogue
2. :doc:`../tutorials/02_masking` — apply point-source and cluster masks
3. :doc:`../tutorials/03_patch_extraction` — extract patches, filter, normalise, save

.. rubric:: Normalisation

Two schemes are used:

.. list-table::
   :header-rows: 1

   * - Channel
     - Function
     - Inverse
     - Range
   * - CIB
     - :func:`apply_maxmin_normalization`
     - :func:`renormalize_dm_maps`
     - [0, 1]
   * - tSZ
     - :func:`apply_stdnorm`
     - :func:`denormalize_dm_maps`
     - z-score (μ=0, σ=1)

After sampling, convert DDPM output back to physical units:

.. code-block:: python

    from foregrounds_diffusion.preprocessing import denormalize_dm_maps

    samples = np.load("samples.npy")           # (N, 2, 256, 256), channels-first
    norm = np.load("norm_params_2mJy.npy")     # [cib_mean, cib_std, tsz_mean, tsz_std]
    samples_phys = denormalize_dm_maps(samples, *norm)

.. rubric:: Dataset splitting

:func:`split_data_to_tensors` performs an 80/10/10 train/val/test split,
converts from channels-last ``(N, H, W, C)`` to channels-first
``(N, C, H, W)``, and returns ``torch.Tensor`` objects:

.. code-block:: python

    from foregrounds_diffusion.preprocessing import split_data_to_tensors

    data = np.load("CIB_map_150GHz_256_st6_zscore_2mJy_lp.npy")  # (N, H, W, 1)
    train, val, test = split_data_to_tensors(data)

.. rubric:: API

.. automodule:: foregrounds_diffusion.preprocessing
   :members:
   :undoc-members: False
   :show-inheritance:
   :member-order: bysource