Calibration
===========

MNiShed calibration is built around :func:`~mnished.run_and_score`,
which runs the model with a given parameter set, applies optional Nash-cascade
routing and a scoring window, and returns a :class:`~mnished.CalibResult`
containing all goodness-of-fit metrics.  Integration with an external optimizer
(e.g. `Dakota <https://dakota.sandia.gov>`_) is achieved by writing a thin
driver that reads parameter values from the optimizer's output file and calls
``run_and_score`` with those values.

.. contents:: On this page
   :local:
   :depth: 2

run_and_score()
---------------

.. autofunction:: mnished.run_and_score
   :no-index:

CalibResult
-----------

.. autoclass:: mnished.CalibResult
   :no-inherited-members:
   :no-index:

The fields of ``CalibResult`` are:

.. list-table::
   :widths: 20 15 55
   :header-rows: 1

   * - Field
     - Type
     - Description
   * - ``score``
     - float
     - Composite goodness-of-fit score from the metric chosen in
       ``metric=``.  Higher is better (all metrics return higher values
       for better fits).  ``np.nan`` if no observed discharge falls in
       the scoring window.
   * - ``aic``
     - float
     - Akaike Information Criterion computed on log-transformed discharge.
       Lower is better.  See :ref:`aic-model-comparison`.
   * - ``bfi_obs``
     - float
     - Observed baseflow index (Eckhardt recursive-filter method,
       α = 0.98, BFI_max = 0.80).
   * - ``bfi_mod``
     - float
     - Modeled baseflow index, computed identically to ``bfi_obs``.
   * - ``kge_logfdc``
     - float
     - KGE on the log-transformed flow-duration curve.  Sensitive to
       reservoir partitioning and long-term storage behaviour.
   * - ``fdc_obs``
     - pd.Series
     - Observed FDC at 99 exceedance percentiles (log-space).
   * - ``fdc_mod``
     - pd.Series
     - Modeled FDC at 99 exceedance percentiles (log-space).
   * - ``final_states``
     - dict
     - Reservoir water depths (mm) at the end of the run.  Pass as
       ``initial_states=`` in the next decade run to chain simulations.
   * - ``buckets``
     - :class:`~mnished.Buckets`
     - The fully initialised and run model object.  Access
       ``result.buckets.hydrodata`` for the full simulated time series.

Goodness-of-Fit Metrics
------------------------

The ``metric`` argument to :func:`~mnished.run_and_score` selects the
composite score maximised during calibration and returned as
``CalibResult.score``.  The AIC is always computed and returned separately
regardless of the chosen metric.

Available metrics
^^^^^^^^^^^^^^^^^

.. list-table::
   :widths: 30 15 45
   :header-rows: 1

   * - ``metric`` key
     - Score range
     - Best suited for
   * - ``KGE``
     - −∞ to 1
     - Overall hydrograph shape; balanced sensitivity to timing, magnitude,
       and variability.
   * - ``NSE``
     - −∞ to 1
     - Peak-flow emphasis; dominated by high-flow events.
   * - ``logKGE``
     - −∞ to 1
     - Low-flow and baseflow behaviour; down-weights large peak events.
   * - ``KGE_logKGE``
     - −∞ to 1
     - Balanced calibration target: average of KGE and logKGE.
       Recommended for most temperate catchments.
   * - ``KGE_logKGE_logFDC``
     - −∞ to 1
     - As above but also includes the flow-duration curve.  Useful
       when the full range of flows is of interest.
   * - ``KGE_logKGE_logFDC_BFI``
     - −∞ to 1
     - As above plus a baseflow-index match term
       (:math:`1 - |\text{BFI}_\text{mod}/\text{BFI}_\text{obs} - 1|`),
       averaged over all four.  Use when the groundwater partitioning
       (baseflow fraction) must be matched alongside the hydrograph shape.
   * - ``logKGE_logFDC_BFI``
     - −∞ to 1
     - Low-flow focus: averages logKGE, the log-FDC KGE, and the
       baseflow-index match, with no high-flow KGE term.  Use for
       baseflow- and drought-dominated studies.

**Recommendation:** ``KGE_logKGE`` is the default and recommended metric for
calibrating multi-reservoir models in temperate climates.  It weights peak and
low flow equally and does not over-emphasise the largest flood events the way
NSE or raw KGE do.  When the flow-duration curve (groundwater contribution
to dry-season flow) is a specific focus, use ``KGE_logKGE_logFDC``.

Baseflow index (BFI) is reported in all ``CalibResult`` objects as a
diagnostic regardless of the chosen calibration metric.  Use it to verify
that the modeled groundwater contribution is physically plausible.

.. _aic-model-comparison:

AIC and model comparison
^^^^^^^^^^^^^^^^^^^^^^^^

The Akaike Information Criterion allows comparison of models with different
numbers of free parameters:

.. math::

    \text{AIC} = n \ln\!\left(\frac{\text{RSS}_{\ln Q}}{n}\right) + 2k

where :math:`n` is the number of scored days, :math:`\text{RSS}_{\ln Q}` is the
residual sum of squares on log-transformed discharge, and :math:`k` is the count
of free calibration parameters.  Lower AIC indicates a better trade-off between
fit quality and model complexity.

The AIC is evaluated on **log-transformed discharge** because the residuals of
raw discharge are dominated by large flood events, which gives a misleading
picture of overall model skill.  Log-transformation gives roughly equal weight
to all flow magnitudes.

**Counting free parameters (k):**
Only parameters that are actively calibrated (not fixed) contribute to :math:`k`.
In ``run_and_score``, :math:`k` is accumulated automatically for each parameter
that is passed as a non-trivial argument (e.g. each element of ``recession_coeff`` and
``f_to_discharge`` that is free).  Fixed structural choices (e.g. ``b_deep = 1``
by convention, not optimization) do not add to :math:`k`.

Some arguments are lists that may hold a mix of fixed and calibrated entries
— the junction parameters (``leakance_R``, ``H_threshold``), the multipath
drain (``multipath_threshold``, ``multipath_timescale``), and the
``recession_exponents``.  For these, the free count cannot be inferred from
the values alone, so it is given explicitly through a companion
``*_calibrated`` argument.  Set ``leakance_R_calibrated``,
``H_threshold_calibrated``, ``multipath_calibrated``, and
``recession_exponents_calibrated`` to the number of entries in the
corresponding list that the optimizer is free to vary; only that many are
added to :math:`k`.

The ET scaling factor added by ``enforce_water_balance='water-year'`` is a
hidden degree of freedom: one multiplier per water year is computed from the
data, effectively adding :math:`N_{\text{years}}` free parameters.
``enforce_water_balance='global'`` computes a single multiplier for the full
record, adding only 1 degree of freedom (not counted in :math:`k` because it is
derived analytically from the data, not optimized).  For multi-model AIC
comparisons, use ``enforce_water_balance='global'`` to avoid this hidden
per-year penalty.

**Interpreting ΔAIC:**

.. list-table::
   :widths: 20 60
   :header-rows: 1

   * - :math:`\Delta\text{AIC}`
     - Interpretation
   * - 0–2
     - Substantial support for both models; the simpler one is preferred
       on parsimony grounds.
   * - 2–4
     - Moderate evidence favouring the lower-AIC model.
   * - 4–10
     - Considerably less support for the higher-AIC model.
   * - > 10
     - The higher-AIC model is effectively ruled out.

Practical Calibration Workflow
-------------------------------

Scoring window
^^^^^^^^^^^^^^

By default, ``run_and_score`` scores over all time steps that have observed
discharge.  Use ``start`` and ``end`` (YYYY-MM-DD strings or ``datetime``
objects) to restrict the scoring window:

.. code-block:: python

    result = run_and_score(
        'config.yml',
        recession_coeff=[671, 22, 50881],
        f_to_discharge=[0.553, 0.446, 1.0],
        recession_exponents=[4.62, 2.203, 1.0],
        start='1993-01-01',   # exclude first two years of spin-up uncertainty
        end='2011-09-30',
    )

Leaving a 1–2 year buffer at the start of the record is good practice when the
deep reservoir is slow (:math:`\tau` ≫ record length) and spin-up may not fully
equilibrate it.

Spin-up
^^^^^^^

``spin_up_cycles=None`` (the default) triggers automatic calculation:
``ceil(τ_max / record_length)``.  This is adequate when initial conditions are
set to the analytical steady-state (the default behaviour in ``run_and_score``).
Set ``spin_up_cycles=0`` when chaining decade runs — the ``final_states`` from
the previous run supply well-initialised starting conditions, so additional
spin-up is not needed.

Nash-cascade routing
^^^^^^^^^^^^^^^^^^^^

The model outputs point-source discharge from each reservoir.  To simulate
channel travel time (appropriate for large basins), ``routing_N`` and
``routing_K`` apply Nash-cascade attenuation after the reservoir cascade:

.. code-block:: python

    result = run_and_score(
        'config.yml',
        ...,
        routing_N=2,     # number of routing reservoirs (integer ≥ 1)
        routing_K=0.5,   # routing timescale (days)
    )

For small catchments (<500 km²), routing adds negligible improvement and
should be omitted.  For large basins, fit one routing step (N=1–3) with a
timescale of order (basin_length / wave_speed).  Routing parameters are
calibration parameters and add to :math:`k` if non-trivial.

Chaining decade runs
^^^^^^^^^^^^^^^^^^^^

To simulate non-stationary behaviour (changing land use, long-term climate
trends), chain runs across decades, passing end-state storage from one decade
as the initial state of the next:

.. code-block:: python

    result_d1 = run_and_score('config_1991_2000.yml', ..., spin_up_cycles=1)
    result_d2 = run_and_score(
        'config_2001_2010.yml',
        ...,
        initial_states=result_d1.final_states,
        spin_up_cycles=0,   # no spin-up — initial states already equilibrated
    )

The ``final_states`` dict has keys matching ``reservoir_order`` in the driver
config (e.g. ``{'soil': H_soil, 'intermediate': H_int, 'deep': H_deep}``).

Calibrating initial storage after spin-up
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

``initial_states`` sets the reservoir storage *before* spin-up, which then
re-equilibrates it.  When a decade's initial condition is itself poorly
constrained — for example, when little forcing is available ahead of the
scored window to drive a meaningful spin-up — it can instead be treated as a
calibration target.  ``post_spinup_states`` injects reservoir depths *after*
spin-up completes and *before* the scored run begins, overriding the spin-up
end state:

.. code-block:: python

    result = run_and_score(
        'config.yml',
        ...,
        start='2001-01-01', end='2010-12-31',                  # decade mode
        post_spinup_states={'reservoirs': [None, None, H0_deep]},  # H0_deep from optimizer
        post_spinup_k=1,                                       # one free initial-storage parameter
    )

Only the ``'reservoirs'`` list is used, and any entry left as ``None`` keeps
that reservoir at its spin-up end state — here, only the deep reservoir's
initial storage is calibrated.  ``post_spinup_states`` is applied only in
decade mode (when ``start`` is set) and is ignored for a full-record run.
Set ``post_spinup_k`` to the number of injected depths the optimizer varies,
so they are counted as degrees of freedom in the AIC.

Suggested Parameter Sets
-------------------------

The following configurations are illustrative starting points derived from
calibration studies on temperate agricultural catchments in the upper Midwest
(USA).  They should be treated as starting points for a calibration, not as
universal defaults — the correct parameter values depend on the specific basin.

Three-reservoir temperate agricultural basin
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This structure (soil → intermediate → deep) is appropriate for glaciated
lowland catchments with tile drainage, shallow Quaternary aquifers, and
regional bedrock groundwater.  It is the recommended starting structure for
AIC-based exploration.

.. code-block:: yaml

    reservoirs:
        recession_coefficients:
            - 671      # soil: calibrate; log10 bounds [1.0, 4.5]
            - 22       # intermediate: calibrate; log10 bounds [1.0, 4.5]
            - 50000    # deep: calibrate; log10 bounds [3.5, 5.0]
        exfiltration_fractions:
            - 0.55     # soil → stream vs. percolating to intermediate
            - 0.45     # intermediate → stream vs. recharging deep
            - 1.0      # deep → all to stream
        maximum_effective_depths__mm:
            - .inf
            - .inf
            - .inf
        recession_exponents:
            - 4.62     # soil: calibrate [1.5, 6.0]; encodes tile-drain nonlinearity
            - 2.203    # intermediate: fix at B–N lower-envelope value (basin-specific)
            - 1.0      # deep: fix at 1.0 (linear; confined aquifer)

    snowmelt:
        PDD_melt_factor: 5.1   # mm SWE °C⁻¹ day⁻¹; calibrate [0.1, 10.0]
        snow_insulation_k: 0.0
        fdd_threshold: .inf    # frozen ground disabled; b_soil absorbs the signal

    modules:
        snowpack:          true
        frozen_ground:     false   # b_soil > 1 absorbs spring-pulse signal
        rain_on_snow:      true
        direct_runoff:     false
        dtr_fgi_decay:     false
        et_water_stress:   false
        et_reservoir_draw: true

    general:
        enforce_water_balance: global   # single WB multiplier; no hidden per-year d.o.f.
        et_alpha: 1.0

**Key design choices and rationale:**

- ``frozen_ground: false`` — the calibrated soil nonlinearity (:math:`b \approx 3`–5
  for tile-drained basins) reproduces the same spring fast-flow signal as frozen
  infiltration blockage.  Activating both simultaneously leads to equifinality.
  Use frozen ground only when independent soil temperature or frost-depth data
  warrant it; see :doc:`model_description` for the equifinality argument.

- ``recession_b_intermediate`` fixed at the basin-specific Brutsaert–Nieber
  lower-envelope value — the B–N lower envelope represents the slowest observable
  recession, which corresponds to the intermediate Quaternary zone (MRT of days to
  weeks).  See :doc:`recession_analysis` for the derivation and
  :class:`~mnished.BrutsaertNieber` for how to compute the estimate.

- ``recession_b_deep = 1.0`` fixed — the deep reservoir (MRT of decades to
  centuries) produces discharge too small and too slow to constrain :math:`b`
  from streamflow.  The linear approximation (Darcy flow in a confined or
  semi-confined matrix) is physically appropriate.

- ``enforce_water_balance: global`` — a single ET multiplier applied to the
  full record.  Recommended for AIC-based model comparison because
  ``'water-year'`` mode adds one hidden degree of freedom per year.

- ``et_reservoir_draw: true`` — ET drawn from post-cascade soil storage, giving
  a temporal buffer equal to the soil MRT without any additional free parameter.

**Free parameters for this structure (k = 7):**

.. list-table::
   :widths: 35 20 35
   :header-rows: 1

   * - Parameter
     - Suggested bounds
     - Notes
   * - ``PDD_melt_factor``
     - [0.1, 10.0]
     - Sensitive to climate; 1–6 mm SWE °C⁻¹ d⁻¹ typical
   * - ``log10(τ_soil)``
     - [1.0, 4.5]
     - Unconstrained; MRT more informative than raw τ
   * - ``recession_b_soil``
     - [1.5, 6.0]
     - High b encodes tile drainage or surface nonlinearity
   * - ``f_exfiltration_soil``
     - [0.01, 0.99]
     - —
   * - ``log10(τ_intermediate)``
     - [1.0, 4.5]
     - Unconstrained; allows model to self-discover timescale ordering
   * - ``f_exfiltration_intermediate``
     - [0.01, 0.99]
     - —
   * - ``log10(τ_deep)``
     - [3.5, 5.0]
     - ~1000–100,000 days; well-constrained range for bedrock GW

.. _tile-drain-degeneracy:

Tile drain parameter degeneracy
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

When the explicit tile-drain module (``f_tile``, ``tau_tile``) is used instead of
letting :math:`b_{\text{soil}} > 1` absorb tile-drain signals implicitly, a
steady-state parameter degeneracy limits interpretation.

**At steady state**, the tile-drain flux is independent of ``tau_tile``:

.. math::

   Q_{\text{tile}} = f_{\text{tile}} \cdot Q_{\text{to\_next}}

where :math:`Q_{\text{to\_next}}` is the downward percolation from the soil
reservoir.  The tile sub-reservoir reaches steady state with ``tau_tile``
controlling only the *lag* (and thus the shape of the recession limb), not
the long-term mean flux.

**Consequence for calibration**: a water-balance calibration that targets
decadal-mean fluxes cannot distinguish ``f_tile`` from the product
:math:`f_{\text{tile}} \cdot \tau_{\text{tile}}`.  Specifically:

.. math::

   f_{\text{tile}} \cdot \tau_{\text{tile}} \;\propto\; A_t

where :math:`A_t` is the areal fraction of the watershed covered by tile drains
(Hooghoudt 1940).  The calibrated ``f_tile`` alone encodes both areal extent
*and* the inverse of drain spacing squared — it is not a clean estimate of
tiled area unless ``tau_tile`` is independently constrained from site data
(e.g. drain spacing surveys, water-table monitoring, or hydrograph recession
analysis targeting the tile-drain recession component).

**Time-varying interpretation**: agricultural tile drainage in the Upper Midwest
expanded and intensified progressively through the 20th century.  Two physical
processes drive this:

- **Areal expansion** (new tiling): increases ``f_tile`` — more of the watershed
  routes water through drains.
- **Intensification** (closer drain spacing): decreases ``tau_tile`` ∝ drain
  spacing² (Hooghoudt 1940), accelerating the tile-drain response without
  necessarily changing ``f_tile``.

A per-decade transient calibration that holds ``tau_tile`` fixed and varies only
``f_tile`` therefore conflates these two processes: a trend in calibrated
``f_tile`` may reflect new tiling, closer spacing, or both.  Interpreting the
trend physically requires additional constraint on drain spacing or tile
installation records.

**Practical guidance**: unless detailed tile-drain records are available,
treat calibrated ``f_tile`` as a bulk "effective tile-drain efficiency" rather
than an areal coverage fraction.  Document ``tau_tile`` as a fixed structural
assumption when reporting results.

**Breaking the degeneracy — a path to physical interpretability**: if
``tau_tile`` can be independently constrained (from tile-drain mapping,
water-table monitoring, or recession decomposition), the degeneracy collapses
and ``f_tile`` recovers its intended physical meaning — the areal fraction of
the catchment that drains through tiles.  A calibration run that fixes
``tau_tile`` at a physically motivated value and calibrates only ``f_tile``
per decade would then produce a direct, comparable time series of effective
tile-drain coverage, with trend slopes interpretable as rates of agricultural
drainage expansion.  This is a scientifically meaningful improvement over the
implicit nonlinear-:math:`b` proxy approach, provided the ``tau_tile``
constraint is defensible.

Note that the parameter names ``f_tile`` and ``tau_tile`` in the model code
retain the same names regardless of whether they are calibrated or externally
constrained.  When ``tau_tile`` is fixed, ``f_tile`` gains the interpretation
of areal coverage; when both are free, they are individually unidentifiable
from water-balance data alone.  The documentation (or the params.yml
``description`` field) should state which regime applies for a given run.

Two-reservoir baseline
^^^^^^^^^^^^^^^^^^^^^^

A simpler structure for data-sparse basins or as a structural null hypothesis
before adding a third reservoir.  AIC comparison against the three-reservoir
structure indicates whether the additional reservoir is warranted.

.. code-block:: yaml

    reservoirs:
        recession_coefficients:
            - 30       # soil/shallow: calibrate
            - 5000     # groundwater: calibrate
        exfiltration_fractions:
            - 0.6      # calibrate
            - 1.0
        maximum_effective_depths__mm:
            - .inf
            - .inf
        recession_exponents:
            - 2.0      # soil: starting point from B–N; calibrate
            - 1.0      # groundwater: fix linear

    general:
        enforce_water_balance: global

This structure has k = 5 free parameters (τ × 2, f × 1, b_soil × 1,
PDD_melt_factor × 1).  If the three-reservoir structure gives ΔAIC < 2 compared
to this baseline, the additional reservoir is not justified by the data.