gwtransport#

gwtransport: A Python package for solving one-dimensional groundwater transport problems.

advection#

Advection Analysis for 1D Aquifer Systems.

This module provides functions to analyze compound transport by advection in aquifer systems. It includes tools for computing concentrations of the extracted water based on the concentration of the infiltrating water, extraction data and aquifer properties.

The model assumes requires the groundwaterflow to be reduced to a 1D system. On one side, water with a certain concentration infiltrates (‘cin’), the water flows through the aquifer and the compound of interest flows through the aquifer with a retarded velocity. The water is extracted (‘cout’).

Main functions: - infiltration_to_extraction: Compute the concentration of the extracted water by shifting cin with its residence time. This corresponds to a convolution operation. - gamma_infiltration_to_extraction: Similar to infiltration_to_extraction, but for a gamma distribution of aquifer pore volumes. - distribution_infiltration_to_extraction: Similar to infiltration_to_extraction, but for an arbitrary distribution of aquifer pore volumes.

gwtransport.advection.infiltration_to_extraction(cin_series, flow_series, aquifer_pore_volume, retardation_factor=1.0, cout_index='cin')[source]#

Compute the concentration of the extracted water by shifting cin with its residence time.

The compound is retarded in the aquifer with a retardation factor. The residence time is computed based on the flow rate of the water in the aquifer and the pore volume of the aquifer.

This function represents infiltration to extraction modeling (equivalent to convolution).

Parameters:
  • cin_series (pandas.Series) – Concentration of the compound in the infiltrating water [ng/m3]. The cin_series should be the average concentration of a time bin. The index should be a pandas.DatetimeIndex and is labeled at the end of the time bin.

  • flow_series (pandas.Series) – Flow rate of water in the aquifer [m3/day]. The flow_series should be the average flow of a time bin. The index should be a pandas.DatetimeIndex and is labeled at the end of the time bin.

  • aquifer_pore_volume (float) – Pore volume of the aquifer [m3].

  • cout_index (str, optional) – The index of the output series. Can be ‘cin’, ‘flow’, or ‘cout’. Default is ‘cin’. - ‘cin’: The output series will have the same index as cin_series. - ‘flow’: The output series will have the same index as flow_series. - ‘cout’: The output series will have the same index as cin_series + residence_time.

Returns:

Concentration of the compound in the extracted water [ng/m3]. Returns pandas.Series when cout_index is ‘cin’ or ‘flow’, or numpy.ndarray when cout_index is ‘cout’.

Return type:

pandas.Series or numpy.ndarray

Examples

Basic usage with single pore volume:

>>> import pandas as pd
>>> import numpy as np
>>> from gwtransport.advection import infiltration_to_extraction
>>>
>>> # Create input data
>>> dates = pd.date_range(start="2020-01-01", end="2020-01-10", freq="D")
>>> cin = pd.Series(np.ones(len(dates)), index=dates)
>>> flow = pd.Series(np.ones(len(dates)) * 100, index=dates)  # 100 m3/day
>>>
>>> # Single aquifer pore volume
>>> aquifer_pore_volume = 500.0  # m3
>>>
>>> # Run infiltration to extraction model (default returns on cin index)
>>> cout = infiltration_to_extraction(cin, flow, aquifer_pore_volume)
>>> len(cout) == len(cin)
True

Different output index options:

>>> # Output on cin index (default)
>>> cout_cin = infiltration_to_extraction(
...     cin, flow, aquifer_pore_volume, cout_index="cin"
... )
>>>
>>> # Output on flow index
>>> cout_flow = infiltration_to_extraction(
...     cin, flow, aquifer_pore_volume, cout_index="flow"
... )
>>>
>>> # Output on shifted time index (cin + residence_time)
>>> cout_shifted = infiltration_to_extraction(
...     cin, flow, aquifer_pore_volume, cout_index="cout"
... )

With retardation factor:

>>> # Compound moves twice as slowly due to sorption
>>> cout = infiltration_to_extraction(
...     cin, flow, aquifer_pore_volume, retardation_factor=2.0
... )
gwtransport.advection.extraction_to_infiltration(cout, flow, aquifer_pore_volume, retardation_factor=1.0, resample_dates=None)[source]#

Compute the concentration of the infiltrating water by shifting cout with its residence time.

This function represents extraction to infiltration modeling (equivalent to deconvolution).

Parameters:
  • cout (pandas.Series) – Concentration of the compound in the extracted water [ng/m3].

  • flow (pandas.Series) – Flow rate of water in the aquifer [m3/day].

  • aquifer_pore_volume (float) – Pore volume of the aquifer [m3].

Returns:

Concentration of the compound in the infiltrating water [ng/m3].

Return type:

numpy.ndarray

gwtransport.advection.gamma_infiltration_to_extraction(*, cin, flow, tedges, cout_tedges, alpha=None, beta=None, mean=None, std=None, n_bins=100, retardation_factor=1.0)[source]#

Compute the concentration of the extracted water by shifting cin with its residence time.

The compound is retarded in the aquifer with a retardation factor. The residence time is computed based on the flow rate of the water in the aquifer and the pore volume of the aquifer. The aquifer pore volume is approximated by a gamma distribution, with parameters alpha and beta.

This function represents infiltration to extraction modeling (equivalent to convolution).

Provide either alpha and beta or mean and std.

Parameters:
  • cin (pandas.Series) – Concentration of the compound in infiltrating water or temperature of infiltrating water.

  • cin_tedges (pandas.DatetimeIndex) – Time edges for the concentration data. Used to compute the cumulative concentration. Has a length of one more than cin.

  • cout_tedges (pandas.DatetimeIndex) – Time edges for the output data. Used to compute the cumulative concentration. Has a length of one more than flow.

  • flow (pandas.Series) – Flow rate of water in the aquifer [m3/day].

  • flow_tedges (pandas.DatetimeIndex) – Time edges for the flow data. Used to compute the cumulative flow. Has a length of one more than flow.

  • alpha (float, optional) – Shape parameter of gamma distribution of the aquifer pore volume (must be > 0)

  • beta (float, optional) – Scale parameter of gamma distribution of the aquifer pore volume (must be > 0)

  • mean (float, optional) – Mean of the gamma distribution.

  • std (float, optional) – Standard deviation of the gamma distribution.

  • n_bins (int) – Number of bins to discretize the gamma distribution.

  • retardation_factor (float) – Retardation factor of the compound in the aquifer.

Returns:

Concentration of the compound in the extracted water [ng/m3] or temperature.

Return type:

numpy.ndarray

Examples

Basic usage with alpha and beta parameters:

>>> import pandas as pd
>>> import numpy as np
>>> from gwtransport.utils import compute_time_edges
>>> from gwtransport.advection import gamma_infiltration_to_extraction
>>>
>>> # Create input data with aligned time edges
>>> dates = pd.date_range(start="2020-01-01", end="2020-01-20", freq="D")
>>> tedges = compute_time_edges(
...     tedges=None, tstart=None, tend=dates, number_of_bins=len(dates)
... )
>>>
>>> # Create output time edges (can be different alignment)
>>> cout_dates = pd.date_range(start="2020-01-05", end="2020-01-15", freq="D")
>>> cout_tedges = compute_time_edges(
...     tedges=None, tstart=None, tend=cout_dates, number_of_bins=len(cout_dates)
... )
>>>
>>> # Input concentration and flow (same length, aligned with tedges)
>>> cin = pd.Series(np.ones(len(dates)), index=dates)
>>> flow = pd.Series(np.ones(len(dates)) * 100, index=dates)  # 100 m3/day
>>>
>>> # Run gamma_infiltration_to_extraction with alpha/beta parameters
>>> cout = gamma_infiltration_to_extraction(
...     cin=cin,
...     cin_tedges=tedges,
...     cout_tedges=cout_tedges,
...     flow=flow,
...     flow_tedges=tedges,  # Must be identical to cin_tedges
...     alpha=10.0,
...     beta=10.0,
...     n_bins=5,
... )
>>> cout.shape
(11,)

Using mean and std parameters instead:

>>> cout = gamma_infiltration_to_extraction(
...     cin=cin,
...     cin_tedges=tedges,
...     cout_tedges=cout_tedges,
...     flow=flow,
...     flow_tedges=tedges,
...     mean=100.0,
...     std=20.0,
...     n_bins=5,
... )

With retardation factor:

>>> cout = gamma_infiltration_to_extraction(
...     cin=cin,
...     cin_tedges=tedges,
...     cout_tedges=cout_tedges,
...     flow=flow,
...     flow_tedges=tedges,
...     alpha=10.0,
...     beta=10.0,
...     retardation_factor=2.0,  # Doubles residence time
... )
gwtransport.advection.gamma_extraction_to_infiltration(cout, flow, alpha, beta, n_bins=100, retardation_factor=1.0)[source]#

Compute the concentration of the infiltrating water by shifting cout with its residence time.

This function represents extraction to infiltration modeling (equivalent to deconvolution).

Parameters:
  • cout (pandas.Series) – Concentration of the compound in the extracted water [ng/m3].

  • flow (pandas.Series) – Flow rate of water in the aquifer [m3/day].

  • alpha (float) – Shape parameter of gamma distribution of the aquifer pore volume (must be > 0)

  • beta (float) – Scale parameter of gamma distribution of the aquifer pore volume (must be > 0)

  • n_bins (int) – Number of bins to discretize the gamma distribution.

  • retardation_factor (float) – Retardation factor of the compound in the aquifer.

Returns:

This function is not yet implemented.

Return type:

NotImplementedError

gwtransport.advection.distribution_infiltration_to_extraction(*, cin, flow, tedges, cout_tedges, aquifer_pore_volumes, retardation_factor=1.0)[source]#

Compute the concentration of the extracted water using flow-weighted advection.

This function implements an infiltration to extraction advection model where cin and flow values correspond to the same aligned time bins defined by tedges.

The algorithm: 1. Computes residence times for each pore volume at cout time edges 2. Calculates infiltration time edges by subtracting residence times 3. Determines temporal overlaps between infiltration and cin time windows 4. Creates flow-weighted overlap matrices normalized by total weights 5. Computes weighted contributions and averages across pore volumes

Parameters:
  • cin (ArrayLike) – Concentration values of infiltrating water or temperature [concentration units]. Length must match the number of time bins defined by tedges.

  • flow (ArrayLike) – Flow rate values in the aquifer [m3/day]. Length must match cin and the number of time bins defined by tedges.

  • tedges (pandas.DatetimeIndex) – Time edges defining bins for both cin and flow data. Has length of len(cin) + 1 and len(flow) + 1.

  • cout_tedges (pandas.DatetimeIndex) – Time edges for output data bins. Has length of desired output + 1. Can have different time alignment and resolution than tedges.

  • aquifer_pore_volumes (ArrayLike) – Array of aquifer pore volumes [m3] representing the distribution of residence times in the aquifer system.

  • retardation_factor (float, optional) – Retardation factor of the compound in the aquifer (default 1.0). Values > 1.0 indicate slower transport due to sorption/interaction.

Returns:

Flow-weighted concentration in the extracted water. Same units as cin. Length equals len(cout_tedges) - 1. NaN values indicate time periods with no valid contributions from the infiltration data.

Return type:

numpy.ndarray

Raises:

ValueError – If tedges length doesn’t match cin/flow arrays plus one, or if infiltration time edges become non-monotonic (invalid input conditions).

Examples

Basic usage with pandas Series:

>>> import pandas as pd
>>> import numpy as np
>>> from gwtransport.utils import compute_time_edges
>>> from gwtransport.advection import distribution_infiltration_to_extraction
>>>
>>> # Create input data
>>> dates = pd.date_range(start="2020-01-01", end="2020-01-20", freq="D")
>>> tedges = compute_time_edges(
...     tedges=None, tstart=None, tend=dates, number_of_bins=len(dates)
... )
>>>
>>> # Create output time edges (different alignment)
>>> cout_dates = pd.date_range(start="2020-01-05", end="2020-01-15", freq="D")
>>> cout_tedges = compute_time_edges(
...     tedges=None, tstart=None, tend=cout_dates, number_of_bins=len(cout_dates)
... )
>>>
>>> # Input concentration and flow
>>> cin = pd.Series(np.ones(len(dates)), index=dates)
>>> flow = pd.Series(np.ones(len(dates)) * 100, index=dates)  # 100 m3/day
>>>
>>> # Define distribution of aquifer pore volumes
>>> aquifer_pore_volumes = np.array([50, 100, 200])  # m3
>>>
>>> # Run distribution_infiltration_to_extraction
>>> cout = distribution_infiltration_to_extraction(
...     cin=cin,
...     flow=flow,
...     tedges=tedges,
...     cout_tedges=cout_tedges,
...     aquifer_pore_volumes=aquifer_pore_volumes,
... )
>>> cout.shape
(11,)

Using array inputs instead of pandas Series:

>>> # Convert to arrays
>>> cin_values = cin.values
>>> flow_values = flow.values
>>>
>>> cout = distribution_infiltration_to_extraction(
...     cin=cin_values,
...     flow=flow_values,
...     tedges=tedges,
...     cout_tedges=cout_tedges,
...     aquifer_pore_volumes=aquifer_pore_volumes,
... )

With retardation factor:

>>> cout = distribution_infiltration_to_extraction(
...     cin=cin,
...     flow=flow,
...     tedges=tedges,
...     cout_tedges=cout_tedges,
...     aquifer_pore_volumes=aquifer_pore_volumes,
...     retardation_factor=2.0,  # Compound moves twice as slowly
... )

Using single pore volume:

>>> single_volume = np.array([100])  # Single 100 m3 pore volume
>>> cout = distribution_infiltration_to_extraction(
...     cin=cin,
...     flow=flow,
...     tedges=tedges,
...     cout_tedges=cout_tedges,
...     aquifer_pore_volumes=single_volume,
... )
gwtransport.advection.distribution_extraction_to_infiltration(*, cout, flow, tedges, cin_tedges, aquifer_pore_volumes, retardation_factor=1.0)[source]#

Compute the concentration of the infiltrating water from extracted water (deconvolution).

This function implements an extraction to infiltration advection model (inverse of distribution_infiltration_to_extraction) where cout and flow values correspond to the same aligned time bins defined by tedges.

SYMMETRIC RELATIONSHIP: - distribution_infiltration_to_extraction: cin + tedges → cout + cout_tedges - distribution_extraction_to_infiltration: cout + tedges → cin + cin_tedges

The algorithm (symmetric to distribution_infiltration_to_extraction): 1. Computes residence times for each pore volume at cint time edges 2. Calculates extraction time edges by adding residence times (reverse of infiltration_to_extraction) 3. Determines temporal overlaps between extraction and cout time windows 4. Creates flow-weighted overlap matrices normalized by total weights 5. Computes weighted contributions and averages across pore volumes

Parameters:
  • cout (ArrayLike) – Concentration values of extracted water [concentration units]. Length must match the number of time bins defined by tedges.

  • flow (ArrayLike) – Flow rate values in the aquifer [m3/day]. Length must match cout and the number of time bins defined by tedges.

  • tedges (pandas.DatetimeIndex) – Time edges defining bins for both cout and flow data. Has length of len(cout) + 1 and len(flow) + 1.

  • cin_tedges (pandas.DatetimeIndex) – Time edges for output (infiltration) data bins. Has length of desired output + 1. Can have different time alignment and resolution than tedges.

  • aquifer_pore_volumes (ArrayLike) – Array of aquifer pore volumes [m3] representing the distribution of residence times in the aquifer system.

  • retardation_factor (float, optional) – Retardation factor of the compound in the aquifer (default 1.0). Values > 1.0 indicate slower transport due to sorption/interaction.

Returns:

Flow-weighted concentration in the infiltrating water. Same units as cout. Length equals len(cin_tedges) - 1. NaN values indicate time periods with no valid contributions from the extraction data.

Return type:

numpy.ndarray

Raises:

ValueError – If tedges length doesn’t match cout/flow arrays plus one, or if extraction time edges become non-monotonic (invalid input conditions).

Examples

Basic usage with pandas Series:

>>> import pandas as pd
>>> import numpy as np
>>> from gwtransport.utils import compute_time_edges
>>> from gwtransport.advection import distribution_extraction_to_infiltration
>>>
>>> # Create input data
>>> dates = pd.date_range(start="2020-01-01", end="2020-01-20", freq="D")
>>> tedges = compute_time_edges(
...     tedges=None, tstart=None, tend=dates, number_of_bins=len(dates)
... )
>>>
>>> # Create output time edges (different alignment)
>>> cint_dates = pd.date_range(start="2019-12-25", end="2020-01-15", freq="D")
>>> cin_tedges = compute_time_edges(
...     tedges=None, tstart=None, tend=cint_dates, number_of_bins=len(cint_dates)
... )
>>>
>>> # Input concentration and flow
>>> cout = pd.Series(np.ones(len(dates)), index=dates)
>>> flow = pd.Series(np.ones(len(dates)) * 100, index=dates)  # 100 m3/day
>>>
>>> # Define distribution of aquifer pore volumes
>>> aquifer_pore_volumes = np.array([50, 100, 200])  # m3
>>>
>>> # Run distribution_extraction_to_infiltration
>>> cin = distribution_extraction_to_infiltration(
...     cout=cout,
...     flow=flow,
...     tedges=tedges,
...     cin_tedges=cin_tedges,
...     aquifer_pore_volumes=aquifer_pore_volumes,
... )
>>> cin.shape
(22,)

Using array inputs instead of pandas Series:

>>> # Convert to arrays
>>> cout_values = cout.values
>>> flow_values = flow.values
>>>
>>> cin = distribution_extraction_to_infiltration(
...     cout=cout_values,
...     flow=flow_values,
...     tedges=tedges,
...     cin_tedges=cin_tedges,
...     aquifer_pore_volumes=aquifer_pore_volumes,
... )

With retardation factor:

>>> cin = distribution_extraction_to_infiltration(
...     cout=cout,
...     flow=flow,
...     tedges=tedges,
...     cin_tedges=cin_tedges,
...     aquifer_pore_volumes=aquifer_pore_volumes,
...     retardation_factor=2.0,  # Compound moves twice as slowly
... )

Using single pore volume:

>>> single_volume = np.array([100])  # Single 100 m3 pore volume
>>> cin = distribution_extraction_to_infiltration(
...     cout=cout,
...     flow=flow,
...     tedges=tedges,
...     cin_tedges=cin_tedges,
...     aquifer_pore_volumes=single_volume,
... )

deposition#

Deposition analysis for 1D aquifer systems.

Analyze compound transport by deposition in aquifer systems with tools for computing concentrations and deposition rates based on aquifer properties.

The model assumes 1D groundwater flow where compound deposition occurs along the flow path, enriching the water. Follows advection module patterns for consistency.

Functions#

extraction_to_deposition : Compute deposition rates from concentration changes deposition_to_extraction : Compute concentrations from deposition rates

gwtransport.deposition.compute_deposition_weights(*, flow_values, tedges, cout_tedges, aquifer_pore_volume, porosity, thickness, retardation_factor=1.0)[source]#

Compute deposition weights for concentration-deposition convolution.

Parameters:
  • flow_values (ArrayLike) – Flow rates in aquifer [m3/day]. Length must equal len(tedges) - 1.

  • tedges (pandas.DatetimeIndex) – Time bin edges for flow data.

  • cout_tedges (pandas.DatetimeIndex) – Time bin edges for output concentration data.

  • aquifer_pore_volume (float) – Aquifer pore volume [m3].

  • porosity (float) – Aquifer porosity [dimensionless].

  • thickness (float) – Aquifer thickness [m].

  • retardation_factor (float, optional) – Compound retardation factor, by default 1.0.

Returns:

Deposition weights matrix with shape (len(cout_tedges) - 1, len(tedges) - 1). May contain NaN values where residence time cannot be computed.

Return type:

numpy.ndarray

Notes

The returned weights matrix may contain NaN values in locations where the residence time calculation fails or is undefined. This typically occurs when flow conditions result in invalid or non-physical residence times.

gwtransport.deposition.deposition_to_extraction(*, dep, flow, tedges, cout_tedges, aquifer_pore_volume, porosity, thickness, retardation_factor=1.0)[source]#

Compute concentrations from deposition rates (convolution).

Parameters:
  • dep (ArrayLike) – Deposition rates [ng/m2/day]. Length must equal len(tedges) - 1.

  • flow (ArrayLike) – Flow rates in aquifer [m3/day]. Length must equal len(tedges) - 1.

  • tedges (pandas.DatetimeIndex) – Time bin edges for deposition and flow data.

  • cout_tedges (pandas.DatetimeIndex) – Time bin edges for output concentration data.

  • aquifer_pore_volume (float) – Aquifer pore volume [m3].

  • porosity (float) – Aquifer porosity [dimensionless].

  • thickness (float) – Aquifer thickness [m].

  • retardation_factor (float, optional) – Compound retardation factor, by default 1.0.

Returns:

Concentration changes [ng/m3] with length len(cout_tedges) - 1.

Return type:

numpy.ndarray

Examples

>>> import pandas as pd
>>> import numpy as np
>>> dates = pd.date_range("2020-01-01", "2020-01-10", freq="D")
>>> tedges = pd.date_range("2019-12-31 12:00", "2020-01-10 12:00", freq="D")
>>> cout_tedges = pd.date_range("2020-01-03 12:00", "2020-01-12 12:00", freq="D")
>>> dep = np.ones(len(dates))
>>> flow = np.full(len(dates), 100.0)
>>> cout = deposition_to_extraction(
...     dep=dep,
...     flow=flow,
...     tedges=tedges,
...     cout_tedges=cout_tedges,
...     aquifer_pore_volume=500.0,
...     porosity=0.3,
...     thickness=10.0,
... )
gwtransport.deposition.extraction_to_deposition(*, flow, tedges, cout, cout_tedges, aquifer_pore_volume, porosity, thickness, retardation_factor=1.0, nullspace_objective='squared_differences')[source]#

Compute deposition rates from concentration changes (deconvolution).

The solution for the deposition is fundamentally underdetermined, as multiple deposition histories can lead to the same concentration. This function computes a least-squares solution and then selects a specific solution from the nullspace of the problem based on the provided objective.

Parameters:
  • flow (ArrayLike) – Flow rates in aquifer [m3/day]. Length must equal len(tedges) - 1. Must not contain NaN values.

  • tedges (pandas.DatetimeIndex) – Time bin edges for deposition and flow data. Length must equal len(flow) + 1.

  • cout (ArrayLike) – Concentration changes in extracted water [ng/m3]. Length must equal len(cout_tedges) - 1. May contain NaN values, which will be excluded from the computation along with corresponding rows in the weight matrix.

  • cout_tedges (pandas.DatetimeIndex) – Time bin edges for output concentration data. Length must equal len(cout) + 1.

  • aquifer_pore_volume (float) – Aquifer pore volume [m3].

  • porosity (float) – Aquifer porosity [dimensionless].

  • thickness (float) – Aquifer thickness [m].

  • retardation_factor (float, optional) – Compound retardation factor, by default 1.0. Values > 1.0 indicate slower transport due to sorption/interaction.

  • nullspace_objective (str or callable, optional) –

    Objective function to minimize in the nullspace. Options:

    • ”squared_differences” : Minimize sum of squared differences between adjacent deposition rates (default, provides smooth solutions)

    • ”summed_differences” : Minimize sum of absolute differences between adjacent deposition rates (promotes sparse/piecewise constant solutions)

    • callable : Custom objective function with signature objective(coeffs, x_ls, nullspace_basis)

    Default is “squared_differences”.

Returns:

Mean deposition rates [ng/m2/day] between tedges. Length equals len(tedges) - 1.

Return type:

numpy.ndarray

Raises:

ValueError – If input dimensions are incompatible, if flow contains NaN values, or if the optimization fails.

Notes

This function solves the inverse problem of determining deposition rates from observed concentration changes. Since multiple deposition histories can produce the same concentration pattern, regularization in the nullspace is used to select a physically meaningful solution.

The algorithm:

  1. Validates input dimensions and checks for NaN values in flow

  2. Computes deposition weight matrix relating deposition to concentration

  3. Identifies valid rows (no NaN in weights or concentration data)

  4. Solves the underdetermined system using nullspace regularization

  5. Returns the regularized deposition rate solution

Examples

Basic usage with default squared differences regularization:

>>> import pandas as pd
>>> import numpy as np
>>> from gwtransport.deposition import extraction_to_deposition
>>>
>>> # Create input data
>>> dates = pd.date_range("2020-01-01", "2020-01-10", freq="D")
>>> tedges = pd.date_range("2019-12-31 12:00", "2020-01-10 12:00", freq="D")
>>> cout_tedges = pd.date_range("2020-01-03 12:00", "2020-01-12 12:00", freq="D")
>>>
>>> # Flow and concentration data
>>> flow = np.full(len(dates), 100.0)  # m3/day
>>> cout = np.ones(len(cout_tedges) - 1) * 10.0  # ng/m3
>>>
>>> # Compute deposition rates
>>> dep = extraction_to_deposition(
...     flow=flow,
...     tedges=tedges,
...     cout=cout,
...     cout_tedges=cout_tedges,
...     aquifer_pore_volume=500.0,
...     porosity=0.3,
...     thickness=10.0,
... )
>>> print(f"Deposition rates shape: {dep.shape}")
>>> print(f"Mean deposition rate: {np.nanmean(dep):.2f} ng/m2/day")

With summed differences regularization for sparse solutions:

>>> dep_sparse = extraction_to_deposition(
...     flow=flow,
...     tedges=tedges,
...     cout=cout,
...     cout_tedges=cout_tedges,
...     aquifer_pore_volume=500.0,
...     porosity=0.3,
...     thickness=10.0,
...     nullspace_objective="summed_differences",
... )

With custom regularization objective:

>>> def l2_norm_objective(coeffs, x_ls, nullspace_basis):
...     x = x_ls + nullspace_basis @ coeffs
...     return np.sum(x**2)  # Minimize L2 norm of solution
>>>
>>> dep_l2 = extraction_to_deposition(
...     flow=flow,
...     tedges=tedges,
...     cout=cout,
...     cout_tedges=cout_tedges,
...     aquifer_pore_volume=500.0,
...     porosity=0.3,
...     thickness=10.0,
...     nullspace_objective=l2_norm_objective,
... )

Handling missing concentration data:

>>> # Concentration with some NaN values
>>> cout_nan = cout.copy()
>>> cout_nan[2:4] = np.nan  # Missing data for some time periods
>>>
>>> dep_robust = extraction_to_deposition(
...     flow=flow,
...     tedges=tedges,
...     cout=cout_nan,
...     cout_tedges=cout_tedges,
...     aquifer_pore_volume=500.0,
...     porosity=0.3,
...     thickness=10.0,
... )
gwtransport.deposition.spinup_duration(*, flow, flow_tedges, aquifer_pore_volume, retardation_factor)[source]#

Compute the spinup duration for deposition modeling.

The spinup duration is the residence time at the first time step, representing the time needed for the system to become fully informed. Before this duration, the extracted concentration lacks complete deposition history.

Parameters:
  • flow (numpy.ndarray) – Flow rate of water in the aquifer [m3/day].

  • flow_tedges (pandas.DatetimeIndex) – Time edges for the flow data.

  • aquifer_pore_volume (float) – Pore volume of the aquifer [m3].

  • retardation_factor (float) – Retardation factor of the compound in the aquifer [dimensionless].

Returns:

Spinup duration in days.

Return type:

float

diffusion#

Module that implements diffusion.

gwtransport.diffusion.infiltration_to_extraction(cin, flow, aquifer_pore_volume, diffusivity=0.1, retardation_factor=1.0, aquifer_length=80.0, porosity=0.35)[source]#

Compute the diffusion of a compound during 1D transport in the aquifer.

This function represents infiltration to extraction modeling (equivalent to convolution).

Parameters:
  • cin (pandas.Series) – Concentration or temperature of the compound in the infiltrating water [ng/m3].

  • flow (pandas.Series) – Flow rate of water in the aquifer [m3/day].

  • aquifer_pore_volume (float) – Pore volume of the aquifer [m3].

  • diffusivity (float, optional) – diffusivity of the compound in the aquifer [m2/day]. Default is 0.1.

  • retardation_factor (float, optional) – Retardation factor of the compound in the aquifer [dimensionless]. Default is 1.0.

  • aquifer_length (float, optional) – Length of the aquifer [m]. Default is 80.0.

  • porosity (float, optional) – Porosity of the aquifer [dimensionless]. Default is 0.35.

Returns:

Concentration of the compound in the extracted water [ng/m3].

Return type:

numpy.ndarray

gwtransport.diffusion.extraction_to_infiltration(cout, flow, aquifer_pore_volume, diffusivity=0.1, retardation_factor=1.0, aquifer_length=80.0, porosity=0.35)[source]#

Compute the reverse diffusion of a compound during 1D transport in the aquifer.

This function represents extraction to infiltration modeling (equivalent to deconvolution).

Parameters:
  • cout (pandas.Series) – Concentration or temperature of the compound in the extracted water [ng/m3].

  • flow (pandas.Series) – Flow rate of water in the aquifer [m3/day].

  • aquifer_pore_volume (float) – Pore volume of the aquifer [m3].

  • diffusivity (float, optional) – diffusivity of the compound in the aquifer [m2/day]. Default is 0.1.

  • retardation_factor (float, optional) – Retardation factor of the compound in the aquifer [dimensionless]. Default is 1.0.

  • aquifer_length (float, optional) – Length of the aquifer [m]. Default is 80.0.

  • porosity (float, optional) – Porosity of the aquifer [dimensionless]. Default is 0.35.

Returns:

Concentration of the compound in the infiltrating water [ng/m3].

Return type:

numpy.ndarray

Notes

Extraction to infiltration diffusion (deconvolution) is mathematically ill-posed and requires regularization to obtain a stable solution.

gwtransport.diffusion.compute_sigma_array(flow, aquifer_pore_volume, diffusivity=0.1, retardation_factor=1.0, aquifer_length=80.0, porosity=0.35)[source]#

Compute sigma values for diffusion based on flow and aquifer properties.

Parameters:
  • flow (pandas.Series) – Flow rate of water in the aquifer [m3/day].

  • aquifer_pore_volume (float) – Pore volume of the aquifer [m3].

  • diffusivity (float, optional) – diffusivity of the compound in the aquifer [m2/day]. Default is 0.1.

  • retardation_factor (float, optional) – Retardation factor of the compound in the aquifer [dimensionless]. Default is 1.0.

  • aquifer_length (float, optional) – Length of the aquifer [m]. Default is 80.0.

  • porosity (float, optional) – Porosity of the aquifer [dimensionless]. Default is 0.35.

Returns:

Array of sigma values for diffusion.

Return type:

numpy.ndarray

gwtransport.diffusion.convolve_diffusion(input_signal, sigma_array, truncate=4.0)[source]#

Apply Gaussian filter with position-dependent sigma values.

This function extends scipy.ndimage.gaussian_filter1d by allowing the standard deviation (sigma) of the Gaussian kernel to vary at each point in the signal. It implements the filter using a sparse convolution matrix where each row represents a Gaussian kernel with a locally-appropriate standard deviation.

Parameters:
  • input_signal (numpy.ndarray) – One-dimensional input array to be filtered.

  • sigma_array (numpy.ndarray) – One-dimensional array of standard deviation values, must have same length as input_signal. Each value specifies the Gaussian kernel width at the corresponding position.

  • truncate (float, optional) – Truncate the filter at this many standard deviations. Default is 4.0.

Returns:

The filtered input signal. Has the same shape as input_signal.

Return type:

numpy.ndarray

Notes

At the boundaries, the outer values are repeated to avoid edge effects. Equal to mode=`nearest` in scipy.ndimage.gaussian_filter1d.

The function constructs a sparse convolution matrix where each row represents a position-specific Gaussian kernel. The kernel width adapts to local sigma values, making it suitable for problems with varying diffusivitys or time steps.

For diffusion problems, the local sigma values can be calculated as: sigma = sqrt(2 * diffusivity * dt) / dx where diffusivity is the diffusivity, dt is the time step, and dx is the spatial step size.

The implementation uses sparse matrices for memory efficiency when dealing with large signals or when sigma values vary significantly.

See also

scipy.ndimage.gaussian_filter1d

Fixed-sigma Gaussian filtering

scipy.sparse

Sparse matrix implementations

Examples

>>> # Create a sample signal
>>> x = np.linspace(0, 10, 1000)
>>> signal = np.exp(-((x - 3) ** 2)) + 0.5 * np.exp(-((x - 7) ** 2) / 0.5)
>>> # Create position-dependent sigma values
>>> diffusivity = 0.1  # diffusivity
>>> dt = 0.001 * (1 + np.sin(2 * np.pi * x / 10))  # Varying time steps
>>> dx = x[1] - x[0]
>>> sigma_array = np.sqrt(2 * diffusivity * dt) / dx
>>> # Apply the filter
>>> filtered = convolve_diffusion(signal, sigma_array)
gwtransport.diffusion.deconvolve_diffusion(output_signal, sigma_array, truncate=4.0)[source]#

Apply Gaussian deconvolution with position-dependent sigma values.

This function extends scipy.ndimage.gaussian_filter1d by allowing the standard deviation (sigma) of the Gaussian kernel to vary at each point in the signal. It implements the filter using a sparse convolution matrix where each row represents a Gaussian kernel with a locally-appropriate standard deviation.

Parameters:
  • output_signal (numpy.ndarray) – One-dimensional input array to be filtered.

  • sigma_array (numpy.ndarray) – One-dimensional array of standard deviation values, must have same length as output_signal. Each value specifies the Gaussian kernel width at the corresponding position.

  • truncate (float, optional) – Truncate the filter at this many standard deviations. Default is 4.0.

Returns:

The filtered output signal. Has the same shape as output_signal.

Return type:

numpy.ndarray

gwtransport.diffusion.create_example_data(nx=1000, domain_length=10.0, diffusivity=0.1)[source]#

Create example data for demonstrating variable-sigma diffusion.

Parameters:
  • nx (int, optional) – Number of spatial points. Default is 1000.

  • domain_length (float, optional) – Domain length. Default is 10.0.

  • diffusivity (float, optional) – diffusivity. Default is 0.1.

Returns:

Notes

This function creates a test case with: - A signal composed of two Gaussian peaks - Sinusoidally varying time steps - Corresponding sigma values for diffusion

gamma#

Functions for working with gamma distributions.

gwtransport.gamma.parse_parameters(alpha=None, beta=None, mean=None, std=None)[source]#

Parse parameters for gamma distribution.

Either alpha and beta or mean and std must be provided.

Parameters:
  • alpha (float, optional) – Shape parameter of gamma distribution (must be > 0)

  • beta (float, optional) – Scale parameter of gamma distribution (must be > 0)

  • mean (float, optional) – Mean of the gamma distribution.

  • std (float, optional) – Standard deviation of the gamma distribution.

Returns:

Shape and scale parameters of the gamma distribution.

Return type:

tuple

Raises:

ValueError – If both alpha and beta are None or if both mean and std are None. If alpha or beta are not positive.

gwtransport.gamma.mean_std_to_alpha_beta(mean, std)[source]#

Convert mean and standard deviation of gamma distribution to shape and scale parameters.

Parameters:
  • mean (float) – Mean of the gamma distribution.

  • std (float) – Standard deviation of the gamma distribution.

Returns:

Shape and scale parameters of the gamma distribution.

Return type:

tuple

gwtransport.gamma.alpha_beta_to_mean_std(alpha, beta)[source]#

Convert shape and scale parameters of gamma distribution to mean and standard deviation.

Parameters:
  • alpha (float) – Shape parameter of the gamma distribution.

  • beta (float) – Scale parameter of the gamma distribution.

Returns:

Mean and standard deviation of the gamma distribution.

Return type:

tuple

gwtransport.gamma.bins(*, alpha=None, beta=None, mean=None, std=None, n_bins=None, quantile_edges=None)[source]#

Divide gamma distribution into bins and compute various bin properties.

If n_bins is provided, the gamma distribution is divided into n_bins equal-mass bins. If quantile_edges is provided, the gamma distribution is divided into bins defined by the quantile edges. The quantile edges must be in the range [0, 1] and of size n_bins + 1. The first and last quantile edges must be 0 and 1, respectively.

Parameters:
  • alpha (float, optional) – Shape parameter of gamma distribution (must be > 0)

  • beta (float, optional) – Scale parameter of gamma distribution (must be > 0)

  • mean (float, optional) – Mean of the gamma distribution.

  • std (float, optional) – Standard deviation of the gamma distribution.

  • n_bins (int, optional) – Number of bins to divide the gamma distribution (must be > 1)

  • quantile_edges (ArrayLike, optional) – Quantile edges for binning. Must be in the range [0, 1] and of size n_bins + 1. The first and last quantile edges must be 0 and 1, respectively. If provided, n_bins is ignored.

Returns:

  • lower_bound: lower bounds of bins (first one is 0)

  • upper_bound: upper bounds of bins (last one is inf)

  • edges: bin edges (lower_bound[0], upper_bound[0], …, upper_bound[-1])

  • expected_values: expected values in bins. Is what you would expect to observe if you repeatedly sampled from the probability distribution, but only considered samples that fall within that particular bin

  • probability_mass: probability mass in bins

Return type:

dict of arrays with keys

gwtransport.gamma.bin_masses(alpha, beta, bin_edges)[source]#

Calculate probability mass for each bin in gamma distribution.

Is the area under the gamma distribution PDF between the bin edges.

Parameters:
  • alpha (float) – Shape parameter of gamma distribution (must be > 0)

  • beta (float) – Scale parameter of gamma distribution (must be > 0)

  • bin_edges (ArrayLike) – Bin edges. Array of increasing values of size len(bins) + 1. Must be > 0.

Returns:

Probability mass for each bin

Return type:

array

logremoval#

Functions for calculating log removal efficiency in water treatment systems.

This module provides utilities to calculate log removal values for different configurations of water treatment systems, including both basic log removal calculations and parallel flow arrangements where multiple treatment processes operate simultaneously on different fractions of the total flow.

Log removal is a standard measure in water treatment that represents the reduction of pathogen concentration on a logarithmic scale. For example, a log removal of 3 represents a 99.9% reduction in pathogen concentration.

Functions#

residence_time_to_log_removal : Calculate log removal from residence times and removal rate parallel_mean : Calculate weighted average log removal for parallel flow systems gamma_pdf : Compute PDF of log removal given gamma-distributed residence time gamma_cdf : Compute CDF of log removal given gamma-distributed residence time gamma_mean : Compute mean log removal for gamma-distributed residence time gamma_find_flow_for_target_mean : Find flow rate for target mean log removal

Notes

For systems in series, log removals are typically summed directly, while for parallel systems, a weighted average based on flow distribution is required. The parallel_mean function supports multi-dimensional arrays via the axis parameter and performs minimal validation for improved performance.

gwtransport.logremoval.residence_time_to_log_removal(residence_times, log_removal_rate)[source]#

Compute log removal given residence times and a log removal rate.

This function calculates the log removal efficiency based on the residence times of water in a treatment system and the log removal rate coefficient.

The calculation uses the formula: Log Removal = log_removal_rate * log10(residence_time)

Parameters:
  • residence_times (ArrayLike) – Array of residence times (in consistent units, e.g., hours, days). Must be positive values.

  • log_removal_rate (float) – Log removal rate coefficient that relates residence time to log removal efficiency. Units should be consistent with residence_times.

Returns:

log_removals – Array of log removal values corresponding to the input residence times. Same shape as input residence_times.

Return type:

numpy.ndarray

Notes

Log removal is a logarithmic measure of pathogen reduction: - Log 1 = 90% reduction - Log 2 = 99% reduction - Log 3 = 99.9% reduction

The log removal rate coefficient determines how effectively the treatment system removes pathogens per unit log time.

Examples

>>> import numpy as np
>>> residence_times = np.array([1.0, 10.0, 100.0])
>>> log_removal_rate = 2.0
>>> residence_time_to_log_removal(residence_times, log_removal_rate)
array([0.   , 2.   , 4.   ])
>>> # Single residence time
>>> residence_time_to_log_removal(5.0, 1.5)
1.0484550065040283
>>> # 2D array of residence times
>>> residence_times_2d = np.array([[1.0, 10.0], [100.0, 1000.0]])
>>> residence_time_to_log_removal(residence_times_2d, 1.0)
array([[0., 1.],
       [2., 3.]])
gwtransport.logremoval.parallel_mean(log_removals, flow_fractions=None, axis=None)[source]#

Calculate the weighted average log removal for a system with parallel flows.

This function computes the overall log removal efficiency of a parallel filtration system. If flow_fractions is not provided, it assumes equal distribution of flow across all paths.

The calculation uses the formula:

Total Log Removal = -log10(sum(F_i * 10^(-LR_i)))

Where: - F_i = fraction of flow through system i (decimal, sum to 1.0) - LR_i = log removal of system i

Parameters:
  • log_removals (ArrayLike) – Array of log removal values for each parallel flow. Each value represents the log10 reduction of pathogens. For multi-dimensional arrays, the parallel mean is computed along the specified axis.

  • flow_fractions (ArrayLike, optional) – Array of flow fractions for each parallel flow. Must sum to 1.0 along the specified axis and have compatible shape with log_removals. If None, equal flow distribution is assumed (default is None).

  • axis (int, optional) – Axis along which to compute the parallel mean for multi-dimensional arrays. If None, the array is treated as 1-dimensional (default is None).

Returns:

The combined log removal value for the parallel system. If log_removals is multi-dimensional and axis is specified, returns an array with the specified axis removed.

Return type:

float or ArrayLike

Notes

This function performs minimal input validation to reduce complexity. NumPy will handle most error cases naturally through broadcasting and array operations.

Notes

Log removal is a logarithmic measure of pathogen reduction: - Log 1 = 90% reduction - Log 2 = 99% reduction - Log 3 = 99.9% reduction

For parallel flows, the combined removal is typically less effective than the best individual removal but better than the worst.

Examples

>>> import numpy as np
>>> # Three parallel streams with equal flow and log removals of 3, 4, and 5
>>> log_removals = np.array([3, 4, 5])
>>> parallel_mean(log_removals)
3.431798275933005
>>> # Two parallel streams with weighted flow
>>> log_removals = np.array([3, 5])
>>> flow_fractions = np.array([0.7, 0.3])
>>> parallel_mean(log_removals, flow_fractions)
3.153044674980176
>>> # Multi-dimensional array: parallel mean along axis 1
>>> log_removals_2d = np.array([[3, 4, 5], [2, 3, 4]])
>>> parallel_mean(log_removals_2d, axis=1)
array([3.43179828, 2.43179828])

See also

For, log

gwtransport.logremoval.gamma_pdf(r, rt_alpha, rt_beta, log_removal_rate)[source]#

Compute the probability density function (PDF) of log removal given a gamma distribution for the residence time.

gamma(rt_alpha, rt_beta) = gamma(apv_alpha, apv_beta / flow)

Parameters:
  • r (ArrayLike) – Log removal values at which to compute the PDF.

  • rt_alpha (float) – Shape parameter of the gamma distribution for residence time.

  • rt_beta (float) – Scale parameter of the gamma distribution for residence time.

  • log_removal_rate (float) – Coefficient for log removal calculation (R = log_removal_rate * log10(T)).

Returns:

pdf – PDF values corresponding to the input r values.

Return type:

numpy.ndarray

gwtransport.logremoval.gamma_cdf(r, rt_alpha, rt_beta, log_removal_rate)[source]#

Compute the cumulative distribution function (CDF) of log removal given a gamma distribution for the residence time.

gamma(rt_alpha, rt_beta) = gamma(apv_alpha, apv_beta / flow)

Parameters:
  • r (ArrayLike) – Log removal values at which to compute the CDF.

  • alpha (float) – Shape parameter of the gamma distribution for residence time.

  • beta (float) – Scale parameter of the gamma distribution for residence time.

  • log_removal_rate (float) – Coefficient for log removal calculation (R = log_removal_rate * log10(T)).

Returns:

cdf – CDF values corresponding to the input r values.

Return type:

numpy.ndarray

gwtransport.logremoval.gamma_mean(rt_alpha, rt_beta, log_removal_rate)[source]#

Compute the mean of the log removal distribution given a gamma distribution for the residence time.

gamma(rt_alpha, rt_beta) = gamma(apv_alpha, apv_beta / flow)

Parameters:
  • rt_alpha (float) – Shape parameter of the gamma distribution for residence time.

  • rt_beta (float) – Scale parameter of the gamma distribution for residence time.

  • log_removal_rate (float) – Coefficient for log removal calculation (R = log_removal_rate * log10(T)).

Returns:

mean – Mean value of the log removal distribution.

Return type:

float

gwtransport.logremoval.gamma_find_flow_for_target_mean(target_mean, apv_alpha, apv_beta, log_removal_rate)[source]#

Find the flow rate flow that produces a specified target mean log removal given a gamma distribution for the residence time.

gamma(rt_alpha, rt_beta) = gamma(apv_alpha, apv_beta / flow)

Parameters:
  • target_mean (float) – Target mean log removal value.

  • apv_alpha (float) – Shape parameter of the gamma distribution for residence time.

  • apv_beta (float) – Scale parameter of the gamma distribution for pore volume.

  • log_removal_rate (float) – Coefficient for log removal calculation (R = log_removal_rate * log10(T)).

Returns:

flow – Flow rate that produces the target mean log removal.

Return type:

float

Notes

This function uses the analytical solution derived from the mean formula. From E[R] = (log_removal_rate/ln(10)) * (digamma(alpha) + ln(beta) - ln(Q)), we can solve for Q to get: flow = beta * exp(ln(10)*target_mean/log_removal_rate - digamma(alpha))

residence_time#

Residence time of a compound in the aquifer.

This module provides functions to compute the residence time of a compound in the aquifer. The residence time is the time it takes for the compound to travel from the infiltration point to the extraction point. The compound is retarded in the aquifer with a retardation factor.

Main functions:

  • residence_time: Compute the residence time of a retarded compound in the aquifer at indices.

  • residence_time_mean: Compute the mean residence time of a retarded compound in the aquifer between specified time edges.

gwtransport.residence_time.residence_time(flow=None, flow_tedges=None, aquifer_pore_volume=None, index=None, retardation_factor=1.0, direction='extraction_to_infiltration', *, return_pandas_series=False)[source]#

Compute the residence time of retarded compound in the water in the aquifer.

Parameters:
  • flow (ArrayLike) – Flow rate of water in the aquifer [m3/day]. The length of flow should match the length of flow_tedges minus one.

  • flow_tedges (pandas.DatetimeIndex) – Time edges for the flow data. Used to compute the cumulative flow. Has a length of one more than flow.

  • aquifer_pore_volume (float or ArrayLike of float) – Pore volume of the aquifer [m3].

  • index (pandas.DatetimeIndex, optional) – Index at which to compute the residence time. If left to None, the index of flow is used. Default is None.

  • retardation_factor (float) – Retardation factor of the compound in the aquifer [dimensionless].

  • direction ({'extraction_to_infiltration', 'infiltration_to_extraction'}, optional) – Direction of the flow calculation: * ‘extraction_to_infiltration’: Extraction to infiltration modeling - how many days ago was the extracted water infiltrated * ‘infiltration_to_extraction’: Infiltration to extraction modeling - how many days until the infiltrated water is extracted Default is ‘extraction_to_infiltration’.

  • return_pandas_series (bool, optional) – If True, return a pandas Series with the residence time at the index provided. Only supported for a single aquifer pore volume. This parameter is deprecated and will be removed in a future version.

Returns:

Residence time of the retarded compound in the aquifer [days].

Return type:

numpy.ndarray

gwtransport.residence_time.residence_time_mean(flow, flow_tedges, tedges_out, aquifer_pore_volume, *, direction='extraction_to_infiltration', retardation_factor=1.0)[source]#

Compute the mean residence time of a retarded compound in the aquifer between specified time edges.

This function calculates the average residence time of a retarded compound in the aquifer between specified time intervals. It can compute both extraction to infiltration modeling (extraction direction: when was extracted water infiltrated) and infiltration to extraction modeling (infiltration direction: when will infiltrated water be extracted).

The function handles time series data by computing the cumulative flow and using linear interpolation and averaging to determine mean residence times between the specified time edges.

Parameters:
  • flow (ArrayLike) – Flow rate of water in the aquifer [m3/day]. Should be an array of flow values corresponding to the intervals defined by flow_tedges.

  • flow_tedges (ArrayLike) – Time edges for the flow data, as datetime64 objects. These define the time intervals for which the flow values are provided.

  • tedges_out (ArrayLike) – Output time edges as datetime64 objects. These define the intervals for which the mean residence times will be calculated.

  • aquifer_pore_volume (float or ArrayLike) – Pore volume of the aquifer [m3]. Can be a single value or an array of values for multiple pore volume scenarios.

  • direction ({'extraction_to_infiltration', 'infiltration_to_extraction'}, optional) – Direction of the flow calculation: * ‘extraction_to_infiltration’: Extraction to infiltration modeling - how many days ago was the extracted water infiltrated * ‘infiltration_to_extraction’: Infiltration to extraction modeling - how many days until the infiltrated water is extracted Default is ‘extraction_to_infiltration’.

  • retardation_factor (float, optional) – Retardation factor of the compound in the aquifer [dimensionless]. A value greater than 1.0 indicates that the compound moves slower than water. Default is 1.0 (no retardation).

Returns:

Mean residence time of the retarded compound in the aquifer [days] for each interval defined by tedges_out. The first dimension corresponds to the different pore volumes and the second to the residence times between tedges_out.

Return type:

numpy.ndarray

Notes

  • The function converts datetime objects to days since the start of the time series.

  • For extraction_to_infiltration direction, the function computes how many days ago water was infiltrated.

  • For infiltration_to_extraction direction, the function computes how many days until water will be extracted.

  • The function uses linear interpolation for computing residence times at specific points and linear averaging for computing mean values over intervals.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> # Create sample flow data
>>> flow_dates = pd.date_range(start="2023-01-01", end="2023-01-10", freq="D")
>>> flow_values = np.full(len(flow_dates) - 1, 100.0)  # Constant flow of 100 m³/day
>>> pore_volume = 200.0  # Aquifer pore volume in m³
>>> # Calculate mean residence times
>>> mean_times = residence_time_mean(
...     flow=flow_values,
...     flow_tedges=flow_dates,
...     tedges_out=flow_dates,
...     aquifer_pore_volume=pore_volume,
...     direction="extraction_to_infiltration",
... )
>>> # With constant flow of 100 m³/day and pore volume of 200 m³,
>>> # mean residence time should be approximately 2 days
>>> print(mean_times)  # Output: [np.nan, np.nan, 2.0, 2.0, 2.0, ..., 2.0]
gwtransport.residence_time.fraction_explained(rt=None, flow=None, flow_tedges=None, aquifer_pore_volume=None, index=None, retardation_factor=1.0, direction='extraction_to_infiltration')[source]#

Compute the fraction of the aquifer that is informed with respect to the retarded flow.

Parameters:
  • rt (numpy.ndarray, optional) – Pre-computed residence time array [days]. If not provided, it will be computed.

  • flow (ArrayLike, optional) – Flow rate of water in the aquifer [m3/day]. The length of flow should match the length of flow_tedges minus one.

  • flow_tedges (pandas.DatetimeIndex, optional) – Time edges for the flow data. Used to compute the cumulative flow. Has a length of one more than flow. Inbetween neighboring time edges, the flow is assumed constant.

  • aquifer_pore_volume (float or ArrayLike of float, optional) – Pore volume of the aquifer [m3].

  • index (pandas.DatetimeIndex, optional) – Index at which to compute the fraction. If left to None, the index of flow is used. Default is None.

  • retardation_factor (float or ArrayLike of float, optional) – Retardation factor of the compound in the aquifer [dimensionless].

  • direction ({'extraction_to_infiltration', 'infiltration_to_extraction'}, optional) – Direction of the flow calculation: * ‘extraction_to_infiltration’: Extraction to infiltration modeling - how many days ago was the extracted water infiltrated * ‘infiltration_to_extraction’: Infiltration to extraction modeling - how many days until the infiltrated water is extracted Default is ‘extraction_to_infiltration’.

  • return_pandas_series (bool, optional) – If True, return a pandas Series with the residence time at the index provided. Only supported for a single aquifer pore volume. This parameter is deprecated and will be removed in a future version.

Returns:

Fraction of the aquifer that is informed with respect to the retarded flow.

Return type:

numpy.ndarray

utils#

General utilities for the 1D groundwater transport model.

gwtransport.utils.linear_interpolate(x_ref, y_ref, x_query, left=None, right=None)[source]#

Linear interpolation on monotonically increasing data.

Parameters:
  • x_ref (ArrayLike) – Reference vector with sorted x-values.

  • y_ref (ArrayLike) – Reference vector with y-values.

  • x_query (ArrayLike) – Query x-values. Array may have any shape.

  • left (float, optional) – Value to return for x_query < x_ref[0]. - If left is set to None, x_query = x_ref[0]. - If left is set to a float, such as np.nan, this value is returned.

  • right (float, optional) – Value to return for x_query > x_ref[-1]. - If right is set to None, x_query = x_ref[-1]. - If right is set to a float, such as np.nan, this value is returned.

Returns:

Interpolated y-values.

Return type:

numpy.ndarray

gwtransport.utils.interp_series(series, index_new, **interp1d_kwargs)[source]#

Interpolate a pandas.Series to a new index.

Parameters:
  • series (pandas.Series) – Series to interpolate.

  • index_new (pandas.DatetimeIndex) – New index to interpolate to.

  • interp1d_kwargs (dict, optional) – Keyword arguments passed to scipy.interpolate.interp1d. Default is {}.

Returns:

Interpolated series.

Return type:

pandas.Series

gwtransport.utils.diff(a, alignment='centered')[source]#

Compute the cell widths for a given array of cell coordinates.

If alignment is “centered”, the coordinates are assumed to be centered in the cells. If alignment is “left”, the coordinates are assumed to be at the left edge of the cells. If alignment is “right”, the coordinates are assumed to be at the right edge of the cells.

Parameters:

a (ArrayLike) – Input array.

Returns:

Array with differences between elements.

Return type:

numpy.ndarray

gwtransport.utils.linear_average(x_data, y_data, x_edges, extrapolate_method='nan')[source]#

Compute the average value of a piecewise linear time series between specified x-edges.

Parameters:
  • x_data (ArrayLike) – x-coordinates of the time series data points, must be in ascending order

  • y_data (ArrayLike) – y-coordinates of the time series data points

  • x_edges (ArrayLike) – x-coordinates of the integration edges. Can be 1D or 2D. - If 1D: shape (n_edges,). Can be 1D or 2D. - If 1D: shape (n_edges,), must be in ascending order - If 2D: shape (n_series, n_edges), each row must be in ascending order - If 2D: shape (n_series, n_edges), each row must be in ascending order

  • extrapolate_method (str, optional) – Method for handling extrapolation. Default is ‘nan’. - ‘outer’: Extrapolate using the outermost data points. - ‘nan’: Extrapolate using np.nan. - ‘raise’: Raise an error for out-of-bounds values.

Returns:

2D array of average values between consecutive pairs of x_edges. Shape is (n_series, n_bins) where n_bins = n_edges - 1. If x_edges is 1D, n_series = 1.

Return type:

numpy.ndarray

Examples

>>> x_data = [0, 1, 2, 3]
>>> y_data = [0, 1, 1, 0]
>>> x_edges = [0, 1.5, 3]
>>> linear_average(x_data, y_data, x_edges)
array([[0.667, 0.667]])
>>> x_edges_2d = [[0, 1.5, 3], [0.5, 2, 3]]
>>> linear_average(x_data, y_data, x_edges_2d)
array([[0.667, 0.667], [0.75, 0.5]])
gwtransport.utils.partial_isin(bin_edges_in, bin_edges_out)[source]#

Calculate the fraction of each input bin that overlaps with each output bin.

This function computes a matrix where element (i, j) represents the fraction of input bin i that overlaps with output bin j. The computation uses vectorized operations to avoid loops.

Parameters:
  • bin_edges_in (ArrayLike) – 1D array of input bin edges in ascending order. For n_in bins, there should be n_in+1 edges.

  • bin_edges_out (ArrayLike) – 1D array of output bin edges in ascending order. For n_out bins, there should be n_out+1 edges.

Returns:

overlap_matrix – Dense matrix of shape (n_in, n_out) where n_in is the number of input bins and n_out is the number of output bins. Each element (i, j) represents the fraction of input bin i that overlaps with output bin j. Values range from 0 (no overlap) to 1 (complete overlap).

Return type:

numpy.ndarray

Notes

  • Both input arrays must be sorted in ascending order

  • The function leverages the sorted nature of both inputs for efficiency

  • Uses vectorized operations to handle large bin arrays efficiently

  • All overlaps sum to 1.0 for each input bin when output bins fully cover input range

Examples

>>> bin_edges_in = np.array([0, 10, 20, 30])
>>> bin_edges_out = np.array([5, 15, 25])
>>> partial_isin(bin_edges_in, bin_edges_out)
array([[0.5, 0.0],
       [0.5, 0.5],
       [0.0, 0.5]])
gwtransport.utils.time_bin_overlap(tedges, bin_tedges)[source]#

Calculate the fraction of each time bin that overlaps with each time range.

This function computes an array where element (i, j) represents the fraction of time bin j that overlaps with time range i. The computation uses vectorized operations to avoid loops.

Parameters:
  • tedges (ArrayLike) – 1D array of time bin edges in ascending order. For n bins, there should be n+1 edges.

  • bin_tedges (list of tuples) – List of tuples where each tuple contains (start_time, end_time) defining a time range.

Returns:

overlap_array – Array of shape (len(bin_tedges), n_bins) where n_bins is the number of time bins. Each element (i, j) represents the fraction of time bin j that overlaps with time range i. Values range from 0 (no overlap) to 1 (complete overlap).

Return type:

numpy.ndarray

Notes

  • tedges must be sorted in ascending order

  • Uses vectorized operations to handle large arrays efficiently

  • Time ranges in bin_tedges can be in any order and can overlap

Examples

>>> tedges = np.array([0, 10, 20, 30])
>>> bin_tedges = [(5, 15), (25, 35)]
>>> time_bin_overlap(tedges, bin_tedges)
array([[0.5, 0.5, 0.0],
       [0.0, 0.0, 0.5]])
gwtransport.utils.generate_failed_coverage_badge()[source]#

Generate a badge indicating failed coverage.

Return type:

None

gwtransport.utils.combine_bin_series(a, a_edges, b, b_edges, extrapolation=0.0)[source]#

Combine two binned series onto a common set of unique edges.

This function takes two binned series (a and b) with their respective bin edges and creates new series (c and d) that are defined on a combined set of unique edges from both input edge arrays.

Parameters:
  • a (ArrayLike) – Values for the first binned series.

  • a_edges (ArrayLike) – Bin edges for the first series. Must have len(a) + 1 elements.

  • b (ArrayLike) – Values for the second binned series.

  • b_edges (ArrayLike) – Bin edges for the second series. Must have len(b) + 1 elements.

  • extrapolation (str or float, optional) – Method for handling combined bins that fall outside the original series ranges. - ‘nearest’: Use the nearest original bin value - float value (e.g., np.nan, 0.0): Fill with the specified value (default: 0.0)

Return type:

tuple[ndarray, ndarray, ndarray, ndarray]

Returns:

  • c (numpy.ndarray) – Values from series a mapped to the combined edge structure.

  • c_edges (numpy.ndarray) – Combined unique edges from a_edges and b_edges.

  • d (numpy.ndarray) – Values from series b mapped to the combined edge structure.

  • d_edges (numpy.ndarray) – Combined unique edges from a_edges and b_edges (same as c_edges).

Notes

The combined edges are created by taking the union of all unique values from both a_edges and b_edges, sorted in ascending order. The values are then broadcasted/repeated for each combined bin that falls within the original bin’s range.

gwtransport.utils.compute_time_edges(tedges, tstart, tend, number_of_bins)[source]#

Compute time edges for binning data based on provided time parameters.

This function creates a DatetimeIndex of time bin edges from one of three possible input formats: explicit edges, start times, or end times. The resulting edges define the boundaries of time intervals for data binning.

Define either explicit time edges, or start and end times for each bin and leave the others at None.

Parameters:
  • tedges (pandas.DatetimeIndex or None) – Explicit time edges for the bins. If provided, must have one more element than the number of bins (n_bins + 1). Takes precedence over tstart and tend.

  • tstart (pandas.DatetimeIndex or None) – Start times for each bin. Must have the same number of elements as the number of bins. Used when tedges is None.

  • tend (pandas.DatetimeIndex or None) – End times for each bin. Must have the same number of elements as the number of bins. Used when both tedges and tstart are None.

  • number_of_bins (int) – The expected number of time bins. Used for validation against the provided time parameters.

Returns:

Time edges defining the boundaries of the time bins. Has one more element than number_of_bins.

Return type:

pandas.DatetimeIndex

Raises:

ValueError – If tedges has incorrect length (not number_of_bins + 1). If tstart has incorrect length (not equal to number_of_bins). If tend has incorrect length (not equal to number_of_bins). If none of tedges, tstart, or tend are provided.

Notes

  • When using tstart, the function assumes uniform spacing and extrapolates the final edge based on the spacing between the last two start times.

  • When using tend, the function assumes uniform spacing and extrapolates the first edge based on the spacing between the first two end times.

  • All input time data is converted to pandas.DatetimeIndex for consistency.

gwtransport.utils.get_soil_temperature(station_number=260, *, interpolate_missing_values=True)[source]#

Download soil temperature data from the KNMI and return it as a pandas DataFrame.

The data is available for the following KNMI weather stations: - 260: De Bilt, the Netherlands (vanaf 1981) - 273: Marknesse, the Netherlands (vanaf 1989) - 286: Nieuw Beerta, the Netherlands (vanaf 1990) - 323: Wilhelminadorp, the Netherlands (vanaf 1989)

TB1 = grondtemperatuur op 5 cm diepte (graden Celsius) tijdens de waarneming TB2 = grondtemperatuur op 10 cm diepte (graden Celsius) tijdens de waarneming TB3 = grondtemperatuur op 20 cm diepte (graden Celsius) tijdens de waarneming TB4 = grondtemperatuur op 50 cm diepte (graden Celsius) tijdens de waarneming TB5 = grondtemperatuur op 100 cm diepte (graden Celsius) tijdens de waarneming TNB2 = minimum grondtemperatuur op 10 cm diepte in de afgelopen 6 uur (graden Celsius) TNB1 = minimum grondtemperatuur op 5 cm diepte in de afgelopen 6 uur (graden Celsius) TXB1 = maximum grondtemperatuur op 5 cm diepte in de afgelopen 6 uur (graden Celsius) TXB2 = maximum grondtemperatuur op 10 cm diepte in de afgelopen 6 uur (graden Celsius)

Parameters:
  • station_number (int, {260, 273, 286, 323}) – The KNMI station number for which to download soil temperature data. Default is 260 (De Bilt).

  • interpolate_missing_values (bool, optional) – If True, missing values are interpolated and recent NaN values are extrapolated with the previous value. If False, missing values remain as NaN. Default is True.

Returns:

DataFrame containing soil temperature data in Celsius with a DatetimeIndex. Columns include TB1, TB2, TB3, TB4, TB5, TNB1, TNB2, TXB1, TXB2.

Return type:

pandas.DataFrame

Notes

  • KNMI: Royal Netherlands Meteorological Institute

  • The timeseries may contain NaN values for missing data.

gwtransport.utils.solve_underdetermined_system(coefficient_matrix, rhs_vector, *, nullspace_objective='squared_differences', optimization_method='BFGS')[source]#

Solve an underdetermined linear system with nullspace regularization.

For an underdetermined system Ax = b where A has more columns than rows, multiple solutions exist. This function computes a least-squares solution and then selects a specific solution from the nullspace based on a regularization objective.

Parameters:
  • coefficient_matrix (ArrayLike) – Coefficient matrix of shape (m, n) where m < n (underdetermined). May contain NaN values in some rows, which will be excluded from the system.

  • rhs_vector (ArrayLike) – Right-hand side vector of length m. May contain NaN values corresponding to NaN rows in coefficient_matrix, which will be excluded from the system.

  • nullspace_objective (str or callable, optional) –

    Objective function to minimize in the nullspace. Options:

    • ”squared_differences” : Minimize sum of squared differences between adjacent elements: sum((x[i+1] - x[i])**2)

    • ”summed_differences” : Minimize sum of absolute differences between adjacent elements: sum(|x[i+1] - x[i]|)

    • callable : Custom objective function with signature objective(coeffs, x_ls, nullspace_basis) where:

      • coeffs : optimization variables (nullspace coefficients)

      • x_ls : least-squares solution

      • nullspace_basis : nullspace basis matrix

    Default is “squared_differences”.

  • optimization_method (str, optional) – Optimization method passed to scipy.optimize.minimize. Default is “BFGS”.

Returns:

Solution vector that minimizes the specified nullspace objective. Has length n (number of columns in coefficient_matrix).

Return type:

numpy.ndarray

Raises:

ValueError – If optimization fails, if coefficient_matrix and rhs_vector have incompatible shapes, or if an unknown nullspace objective is specified.

Notes

The algorithm follows these steps:

  1. Remove rows with NaN values from both coefficient_matrix and rhs_vector

  2. Compute least-squares solution: x_ls = pinv(valid_matrix) @ valid_rhs

  3. Compute nullspace basis: N = null_space(valid_matrix)

  4. Find nullspace coefficients: coeffs = argmin objective(x_ls + N @ coeffs)

  5. Return final solution: x = x_ls + N @ coeffs

For the built-in objectives:

  • “squared_differences” provides smooth solutions, minimizing rapid changes

  • “summed_differences” provides sparse solutions, promoting piecewise constant behavior

Examples

Basic usage with default squared differences objective:

>>> import numpy as np
>>> from gwtransport.utils import solve_underdetermined_system
>>>
>>> # Create underdetermined system (2 equations, 4 unknowns)
>>> matrix = np.array([[1, 2, 1, 0], [0, 1, 2, 1]])
>>> rhs = np.array([3, 4])
>>>
>>> # Solve with squared differences regularization
>>> x = solve_underdetermined_system(matrix, rhs)
>>> print(f"Solution: {x}")
>>> print(f"Residual: {np.linalg.norm(matrix @ x - rhs):.2e}")

With summed differences objective:

>>> x_sparse = solve_underdetermined_system(
...     matrix, rhs, nullspace_objective="summed_differences"
... )

With custom objective function:

>>> def custom_objective(coeffs, x_ls, nullspace_basis):
...     x = x_ls + nullspace_basis @ coeffs
...     return np.sum(x**2)  # Minimize L2 norm
>>>
>>> x_custom = solve_underdetermined_system(
...     matrix, rhs, nullspace_objective=custom_objective
... )

Handling NaN values:

>>> # System with missing data
>>> matrix_nan = np.array([
...     [1, 2, 1, 0],
...     [np.nan, np.nan, np.nan, np.nan],
...     [0, 1, 2, 1],
... ])
>>> rhs_nan = np.array([3, np.nan, 4])
>>>
>>> x_nan = solve_underdetermined_system(matrix_nan, rhs_nan)