gwtransport#
gwtransport: A Python package for solving one-dimensional groundwater transport problems.
advection#
Advection Analysis for 1D Aquifer Systems.
This module provides functions to analyze compound transport by advection in aquifer systems. It includes tools for computing concentrations of the extracted water based on the concentration of the infiltrating water, extraction data and aquifer properties.
The model assumes requires the groundwaterflow to be reduced to a 1D system. On one side, water with a certain concentration infiltrates (‘cin’), the water flows through the aquifer and the compound of interest flows through the aquifer with a retarded velocity. The water is extracted (‘cout’).
Main functions: - infiltration_to_extraction: Compute the concentration of the extracted water by shifting cin with its residence time. This corresponds to a convolution operation. - gamma_infiltration_to_extraction: Similar to infiltration_to_extraction, but for a gamma distribution of aquifer pore volumes. - distribution_infiltration_to_extraction: Similar to infiltration_to_extraction, but for an arbitrary distribution of aquifer pore volumes.
- gwtransport.advection.infiltration_to_extraction(cin_series, flow_series, aquifer_pore_volume, retardation_factor=1.0, cout_index='cin')[source]#
Compute the concentration of the extracted water by shifting cin with its residence time.
The compound is retarded in the aquifer with a retardation factor. The residence time is computed based on the flow rate of the water in the aquifer and the pore volume of the aquifer.
This function represents infiltration to extraction modeling (equivalent to convolution).
- Parameters:
cin_series (pandas.Series) – Concentration of the compound in the infiltrating water [ng/m3]. The cin_series should be the average concentration of a time bin. The index should be a pandas.DatetimeIndex and is labeled at the end of the time bin.
flow_series (pandas.Series) – Flow rate of water in the aquifer [m3/day]. The flow_series should be the average flow of a time bin. The index should be a pandas.DatetimeIndex and is labeled at the end of the time bin.
aquifer_pore_volume (float) – Pore volume of the aquifer [m3].
cout_index (str, optional) – The index of the output series. Can be ‘cin’, ‘flow’, or ‘cout’. Default is ‘cin’. - ‘cin’: The output series will have the same index as cin_series. - ‘flow’: The output series will have the same index as flow_series. - ‘cout’: The output series will have the same index as cin_series + residence_time.
- Returns:
Concentration of the compound in the extracted water [ng/m3].
- Return type:
Examples
Basic usage with single pore volume:
>>> import pandas as pd >>> import numpy as np >>> from gwtransport.advection import infiltration_to_extraction >>> >>> # Create input data >>> dates = pd.date_range(start="2020-01-01", end="2020-01-10", freq="D") >>> cin = pd.Series(np.ones(len(dates)), index=dates) >>> flow = pd.Series(np.ones(len(dates)) * 100, index=dates) # 100 m3/day >>> >>> # Single aquifer pore volume >>> aquifer_pore_volume = 500.0 # m3 >>> >>> # Run infiltration to extraction model (default returns on cin index) >>> cout = infiltration_to_extraction(cin, flow, aquifer_pore_volume) >>> len(cout) == len(cin) True
Different output index options:
>>> # Output on cin index (default) >>> cout_cin = infiltration_to_extraction( ... cin, flow, aquifer_pore_volume, cout_index="cin" ... ) >>> >>> # Output on flow index >>> cout_flow = infiltration_to_extraction( ... cin, flow, aquifer_pore_volume, cout_index="flow" ... ) >>> >>> # Output on shifted time index (cin + residence_time) >>> cout_shifted = infiltration_to_extraction( ... cin, flow, aquifer_pore_volume, cout_index="cout" ... )
With retardation factor:
>>> # Compound moves twice as slowly due to sorption >>> cout = infiltration_to_extraction( ... cin, flow, aquifer_pore_volume, retardation_factor=2.0 ... )
- gwtransport.advection.extraction_to_infiltration(cout, flow, aquifer_pore_volume, retardation_factor=1.0, resample_dates=None)[source]#
Compute the concentration of the infiltrating water by shifting cout with its residence time.
This function represents extraction to infiltration modeling (equivalent to deconvolution).
- Parameters:
cout (pandas.Series) – Concentration of the compound in the extracted water [ng/m3].
flow (pandas.Series) – Flow rate of water in the aquifer [m3/day].
aquifer_pore_volume (float) – Pore volume of the aquifer [m3].
- Returns:
Concentration of the compound in the infiltrating water [ng/m3].
- Return type:
- gwtransport.advection.gamma_infiltration_to_extraction(*, cin, flow, tedges, cout_tedges, alpha=None, beta=None, mean=None, std=None, n_bins=100, retardation_factor=1.0)[source]#
Compute the concentration of the extracted water by shifting cin with its residence time.
The compound is retarded in the aquifer with a retardation factor. The residence time is computed based on the flow rate of the water in the aquifer and the pore volume of the aquifer. The aquifer pore volume is approximated by a gamma distribution, with parameters alpha and beta.
This function represents infiltration to extraction modeling (equivalent to convolution).
Provide either alpha and beta or mean and std.
- Parameters:
cin (pandas.Series) – Concentration of the compound in infiltrating water or temperature of infiltrating water.
cin_tedges (pandas.DatetimeIndex) – Time edges for the concentration data. Used to compute the cumulative concentration. Has a length of one more than cin.
cout_tedges (pandas.DatetimeIndex) – Time edges for the output data. Used to compute the cumulative concentration. Has a length of one more than flow.
flow (pandas.Series) – Flow rate of water in the aquifer [m3/day].
flow_tedges (pandas.DatetimeIndex) – Time edges for the flow data. Used to compute the cumulative flow. Has a length of one more than flow.
alpha (float, optional) – Shape parameter of gamma distribution of the aquifer pore volume (must be > 0)
beta (float, optional) – Scale parameter of gamma distribution of the aquifer pore volume (must be > 0)
mean (float, optional) – Mean of the gamma distribution.
std (float, optional) – Standard deviation of the gamma distribution.
n_bins (int) – Number of bins to discretize the gamma distribution.
retardation_factor (float) – Retardation factor of the compound in the aquifer.
- Returns:
Concentration of the compound in the extracted water [ng/m3] or temperature.
- Return type:
Examples
Basic usage with alpha and beta parameters:
>>> import pandas as pd >>> import numpy as np >>> from gwtransport.utils import compute_time_edges >>> from gwtransport.advection import gamma_infiltration_to_extraction >>> >>> # Create input data with aligned time edges >>> dates = pd.date_range(start="2020-01-01", end="2020-01-20", freq="D") >>> tedges = compute_time_edges( ... tedges=None, tstart=None, tend=dates, number_of_bins=len(dates) ... ) >>> >>> # Create output time edges (can be different alignment) >>> cout_dates = pd.date_range(start="2020-01-05", end="2020-01-15", freq="D") >>> cout_tedges = compute_time_edges( ... tedges=None, tstart=None, tend=cout_dates, number_of_bins=len(cout_dates) ... ) >>> >>> # Input concentration and flow (same length, aligned with tedges) >>> cin = pd.Series(np.ones(len(dates)), index=dates) >>> flow = pd.Series(np.ones(len(dates)) * 100, index=dates) # 100 m3/day >>> >>> # Run gamma_infiltration_to_extraction with alpha/beta parameters >>> cout = gamma_infiltration_to_extraction( ... cin=cin, ... cin_tedges=tedges, ... cout_tedges=cout_tedges, ... flow=flow, ... flow_tedges=tedges, # Must be identical to cin_tedges ... alpha=10.0, ... beta=10.0, ... n_bins=5, ... ) >>> cout.shape (11,)
Using mean and std parameters instead:
>>> cout = gamma_infiltration_to_extraction( ... cin=cin, ... cin_tedges=tedges, ... cout_tedges=cout_tedges, ... flow=flow, ... flow_tedges=tedges, ... mean=100.0, ... std=20.0, ... n_bins=5, ... )
With retardation factor:
>>> cout = gamma_infiltration_to_extraction( ... cin=cin, ... cin_tedges=tedges, ... cout_tedges=cout_tedges, ... flow=flow, ... flow_tedges=tedges, ... alpha=10.0, ... beta=10.0, ... retardation_factor=2.0, # Doubles residence time ... )
- gwtransport.advection.gamma_extraction_to_infiltration(cout, flow, alpha, beta, n_bins=100, retardation_factor=1.0)[source]#
Compute the concentration of the infiltrating water by shifting cout with its residence time.
This function represents extraction to infiltration modeling (equivalent to deconvolution).
- Parameters:
cout (pandas.Series) – Concentration of the compound in the extracted water [ng/m3].
flow (pandas.Series) – Flow rate of water in the aquifer [m3/day].
alpha (float) – Shape parameter of gamma distribution of the aquifer pore volume (must be > 0)
beta (float) – Scale parameter of gamma distribution of the aquifer pore volume (must be > 0)
n_bins (int) – Number of bins to discretize the gamma distribution.
retardation_factor (float) – Retardation factor of the compound in the aquifer.
- Returns:
This function is not yet implemented.
- Return type:
- gwtransport.advection.distribution_infiltration_to_extraction(*, cin, flow, tedges, cout_tedges, aquifer_pore_volumes, retardation_factor=1.0)[source]#
Compute the concentration of the extracted water using flow-weighted advection.
This function implements an infiltration to extraction advection model where cin and flow values correspond to the same aligned time bins defined by tedges.
The algorithm: 1. Computes residence times for each pore volume at cout time edges 2. Calculates infiltration time edges by subtracting residence times 3. Determines temporal overlaps between infiltration and cin time windows 4. Creates flow-weighted overlap matrices normalized by total weights 5. Computes weighted contributions and averages across pore volumes
- Parameters:
cin (array-like) – Concentration values of infiltrating water or temperature [concentration units]. Length must match the number of time bins defined by tedges.
flow (array-like) – Flow rate values in the aquifer [m3/day]. Length must match cin and the number of time bins defined by tedges.
tedges (pandas.DatetimeIndex) – Time edges defining bins for both cin and flow data. Has length of len(cin) + 1 and len(flow) + 1.
cout_tedges (pandas.DatetimeIndex) – Time edges for output data bins. Has length of desired output + 1. Can have different time alignment and resolution than tedges.
aquifer_pore_volumes (array-like) – Array of aquifer pore volumes [m3] representing the distribution of residence times in the aquifer system.
retardation_factor (float, optional) – Retardation factor of the compound in the aquifer (default 1.0). Values > 1.0 indicate slower transport due to sorption/interaction.
- Returns:
Flow-weighted concentration in the extracted water. Same units as cin. Length equals len(cout_tedges) - 1. NaN values indicate time periods with no valid contributions from the infiltration data.
- Return type:
- Raises:
ValueError – If tedges length doesn’t match cin/flow arrays plus one, or if infiltration time edges become non-monotonic (invalid input conditions).
Examples
Basic usage with pandas Series:
>>> import pandas as pd >>> import numpy as np >>> from gwtransport.utils import compute_time_edges >>> from gwtransport.advection import distribution_infiltration_to_extraction >>> >>> # Create input data >>> dates = pd.date_range(start="2020-01-01", end="2020-01-20", freq="D") >>> tedges = compute_time_edges( ... tedges=None, tstart=None, tend=dates, number_of_bins=len(dates) ... ) >>> >>> # Create output time edges (different alignment) >>> cout_dates = pd.date_range(start="2020-01-05", end="2020-01-15", freq="D") >>> cout_tedges = compute_time_edges( ... tedges=None, tstart=None, tend=cout_dates, number_of_bins=len(cout_dates) ... ) >>> >>> # Input concentration and flow >>> cin = pd.Series(np.ones(len(dates)), index=dates) >>> flow = pd.Series(np.ones(len(dates)) * 100, index=dates) # 100 m3/day >>> >>> # Define distribution of aquifer pore volumes >>> aquifer_pore_volumes = np.array([50, 100, 200]) # m3 >>> >>> # Run distribution_infiltration_to_extraction >>> cout = distribution_infiltration_to_extraction( ... cin=cin, ... flow=flow, ... tedges=tedges, ... cout_tedges=cout_tedges, ... aquifer_pore_volumes=aquifer_pore_volumes, ... ) >>> cout.shape (11,)
Using array inputs instead of pandas Series:
>>> # Convert to arrays >>> cin_values = cin.values >>> flow_values = flow.values >>> >>> cout = distribution_infiltration_to_extraction( ... cin=cin_values, ... flow=flow_values, ... tedges=tedges, ... cout_tedges=cout_tedges, ... aquifer_pore_volumes=aquifer_pore_volumes, ... )
With retardation factor:
>>> cout = distribution_infiltration_to_extraction( ... cin=cin, ... flow=flow, ... tedges=tedges, ... cout_tedges=cout_tedges, ... aquifer_pore_volumes=aquifer_pore_volumes, ... retardation_factor=2.0, # Compound moves twice as slowly ... )
Using single pore volume:
>>> single_volume = np.array([100]) # Single 100 m3 pore volume >>> cout = distribution_infiltration_to_extraction( ... cin=cin, ... flow=flow, ... tedges=tedges, ... cout_tedges=cout_tedges, ... aquifer_pore_volumes=single_volume, ... )
- gwtransport.advection.distribution_extraction_to_infiltration(*, cout, flow, tedges, cin_tedges, aquifer_pore_volumes, retardation_factor=1.0)[source]#
Compute the concentration of the infiltrating water from extracted water (deconvolution).
This function implements an extraction to infiltration advection model (inverse of distribution_infiltration_to_extraction) where cout and flow values correspond to the same aligned time bins defined by tedges.
SYMMETRIC RELATIONSHIP: - distribution_infiltration_to_extraction: cin + tedges → cout + cout_tedges - distribution_extraction_to_infiltration: cout + tedges → cin + cin_tedges
The algorithm (symmetric to distribution_infiltration_to_extraction): 1. Computes residence times for each pore volume at cint time edges 2. Calculates extraction time edges by adding residence times (reverse of infiltration_to_extraction) 3. Determines temporal overlaps between extraction and cout time windows 4. Creates flow-weighted overlap matrices normalized by total weights 5. Computes weighted contributions and averages across pore volumes
- Parameters:
cout (array-like) – Concentration values of extracted water [concentration units]. Length must match the number of time bins defined by tedges.
flow (array-like) – Flow rate values in the aquifer [m3/day]. Length must match cout and the number of time bins defined by tedges.
tedges (pandas.DatetimeIndex) – Time edges defining bins for both cout and flow data. Has length of len(cout) + 1 and len(flow) + 1.
cin_tedges (pandas.DatetimeIndex) – Time edges for output (infiltration) data bins. Has length of desired output + 1. Can have different time alignment and resolution than tedges.
aquifer_pore_volumes (array-like) – Array of aquifer pore volumes [m3] representing the distribution of residence times in the aquifer system.
retardation_factor (float, optional) – Retardation factor of the compound in the aquifer (default 1.0). Values > 1.0 indicate slower transport due to sorption/interaction.
- Returns:
Flow-weighted concentration in the infiltrating water. Same units as cout. Length equals len(cin_tedges) - 1. NaN values indicate time periods with no valid contributions from the extraction data.
- Return type:
- Raises:
ValueError – If tedges length doesn’t match cout/flow arrays plus one, or if extraction time edges become non-monotonic (invalid input conditions).
Examples
Basic usage with pandas Series:
>>> import pandas as pd >>> import numpy as np >>> from gwtransport.utils import compute_time_edges >>> from gwtransport.advection import distribution_extraction_to_infiltration >>> >>> # Create input data >>> dates = pd.date_range(start="2020-01-01", end="2020-01-20", freq="D") >>> tedges = compute_time_edges( ... tedges=None, tstart=None, tend=dates, number_of_bins=len(dates) ... ) >>> >>> # Create output time edges (different alignment) >>> cint_dates = pd.date_range(start="2019-12-25", end="2020-01-15", freq="D") >>> cin_tedges = compute_time_edges( ... tedges=None, tstart=None, tend=cint_dates, number_of_bins=len(cint_dates) ... ) >>> >>> # Input concentration and flow >>> cout = pd.Series(np.ones(len(dates)), index=dates) >>> flow = pd.Series(np.ones(len(dates)) * 100, index=dates) # 100 m3/day >>> >>> # Define distribution of aquifer pore volumes >>> aquifer_pore_volumes = np.array([50, 100, 200]) # m3 >>> >>> # Run distribution_extraction_to_infiltration >>> cin = distribution_extraction_to_infiltration( ... cout=cout, ... flow=flow, ... tedges=tedges, ... cin_tedges=cin_tedges, ... aquifer_pore_volumes=aquifer_pore_volumes, ... ) >>> cin.shape (22,)
Using array inputs instead of pandas Series:
>>> # Convert to arrays >>> cout_values = cout.values >>> flow_values = flow.values >>> >>> cin = distribution_extraction_to_infiltration( ... cout=cout_values, ... flow=flow_values, ... tedges=tedges, ... cin_tedges=cin_tedges, ... aquifer_pore_volumes=aquifer_pore_volumes, ... )
With retardation factor:
>>> cin = distribution_extraction_to_infiltration( ... cout=cout, ... flow=flow, ... tedges=tedges, ... cin_tedges=cin_tedges, ... aquifer_pore_volumes=aquifer_pore_volumes, ... retardation_factor=2.0, # Compound moves twice as slowly ... )
Using single pore volume:
>>> single_volume = np.array([100]) # Single 100 m3 pore volume >>> cin = distribution_extraction_to_infiltration( ... cout=cout, ... flow=flow, ... tedges=tedges, ... cin_tedges=cin_tedges, ... aquifer_pore_volumes=single_volume, ... )
deposition#
Deposition Analysis for 1D Aquifer Systems.
This module provides functions to analyze compound deposition and concentration in aquifer systems. It includes tools for computing deposition rates, concentration changes, and related coefficients based on extraction data and aquifer properties.
The model assumes requires the groundwaterflow to be reduced to a 1D system. On one side, water with a certain concentration infiltrates (‘cin’), the water flows through the aquifer and the compound of interest flows through the aquifer with a retarded velocity. During the soil passage, deposition enriches the water with the compound. The water is extracted (‘cout’) and the concentration increase due to deposition is called ‘dcout’.
Main functions: - extraction_to_infiltration: Calculate deposition rates from extraction data (extraction to infiltration modeling, equivalent to deconvolution) - infiltration_to_extraction: Determine concentration changes due to deposition (infiltration to extraction modeling, equivalent to convolution) - deposition_coefficients: Generate coefficients for deposition modeling
- gwtransport.deposition.extraction_to_infiltration(cout, flow, aquifer_pore_volume, porosity, thickness, retardation_factor, nullspace_objective='squared_lengths')[source]#
Compute the deposition given the added concentration of the compound in the extracted water.
- Parameters:
cout (pandas.Series) – Concentration of the compound in the extracted water [ng/m3].
flow (pandas.Series) – Flow rate of water in the aquifer [m3/day].
aquifer_pore_volume (float) – Pore volume of the aquifer [m3].
porosity (float) – Porosity of the aquifer [dimensionless].
thickness (float) – Thickness of the aquifer [m].
retardation_factor (float) – Retardation factor of the compound in the aquifer [dimensionless].
nullspace_objective (str or callable, optional) – Objective to minimize in the nullspace. If a string, it should be either “squared_lengths” or “summed_lengths”. If a callable, it should take the form objective(x, xLS, colsOfNullspace). Default is “squared_lengths”.
- Returns:
Deposition of the compound in the aquifer [ng/m2/day].
- Return type:
- gwtransport.deposition.infiltration_to_extraction(dcout_index, deposition, flow, aquifer_pore_volume, porosity, thickness, retardation_factor)[source]#
Compute the increase in concentration of the compound in the extracted water by the deposition.
This function represents infiltration to extraction modeling (equivalent to convolution).
- Parameters:
dcout_index (pandas.Series) – Concentration of the compound in the extracted water [ng/m3].
deposition (pandas.Series) – Deposition of the compound in the aquifer [ng/m2/day].
flow (pandas.Series) – Flow rate of water in the aquifer [m3/day].
aquifer_pore_volume (float) – Pore volume of the aquifer [m3].
porosity (float) – Porosity of the aquifer [dimensionless].
thickness (float) – Thickness of the aquiifer [m].
- Returns:
Concentration of the compound in the extracted water [ng/m3].
- Return type:
- gwtransport.deposition.deposition_coefficients(dcout_index, flow, aquifer_pore_volume, porosity, thickness, retardation_factor)[source]#
Compute the coefficients of the deposition model.
- Parameters:
dcout_index (pandas.Series) – Concentration of the compound in the extracted water [ng/m3].
flow (pandas.Series) – Flow rate of water in the aquifer [m3/day].
aquifer_pore_volume (float) – Pore volume of the aquifer [m3].
porosity (float) – Porosity of the aquifer [dimensionless].
thickness (float) – Thickness of the aquifer [m].
retardation_factor (float) – Retardation factor of the compound in the aquifer [dimensionless].
- Returns:
numpy.ndarray – Coefficients of the deposition model [m2/day].
pandas.DataFrame – Dataframe containing the residence time of the retarded compound in the aquifer [days].
pandas.DatetimeIndex – Datetime index of the deposition.
- gwtransport.deposition.dcout_date_range_from_dcout_index(dcout_index)[source]#
Compute the date range of the concentration of the compound in the extracted water.
- Parameters:
dcout_index (pandas.DatetimeIndex) – Index of the concentration of the compound in the extracted water.
- Returns:
Date range of the concentration of the compound in the extracted water.
- Return type:
- gwtransport.deposition.deposition_index_from_dcout_index(dcout_index, flow, aquifer_pore_volume, retardation_factor)[source]#
Compute the index of the deposition from the concentration of the compound in the extracted water index.
Creates and index for each day, starting at the first day that contributes to the first dcout measurement, and ending at the last day that contributes to the last dcout measurement. The times are floored to the beginning of the day.
- Parameters:
dcout_index (pandas.DatetimeIndex) – Index of the concentration of the compound in the extracted water.
flow (pandas.Series) – Flow rate of water in the aquifer [m3/day].
aquifer_pore_volume (float) – Pore volume of the aquifer [m3].
retardation_factor (float) – Retardation factor of the compound in the aquifer [dimensionless].
- Returns:
Index of the deposition.
- Return type:
diffusion#
Module that implements diffusion.
- gwtransport.diffusion.infiltration_to_extraction(cin, flow, aquifer_pore_volume, diffusivity=0.1, retardation_factor=1.0, aquifer_length=80.0, porosity=0.35)[source]#
Compute the diffusion of a compound during 1D transport in the aquifer.
This function represents infiltration to extraction modeling (equivalent to convolution).
- Parameters:
cin (pandas.Series) – Concentration or temperature of the compound in the infiltrating water [ng/m3].
flow (pandas.Series) – Flow rate of water in the aquifer [m3/day].
aquifer_pore_volume (float) – Pore volume of the aquifer [m3].
diffusivity (float, optional) – diffusivity of the compound in the aquifer [m2/day]. Default is 0.1.
retardation_factor (float, optional) – Retardation factor of the compound in the aquifer [dimensionless]. Default is 1.0.
aquifer_length (float, optional) – Length of the aquifer [m]. Default is 80.0.
porosity (float, optional) – Porosity of the aquifer [dimensionless]. Default is 0.35.
- Returns:
Concentration of the compound in the extracted water [ng/m3].
- Return type:
- gwtransport.diffusion.extraction_to_infiltration(cout, flow, aquifer_pore_volume, diffusivity=0.1, retardation_factor=1.0, aquifer_length=80.0, porosity=0.35)[source]#
Compute the reverse diffusion of a compound during 1D transport in the aquifer.
This function represents extraction to infiltration modeling (equivalent to deconvolution).
- Parameters:
cout (pandas.Series) – Concentration or temperature of the compound in the extracted water [ng/m3].
flow (pandas.Series) – Flow rate of water in the aquifer [m3/day].
aquifer_pore_volume (float) – Pore volume of the aquifer [m3].
diffusivity (float, optional) – diffusivity of the compound in the aquifer [m2/day]. Default is 0.1.
retardation_factor (float, optional) – Retardation factor of the compound in the aquifer [dimensionless]. Default is 1.0.
aquifer_length (float, optional) – Length of the aquifer [m]. Default is 80.0.
porosity (float, optional) – Porosity of the aquifer [dimensionless]. Default is 0.35.
- Returns:
Concentration of the compound in the infiltrating water [ng/m3].
- Return type:
Notes
Extraction to infiltration diffusion (deconvolution) is mathematically ill-posed and requires regularization to obtain a stable solution.
- gwtransport.diffusion.compute_sigma_array(flow, aquifer_pore_volume, diffusivity=0.1, retardation_factor=1.0, aquifer_length=80.0, porosity=0.35)[source]#
Compute sigma values for diffusion based on flow and aquifer properties.
- Parameters:
flow (pandas.Series) – Flow rate of water in the aquifer [m3/day].
aquifer_pore_volume (float) – Pore volume of the aquifer [m3].
diffusivity (float, optional) – diffusivity of the compound in the aquifer [m2/day]. Default is 0.1.
retardation_factor (float, optional) – Retardation factor of the compound in the aquifer [dimensionless]. Default is 1.0.
aquifer_length (float, optional) – Length of the aquifer [m]. Default is 80.0.
porosity (float, optional) – Porosity of the aquifer [dimensionless]. Default is 0.35.
- Returns:
Array of sigma values for diffusion.
- Return type:
array
- gwtransport.diffusion.convolve_diffusion(input_signal, sigma_array, truncate=4.0)[source]#
Apply Gaussian filter with position-dependent sigma values.
This function extends scipy.ndimage.gaussian_filter1d by allowing the standard deviation (sigma) of the Gaussian kernel to vary at each point in the signal. It implements the filter using a sparse convolution matrix where each row represents a Gaussian kernel with a locally-appropriate standard deviation.
- Parameters:
input_signal (ndarray) – One-dimensional input array to be filtered.
sigma_array (ndarray) – One-dimensional array of standard deviation values, must have same length as input_signal. Each value specifies the Gaussian kernel width at the corresponding position.
truncate (float, optional) – Truncate the filter at this many standard deviations. Default is 4.0.
- Returns:
The filtered input signal. Has the same shape as input_signal.
- Return type:
ndarray
Notes
At the boundaries, the outer values are repeated to avoid edge effects. Equal to mode=`nearest` in scipy.ndimage.gaussian_filter1d.
The function constructs a sparse convolution matrix where each row represents a position-specific Gaussian kernel. The kernel width adapts to local sigma values, making it suitable for problems with varying diffusivitys or time steps.
For diffusion problems, the local sigma values can be calculated as: sigma = sqrt(2 * diffusivity * dt) / dx where diffusivity is the diffusivity, dt is the time step, and dx is the spatial step size.
The implementation uses sparse matrices for memory efficiency when dealing with large signals or when sigma values vary significantly.
See also
scipy.ndimage.gaussian_filter1d
Fixed-sigma Gaussian filtering
scipy.sparse
Sparse matrix implementations
Examples
>>> # Create a sample signal >>> x = np.linspace(0, 10, 1000) >>> signal = np.exp(-((x - 3) ** 2)) + 0.5 * np.exp(-((x - 7) ** 2) / 0.5)
>>> # Create position-dependent sigma values >>> diffusivity = 0.1 # diffusivity >>> dt = 0.001 * (1 + np.sin(2 * np.pi * x / 10)) # Varying time steps >>> dx = x[1] - x[0] >>> sigma_array = np.sqrt(2 * diffusivity * dt) / dx
>>> # Apply the filter >>> filtered = convolve_diffusion(signal, sigma_array)
- gwtransport.diffusion.deconvolve_diffusion(output_signal, sigma_array, truncate=4.0)[source]#
Apply Gaussian deconvolution with position-dependent sigma values.
This function extends scipy.ndimage.gaussian_filter1d by allowing the standard deviation (sigma) of the Gaussian kernel to vary at each point in the signal. It implements the filter using a sparse convolution matrix where each row represents a Gaussian kernel with a locally-appropriate standard deviation.
- Parameters:
output_signal (ndarray) – One-dimensional input array to be filtered.
sigma_array (ndarray) – One-dimensional array of standard deviation values, must have same length as output_signal. Each value specifies the Gaussian kernel width at the corresponding position.
truncate (float, optional) – Truncate the filter at this many standard deviations. Default is 4.0.
- Returns:
The filtered output signal. Has the same shape as output_signal.
- Return type:
ndarray
- gwtransport.diffusion.create_example_data(nx=1000, domain_length=10.0, diffusivity=0.1)[source]#
Create example data for demonstrating variable-sigma diffusion.
- Parameters:
- Returns:
x (ndarray) – Spatial coordinates.
signal (ndarray) – Initial signal (sum of two Gaussians).
sigma_array (ndarray) – Array of sigma values varying in space.
dt (ndarray) – Array of time steps varying in space.
Notes
This function creates a test case with: - A signal composed of two Gaussian peaks - Sinusoidally varying time steps - Corresponding sigma values for diffusion
gamma#
Functions for working with gamma distributions.
- gwtransport.gamma.parse_parameters(alpha=None, beta=None, mean=None, std=None)[source]#
Parse parameters for gamma distribution.
Either alpha and beta or mean and std must be provided.
- Parameters:
- Returns:
Shape and scale parameters of the gamma distribution.
- Return type:
- Raises:
ValueError – If both alpha and beta are None or if both mean and std are None. If alpha or beta are not positive.
- gwtransport.gamma.mean_std_to_alpha_beta(mean, std)[source]#
Convert mean and standard deviation of gamma distribution to shape and scale parameters.
- gwtransport.gamma.alpha_beta_to_mean_std(alpha, beta)[source]#
Convert shape and scale parameters of gamma distribution to mean and standard deviation.
- gwtransport.gamma.bins(*, alpha=None, beta=None, mean=None, std=None, n_bins=None, quantile_edges=None)[source]#
Divide gamma distribution into bins and compute various bin properties.
If n_bins is provided, the gamma distribution is divided into n_bins equal-mass bins. If quantile_edges is provided, the gamma distribution is divided into bins defined by the quantile edges. The quantile edges must be in the range [0, 1] and of size n_bins + 1. The first and last quantile edges must be 0 and 1, respectively.
- Parameters:
alpha (float, optional) – Shape parameter of gamma distribution (must be > 0)
beta (float, optional) – Scale parameter of gamma distribution (must be > 0)
mean (float, optional) – Mean of the gamma distribution.
std (float, optional) – Standard deviation of the gamma distribution.
n_bins (int, optional) – Number of bins to divide the gamma distribution (must be > 1)
quantile_edges (array-like, optional) – Quantile edges for binning. Must be in the range [0, 1] and of size n_bins + 1. The first and last quantile edges must be 0 and 1, respectively. If provided, n_bins is ignored.
- Returns:
lower_bound: lower bounds of bins (first one is 0)
upper_bound: upper bounds of bins (last one is inf)
edges: bin edges (lower_bound[0], upper_bound[0], …, upper_bound[-1])
expected_value: expected values in bins. Is what you would expect to observe if you repeatedly sampled from the probability distribution, but only considered samples that fall within that particular bin
probability_mass: probability mass in bins
- Return type:
dict of arrays with keys
logremoval#
Functions for calculating log removal efficiency in water treatment systems.
This module provides utilities to calculate log removal values for different configurations of water treatment systems, including both basic log removal calculations and parallel flow arrangements where multiple treatment processes operate simultaneously on different fractions of the total flow.
Log removal is a standard measure in water treatment that represents the reduction of pathogen concentration on a logarithmic scale. For example, a log removal of 3 represents a 99.9% reduction in pathogen concentration.
Functions#
residence_time_to_log_removal : Calculate log removal from residence times and removal rate parallel_mean : Calculate weighted average log removal for parallel flow systems gamma_pdf : Compute PDF of log removal given gamma-distributed residence time gamma_cdf : Compute CDF of log removal given gamma-distributed residence time gamma_mean : Compute mean log removal for gamma-distributed residence time gamma_find_flow_for_target_mean : Find flow rate for target mean log removal
Notes
For systems in series, log removals are typically summed directly, while for parallel systems, a weighted average based on flow distribution is required. The parallel_mean function supports multi-dimensional arrays via the axis parameter and performs minimal validation for improved performance.
- gwtransport.logremoval.residence_time_to_log_removal(residence_times, log_removal_rate)[source]#
Compute log removal given residence times and a log removal rate.
This function calculates the log removal efficiency based on the residence times of water in a treatment system and the log removal rate coefficient.
The calculation uses the formula: Log Removal = log_removal_rate * log10(residence_time)
- Parameters:
residence_times (array_like) – Array of residence times (in consistent units, e.g., hours, days). Must be positive values.
log_removal_rate (float) – Log removal rate coefficient that relates residence time to log removal efficiency. Units should be consistent with residence_times.
- Returns:
log_removals – Array of log removal values corresponding to the input residence times. Same shape as input residence_times.
- Return type:
ndarray
Notes
Log removal is a logarithmic measure of pathogen reduction: - Log 1 = 90% reduction - Log 2 = 99% reduction - Log 3 = 99.9% reduction
The log removal rate coefficient determines how effectively the treatment system removes pathogens per unit log time.
Examples
>>> import numpy as np >>> residence_times = np.array([1.0, 10.0, 100.0]) >>> log_removal_rate = 2.0 >>> residence_time_to_log_removal(residence_times, log_removal_rate) array([0. , 2. , 4. ])
>>> # Single residence time >>> residence_time_to_log_removal(5.0, 1.5) 1.0484550065040283
>>> # 2D array of residence times >>> residence_times_2d = np.array([[1.0, 10.0], [100.0, 1000.0]]) >>> residence_time_to_log_removal(residence_times_2d, 1.0) array([[0., 1.], [2., 3.]])
- gwtransport.logremoval.parallel_mean(log_removals, flow_fractions=None, axis=None)[source]#
Calculate the weighted average log removal for a system with parallel flows.
This function computes the overall log removal efficiency of a parallel filtration system. If flow_fractions is not provided, it assumes equal distribution of flow across all paths.
The calculation uses the formula:
Total Log Removal = -log10(sum(F_i * 10^(-LR_i)))
Where: - F_i = fraction of flow through system i (decimal, sum to 1.0) - LR_i = log removal of system i
- Parameters:
log_removals (array_like) – Array of log removal values for each parallel flow. Each value represents the log10 reduction of pathogens. For multi-dimensional arrays, the parallel mean is computed along the specified axis.
flow_fractions (array_like, optional) – Array of flow fractions for each parallel flow. Must sum to 1.0 along the specified axis and have compatible shape with log_removals. If None, equal flow distribution is assumed (default is None).
axis (int, optional) – Axis along which to compute the parallel mean for multi-dimensional arrays. If None, the array is treated as 1-dimensional (default is None).
- Returns:
The combined log removal value for the parallel system. If log_removals is multi-dimensional and axis is specified, returns an array with the specified axis removed.
- Return type:
float or array_like
Notes
This function performs minimal input validation to reduce complexity. NumPy will handle most error cases naturally through broadcasting and array operations.
Notes
Log removal is a logarithmic measure of pathogen reduction: - Log 1 = 90% reduction - Log 2 = 99% reduction - Log 3 = 99.9% reduction
For parallel flows, the combined removal is typically less effective than the best individual removal but better than the worst.
Examples
>>> import numpy as np >>> # Three parallel streams with equal flow and log removals of 3, 4, and 5 >>> log_removals = np.array([3, 4, 5]) >>> parallel_mean(log_removals) 3.431798275933005
>>> # Two parallel streams with weighted flow >>> log_removals = np.array([3, 5]) >>> flow_fractions = np.array([0.7, 0.3]) >>> parallel_mean(log_removals, flow_fractions) 3.153044674980176
>>> # Multi-dimensional array: parallel mean along axis 1 >>> log_removals_2d = np.array([[3, 4, 5], [2, 3, 4]]) >>> parallel_mean(log_removals_2d, axis=1) array([3.43179828, 2.43179828])
See also
For
,log
- gwtransport.logremoval.gamma_pdf(r, rt_alpha, rt_beta, log_removal_rate)[source]#
Compute the probability density function (PDF) of log removal given a gamma distribution for the residence time.
gamma(rt_alpha, rt_beta) = gamma(apv_alpha, apv_beta / flow)
- Parameters:
r (array_like) – Log removal values at which to compute the PDF.
rt_alpha (float) – Shape parameter of the gamma distribution for residence time.
rt_beta (float) – Scale parameter of the gamma distribution for residence time.
log_removal_rate (float) – Coefficient for log removal calculation (R = log_removal_rate * log10(T)).
- Returns:
pdf_values – PDF values corresponding to the input r values.
- Return type:
ndarray
- gwtransport.logremoval.gamma_cdf(r, rt_alpha, rt_beta, log_removal_rate)[source]#
Compute the cumulative distribution function (CDF) of log removal given a gamma distribution for the residence time.
gamma(rt_alpha, rt_beta) = gamma(apv_alpha, apv_beta / flow)
- Parameters:
r (array_like) – Log removal values at which to compute the CDF.
alpha (float) – Shape parameter of the gamma distribution for residence time.
beta (float) – Scale parameter of the gamma distribution for residence time.
log_removal_rate (float) – Coefficient for log removal calculation (R = log_removal_rate * log10(T)).
- Returns:
cdf_values – CDF values corresponding to the input r values.
- Return type:
ndarray
- gwtransport.logremoval.gamma_mean(rt_alpha, rt_beta, log_removal_rate)[source]#
Compute the mean of the log removal distribution given a gamma distribution for the residence time.
gamma(rt_alpha, rt_beta) = gamma(apv_alpha, apv_beta / flow)
- Parameters:
- Returns:
mean – Mean value of the log removal distribution.
- Return type:
- gwtransport.logremoval.gamma_find_flow_for_target_mean(target_mean, apv_alpha, apv_beta, log_removal_rate)[source]#
Find the flow rate flow that produces a specified target mean log removal given a gamma distribution for the residence time.
gamma(rt_alpha, rt_beta) = gamma(apv_alpha, apv_beta / flow)
- Parameters:
target_mean (float) – Target mean log removal value.
apv_alpha (float) – Shape parameter of the gamma distribution for residence time.
apv_beta (float) – Scale parameter of the gamma distribution for pore volume.
log_removal_rate (float) – Coefficient for log removal calculation (R = log_removal_rate * log10(T)).
- Returns:
flow – Flow rate that produces the target mean log removal.
- Return type:
Notes
This function uses the analytical solution derived from the mean formula. From E[R] = (log_removal_rate/ln(10)) * (digamma(alpha) + ln(beta) - ln(Q)), we can solve for Q to get: flow = beta * exp(ln(10)*target_mean/log_removal_rate - digamma(alpha))
residence_time#
Residence time of a compound in the aquifer.
This module provides functions to compute the residence time of a compound in the aquifer. The residence time is the time it takes for the compound to travel from the infiltration point to the extraction point. The compound is retarded in the aquifer with a retardation factor.
Main functions:
residence_time: Compute the residence time of a retarded compound in the aquifer at indices.
residence_time_mean: Compute the mean residence time of a retarded compound in the aquifer between specified time edges.
- gwtransport.residence_time.residence_time(flow=None, flow_tedges=None, aquifer_pore_volume=None, index=None, retardation_factor=1.0, direction='extraction_to_infiltration', *, return_pandas_series=False)[source]#
Compute the residence time of retarded compound in the water in the aquifer.
- Parameters:
flow (array-like) – Flow rate of water in the aquifer [m3/day]. The length of flow should match the length of flow_tedges minus one.
flow_tedges (pandas.DatetimeIndex) – Time edges for the flow data. Used to compute the cumulative flow. Has a length of one more than flow.
aquifer_pore_volume (float or array-like of float) – Pore volume of the aquifer [m3].
index (pandas.DatetimeIndex, optional) – Index at which to compute the residence time. If left to None, the index of flow is used. Default is None.
retardation_factor (float) – Retardation factor of the compound in the aquifer [dimensionless].
direction ({'extraction_to_infiltration', 'infiltration_to_extraction'}, optional) – Direction of the flow calculation: * ‘extraction_to_infiltration’: Extraction to infiltration modeling - how many days ago was the extracted water infiltrated * ‘infiltration_to_extraction’: Infiltration to extraction modeling - how many days until the infiltrated water is extracted Default is ‘extraction_to_infiltration’.
return_pandas_series (bool, optional) – If True, return a pandas Series with the residence time at the index provided. Only supported for a single aquifer pore volume. This parameter is deprecated and will be removed in a future version.
- Returns:
Residence time of the retarded compound in the aquifer [days].
- Return type:
- gwtransport.residence_time.residence_time_mean(flow, flow_tedges, tedges_out, aquifer_pore_volume, *, direction='extraction_to_infiltration', retardation_factor=1.0)[source]#
Compute the mean residence time of a retarded compound in the aquifer between specified time edges.
This function calculates the average residence time of a retarded compound in the aquifer between specified time intervals. It can compute both extraction to infiltration modeling (extraction direction: when was extracted water infiltrated) and infiltration to extraction modeling (infiltration direction: when will infiltrated water be extracted).
The function handles time series data by computing the cumulative flow and using linear interpolation and averaging to determine mean residence times between the specified time edges.
- Parameters:
flow (array-like) – Flow rate of water in the aquifer [m3/day]. Should be an array of flow values corresponding to the intervals defined by flow_tedges.
flow_tedges (array-like) – Time edges for the flow data, as datetime64 objects. These define the time intervals for which the flow values are provided.
tedges_out (array-like) – Output time edges as datetime64 objects. These define the intervals for which the mean residence times will be calculated.
aquifer_pore_volume (float or array-like) – Pore volume of the aquifer [m3]. Can be a single value or an array of values for multiple pore volume scenarios.
direction ({'extraction_to_infiltration', 'infiltration_to_extraction'}, optional) – Direction of the flow calculation: * ‘extraction_to_infiltration’: Extraction to infiltration modeling - how many days ago was the extracted water infiltrated * ‘infiltration_to_extraction’: Infiltration to extraction modeling - how many days until the infiltrated water is extracted Default is ‘extraction_to_infiltration’.
retardation_factor (float, optional) – Retardation factor of the compound in the aquifer [dimensionless]. A value greater than 1.0 indicates that the compound moves slower than water. Default is 1.0 (no retardation).
- Returns:
Mean residence time of the retarded compound in the aquifer [days] for each interval defined by tedges_out. The first dimension corresponds to the different pore volumes and the second to the residence times between tedges_out.
- Return type:
Notes
The function converts datetime objects to days since the start of the time series.
For extraction_to_infiltration direction, the function computes how many days ago water was infiltrated.
For infiltration_to_extraction direction, the function computes how many days until water will be extracted.
The function uses linear interpolation for computing residence times at specific points and linear averaging for computing mean values over intervals.
Examples
>>> import pandas as pd >>> import numpy as np >>> # Create sample flow data >>> flow_dates = pd.date_range(start="2023-01-01", end="2023-01-10", freq="D") >>> flow_values = np.full(len(flow_dates) - 1, 100.0) # Constant flow of 100 m³/day >>> pore_volume = 200.0 # Aquifer pore volume in m³ >>> # Calculate mean residence times >>> mean_times = residence_time_mean( ... flow=flow_values, ... flow_tedges=flow_dates, ... tedges_out=flow_dates, ... aquifer_pore_volume=pore_volume, ... direction="extraction_to_infiltration", ... ) >>> # With constant flow of 100 m³/day and pore volume of 200 m³, >>> # mean residence time should be approximately 2 days >>> print(mean_times) # Output: [np.nan, np.nan, 2.0, 2.0, 2.0, ..., 2.0]
utils#
General utilities for the 1D groundwater transport model.
- gwtransport.utils.linear_interpolate(x_ref, y_ref, x_query, left=None, right=None)[source]#
Linear interpolation on monotonically increasing data.
- Parameters:
x_ref (array-like) – Reference vector with sorted x-values.
y_ref (array-like) – Reference vector with y-values.
x_query (array-like) – Query x-values. Array may have any shape.
left (float, optional) – Value to return for x_query < x_ref[0]. - If left is set to None, x_query = x_ref[0]. - If left is set to a float, such as np.nan, this value is returned.
right (float, optional) – Value to return for x_query > x_ref[-1]. - If right is set to None, x_query = x_ref[-1]. - If right is set to a float, such as np.nan, this value is returned.
- Returns:
Interpolated y-values.
- Return type:
array
- gwtransport.utils.interp_series(series, index_new, **interp1d_kwargs)[source]#
Interpolate a pandas.Series to a new index.
- Parameters:
series (pandas.Series) – Series to interpolate.
index_new (pandas.DatetimeIndex) – New index to interpolate to.
interp1d_kwargs (dict, optional) – Keyword arguments passed to scipy.interpolate.interp1d. Default is {}.
- Returns:
Interpolated series.
- Return type:
- gwtransport.utils.diff(a, alignment='centered')[source]#
Compute the cell widths for a given array of cell coordinates.
If alignment is “centered”, the coordinates are assumed to be centered in the cells. If alignment is “left”, the coordinates are assumed to be at the left edge of the cells. If alignment is “right”, the coordinates are assumed to be at the right edge of the cells.
- Parameters:
a (array-like) – Input array.
- Returns:
Array with differences between elements.
- Return type:
array
- gwtransport.utils.linear_average(x_data: Sequence[float] | npt.NDArray[np.float64], y_data: Sequence[float] | npt.NDArray[np.float64], x_edges: Sequence[float] | npt.NDArray[np.float64], extrapolate_method: str = 'nan') npt.NDArray[np.float64] [source]#
Compute the average value of a piecewise linear time series between specified x-edges.
- Parameters:
x_data (array-like) – x-coordinates of the time series data points, must be in ascending order
y_data (array-like) – y-coordinates of the time series data points
x_edges (array-like) – x-coordinates of the integration edges. Can be 1D or 2D. - If 1D: shape (n_edges,). Can be 1D or 2D. - If 1D: shape (n_edges,), must be in ascending order - If 2D: shape (n_series, n_edges), each row must be in ascending order - If 2D: shape (n_series, n_edges), each row must be in ascending order
extrapolate_method (str, optional) – Method for handling extrapolation. Default is ‘nan’. - ‘outer’: Extrapolate using the outermost data points. - ‘nan’: Extrapolate using np.nan. - ‘raise’: Raise an error for out-of-bounds values.
- Returns:
2D array of average values between consecutive pairs of x_edges. Shape is (n_series, n_bins) where n_bins = n_edges - 1. If x_edges is 1D, n_series = 1.
- Return type:
Examples
>>> x_data = [0, 1, 2, 3] >>> y_data = [0, 1, 1, 0] >>> x_edges = [0, 1.5, 3] >>> linear_average(x_data, y_data, x_edges) array([[0.667, 0.667]])
>>> x_edges_2d = [[0, 1.5, 3], [0.5, 2, 3]] >>> linear_average(x_data, y_data, x_edges_2d) array([[0.667, 0.667], [0.75, 0.5]])
- gwtransport.utils.partial_isin(bin_edges_in, bin_edges_out)[source]#
Calculate the fraction of each input bin that overlaps with each output bin.
This function computes a matrix where element (i, j) represents the fraction of input bin i that overlaps with output bin j. The computation uses vectorized operations to avoid loops.
- Parameters:
bin_edges_in (array_like) – 1D array of input bin edges in ascending order. For n_in bins, there should be n_in+1 edges.
bin_edges_out (array_like) – 1D array of output bin edges in ascending order. For n_out bins, there should be n_out+1 edges.
- Returns:
overlap_matrix – Dense matrix of shape (n_in, n_out) where n_in is the number of input bins and n_out is the number of output bins. Each element (i, j) represents the fraction of input bin i that overlaps with output bin j. Values range from 0 (no overlap) to 1 (complete overlap).
- Return type:
Notes
Both input arrays must be sorted in ascending order
The function leverages the sorted nature of both inputs for efficiency
Uses vectorized operations to handle large bin arrays efficiently
All overlaps sum to 1.0 for each input bin when output bins fully cover input range
Examples
>>> bin_edges_in = np.array([0, 10, 20, 30]) >>> bin_edges_out = np.array([5, 15, 25]) >>> partial_isin(bin_edges_in, bin_edges_out) array([[0.5, 0.0], [0.5, 0.5], [0.0, 0.5]])
- gwtransport.utils.generate_failed_coverage_badge()[source]#
Generate a badge indicating failed coverage.
- gwtransport.utils.combine_bin_series(a, a_edges, b, b_edges, extrapolation=0.0)[source]#
Combine two binned series onto a common set of unique edges.
This function takes two binned series (a and b) with their respective bin edges and creates new series (c and d) that are defined on a combined set of unique edges from both input edge arrays.
- Parameters:
a (array-like) – Values for the first binned series.
a_edges (array-like) – Bin edges for the first series. Must have len(a) + 1 elements.
b (array-like) – Values for the second binned series.
b_edges (array-like) – Bin edges for the second series. Must have len(b) + 1 elements.
extrapolation (str or float, optional) – Method for handling combined bins that fall outside the original series ranges. - ‘nearest’: Use the nearest original bin value - float value (e.g., np.nan, 0.0): Fill with the specified value (default: 0.0)
- Returns:
c (numpy.ndarray) – Values from series a mapped to the combined edge structure.
c_edges (numpy.ndarray) – Combined unique edges from a_edges and b_edges.
d (numpy.ndarray) – Values from series b mapped to the combined edge structure.
d_edges (numpy.ndarray) – Combined unique edges from a_edges and b_edges (same as c_edges).
Notes
The combined edges are created by taking the union of all unique values from both a_edges and b_edges, sorted in ascending order. The values are then broadcasted/repeated for each combined bin that falls within the original bin’s range.
- gwtransport.utils.compute_time_edges(tedges, tstart, tend, number_of_bins)[source]#
Compute time edges for binning data based on provided time parameters.
This function creates a DatetimeIndex of time bin edges from one of three possible input formats: explicit edges, start times, or end times. The resulting edges define the boundaries of time intervals for data binning.
Define either explicit time edges, or start and end times for each bin and leave the others at None.
- Parameters:
tedges (pandas.DatetimeIndex or None) – Explicit time edges for the bins. If provided, must have one more element than the number of bins (n_bins + 1). Takes precedence over tstart and tend.
tstart (pandas.DatetimeIndex or None) – Start times for each bin. Must have the same number of elements as the number of bins. Used when tedges is None.
tend (pandas.DatetimeIndex or None) – End times for each bin. Must have the same number of elements as the number of bins. Used when both tedges and tstart are None.
number_of_bins (int) – The expected number of time bins. Used for validation against the provided time parameters.
- Returns:
Time edges defining the boundaries of the time bins. Has one more element than number_of_bins.
- Return type:
- Raises:
ValueError – If tedges has incorrect length (not number_of_bins + 1). If tstart has incorrect length (not equal to number_of_bins). If tend has incorrect length (not equal to number_of_bins). If none of tedges, tstart, or tend are provided.
Notes
When using tstart, the function assumes uniform spacing and extrapolates the final edge based on the spacing between the last two start times.
When using tend, the function assumes uniform spacing and extrapolates the first edge based on the spacing between the first two end times.
All input time data is converted to pandas.DatetimeIndex for consistency.
- gwtransport.utils.get_soil_temperature(station_number: int = 260, *, interpolate_missing_values: bool = True) DataFrame [source]#
Download soil temperature data from the KNMI and return it as a pandas DataFrame.
The data is available for the following KNMI weather stations: - 260: De Bilt, the Netherlands (vanaf 1981) - 273: Marknesse, the Netherlands (vanaf 1989) - 286: Nieuw Beerta, the Netherlands (vanaf 1990) - 323: Wilhelminadorp, the Netherlands (vanaf 1989)
TB1 = grondtemperatuur op 5 cm diepte (graden Celsius) tijdens de waarneming TB2 = grondtemperatuur op 10 cm diepte (graden Celsius) tijdens de waarneming TB3 = grondtemperatuur op 20 cm diepte (graden Celsius) tijdens de waarneming TB4 = grondtemperatuur op 50 cm diepte (graden Celsius) tijdens de waarneming TB5 = grondtemperatuur op 100 cm diepte (graden Celsius) tijdens de waarneming TNB2 = minimum grondtemperatuur op 10 cm diepte in de afgelopen 6 uur (graden Celsius) TNB1 = minimum grondtemperatuur op 5 cm diepte in de afgelopen 6 uur (graden Celsius) TXB1 = maximum grondtemperatuur op 5 cm diepte in de afgelopen 6 uur (graden Celsius) TXB2 = maximum grondtemperatuur op 10 cm diepte in de afgelopen 6 uur (graden Celsius)
- Parameters:
station_number (int, {260, 273, 286, 323}) – The KNMI station number for which to download soil temperature data. Default is 260 (De Bilt).
interpolate_missing_values (bool, optional) – If True, missing values are interpolated and recent NaN values are extrapolated with the previous value. If False, missing values remain as NaN. Default is True.
- Returns:
DataFrame containing soil temperature data in Celsius with a DatetimeIndex. Columns include TB1, TB2, TB3, TB4, TB5, TNB1, TNB2, TXB1, TXB2.
- Return type:
Notes
KNMI: Royal Netherlands Meteorological Institute
The timeseries may contain NaN values for missing data.