Note

This notebook is located in the ./examples directory of the gwtransport repository.

Aquifer Characterization Using Temperature Response#

Learning Objectives#

  • Understand how temperature can be used as a natural tracer for aquifer characterization

  • Learn inverse modeling techniques for estimating aquifer properties

  • Apply gamma distribution models to represent aquifer heterogeneity

  • Interpret thermal breakthrough curves in hydrogeological contexts

Overview#

This notebook demonstrates inverse modeling to estimate aquifer pore volume distribution from temperature breakthrough curves. Temperature acts as a conservative tracer with known thermal retardation, allowing characterization of flow paths and residence times. Transport is modeled with the diffusion_fast module, which accounts for advection, microdispersion, and molecular diffusion.

Applications#

  • Groundwater vulnerability assessment

  • Residence time distribution analysis

  • Contaminant transport forecasting

  • Aquifer heterogeneity characterization

Key Assumptions#

  • Stationary pore volume distribution (steady streamlines)

  • Thermal retardation factor = 2.0 (typical for saturated media)

  • Transport includes advection, microdispersion, and molecular diffusion via diffusion_fast

Background Reading#

Theoretical Background#

Thermal Transport in Groundwater#

The residence time for thermal transport relates to the aquifer pore volume distribution through:

\[t_{residence} = \frac{V_{pore} \cdot R_f}{Q}\]

where \(V_{pore}\) is the pore volume [m³], \(R_f\) the thermal retardation factor [-], and \(Q\) the flow rate [m³/day]. See residence time.

Gamma Distribution Model#

Aquifer heterogeneity is represented using a gamma distribution for pore volumes, characterized by shape (\(\alpha\)) and scale (\(\beta\)) parameters, or equivalently by mean and standard deviation.

See concepts and assumptions for further background.

[1]:
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit
from scipy.stats import gamma as gamma_dist

from gwtransport import diffusion_fast
from gwtransport import gamma as gamma_utils
from gwtransport.examples import generate_temperature_example_data
from gwtransport.residence_time import fraction_explained, residence_time
from gwtransport.utils import step_plot_coords

# Set random seed for reproducibility
np.random.seed(42)
plt.style.use("seaborn-v0_8-whitegrid")

print("Libraries imported successfully")
Libraries imported successfully

1. Synthetic Data Generation#

We generate realistic temperature and flow time series to demonstrate the inverse modeling approach. The synthetic data includes seasonal patterns and realistic variability.

[2]:
# Generate 6 years of daily data with seasonal patterns
df, tedges = generate_temperature_example_data(
    date_start="2020-01-01",
    date_end="2025-05-31",
    flow_mean=120.0,  # Base flow rate [m³/day]
    flow_amplitude=40.0,  # Seasonal flow variation [m³/day]
    flow_noise=5.0,  # Random daily fluctuations [m³/day]
    cin_method="soil_temperature",  # Use real soil temperature data
    aquifer_pore_volume_gamma_mean=8000.0,  # True mean pore volume [m³]
    aquifer_pore_volume_gamma_std=400.0,  # True standard deviation [m³]
    measurement_noise=0.1,  # Measurement noise for temperatures. Set to zero for perfect fit.[°C]
)

print("Dataset Summary:")
print(f"Period: {df.index[0].date()} to {df.index[-1].date()}")
print(f"Mean flow: {df['flow'].mean():.1f} m³/day")
print(f"Mean infiltration temperature: {df['cin'].mean():.1f} °C")
print(f"Mean extraction temperature: {df['cout'].mean():.1f} °C")
print(f"True mean pore volume: {df.attrs['aquifer_pore_volume_gamma_mean']:.1f} m³")
print(f"True std deviation: {df.attrs['aquifer_pore_volume_gamma_std']:.1f} m³")
Dataset Summary:
Period: 2020-01-01 to 2025-05-31
Mean flow: 110.6 m³/day
Mean infiltration temperature: 11.8 °C
Mean extraction temperature: 12.9 °C
True mean pore volume: 8000.0 m³
True std deviation: 400.0 m³

2. Parameter Estimation via Optimization#

We implement inverse modeling to estimate gamma distribution parameters using nonlinear least squares optimization. A spin-up period is excluded to allow thermal breakthrough to stabilize. We use :func:~gwtransport.residence_time.fraction_explained to compute the fraction of pore volumes with valid residence times, and exclude the period where the model output is not yet fully informed by the input signal.

[3]:
# Compute fraction_explained to determine the spin-up period.
# We use the initial parameter guesses (p0) to estimate when the model output
# is fully informed by the input signal. After fitting, we verify that the
# spin-up period was sufficient.
p0_mean, p0_std = 7500.0, 450.0
p0_bins = gamma_utils.bins(mean=p0_mean, std=p0_std, n_bins=25)
rt = residence_time(
    flow=df.flow,
    flow_tedges=tedges,
    aquifer_pore_volumes=p0_bins["expected_values"],
    retardation_factor=2.0,  # Thermal retardation
    direction="extraction_to_infiltration",
)
frac = fraction_explained(rt=rt)

# Find the first date where the model output is fully explained
fully_explained_mask = frac >= 0.99
first_fully_explained = df.index[fully_explained_mask][0]

# Define training dataset excluding the spin-up period
train_data = df.loc[fully_explained_mask].cout
train_data = train_data.dropna()
train_length = len(train_data)

print(f"Spin-up period ends: {first_fully_explained.date()}")
print(f"Training dataset: {train_length} days")
print(f"Training period: {train_data.index[0].date()} to {train_data.index[-1].date()}")
Spin-up period ends: 2020-08-03
Training dataset: 1763 days
Training period: 2020-08-03 to 2025-05-31
[4]:
def objective(_xdata, mean, std):
    """Infiltration to extraction model for temperature breakthrough with gamma-distributed pore volumes."""
    print(f"Optimizing: mean={mean:.1f} m³, std={std:.1f} m³")

    cout = diffusion_fast.gamma_infiltration_to_extraction(
        cin=df.cin,
        flow=df.flow,
        tedges=tedges,
        cout_tedges=tedges,
        mean=mean,  # Mean pore volume [m³]
        std=std,  # Standard deviation [m³]
        n_bins=25,  # Discretization resolution
        retardation_factor=df.attrs["retardation_factor"],  # Thermal retardation factor
        mean_longitudinal_dispersivity=df.attrs["longitudinal_dispersivity"],  # Longitudinal dispersivity [m]
        mean_molecular_diffusivity=df.attrs["molecular_diffusivity"],  # Molecular diffusion coefficient [m²/day]
        mean_streamline_length=df.attrs["streamline_length"],  # Streamline length [m]
    )

    # Return training period data (excluding spin-up)
    return cout[fully_explained_mask]
[5]:
# Nonlinear least squares optimization
print("Starting parameter optimization...")

(mean, std), pcov = curve_fit(
    objective,
    df.index,
    train_data.values,
    p0=(7500.0, 450.0),  # Initial parameter estimates [m³]
    bounds=([5000, 200], [10000, 600]),  # Physical constraints [m³]
    method="trf",  # Trust Region Reflective algorithm
    max_nfev=250,  # Limit function evaluations
)

print("\nOptimization completed!")
Starting parameter optimization...
Optimizing: mean=7500.0 m³, std=450.0 m³
/Users/bdestombe/Projects/gwtransport/gwtransport/src/gwtransport/diffusion_fast.py:596: UserWarning: Using multiple aquifer_pore_volumes with non-zero mean_longitudinal_dispersivity. Both represent spreading from velocity heterogeneity at different scales.

This is appropriate when:
  - APVD comes from streamline analysis (explicit geometry)
  - You want to add unresolved microdispersion and molecular diffusion

This may double-count spreading when:
  - APVD was calibrated from measurements (microdispersion already included)

For gamma-parameterized APVD, consider using the 'equivalent APVD std' approach
from 05_Diffusion_Dispersion.ipynb to combine both effects in the faster advection
module. Note: this variance combination is only valid for continuous (gamma)
distributions, not for discrete streamline volumes.
Suppress this warning with suppress_dispersion_warning=True.
  return infiltration_to_extraction(
Optimizing: mean=7500.0 m³, std=450.0 m³
Optimizing: mean=7500.0 m³, std=450.0 m³
Optimizing: mean=7893.0 m³, std=477.1 m³
Optimizing: mean=7893.0 m³, std=477.1 m³
Optimizing: mean=7893.0 m³, std=477.1 m³
Optimizing: mean=7989.0 m³, std=449.0 m³
Optimizing: mean=7989.0 m³, std=449.0 m³
Optimizing: mean=7989.0 m³, std=449.0 m³
Optimizing: mean=8001.6 m³, std=427.1 m³
Optimizing: mean=8001.6 m³, std=427.1 m³
Optimizing: mean=8001.6 m³, std=427.1 m³
Optimizing: mean=8002.3 m³, std=419.2 m³
Optimizing: mean=8002.3 m³, std=419.2 m³
Optimizing: mean=8002.3 m³, std=419.2 m³
Optimizing: mean=8002.8 m³, std=413.9 m³
Optimizing: mean=8002.8 m³, std=413.9 m³
Optimizing: mean=8002.8 m³, std=413.9 m³
Optimizing: mean=8003.1 m³, std=408.7 m³
Optimizing: mean=8003.1 m³, std=408.7 m³
Optimizing: mean=8003.1 m³, std=408.7 m³
Optimizing: mean=8002.6 m³, std=405.8 m³
Optimizing: mean=8002.6 m³, std=405.8 m³
Optimizing: mean=8002.6 m³, std=405.8 m³
Optimizing: mean=8002.7 m³, std=401.9 m³
Optimizing: mean=8002.7 m³, std=401.9 m³
Optimizing: mean=8002.7 m³, std=401.9 m³
Optimizing: mean=8002.9 m³, std=396.1 m³
Optimizing: mean=8002.9 m³, std=396.1 m³
Optimizing: mean=8002.9 m³, std=396.1 m³
Optimizing: mean=8002.5 m³, std=392.5 m³
Optimizing: mean=8002.5 m³, std=392.5 m³
Optimizing: mean=8002.5 m³, std=392.5 m³
Optimizing: mean=8002.6 m³, std=388.8 m³
Optimizing: mean=8002.6 m³, std=388.8 m³
Optimizing: mean=8002.6 m³, std=388.8 m³
Optimizing: mean=8002.6 m³, std=387.0 m³
Optimizing: mean=8002.6 m³, std=387.0 m³
Optimizing: mean=8002.6 m³, std=387.0 m³
Optimizing: mean=8002.5 m³, std=387.3 m³
Optimizing: mean=8002.5 m³, std=387.3 m³
Optimizing: mean=8002.5 m³, std=387.3 m³
Optimizing: mean=8002.6 m³, std=387.2 m³
Optimizing: mean=8002.6 m³, std=387.3 m³
Optimizing: mean=8002.5 m³, std=387.3 m³
Optimizing: mean=8002.5 m³, std=387.3 m³
Optimizing: mean=8002.5 m³, std=387.3 m³
Optimizing: mean=8002.5 m³, std=387.3 m³
Optimizing: mean=8002.5 m³, std=387.3 m³
Optimizing: mean=8002.5 m³, std=387.3 m³
Optimizing: mean=8002.5 m³, std=387.3 m³
Optimizing: mean=8002.5 m³, std=387.3 m³
Optimizing: mean=8002.5 m³, std=387.3 m³
Optimizing: mean=8002.5 m³, std=387.3 m³
Optimizing: mean=8002.5 m³, std=387.3 m³
Optimizing: mean=8002.5 m³, std=387.3 m³
Optimizing: mean=8002.5 m³, std=387.3 m³
Optimizing: mean=8002.5 m³, std=387.3 m³
Optimizing: mean=8002.5 m³, std=387.3 m³
Optimizing: mean=8002.5 m³, std=387.3 m³
Optimizing: mean=8002.5 m³, std=387.3 m³
Optimizing: mean=8002.5 m³, std=387.3 m³
Optimizing: mean=8002.5 m³, std=387.3 m³
Optimizing: mean=8002.5 m³, std=387.3 m³
Optimizing: mean=8002.5 m³, std=387.3 m³

Optimization completed!
[6]:
# Generate model predictions using optimized parameters
df["cout_modeled"] = diffusion_fast.gamma_infiltration_to_extraction(
    cin=df.cin,
    flow=df.flow,
    tedges=tedges,
    cout_tedges=tedges,
    mean=mean,  # Fitted mean pore volume
    std=std,  # Fitted standard deviation
    n_bins=250,  # High computational resolution
    retardation_factor=df.attrs["retardation_factor"],  # Thermal retardation
    mean_longitudinal_dispersivity=df.attrs["longitudinal_dispersivity"],  # Longitudinal dispersivity [m]
    mean_molecular_diffusivity=df.attrs["molecular_diffusivity"],  # Molecular diffusion coefficient [m²/day]
    mean_streamline_length=df.attrs["streamline_length"],  # Streamline length [m]
)

# Report optimization results with uncertainty estimates
print("Parameter Estimation Results:")
print(f"Mean pore volume: {mean:.1f} ± {pcov[0, 0] ** 0.5:.1f} m³")
print(f"Standard deviation: {std:.1f} ± {pcov[1, 1] ** 0.5:.1f} m³")
print(f"Coefficient of variation: {std / mean:.2f}")

# Compare with true values
true_mean = df.attrs["aquifer_pore_volume_gamma_mean"]
true_std = df.attrs["aquifer_pore_volume_gamma_std"]
print(f"\nTrue values: {true_mean:.1f} m³ (mean), {true_std:.1f} m³ (std)")
print(
    f"Relative error: {abs(mean - true_mean) / true_mean * 100:.1f}% (mean), {abs(std - true_std) / true_std * 100:.1f}% (std)"
)
Parameter Estimation Results:
Mean pore volume: 8002.5 ± 1.6 m³
Standard deviation: 387.3 ± 4.2 m³
Coefficient of variation: 0.05

True values: 8000.0 m³ (mean), 400.0 m³ (std)
Relative error: 0.0% (mean), 3.2% (std)

3. Model Validation and Visualization#

We compare observed and modeled temperature breakthrough curves to assess model performance.

[7]:
fig, (ax1, ax2) = plt.subplots(figsize=(12, 8), nrows=2, ncols=1, sharex=True)

# Flow rate subplot - convert to step format
xstep_flow, ystep_flow = step_plot_coords(tedges, df.flow)
ax1.set_title("Temperature-Based Aquifer Characterization", fontsize=14, fontweight="bold")
ax1.plot(xstep_flow, ystep_flow, label="Discharge rate", color="steelblue", alpha=0.8, linewidth=1.2)
ax1.set_ylabel("Discharge [m³/day]")
ax1.legend()
ax1.grid(True, alpha=0.3)

# Temperature subplot - convert all series to step format
xstep_temp, ystep_infiltration = step_plot_coords(tedges, df.cin)
_, ystep_extraction = step_plot_coords(tedges, df.cout)
_, ystep_modeled = step_plot_coords(tedges, df.cout_modeled)

ax2.plot(xstep_temp, ystep_infiltration, label="Recharge temperature", color="orange", alpha=0.8, linewidth=1.2)
ax2.plot(xstep_temp, ystep_extraction, label="Discharge temperature (observed)", color="red", alpha=0.8, linewidth=1.2)
ax2.plot(
    xstep_temp,
    ystep_modeled,
    label="Discharge temperature (modeled)",
    color="green",
    alpha=0.8,
    linewidth=1.2,
    linestyle="--",
)

ax2.set_xlabel("Date")
ax2.set_ylabel("Temperature [°C]")
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()
../_images/examples_01_Aquifer_Characterization_Temperature_12_0.png

4. Pore Volume Distribution Analysis#

We visualize the fitted gamma distribution representing spatial heterogeneity in pore volume. Each bin represents a different flow path through the aquifer.

[8]:
# Discretize gamma distribution into flow path bins
n_bins = 10  # Reduced for visualization clarity
alpha, beta = gamma_utils.mean_std_loc_to_alpha_beta(mean=mean, std=std)
gbins = gamma_utils.bins(alpha=alpha, beta=beta, n_bins=n_bins)

print(f"Gamma Distribution Parameters: alpha={alpha:.1f}, beta={beta:.1f}")
print(f"Discretized into {n_bins} equiprobable bins:")
print("-" * 80)
print(f"{'Bin':3s} {'Lower [m³]':10s} {'Upper [m³]':10s} {'E[V|bin]':10s} {'P(bin)':10s}")
print("-" * 80)

for i in range(n_bins):
    upper = f"{gbins['upper_bound'][i]:.1f}" if not np.isinf(gbins["upper_bound"][i]) else "∞"
    lower = f"{gbins['lower_bound'][i]:.1f}"
    expected = f"{gbins['expected_values'][i]:.1f}"
    prob = f"{gbins['probability_mass'][i]:.3f}"
    print(f"{i:3d} {lower:10s} {upper:10s} {expected:10s} {prob:10s}")
Gamma Distribution Parameters: alpha=426.9, beta=18.7
Discretized into 10 equiprobable bins:
--------------------------------------------------------------------------------
Bin Lower [m³] Upper [m³] E[V|bin]   P(bin)
--------------------------------------------------------------------------------
  0 0.0        7510.3     7337.0     0.100
  1 7510.3     7674.9     7598.7     0.100
  2 7674.9     7795.0     7737.0     0.100
  3 7795.0     7898.6     7847.6     0.100
  4 7898.6     7996.3     7947.6     0.100
  5 7996.3     8094.8     8045.2     0.100
  6 8094.8     8201.0     8146.9     0.100
  7 8201.0     8326.5     8261.4     0.100
  8 8326.5     8502.7     8407.6     0.100
  9 8502.7     ∞          8696.2     0.100
[9]:
# Plot the gamma distribution and bins
x = np.linspace(0, 1.1 * gbins["expected_values"][-1], 1000)
y = gamma_dist.pdf(x, alpha, scale=beta)

fig, ax = plt.subplots(figsize=(12, 8))
ax.set_title(
    f"Fitted Pore Volume Distribution\n(alpha={alpha:.1f}, beta={beta:.1f}, mean={mean:.0f} m³, std={std:.0f} m³)",
    fontsize=14,
    fontweight="bold",
)

ax.plot(x, y, label="Probability density function", color="navy", alpha=0.8, linewidth=2.5)
pdf_at_lower_bound = gamma_dist.pdf(gbins["lower_bound"], alpha, scale=beta)
ax.vlines(
    gbins["lower_bound"],
    0,
    pdf_at_lower_bound,
    color="red",
    alpha=0.6,
    linewidth=1.5,
    label="Bin boundaries",
)

ax.set_xlabel("Pore Volume [m³]")
ax.set_ylabel("Probability Density [m^-3]")
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
../_images/examples_01_Aquifer_Characterization_Temperature_15_0.png

Results & Discussion#

The inverse modeling successfully recovered the aquifer pore volume distribution parameters. The fitted gamma distribution captures the heterogeneity in flow paths through the aquifer.

Practical Applications#

  • Contaminant Transport: Use fitted parameters to predict pollutant breakthrough

  • Well Field Design: Optimize extraction rates based on residence time requirements

  • Vulnerability Assessment: Identify fast flow paths that may compromise water quality

Key Takeaways#

  • Temperature as Natural Tracer: Temperature provides valuable information about aquifer properties without artificial injection

  • Inverse Modeling: Optimization techniques can extract quantitative aquifer parameters from field observations

  • Gamma Distribution: Effective model for representing aquifer heterogeneity in engineering applications

  • Thermal Retardation: Must account for heat exchange when using temperature data

  • Spin-up Period: Use :func:~gwtransport.residence_time.fraction_explained to determine when the model output is fully informed by the input signal