PyHydroGeophysX.model_output package

Submodules

PyHydroGeophysX.model_output.base module

Base classes for model output processing.

class PyHydroGeophysX.model_output.base.HydroModelOutput(model_directory: str)[source]

Bases: ABC

Base class for all hydrological model outputs.

calculate_saturation(water_content: ndarray, porosity: float | ndarray) ndarray[source]

Calculate saturation from water content and porosity.

Parameters:
  • water_content – Water content array

  • porosity – Porosity value(s)

Returns:

Saturation array

abstract get_timestep_info() List[Tuple][source]

Get information about each timestep.

Returns:

List of timestep information tuples

abstract load_time_range(start_idx: int = 0, end_idx: int | None = None, **kwargs) ndarray[source]

Load data for a range of timesteps.

Parameters:
  • start_idx – Starting timestep index

  • end_idx – Ending timestep index (exclusive)

  • **kwargs – Additional parameters specific to the model type

Returns:

Data array for the specified timestep range

abstract load_timestep(timestep_idx: int, **kwargs) ndarray[source]

Load data for a specific timestep.

Parameters:
  • timestep_idx – Index of the timestep to load

  • **kwargs – Additional parameters specific to the model type

Returns:

Data array for the specified timestep

PyHydroGeophysX.model_output.modflow_output module

Module for processing MODFLOW model outputs.

class PyHydroGeophysX.model_output.modflow_output.MODFLOWPorosity(model_directory: str, model_name: str)[source]

Bases: HydroModelOutput

Class for processing porosity data from MODFLOW simulations.

get_timestep_info() List[Tuple][source]

Get information about each timestep in the model. Returns a minimal placeholder since porosity doesn’t vary with time.

Returns:

List with single dummy timestep info

load_porosity() ndarray[source]

Load porosity data from MODFLOW model (supports both MODFLOW 6 and earlier versions).

Returns:

3D array of porosity values (nlay, nrow, ncol)

load_time_range(start_idx: int = 0, end_idx: int | None = None, **kwargs) ndarray[source]

Load porosity for a range of timesteps. Since porosity is typically constant, this returns a stack of identical arrays.

Parameters:
  • start_idx – Starting timestep index (unused)

  • end_idx – Ending timestep index (unused)

Returns:

4D array of porosity values (nt, nlay, nrow, ncol) where all timesteps are identical

load_timestep(timestep_idx: int, **kwargs) ndarray[source]

Load porosity for a specific timestep. Note: For MODFLOW, porosity is typically constant over time, so this returns the same array regardless of timestep.

Parameters:

timestep_idx – Index of the timestep (unused)

Returns:

3D array of porosity values

class PyHydroGeophysX.model_output.modflow_output.MODFLOWWaterContent(model_directory: str, idomain: ndarray)[source]

Bases: HydroModelOutput

Class for processing water content data from MODFLOW simulations.

get_timestep_info() List[Tuple[int, int, float, float]][source]

Get information about each timestep in the WaterContent file.

Returns:

List of tuples (kstp, kper, pertim, totim) for each timestep

load_time_range(start_idx: int = 0, end_idx: int | None = None, nlay: int = 3) ndarray[source]

Load water content for a range of timesteps.

Parameters:
  • start_idx – Starting timestep index (default: 0)

  • end_idx – Ending timestep index (exclusive, default: None loads all)

  • nlay – Number of layers in the model (default: 3)

Returns:

Water content array with shape (timesteps, nlay, nrows, ncols)

load_timestep(timestep_idx: int, nlay: int = 3) ndarray[source]

Load water content for a specific timestep.

Parameters:
  • timestep_idx – Index of the timestep to load

  • nlay – Number of layers in the model

Returns:

Water content array with shape (nlay, nrows, ncols)

PyHydroGeophysX.model_output.modflow_output.binaryread(file, vartype, shape=(1,), charlen=16)[source]

Uses numpy to read from binary file. This was found to be faster than the struct approach and is used as the default.

Parameters:
  • file – Open file object in binary read mode

  • vartype – Variable type to read

  • shape – Shape of the data to read (default: (1,))

  • charlen – Length of character strings (default: 16)

Returns:

The read data

PyHydroGeophysX.model_output.parflow_output module

Module for processing ParFlow model outputs.

This module provides classes to handle specific types of ParFlow outputs, such as saturation and porosity, by reading ParFlow Binary Files (PFB). It relies on the parflow Python package for PFB reading capabilities.

class PyHydroGeophysX.model_output.parflow_output.ParflowOutput(model_directory: str, run_name: str)[source]

Bases: HydroModelOutput

Base class for processing ParFlow model outputs.

This class handles common ParFlow output functionalities, such as identifying available timesteps and interfacing with the parflow Python package for reading PFB files. Specific data types (like saturation, porosity) should be handled by subclasses.

get_pfb_dimensions(pfb_file_path: str) Tuple[int, int, int][source]

Reads a PFB file and returns its data dimensions (nz, ny, nx).

Parameters:

pfb_file_path (str) – The full path to the PFB file.

Returns:

The dimensions of the data in the PFB file,

typically in (nz, ny, nx) order for ParFlow.

Return type:

Tuple[int, int, int]

Raises:
  • FileNotFoundError – If the pfb_file_path does not exist.

  • Exception – If self.read_pfb (from parflow.tools.io) fails to read the file.

class PyHydroGeophysX.model_output.parflow_output.ParflowPorosity(model_directory: str, run_name: str)[source]

Bases: ParflowOutput

Processes porosity data from ParFlow simulations. Porosity in ParFlow is typically static (time-invariant) and stored in a single PFB file (e.g., <run_name>.out.porosity.pfb or similar).

get_timestep_info() List[Tuple[int, float]][source]

Returns timestep information, typically based on other ParFlow outputs (like saturation) as porosity itself is static.

Returns:

A list of (timestep_number, time_value) tuples,

derived from self.available_timesteps.

Return type:

List[Tuple[int, float]]

load_mask() ndarray[source]

Load the domain mask data from a ParFlow model. The mask file (.out.mask.pfb) indicates active (1) and inactive (0) cells.

Returns:

A 3D NumPy array representing the domain mask (nz, ny, nx).

Values are typically 0 or 1.

Return type:

np.ndarray

Raises:
  • FileNotFoundError – If no standard mask PFB file can be found.

  • ValueError – If there’s an error reading or processing the PFB file.

load_porosity() ndarray[source]

Load the static porosity data from the ParFlow model.

It searches for common ParFlow porosity filename patterns within the model directory.

Returns:

A 3D NumPy array of porosity values (nz, ny, nx).

Return type:

np.ndarray

Raises:
  • FileNotFoundError – If no standard porosity PFB file can be found.

  • ValueError – If there’s an error reading or processing the PFB file.

load_time_range(start_idx: int = 0, end_idx: int | None = None, **kwargs: Any) ndarray[source]

Load porosity data for a conceptual range of timesteps. Since porosity is time-invariant, this method returns a 4D array where the static 3D porosity data is repeated along the time axis.

The number of repetitions along the time axis (nt) is determined by the length of self.available_timesteps (discovered from saturation/pressure files) if end_idx is None, or by min(end_idx - start_idx, len(available_timesteps)). A minimum of 1 repetition is ensured if any timesteps are notionally available.

Parameters:
  • start_idx (int, optional) – Starting timestep index (used to determine nt). Defaults to 0.

  • end_idx (Optional[int], optional) – Ending timestep index (exclusive, used for nt). Defaults to None (use all available timesteps).

  • **kwargs (Any) – Additional keyword arguments (not used).

Returns:

A 4D NumPy array of porosity values (nt, nz, ny, nx).

All slices along the time dimension are identical.

Return type:

np.ndarray

load_timestep(timestep_idx: int, **kwargs: Any) ndarray[source]

Load porosity data. For ParFlow, porosity is typically time-invariant. This method returns the static porosity array, ignoring timestep_idx.

Parameters:
  • timestep_idx (int) – Index of the timestep (ignored, as porosity is static).

  • **kwargs (Any) – Additional keyword arguments (not used).

Returns:

A 3D NumPy array of porosity values (nz, ny, nx).

Return type:

np.ndarray

class PyHydroGeophysX.model_output.parflow_output.ParflowSaturation(model_directory: str, run_name: str)[source]

Bases: ParflowOutput

Processes saturation data from ParFlow simulations (.out.satur.*.pfb files).

get_timestep_info() List[Tuple[int, float]][source]

Provides information about available ParFlow timesteps.

For ParFlow, the timestep number from the filename often directly corresponds to the simulation time (e.g., if output is every 1 hour, timestep 24 is 24 hours). This method returns a list of tuples: (timestep_number, simulation_time). Currently, simulation_time is simply cast from timestep_number. More accurate time mapping would require parsing ParFlow timing files if complex.

Returns:

A list where each tuple is (timestep_number, time_value).

Time_value is float representation of timestep_number.

Return type:

List[Tuple[int, float]]

load_time_range(start_idx: int = 0, end_idx: int | None = None, **kwargs: Any) ndarray[source]

Load saturation data for a specified range of zero-based timestep indices.

Parameters:
  • start_idx (int, optional) – Starting zero-based timestep index. Defaults to 0.

  • end_idx (Optional[int], optional) – Ending zero-based timestep index (exclusive). If None, loads up to the last available timestep. Defaults to None.

  • **kwargs (Any) – Additional keyword arguments (not used).

Returns:

A 4D NumPy array of saturation values (num_timesteps, nz, ny, nx).

Returns an empty 4D array if the range is invalid or no data is found.

Return type:

np.ndarray

Raises:

ValueError – If no timesteps are available, or if the specified range is invalid (e.g., start_idx out of bounds, end_idx <= start_idx leading to empty range).

load_timestep(timestep_idx: int, **kwargs: Any) ndarray[source]

Load saturation data for a specific, zero-based timestep index.

Parameters:
  • timestep_idx (int) – The zero-based index of the timestep to load from the list of available timesteps discovered during initialization.

  • **kwargs (Any) – Additional keyword arguments (not used by this method).

Returns:

A 3D NumPy array of saturation values (nz, ny, nx).

Return type:

np.ndarray

Raises:

ValueError – If no timesteps are available or if timestep_idx is out of range.

PyHydroGeophysX.model_output.water_content module

Module for handling MODFLOW Unsaturated-Zone Flow (UZF) package water content data.

This module provides a class MODFLOWWaterContent (which seems to be a duplicate or very similar to the one in modflow_output.py but focused here) for reading binary ‘WaterContent’ files produced by MODFLOW’s UZF package. It also includes a utility for calculating saturation.

class PyHydroGeophysX.model_output.water_content.MODFLOWWaterContent(sim_ws: str, idomain: ndarray)[source]

Bases: object

Processes water content data from MODFLOW’s UZF (Unsaturated-Zone Flow) package.

This class reads the binary ‘WaterContent’ file output by MODFLOW when the UZF package is active and output is requested. It maps the 1D array of UZF cell water contents back to a 2D or 3D grid based on the provided idomain. It also includes a method to calculate saturation from water content and porosity.

sim_ws

Path to the simulation workspace.

Type:

str

idomain

The 2D idomain array used for mapping UZF cells.

Type:

np.ndarray

nrows

Number of rows in the model grid.

Type:

int

ncols

Number of columns in the model grid.

Type:

int

iuzno_dict_rev

Reverse lookup dictionary mapping sequential UZF cell number to (row, col) index.

Type:

Dict[int, Tuple[int,int]]

nuzfcells_2d

Number of active UZF cells in the 2D plane (derived from idomain).

Type:

int

calculate_saturation(water_content: ndarray, porosity: float | ndarray) ndarray[source]

Calculate volumetric saturation from water content and porosity.

Saturation (S) is computed as S = water_content / porosity. The result is clipped to the range [0.0, 1.0].

Parameters:
  • water_content (np.ndarray) – NumPy array of water content values. Can be for a single timestep (e.g., [nlay, nrow, ncol]) or multiple timesteps (e.g., [time, nlay, nrow, ncol]).

  • porosity (Union[float, np.ndarray]) – Porosity of the medium. Can be a scalar (uniform porosity) or a NumPy array. If an array, its dimensions must be compatible with water_content (e.g., matching spatial dimensions for broadcasting across time if needed).

Returns:

NumPy array of calculated saturation values, same shape as water_content,

with values clipped between 0 and 1.

Return type:

np.ndarray

Raises:
  • ValueError – If porosity is an array and its dimensions are incompatible with water_content for element-wise division.

  • TypeError – If inputs are not of expected types (NumPy arrays, float).

get_timestep_info() List[Tuple[int, int, float, float]][source]

Reads the ‘WaterContent’ file to extract header information for each timestep.

Returns:

A list of tuples, where each tuple

contains (kstp, kper, pertim, totim) for a timestep: - kstp (int): Timestep number within the stress period. - kper (int): Stress period number. - pertim (float): Time within the current stress period. - totim (float): Total simulation time.

Return type:

List[Tuple[int, int, float, float]]

load_time_range(start_idx: int = 0, end_idx: int | None = None, nlay_uzf: int = 3) ndarray[source]

Load water content data for a specified range of timesteps from the ‘WaterContent’ file.

Parameters:
  • start_idx (int, optional) – Zero-based starting timestep index. Defaults to 0.

  • end_idx (Optional[int], optional) – Zero-based ending timestep index (exclusive). If None, loads all timesteps from start_idx to the end of the file. Defaults to None.

  • nlay_uzf (int, optional) – The number of unsaturated zone layers in the UZF model. This dictates how many data values are read per active UZF cell at each timestep. Defaults to 3.

Returns:

A 4D NumPy array of water content values, with shape

(num_timesteps_loaded, nlay_uzf, nrows, ncols). Returns an empty 4D array (shape (0, nlay_uzf, nrows, ncols)) if no timesteps are loaded or if an error occurs during initial file access.

Return type:

np.ndarray

load_timestep(timestep_idx: int, nlay_uzf: int = 3) ndarray[source]

Load water content data for a single, specific timestep.

Parameters:
  • timestep_idx (int) – The zero-based index of the timestep to load from the ‘WaterContent’ file.

  • nlay_uzf (int, optional) – The number of unsaturated zone layers simulated in UZF, which determines how many values are stored per (row, col) UZF cell. Defaults to 3. This must match the UZF package configuration.

Returns:

A 3D NumPy array of water content values with shape (nlay_uzf, nrows, ncols).

Values for inactive grid cells (where idomain <= 0) will be NaN.

Return type:

np.ndarray

Raises:
  • IndexError – If timestep_idx results in no data being loaded (e.g., out of bounds).

  • RuntimeError – If data loading for the specific timestep fails unexpectedly.

PyHydroGeophysX.model_output.water_content.binaryread(file_obj: Any, vartype: type | List[Tuple[str, str]], shape: Tuple[int, ...] = (1,), charlen: int = 16) bytes | ndarray | void[source]

Reads data from an open binary file using numpy.fromfile or file.read.

Designed for MODFLOW binary output files, handling various data types.

Parameters:
  • file_obj – Open file object in binary read mode.

  • vartype – Variable type to read (e.g., np.float64, str, or structured dtype list).

  • shape (Tuple[int, ...], optional) – Desired output shape for standard numpy dtypes. Defaults to (1,).

  • charlen (int, optional) – Length for string types if vartype is str. Defaults to 16.

Returns:

Data read from file. bytes for str type,

np.ndarray for standard dtypes, np.void for structured.

Return type:

Union[bytes, np.ndarray, np.void]

Raises:

EOFError – If EOF is reached unexpectedly while reading data for standard dtypes.

Module contents

Module for processing model outputs from various hydrological models.

class PyHydroGeophysX.model_output.HydroModelOutput(model_directory: str)[source]

Bases: ABC

Base class for all hydrological model outputs.

calculate_saturation(water_content: ndarray, porosity: float | ndarray) ndarray[source]

Calculate saturation from water content and porosity.

Parameters:
  • water_content – Water content array

  • porosity – Porosity value(s)

Returns:

Saturation array

abstract get_timestep_info() List[Tuple][source]

Get information about each timestep.

Returns:

List of timestep information tuples

abstract load_time_range(start_idx: int = 0, end_idx: int | None = None, **kwargs) ndarray[source]

Load data for a range of timesteps.

Parameters:
  • start_idx – Starting timestep index

  • end_idx – Ending timestep index (exclusive)

  • **kwargs – Additional parameters specific to the model type

Returns:

Data array for the specified timestep range

abstract load_timestep(timestep_idx: int, **kwargs) ndarray[source]

Load data for a specific timestep.

Parameters:
  • timestep_idx – Index of the timestep to load

  • **kwargs – Additional parameters specific to the model type

Returns:

Data array for the specified timestep

class PyHydroGeophysX.model_output.MODFLOWPorosity(model_directory: str, model_name: str)[source]

Bases: HydroModelOutput

Class for processing porosity data from MODFLOW simulations.

get_timestep_info() List[Tuple][source]

Get information about each timestep in the model. Returns a minimal placeholder since porosity doesn’t vary with time.

Returns:

List with single dummy timestep info

load_porosity() ndarray[source]

Load porosity data from MODFLOW model (supports both MODFLOW 6 and earlier versions).

Returns:

3D array of porosity values (nlay, nrow, ncol)

load_time_range(start_idx: int = 0, end_idx: int | None = None, **kwargs) ndarray[source]

Load porosity for a range of timesteps. Since porosity is typically constant, this returns a stack of identical arrays.

Parameters:
  • start_idx – Starting timestep index (unused)

  • end_idx – Ending timestep index (unused)

Returns:

4D array of porosity values (nt, nlay, nrow, ncol) where all timesteps are identical

load_timestep(timestep_idx: int, **kwargs) ndarray[source]

Load porosity for a specific timestep. Note: For MODFLOW, porosity is typically constant over time, so this returns the same array regardless of timestep.

Parameters:

timestep_idx – Index of the timestep (unused)

Returns:

3D array of porosity values

class PyHydroGeophysX.model_output.MODFLOWWaterContent(model_directory: str, idomain: ndarray)[source]

Bases: HydroModelOutput

Class for processing water content data from MODFLOW simulations.

get_timestep_info() List[Tuple[int, int, float, float]][source]

Get information about each timestep in the WaterContent file.

Returns:

List of tuples (kstp, kper, pertim, totim) for each timestep

load_time_range(start_idx: int = 0, end_idx: int | None = None, nlay: int = 3) ndarray[source]

Load water content for a range of timesteps.

Parameters:
  • start_idx – Starting timestep index (default: 0)

  • end_idx – Ending timestep index (exclusive, default: None loads all)

  • nlay – Number of layers in the model (default: 3)

Returns:

Water content array with shape (timesteps, nlay, nrows, ncols)

load_timestep(timestep_idx: int, nlay: int = 3) ndarray[source]

Load water content for a specific timestep.

Parameters:
  • timestep_idx – Index of the timestep to load

  • nlay – Number of layers in the model

Returns:

Water content array with shape (nlay, nrows, ncols)

class PyHydroGeophysX.model_output.ParflowPorosity(model_directory: str, run_name: str)[source]

Bases: ParflowOutput

Processes porosity data from ParFlow simulations. Porosity in ParFlow is typically static (time-invariant) and stored in a single PFB file (e.g., <run_name>.out.porosity.pfb or similar).

get_timestep_info() List[Tuple[int, float]][source]

Returns timestep information, typically based on other ParFlow outputs (like saturation) as porosity itself is static.

Returns:

A list of (timestep_number, time_value) tuples,

derived from self.available_timesteps.

Return type:

List[Tuple[int, float]]

load_mask() ndarray[source]

Load the domain mask data from a ParFlow model. The mask file (.out.mask.pfb) indicates active (1) and inactive (0) cells.

Returns:

A 3D NumPy array representing the domain mask (nz, ny, nx).

Values are typically 0 or 1.

Return type:

np.ndarray

Raises:
  • FileNotFoundError – If no standard mask PFB file can be found.

  • ValueError – If there’s an error reading or processing the PFB file.

load_porosity() ndarray[source]

Load the static porosity data from the ParFlow model.

It searches for common ParFlow porosity filename patterns within the model directory.

Returns:

A 3D NumPy array of porosity values (nz, ny, nx).

Return type:

np.ndarray

Raises:
  • FileNotFoundError – If no standard porosity PFB file can be found.

  • ValueError – If there’s an error reading or processing the PFB file.

load_time_range(start_idx: int = 0, end_idx: int | None = None, **kwargs: Any) ndarray[source]

Load porosity data for a conceptual range of timesteps. Since porosity is time-invariant, this method returns a 4D array where the static 3D porosity data is repeated along the time axis.

The number of repetitions along the time axis (nt) is determined by the length of self.available_timesteps (discovered from saturation/pressure files) if end_idx is None, or by min(end_idx - start_idx, len(available_timesteps)). A minimum of 1 repetition is ensured if any timesteps are notionally available.

Parameters:
  • start_idx (int, optional) – Starting timestep index (used to determine nt). Defaults to 0.

  • end_idx (Optional[int], optional) – Ending timestep index (exclusive, used for nt). Defaults to None (use all available timesteps).

  • **kwargs (Any) – Additional keyword arguments (not used).

Returns:

A 4D NumPy array of porosity values (nt, nz, ny, nx).

All slices along the time dimension are identical.

Return type:

np.ndarray

load_timestep(timestep_idx: int, **kwargs: Any) ndarray[source]

Load porosity data. For ParFlow, porosity is typically time-invariant. This method returns the static porosity array, ignoring timestep_idx.

Parameters:
  • timestep_idx (int) – Index of the timestep (ignored, as porosity is static).

  • **kwargs (Any) – Additional keyword arguments (not used).

Returns:

A 3D NumPy array of porosity values (nz, ny, nx).

Return type:

np.ndarray

class PyHydroGeophysX.model_output.ParflowSaturation(model_directory: str, run_name: str)[source]

Bases: ParflowOutput

Processes saturation data from ParFlow simulations (.out.satur.*.pfb files).

get_timestep_info() List[Tuple[int, float]][source]

Provides information about available ParFlow timesteps.

For ParFlow, the timestep number from the filename often directly corresponds to the simulation time (e.g., if output is every 1 hour, timestep 24 is 24 hours). This method returns a list of tuples: (timestep_number, simulation_time). Currently, simulation_time is simply cast from timestep_number. More accurate time mapping would require parsing ParFlow timing files if complex.

Returns:

A list where each tuple is (timestep_number, time_value).

Time_value is float representation of timestep_number.

Return type:

List[Tuple[int, float]]

load_time_range(start_idx: int = 0, end_idx: int | None = None, **kwargs: Any) ndarray[source]

Load saturation data for a specified range of zero-based timestep indices.

Parameters:
  • start_idx (int, optional) – Starting zero-based timestep index. Defaults to 0.

  • end_idx (Optional[int], optional) – Ending zero-based timestep index (exclusive). If None, loads up to the last available timestep. Defaults to None.

  • **kwargs (Any) – Additional keyword arguments (not used).

Returns:

A 4D NumPy array of saturation values (num_timesteps, nz, ny, nx).

Returns an empty 4D array if the range is invalid or no data is found.

Return type:

np.ndarray

Raises:

ValueError – If no timesteps are available, or if the specified range is invalid (e.g., start_idx out of bounds, end_idx <= start_idx leading to empty range).

load_timestep(timestep_idx: int, **kwargs: Any) ndarray[source]

Load saturation data for a specific, zero-based timestep index.

Parameters:
  • timestep_idx (int) – The zero-based index of the timestep to load from the list of available timesteps discovered during initialization.

  • **kwargs (Any) – Additional keyword arguments (not used by this method).

Returns:

A 3D NumPy array of saturation values (nz, ny, nx).

Return type:

np.ndarray

Raises:

ValueError – If no timesteps are available or if timestep_idx is out of range.

PyHydroGeophysX.model_output.binaryread(file, vartype, shape=(1,), charlen=16)[source]

Uses numpy to read from binary file. This was found to be faster than the struct approach and is used as the default.

Parameters:
  • file – Open file object in binary read mode

  • vartype – Variable type to read

  • shape – Shape of the data to read (default: (1,))

  • charlen – Length of character strings (default: 16)

Returns:

The read data