PyHydroGeophysX.data_access package#

Submodules#

PyHydroGeophysX.data_access.accessors module#

Accessor abstraction for hydro-model data.

Three concrete implementations: - LocalHydroAccessor – reads directly from a local directory. - HttpHydroAccessor – downloads files on demand from an HTTP base URL. - (BaseHydroAccessor) – abstract base defining the interface.

Usage#

>>> acc = LocalHydroAccessor("/path/to/data")
>>> ok, summary, errors = acc.validate()
>>> local_dir = acc.materialize(["Watercontent.npy", "top.txt"], "/tmp/work")
class PyHydroGeophysX.data_access.accessors.BaseHydroAccessor[source]#

Bases: ABC

Abstract base for hydro-data access.

abstract list_available_items() Dict[str, Any][source]#

Return available timesteps / variables / files.

Return type:

dict with keys like files, timesteps, variables.

abstract materialize(required_files: List[str], target_dir: str) str[source]#

Ensure required_files are available in target_dir.

For local accessors this may be a no-op (return source dir). For HTTP accessors this downloads missing files.

Return type:

str – local directory path containing the files.

abstract validate() Tuple[bool, Dict[str, Any], List[str]][source]#

Check whether the data source is valid.

Returns:

  • ok (bool)

  • summary (dict) – Keys: snapshot_count, water_shape, porosity_shape, bot_shape, grid_info, path.

  • errors (list[str])

class PyHydroGeophysX.data_access.accessors.HttpHydroAccessor(manifest_entry: Dict[str, Any], cache_dir: str | None = None)[source]#

Bases: BaseHydroAccessor

Download hydro data on demand from an HTTP base URL.

Parameters:
  • manifest_entry (dict) – A single dataset entry from manifest.json.

  • cache_dir (str, optional) – Directory for caching downloaded files. Defaults to a temp directory.

clear_cache() int[source]#

Remove all cached files. Returns number of files removed.

list_available_items() Dict[str, Any][source]#

Return available timesteps / variables / files.

Return type:

dict with keys like files, timesteps, variables.

materialize(required_files: List[str], target_dir: str) str[source]#

Download required files to target_dir, using the cache.

validate() Tuple[bool, Dict[str, Any], List[str]][source]#

Check whether the data source is valid.

Returns:

  • ok (bool)

  • summary (dict) – Keys: snapshot_count, water_shape, porosity_shape, bot_shape, grid_info, path.

  • errors (list[str])

class PyHydroGeophysX.data_access.accessors.LocalHydroAccessor(root_path: str)[source]#

Bases: BaseHydroAccessor

Read hydro data from a local filesystem directory.

list_available_items() Dict[str, Any][source]#

Return available timesteps / variables / files.

Return type:

dict with keys like files, timesteps, variables.

materialize(required_files: List[str], target_dir: str) str[source]#

Ensure required_files are available in target_dir.

For local accessors this may be a no-op (return source dir). For HTTP accessors this downloads missing files.

Return type:

str – local directory path containing the files.

validate() Tuple[bool, Dict[str, Any], List[str]][source]#

Check whether the data source is valid.

Returns:

  • ok (bool)

  • summary (dict) – Keys: snapshot_count, water_shape, porosity_shape, bot_shape, grid_info, path.

  • errors (list[str])

PyHydroGeophysX.data_access.accessors.get_manifest_entry(dataset_id: str, manifest_path: str | None = None) Dict[str, Any] | None[source]#

Return a single manifest entry by id, or None.

PyHydroGeophysX.data_access.accessors.load_manifest(manifest_path: str | None = None) Dict[str, Any][source]#

Load the dataset manifest JSON.

Parameters:

manifest_path (str, optional) – Path to manifest.json. Defaults to datasets/manifest.json relative to the repository root (two levels up from this file).

Returns:

Parsed manifest with a datasets key.

Return type:

dict

Module contents#

Data access abstraction for PyHydroGeophysX.

Provides accessor classes that unify local filesystem and HTTP-based data loading for the Hydro-to-Geophysics workflow.

class PyHydroGeophysX.data_access.BaseHydroAccessor[source]#

Bases: ABC

Abstract base for hydro-data access.

abstract list_available_items() Dict[str, Any][source]#

Return available timesteps / variables / files.

Return type:

dict with keys like files, timesteps, variables.

abstract materialize(required_files: List[str], target_dir: str) str[source]#

Ensure required_files are available in target_dir.

For local accessors this may be a no-op (return source dir). For HTTP accessors this downloads missing files.

Return type:

str – local directory path containing the files.

abstract validate() Tuple[bool, Dict[str, Any], List[str]][source]#

Check whether the data source is valid.

Returns:

  • ok (bool)

  • summary (dict) – Keys: snapshot_count, water_shape, porosity_shape, bot_shape, grid_info, path.

  • errors (list[str])

class PyHydroGeophysX.data_access.HttpHydroAccessor(manifest_entry: Dict[str, Any], cache_dir: str | None = None)[source]#

Bases: BaseHydroAccessor

Download hydro data on demand from an HTTP base URL.

Parameters:
  • manifest_entry (dict) – A single dataset entry from manifest.json.

  • cache_dir (str, optional) – Directory for caching downloaded files. Defaults to a temp directory.

clear_cache() int[source]#

Remove all cached files. Returns number of files removed.

list_available_items() Dict[str, Any][source]#

Return available timesteps / variables / files.

Return type:

dict with keys like files, timesteps, variables.

materialize(required_files: List[str], target_dir: str) str[source]#

Download required files to target_dir, using the cache.

validate() Tuple[bool, Dict[str, Any], List[str]][source]#

Check whether the data source is valid.

Returns:

  • ok (bool)

  • summary (dict) – Keys: snapshot_count, water_shape, porosity_shape, bot_shape, grid_info, path.

  • errors (list[str])

class PyHydroGeophysX.data_access.LocalHydroAccessor(root_path: str)[source]#

Bases: BaseHydroAccessor

Read hydro data from a local filesystem directory.

list_available_items() Dict[str, Any][source]#

Return available timesteps / variables / files.

Return type:

dict with keys like files, timesteps, variables.

materialize(required_files: List[str], target_dir: str) str[source]#

Ensure required_files are available in target_dir.

For local accessors this may be a no-op (return source dir). For HTTP accessors this downloads missing files.

Return type:

str – local directory path containing the files.

validate() Tuple[bool, Dict[str, Any], List[str]][source]#

Check whether the data source is valid.

Returns:

  • ok (bool)

  • summary (dict) – Keys: snapshot_count, water_shape, porosity_shape, bot_shape, grid_info, path.

  • errors (list[str])

PyHydroGeophysX.data_access.get_manifest_entry(dataset_id: str, manifest_path: str | None = None) Dict[str, Any] | None[source]#

Return a single manifest entry by id, or None.

PyHydroGeophysX.data_access.load_manifest(manifest_path: str | None = None) Dict[str, Any][source]#

Load the dataset manifest JSON.

Parameters:

manifest_path (str, optional) – Path to manifest.json. Defaults to datasets/manifest.json relative to the repository root (two levels up from this file).

Returns:

Parsed manifest with a datasets key.

Return type:

dict