API Reference
sea_ice_toolbox
- class sea_ice_toolbox.SeaIceToolbox(P_json=None, P_CICE_grid=None, sim_name=None, dt0_str=None, dtN_str=None, list_of_BorC2T=None, iceh_frequency=None, ice_concentration_threshold=None, ice_speed_threshold=None, ice_type=None, mean_period=None, bin_win_days=None, bin_min_days=None, extra_cice_vars=None, hemisphere=None, P_log=None, log_level=None, dask_memory_limit=None, overwrite_zarr=False, overwrite_saved_figs=False, save_new_figs=True, show_figs=False, delete_original_cice_iceh_nc=False, client=None, force_recompile_ice_in=False, **kwargs)[source]
Bases:
SeaIceClassification,SeaIceMetrics,SeaIcePlotter,SeaIceIcebergs,SeaIceObservations,SeaIceACCESS,SeaIceGridWork,SeaIceRegridder,SeaIceFast,SeaIceCICEUnified AFIM toolbox for processing and analysing Antarctic sea ice from CICE.
SeaIceToolbox composes several functional mixins into a single interface for reading CICE output, classifying fast/pack ice, computing metrics, regridding, working with grounded-iceberg masks, and producing plots.
Inheritance / Composition
SeaIceToolbox inherits from the following modules:
SeaIceClassification : classification masks and simulation I/O helpers
SeaIceMetrics : time-series and spatial metrics, skill statistics
SeaIcePlotter : PyGMT maps and time series plotting utilities
SeaIceIcebergs : grounded-iceberg thinning/masking and GI datasets
SeaIceObservations : Fraser et al. (2020) and NSIDC observational utilities
SeaIceACCESS : ACCESS-OM related helpers (where applicable)
SeaIceGridWork : grid geometry, landmask application, hemisphere slicing
SeaIceRegridder : B-grid → T-grid and swath/gridded regridding utilities
Configuration
The toolbox is configured primarily by an AFIM JSON file. The constructor loads that JSON, sets commonly used directories (simulation output, zarr, metrics, figures), defines hemisphere behaviour, and stores thresholds used throughout the pipeline.
- param P_json:
Path to the AFIM configuration JSON. If None, a project default path is used.
- type P_json:
str or pathlib.Path, optional
- param P_CICE_grid:
Override path to the CICE grid file; otherwise uses config entry CICE_dict[‘P_G’].
- type P_CICE_grid:
str or pathlib.Path, optional
- param sim_name:
Simulation name (must exist under the configured AFIM output root).
- type sim_name:
str
- param dt0_str:
Inclusive analysis window bounds in
YYYY-MM-DD.- type dt0_str:
str, optional
- param dtN_str:
Inclusive analysis window bounds in
YYYY-MM-DD.- type dtN_str:
str, optional
- param list_of_BorC2T:
Speed/vector products to use (e.g., [“Tb”], [“Ta”,”Tx”], or [“B”]).
- type list_of_BorC2T:
list[str], optional
- param iceh_frequency:
Which CICE history cadence to use for iceh inputs.
- type iceh_frequency:
{“hourly”,”daily”,”monthly”,”yearly”}, optional
- param ice_concentration_threshold:
Concentration threshold used for masking / metrics (default from config; often 0.15).
- type ice_concentration_threshold:
float, optional
- param ice_speed_threshold:
Speed threshold (m/s) below which ice is treated as “fast” (default from config).
- type ice_speed_threshold:
float, optional
- param ice_type:
Ice classification type(s) to process (e.g., “FI”, “PI”, “SI”).
- type ice_type:
str or list[str], optional
- param mean_period:
Rolling mean window length (days) used in some classification products.
- type mean_period:
int, optional
- param bin_win_days:
Binary-days window length and minimum count used for persistence classification.
- type bin_win_days:
int, optional
- param bin_min_days:
Binary-days window length and minimum count used for persistence classification.
- type bin_min_days:
int, optional
- param extra_cice_vars:
Additional variables to include when loading/processing beyond cice_vars_reqd.
- type extra_cice_vars:
list[str] or bool, optional
- param hemisphere:
Hemisphere selector. Common aliases are accepted (e.g., “south”, “SH”, “nh”).
- type hemisphere:
str, optional
- param P_log:
Log file path to attach a FileHandler to.
- type P_log:
str or pathlib.Path, optional
- param log_level:
Python logging level for the toolbox logger.
- type log_level:
int or str, optional
- param dask_memory_limit:
Memory limit (informational here unless you create the client externally).
- type dask_memory_limit:
str, optional
- param overwrite_zarr:
If True, overwrite Zarr groups when writing classification/metrics products.
- type overwrite_zarr:
bool, optional
- param overwrite_saved_figs:
If True, overwrite existing figure files.
- type overwrite_saved_figs:
bool, optional
- param save_new_figs:
Figure saving and interactive display toggles.
- type save_new_figs:
bool, optional
- param show_figs:
Figure saving and interactive display toggles.
- type show_figs:
bool, optional
- param delete_original_cice_iceh_nc:
If True, delete original NetCDF inputs after conversion to monthly Zarr (where implemented).
- type delete_original_cice_iceh_nc:
bool, optional
- param client:
Dask client to use. In this implementation, a client must already exist and be passed in.
- type client:
dask.distributed.Client, optional
- param force_recompile_ice_in:
Force regeneration of derived simulation metadata (e.g., parsing/rebuilding ice_in JSON).
- type force_recompile_ice_in:
bool, default False
- param **kwargs:
Additional keyword arguments are attached to the instance so mixins can access specialised tunables without changing the constructor signature.
- param Attributes Set (high-level):
- param —————————:
- param - Configuration dicts:
- type - Configuration dicts:
self.CICE_dict, self.GI_dict, self.NSIDC_dict, etc.
- param - Directory paths:
- type - Directory paths:
self.D_sim, self.D_zarr, self.D_metrics, etc.
- param - Hemisphere metadata:
- type - Hemisphere metadata:
self.hemisphere_dict, self.hemisphere
- param - Thresholds/settings:
- type - Thresholds/settings:
self.ispd_thresh, self.icon_thresh, self.mean_period, etc.
- param - State flags:
- type - State flags:
self.grid_loaded, self.reG_weights_defined, etc.
Notes
The constructor performs substantial configuration and path normalisation and
therefore has side effects (file IO, logger configuration). - Grids and regridders are not loaded/built until needed (load_cice_grid, define_reG_weights, etc.).
- compute_doy_climatology(da, leap_year=None, time_coord=None)[source]
Compute day-of-year (DOY) climatology statistics from a time series.
This method calculates the climatological mean, minimum, maximum, and standard deviation of a given time series DataArray, grouped by day-of-year. The result is returned as a dictionary of Pandas Series indexed by a datetime index constructed using a reference leap_year.
- Parameters:
da (xarray.DataArray) – The input time series with a time coordinate. Can be daily or sub-daily resolution, but must be regular and span multiple years for meaningful climatology.
leap_year (int, optional) – The reference leap year to use when reconstructing the datetime index for the output. Defaults to self.leap_year if not provided.
time_coord (str, optional) – The name of the time coordinate in da. Defaults to self.CICE_dict[‘time_dim’] if not specified.
- Returns:
A dictionary with keys: - ‘mean’ : pd.Series of climatological mean values - ‘min’ : pd.Series of climatological minimum values - ‘max’ : pd.Series of climatological maximum values - ‘std’ : pd.Series of climatological standard deviation
Each Series is indexed by datetime values (from the specified leap_year) corresponding to days 1–366.
- Return type:
dict
Notes
The output includes 366 days if data contains leap years; otherwise, it includes up to 365.
The use of a leap year for index construction ensures that the DOY mapping to dates is valid,
especially for plotting or seasonal alignment. - The DataArray is fully loaded into memory before processing.
- compute_rolling_mean_on_dataset(ds, mean_period=None)[source]
Apply a centered temporal rolling mean to a dataset.
This method smooths temporal noise by computing a centered moving average across the time dimension, typically for use in rolling fast ice classification.
- INPUTS:
ds : xarray.Dataset; input dataset with a time dimension. mean_period : int, optional; rolling window size in days. Defaults to self.mean_period.
- OUTPUTS:
xarray.Dataset; dataset with all variables averaged over the specified rolling window.
- cosine_vector_similarity(uo, vo, um, vm, eps=1e-12)[source]
Compute the cosine similarity between two vector fields (e.g., observed vs. modelled velocity vectors).
This metric quantifies the directional alignment of the two vector fields without considering magnitude. A value of: - +1.0 means the vectors point in exactly the same direction, - 0.0 means the vectors are orthogonal (90° apart), - -1.0 means the vectors point in opposite directions.
The cosine similarity is computed as the dot product of the two vectors, divided by the product of their magnitudes.
- Parameters:
uo (xarray.DataArray) – Components of the observed vector field (e.g., u and v velocity components) in units of m/s.
vo (xarray.DataArray) – Components of the observed vector field (e.g., u and v velocity components) in units of m/s.
um (xarray.DataArray) – Components of the modelled vector field (same units as uo, vo).
vm (xarray.DataArray) – Components of the modelled vector field (same units as uo, vo).
eps (float, optional) – Small constant to prevent division by zero in regions where either vector magnitude is near-zero. Default is 1e-12.
- Returns:
Cosine similarity between vectors, dimensionless, in the range [-1, 1].
- Return type:
xarray.DataArray
Notes
NaNs are returned where either the observed or modelled vector magnitude is near-zero.
This metric is scale-invariant — it compares direction only, not speed.
Particularly useful for evaluating sea ice drift direction skill, regardless of speed bias.
- count_zarr_files(path)[source]
Count and log the number of files under a directory tree (e.g., a Zarr store).
- Parameters:
path (str or pathlib.Path) – Directory to count files under.
Notes
This counts filesystem entries, not Zarr logical arrays. It is useful for quick sanity checks and estimating metadata overhead.
- create_empty_valid_DS_dictionary(valid_zarr_DS_list=None)[source]
Create a nested dictionary template for collecting datasets by category.
- Parameters:
valid_zarr_DS_list (list[str], optional) – List of dataset keys to initialise per outer key. Defaults to self.valid_ice_types.
- Returns:
A defaultdict where each new outer key maps to a dict of empty lists: { <outer_key>: {<ds_key_1>: [], <ds_key_2>: [], …} }.
- Return type:
collections.defaultdict
Notes
This is a lightweight utility for accumulating per-year/per-month objects before concatenation.
- create_monthly_strings(dt0_str=None, dtN_str=None)[source]
Return sorted unique YYYY-MM strings between two dates (inclusive).
- Parameters:
dt0_str (str, optional) – Start/end dates in
YYYY-MM-DD. Defaults to self.dt0_str and self.dtN_str.dtN_str (str, optional) – Start/end dates in
YYYY-MM-DD. Defaults to self.dt0_str and self.dtN_str.
- Returns:
Sorted list of unique month strings, e.g., [“1993-01”, “1993-02”, …].
- Return type:
list[str]
Notes
Uses a daily pandas date range and then de-duplicates by month.
- define_datetime_vars(dt0_str=None, dtN_str=None)[source]
Define date range attributes from start and end date strings.
- Parameters:
dt0_str (str, optional) – Start date string (e.g., ‘1994-01-01’). Defaults to self.dt0_str.
dtN_str (str, optional) – End date string (e.g., ‘1999-12-31’). Defaults to self.dtN_str.
Sets
----
self.dt0 (pd.Timestamp) – Parsed start date.
self.dtN (pd.Timestamp) – Parsed end date.
self.yrs_mos (np.ndarray) – Array of ‘YYYY-MM’ strings for each month in the range.
self.ymd_strs (np.ndarray) – Array of ‘YYYY-MM-DD’ strings for each day in the range.
- define_hemisphere(hemisphere)[source]
Initialise hemisphere configuration used across grid slicing and plotting.
Accepts common hemisphere aliases (e.g., “south”, “SH”, “nh”) and maps them to the internal self.hemisphere_dict configuration (slicing indices and labels). Also converts nj_slice from a (start, stop) tuple into a Python slice.
- Parameters:
hemisphere (str) – Hemisphere selector. Accepted aliases include: - North: “north”, “northern”, “nh”, “n” - South: “south”, “southern”, “sh”, “s”
Sets
----
hemisphere_dict (dict) – Hemisphere metadata dictionary loaded from the JSON config, with nj_slice stored as a Python slice.
hemisphere – Canonical lowercase hemisphere string used internally.
- Raises:
ValueError – If hemisphere does not match any accepted alias.
Notes
Downstream methods rely on self.hemisphere_dict[‘nj_slice’] for slicing and self.hemisphere_dict[‘abbreviation’] for path construction.
- define_month_first_last_dates(year_month_str)[source]
Given a ‘YYYY-MM’ string, return the first and last dates of that month.
- Parameters:
year_month_str (str) – Year and month string in ‘YYYY-MM’ format.
- Returns:
First and last day of the month in ‘YYYY-MM-DD’ format.
- Return type:
tuple of str
- define_toolbox_paths(sim_name=None, D_sim=None, D_zarr=None, D_graph=None, ice_type=None, BorC2T_type=None, ispd_thresh=None, bin_win_days=None, bin_min_days=None, mean_period=None)[source]
Define canonical paths for ice-history, classified products, and metrics.
- Layout:
- <D_zarr>/
iceh_daily.zarr/<YYYY-MM>/ iceh_monthly.zarr/<YYYY-MM>/ <HEM>/
- SI/
raw.zarr mets.zarr
- MIZ/
raw.zarr mets.zarr
- ispd_thresh_<thr>/
- FI/
- <BorC2T>/
raw.zarr mets.zarr bin-win-XX_bin-min-YY/
raw.zarr mets.zarr
- roll-days-ZZ/
raw.zarr mets.zarr
- PI/
- <BorC2T>/
raw.zarr mets.zarr bin-win-XX_bin-min-YY/
raw.zarr mets.zarr
- roll-days-ZZ/
raw.zarr mets.zarr
- dict_to_ds(data_dict)[source]
Convert a dictionary of DataArrays into an xarray.Dataset.
- Parameters:
data_dict (dict) – Mapping of variable name -> xarray.DataArray (or array-like compatible).
- Returns:
Dataset with keys from data_dict as variables.
- Return type:
xarray.Dataset
- get_dir_size(path)[source]
Compute and log the total on-disk size of a directory tree.
- Parameters:
path (str or pathlib.Path) – Directory to scan recursively.
Notes
Size is computed from file sizes returned by Path.rglob and may be slow on
large filesystems. - The result is logged in GiB (1024**3).
- interpret_ice_speed_threshold(ispd_thresh=None, lat_thresh=-60)[source]
Translate the ice speed threshold into intuitive grid-scale metrics.
Computes: - meters per day at the given speed, - the median grid-cell edge length south of lat_thresh on the model
grid (self.CICE_dict[‘P_G’]),
displacement as a fraction of a grid cell per day,
days required to traverse a grid cell at the threshold speed.
- Parameters:
ispd_thresh (float, optional) – Threshold in m/s. Defaults to
self.ispd_thresh.lat_thresh (float, default -60) – Latitude (degrees) used to select the polar region for the median cell size.
- Returns:
Summary metrics with keys:
{'ice_speed_thresh_m_per_s', 'displacement_m_per_day', 'median_grid_cell_length_m', 'percent_displacement_per_day', 'days_per_grid_cell'}.- Return type:
dict
Notes
Uses CICE grid variables from
self.CICE_dict['P_G'](radians) and
converts to degrees for masking. - Logs a human-readable summary via self.logger.
- parse_simulation_metadata(force_recompile=False)[source]
Parse or load simulation metadata for CICE AFIM runs.
If a cached JSON config file exists in the simulation directory and force_recompile=False, this method will load and return the metadata from that file. Otherwise, it will parse the ice_diag.d file and store the extracted parameters in a structured JSON file for future use.
- Parameters:
force_recompile (bool) – If True, re-parse the raw ice_diag.d file even if a cached JSON exists.
- Returns:
Dictionary of extracted parameters including inferred grounded iceberg info.
- Return type:
dict
- radians_to_degrees(da)[source]
Convert radians to degrees.
- Parameters:
da (array-like) – Values in radians (NumPy array, xarray DataArray, or scalar).
- Returns:
Values converted to degrees.
- Return type:
same type as da
- setup_logging(logfile=None, log_level=20)[source]
Configure the toolbox logger with a stream handler and optional file handler.
This method creates (or reuses) a named logger, sets the logging level, and ensures that (a) a console StreamHandler exists and (b) exactly one FileHandler is attached when logfile is provided.
- Parameters:
logfile (str or pathlib.Path, optional) – File to write log messages to. If None, only console logging is configured.
log_level (int, default logging.INFO) – Logging level for the logger and its handlers.
Notes
Existing FileHandlers are removed and closed before attaching a new one.
self.logger.propagate is set False to avoid duplicate logs when parent
loggers are configured elsewhere.
- slice_hemisphere(var)[source]
Apply the configured hemisphere slice to a Dataset/DataArray (and corners).
Uses self.hemisphere_dict[‘nj_slice’] on the cell-centre row dimension (e.g., nj). If a matching corner dimension is present (e.g., nj_b), expands the stop bound by +1 to include the boundary row.
- Parameters:
var (xr.Dataset | xr.DataArray | dict[str, xr.Dataset | xr.DataArray]) – Object(s) to slice. Dict values are sliced if their dims match.
- Returns:
Sliced object of the same type as the input.
- Return type:
xr.Dataset | xr.DataArray | dict
Notes
Cell-centre dimension name is taken from self.CICE_dict[“y_dim”].
Corner dim is assumed to be ‘nj_b’. If you parameterise corners in your config, adapt _indexers_for() accordingly.
- summary()[source]
Log a concise summary of key configuration and runtime settings.
This is primarily a convenience method for sanity checking during interactive work and batch runs. It writes configuration metadata to self.logger.
Notes
No values are returned.
Assumes self.logger has already been configured and that core attributes
(sim_name, date bounds, thresholds, toggles) are present.
- vector_angle_diff(uo, vo, um, vm)[source]
Compute the signed angular difference (in radians) between two vector fields.
This metric measures the angle by which the modelled vector differs from the observed vector. The difference is returned as a signed value in the range [-π, π], where: - 0 indicates perfect alignment, - +pi/2 indicates model is rotated 90° counter-clockwise from observation, - -pi/2 indicates 90° clockwise rotation, - +/-pi indicates vectors are anti-parallel.
- Parameters:
uo (xarray.DataArray) – Components of the observed vector field (e.g., u and v velocity components), in m/s.
vo (xarray.DataArray) – Components of the observed vector field (e.g., u and v velocity components), in m/s.
um (xarray.DataArray) – Components of the modelled vector field (in m/s).
vm (xarray.DataArray) – Components of the modelled vector field (in m/s).
- Returns:
Signed angular difference (model - obs) in radians, bounded in [-pi, pi].
- Return type:
xarray.DataArray
Notes
The angular difference is computed using arctan2 on the vector components.
Use np.rad2deg() to convert output to degrees, if desired.
This metric is useful for directional error analysis in drift or flow fields.
- class sea_ice_toolbox.SeaIceToolboxManager(P_log, n_workers: int = 4, n_threads: int = 1, mem_lim: str = '16GB', process: bool = True, D_dask: str = None)[source]
Bases:
objectFactory for creating SeaIceToolbox instances backed by a shared Dask client.
SeaIceToolboxManager maintains a class-level Dask client so that multiple toolboxes (e.g., multiple simulations or multiple analyses) reuse a single scheduler and worker pool. This is particularly useful on HPC login/compute nodes where repeatedly creating local clusters is slow and can fragment resources.
- Parameters:
P_log (str or pathlib.Path) – Path to a log file. The created toolboxes attach a file handler to this path.
n_workers (int, default 4) – Number of Dask workers for the shared LocalCluster.
n_threads (int, default 1) – Threads per worker.
mem_lim (str, default "16GB") – Worker memory limit passed to Dask (e.g., “8GB”, “16GB”).
process (bool, default True) – Whether to use separate worker processes (True) or threads-only workers (False).
D_dask (str, optional) – Local directory for Dask worker spill and temporary files. If None, uses DASK_TEMPORARY_DIRECTORY or the system temp directory.
Notes
The shared client is cached on the class (SeaIceToolboxManager._shared_client).
shutdown() closes the shared client and attempts to close file handlers.
get_toolbox(sim_name, **kwargs) returns a preconfigured SeaIceToolbox
attached to the shared client and log file.
See also
SeaIceToolboxUnified AFIM sea-ice analysis class composed of multiple SeaIce* mixins.
- get_toolbox(sim_name, **kwargs)[source]
Create a SeaIceToolbox for a given simulation using the shared Dask client.
- Parameters:
sim_name (str) – Simulation name (must correspond to a configured simulation directory).
**kwargs – Additional keyword arguments forwarded to SeaIceToolbox(…). Use this to override date ranges, thresholds, plotting options, etc.
- Returns:
Toolbox instance bound to the shared Dask client and configured log file.
- Return type:
- Raises:
ValueError – If the shared Dask client has not been created successfully.
- shutdown()[source]
Shut down the shared Dask client and close file log handlers.
Closes the class-level Dask client if it exists and resets the cached reference. Also attempts to close any logging.FileHandler instances attached to the SeaIceToolbox logger.
Notes
Intended for interactive sessions to avoid orphaned local clusters.
If multiple loggers/handlers are in use, only file handlers on the named
logger are closed.
sea_ice_gridwork
- class sea_ice_gridwork.SeaIceGridWork(**kwargs)[source]
Bases:
objectGrid utilities and CICE grid loaders for AFIM sea-ice workflows.
SeaIceGridWork centralizes common spatial operations used across the AFIM toolbox, including longitude normalization, landmask application, construction of cell-corner and face-coordinate geometry, and loading CICE grid metadata (T-grid and U-grid) from the model grid file.
The class is intended to be mixed into higher-level analysis classes (e.g., toolbox/plotter/regridder) and therefore typically relies on attributes defined elsewhere, such as: - self.CICE_dict : configuration dictionary (paths, dimension names, etc.) - self.logger : logger instance - self.use_gi : bool; whether grounded-iceberg modified landmask is active - self.P_KMT_org, self.P_KMT_mod : paths to original/modified landmask files - self.slice_hemisphere(…) : optional helper to subset to SH/NH - self.radians_to_degrees(…) : helper for coordinate conversion - state flags: self.grid_loaded, self.bgrid_loaded, self.cgrid_loaded
Notes
CICE grids are curvilinear; longitudes require seam-aware handling. The class
provides robust longitude wrapping and a circular mean for longitudes. - load_cice_grid sets multiple datasets as attributes; it is a loader with side effects and does not return the grids explicitly.
- build_F2_GI_from_df(df, P_grid: str, P_mask: str = None, P_topog: str = None, nx: int = 1440, ny: int = 1152, lat_subset_max: float = -30.0, restrict_to_ocean: bool = True, method: str = 'simple-geometry', length_scale: str = 'perimeter', base_area_m2: float = None, C_gi: float = 1.0, proj_crs: str = 'EPSG:3031', max_map_dist_km: float = 50.0, eps_cluster_km: float = 15.0, min_cluster_size: int = 3, cluster_buffer_m: float = None, cluster_amplification: float = 0.0, weight_by: str = 'perim')[source]
- build_F2_GI_from_gpkg(P_gpkg, layer='grounded_icebergs', P_grid=None, P_mask=None, P_topog=None, nx=1440, ny=1152, method='simple-geometry', length_scale='perimeter', C_gi=1.0, proj_crs='EPSG:3031', max_map_dist_km=50.0, dedup_uid=True, **kwargs)[source]
- build_F2_form_factors_from_high_res_coast(P_shp: str = None, P_grid: str = None, P_out: str = None, proj_crs: str = 'EPSG:3031', chunk_segments: int = 2000000, coast_write_stride: int = 25, lat_subset_max: float = -30.0, netcdf_compression: int = 4, use_coastal_ocean_kdtree: bool = True, coast_buffer_cells: int = 1, max_assign_km: float = 50.0, P_mask: str | None = None, nx: int | None = 1440, ny: int | None = 1152)[source]
Compute Liu et al. (2022) coastal form factors (F2x, F2y) on the CICE T-grid from a high-resolution polygon coastline.
- The cell-based form factors (Liu et al. 2022, Eqs. 9–10) are computed as:
F2x(i,j) = Σ_n | l_n cos(θ_n) | / dx(i,j) F2y(i,j) = Σ_n | l_n sin(θ_n) | / dy(i,j)
where each coastline segment n has geodesic length l_n (WGS84) and bearing projected into the local model axes using:
θ_n = α_n - anglet(i,j),
with α_n the segment angle in Earth-referenced east-north coordinates and anglet the local model grid rotation (radians).
- Parameters:
P_shp (str, optional) – Path to the high-resolution coastline/land polygons. If None, uses self.CICE_dict[“P_high_res_coast”].
P_grid (str, optional) – Path to the CICE grid file. If None, uses self.CICE_dict[“P_G”].
P_out (str, optional) – Output NetCDF path. If None, uses self.CICE_dict[“P_F2_coast”].
proj_crs (str, default="EPSG:3031") – Projected CRS used to assign each coastline segment midpoint to the nearest T-cell via KDTree (meters).
chunk_segments (int, default=2_000_000) – Process coastline segments in chunks to limit memory use.
coast_write_stride (int, default=25) – Subsampling stride for storing coastline vertices in the output dataset for provenance/plotting. Set <=0 or None to disable.
lat_subset_max (float, default=-30.0) – Only T-cells with latitude <= this threshold are included in the KDTree (a performance and relevance filter for Antarctic use).
netcdf_compression (int, default=4) – NetCDF zlib compression level for output variables.
use_coastal_ocean_kdtree (bool, default=True) – If True and an ocean mask can be inferred from the grid file, restrict the KDTree to ocean cells within coast_buffer_cells of land (to avoid mapping segments to interior ice-shelf/grounding-line regions).
coast_buffer_cells (int, default=1) – Buffer (in grid cells) used to define the coastal-ocean band via binary dilation of the land mask.
max_assign_km (float, default=50.0) – Reject segment-to-cell assignments whose KDTree distance exceeds this threshold (in km). Use None to disable.
- Returns:
ds_out – Dataset containing: - F2x(nj,ni), F2y(nj,ni) : float32, unitless - lon(ncoast), lat(ncoast) : thinned coastline vertices (float32) with provenance attributes describing inputs and settings.
- Return type:
xarray.Dataset
- Raises:
ValueError – If required paths are not provided via arguments or configuration.
RuntimeError – If no grid cells satisfy the latitude subset filter (likely units issue).
Notes
This computes the cell-based integrals (Liu f2^u and f2^v). Conversion to
C-grid u/v points (Liu Eqs. 11–12) is intentionally deferred (e.g., Fortran). - Segment lengths and azimuths are computed geodesically on WGS84 (pyproj.Geod). - Nearest-cell assignment uses segment midpoints in the projected CRS.
- build_grid_corners(lat_rads, lon_rads, grid_res=0.25, source_in_radians=True)[source]
Construct corner coordinates from cell-centre coordinates using simple averaging.
This routine builds (ny+1, nx+1) corner arrays from (ny, nx) cell-centre arrays via arithmetic averaging. It is a legacy/simple alternative to build_grid_faces.
- Parameters:
lat_rads (array-like) – 2-D arrays of cell-centre lat/lon. Interpreted as radians when source_in_radians=True, else as degrees.
lon_rads (array-like) – 2-D arrays of cell-centre lat/lon. Interpreted as radians when source_in_radians=True, else as degrees.
grid_res (float, default 0.25) – Scaling factor applied to the interior arithmetic mean. Historically, this function used grid_res as a multiplier; for conventional corner averaging this should be 0.25.
source_in_radians (bool, default True) – If True, interpret inputs as radians and convert to degrees.
- Returns:
lon_b, lat_b – Corner longitudes/latitudes with shape (ny+1, nx+1) in degrees.
- Return type:
numpy.ndarray
Notes
Longitudes are normalized using normalise_longitudes after construction.
This method uses arithmetic averaging for longitude and is not seam-safe near
the dateline. Prefer build_grid_faces when longitude seam handling matters.
- build_grid_faces(lon, lat, source_in_radians=False)[source]
Construct corner and face-coordinate geometry from cell-centre lon/lat fields.
Given 2-D cell-centre longitudes and latitudes (typically the CICE T-grid centres), this routine constructs: - cell corner coordinates (
lon_b,lat_b) with shape (nj+1, ni+1) - vertical face-centre coordinates (lon_e,lat_e) with shape (nj, ni+1) - horizontal face-centre coordinates (lon_n,lat_n) with shape (nj+1, ni)Longitudes are handled using a circular mean to avoid dateline artifacts.
- Parameters:
lon (array-like) – 2-D arrays of cell-centre coordinates with shape (nj, ni). Units are degrees unless source_in_radians=True.
lat (array-like) – 2-D arrays of cell-centre coordinates with shape (nj, ni). Units are degrees unless source_in_radians=True.
source_in_radians (bool, default False) – If True, inputs are in radians and will be converted to degrees before geometry construction.
- Returns:
lon_b, lat_b (numpy.ndarray) – Corner coordinates with shape (nj+1, ni+1) in degrees.
lon_e, lat_e (numpy.ndarray) – Vertical face-centre coordinates with shape (nj, ni+1) in degrees.
lon_n, lat_n (numpy.ndarray) – Horizontal face-centre coordinates with shape (nj+1, ni) in degrees.
Notes
Interior corners are computed as the mean of the four surrounding cell centres.
Longitude uses a circular mean; latitude uses an arithmetic mean. - Edge corners are computed from adjacent two-cell averages (more consistent than copying boundary cells). - The four outermost corners are set to the nearest cell-centre values.
- compute_regular_grid_area(da)[source]
Compute cell areas (km^2) for a regular lat/lon grid (1D coords).
Assumes 1D, monotonically increasing latitude and longitude coordinates with uniform longitudinal spacing. Uses spherical geometry to integrate the area between latitude edges for each latitude band, broadcasting across longitudes.
- Parameters:
da (xr.DataArray) – Any DataArray that carries 1D coordinates named
'lat'and'lon'. Values are not used; only the coordinates are read.- Returns:
2D area array with dims (
lat,lon) in km², aligned to the provided coordinate vectors.- Return type:
xr.DataArray
Notes
Earth radius is fixed at 6,371,000 m.
For non-uniform lon spacing or curvilinear grids, use model grid areas
(e.g., CICE tarea) rather than this helper.
- compute_tcell_dxdy_m(force=False, cache=True)[source]
Compute approximate T-cell metric lengths dx and dy (meters).
The method estimates T-cell sizes using distances between adjacent face-center coordinates derived from grid corner/face metadata (self.G_e, self.G_n). Great- circle (ellipsoidal) distances are computed on the WGS84 ellipsoid using pyproj.Geod.inv.
- Parameters:
force (bool, default=False) – If True, recompute dx/dy even if cached values exist.
cache (bool, default=True) – If True, store results in self._dxdy_cache and reuse on subsequent calls.
- Returns:
dx_m (numpy.ndarray) – Zonal metric length between adjacent vertical-face centers, shape (nj, ni), in meters (float64).
dy_m (numpy.ndarray) – Meridional metric length between adjacent horizontal-face centers, shape (nj, ni), in meters (float64).
- Raises:
ImportError – If pyproj is not available.
Notes
Distances are returned as absolute values; non-finite or non-positive values
are set to NaN. - This is an approximate metric consistent with face-center spacing; it is appropriate for normalising form-factor sums and related diagnostics.
- define_regular_G(grid_res, region=[0, 360, -90, 0], spatial_dim_names=('nj', 'ni'))[source]
Create a regular (lat, lon) destination grid as an xarray.Dataset.
- Parameters:
grid_res (float) – Grid resolution in degrees.
region (list[float], default [0, 360, -90, 0]) – [lon_min, lon_max, lat_min, lat_max] in degrees.
spatial_dim_names (tuple[str, str], default ("nj","ni")) – Output dimension names.
- Returns:
Dataset containing 2D lon/lat coordinates on a regular grid.
- Return type:
xr.Dataset
Notes
Longitudes are generated in the same numeric range as region.
Intended as a destination grid for xESMF regridding of persistence-style maps.
- get_static_grid(var)[source]
Return a static (time-independent) slice of a grid-like DataArray.
If the input has a “time” dimension, returns isel(time=0), otherwise returns the object unchanged.
- Parameters:
var (xarray.DataArray) – DataArray that may or may not have a “time” dimension.
- Returns:
Time-sliced or original DataArray.
- Return type:
xarray.DataArray
- load_cice_grid(P_grid=None, P_mask_org=None, P_mask_mod=None, slice_hem=False, build_faces=True, nx=None, ny=None)[source]
Canonical attribute builder. Must keep self.G_t/self.G_u/self.G_e/self.G_n stable for downstream. :contentReference[oaicite:11]{index=11}
- load_super_grid(P_grid=None, lon_type='0-360', nx=None, ny=None, nx_in=None, ny_in=None)[source]
Backwards-compatible public wrapper. Do NOT change return signature (F2 builders rely on it). :contentReference[oaicite:10]{index=10}
- map_points_to_tgrid(lon_deg, lat_deg, proj_crs: str = 'EPSG:3031', max_dist_km: float = None)[source]
Map geographic lon/lat points to the nearest CICE T-cell.
Points are projected into proj_crs and mapped to the nearest projected T-cell center using a cached KDTree.
- Parameters:
lon_deg (array-like) – Longitudes and latitudes in degrees (EPSG:4326).
lat_deg (array-like) – Longitudes and latitudes in degrees (EPSG:4326).
proj_crs (str, default="EPSG:3031") – Projected CRS used for KDTree search (meters).
max_dist_km (float, optional) – If provided, points farther than this distance (km) from the nearest T-cell center are flagged invalid.
- Returns:
ii (numpy.ndarray) – Nearest-cell i indices (int64).
jj (numpy.ndarray) – Nearest-cell j indices (int64).
dist_m (numpy.ndarray) – Distance (meters) from each point to the nearest T-cell center.
valid (numpy.ndarray) – Boolean mask indicating finite distances and (if provided) max_dist_km compliance.
Notes
Requires self.G_t to be loaded; if absent, load_cice_grid is called.
- map_xy_to_tgrid(x_m, y_m, proj_crs: str = 'EPSG:3031', max_dist_km: float = None)[source]
Map projected x/y points to the nearest CICE T-cell using a cached KDTree.
- Parameters:
x_m (array-like) – Point coordinates in meters in the CRS specified by proj_crs.
y_m (array-like) – Point coordinates in meters in the CRS specified by proj_crs.
proj_crs (str, default="EPSG:3031") – CRS of input x/y coordinates. Must match the CRS used to build the KDTree.
max_dist_km (float, optional) – If provided, points farther than this distance (km) from the nearest T-cell center are flagged invalid.
- Returns:
ii (numpy.ndarray) – Nearest-cell i indices (int64).
jj (numpy.ndarray) – Nearest-cell j indices (int64).
dist_m (numpy.ndarray) – Distance (meters) from each point to the nearest T-cell center.
valid (numpy.ndarray) – Boolean mask indicating finite distances and (if provided) max_dist_km compliance.
Notes
Requires self.G_t to be loaded; if not present, load_cice_grid is called with build_faces=True to ensure consistent grid metadata.
- normalise_longitudes(lon, to='0-360', eps=1e-12)[source]
Normalize longitudes to a consistent wrap convention.
This routine wraps longitude values either to: -
"0-360": [0, 360) -"-180-180": [-180, 180)It supports scalars, NumPy arrays, and xarray objects (DataArray / Dataset variables). NaNs are preserved.
- Parameters:
lon (scalar, numpy.ndarray, or xarray.DataArray) – Longitude values in degrees east. May be any numeric dtype; NaNs pass through.
to ({"0-360", "-180-180"}, default "0-360") – Target longitude convention.
eps (float, default 1e-12) – Numerical tolerance used to collapse boundary values (e.g., 360→0, 180→-180) when floating-point rounding produces near-boundary values.
- Returns:
Longitude values wrapped to the requested convention.
- Return type:
same type as lon
- Raises:
ValueError – If to is not one of {“0-360”, “-180-180”}.
Notes
For the “-180-180” convention, values exactly equal to +180 (within eps) are
mapped to -180 for consistency with a half-open interval [-180, 180).
- read_grounded_iceberg_csv(P_csv, lon_col: str = 'lon', lat_col: str = 'lat', normalise_lon_to: str = '0-360')[source]
Read a grounded-iceberg (GI) point CSV and standardise columns.
The CSV must contain longitude and latitude columns (degrees). Optional columns are preserved when present (e.g., area_m2, C_gi, orient_deg, t_start, t_end). Longitudes are normalised to the chosen convention.
- Parameters:
P_csv (str or pathlib.Path) – Path to the CSV file.
lon_col (str, default="lon") – Name of the longitude column in the CSV.
lat_col (str, default="lat") – Name of the latitude column in the CSV.
normalise_lon_to ({"0-360","-180-180"}, default="0-360") – Longitude convention applied after reading.
- Returns:
df – DataFrame with at least columns: - lon : float (degrees) - lat : float (degrees) and any recognised optional columns preserved.
- Return type:
pandas.DataFrame
- Raises:
ValueError – If required columns are missing.
Notes
Rows with missing or non-numeric lon/lat are dropped. Latitudes are bounded to [-90, 90] as a basic sanity check.
- read_grounded_iceberg_gpkg(P_gpkg: str, layer: str = 'grounded_icebergs', dedup_uid: bool = True, uid_col: str = 'Global_UID', x_col: str = 'Easting_3031', y_col: str = 'Northing_3031', lon_col: str = 'Longitude', lat_col: str = 'Latitude', area_km2_col: str = 'Area_Mean_km2', use_attr_area: bool = True, normalise_lon_to: str = '0-360')[source]
Read a grounded-iceberg polygon GeoPackage and derive mapping-ready attributes.
The GeoPackage is expected to contain grounded iceberg polygons (typically in EPSG:3031). This method computes perimeter and area from geometry, optionally prefers an attribute-provided mean area (e.g., Area_Mean_km2), and returns a tabular representation suitable for mapping to the model grid.
- Parameters:
P_gpkg (str or pathlib.Path) – Path to the GeoPackage.
layer (str, default="grounded_icebergs") – Layer name to read.
dedup_uid (bool, default=True) – If True, aggregate multiple detections sharing uid_col into one record (mean position; median size metrics).
uid_col (str, default="Global_UID") – Unique identifier column for deduplication.
x_col (str) – Column names containing projected coordinates in EPSG:3031 meters.
y_col (str) – Column names containing projected coordinates in EPSG:3031 meters.
lon_col (str) – Optional geographic lon/lat columns (degrees) for metadata/debug.
lat_col (str) – Optional geographic lon/lat columns (degrees) for metadata/debug.
area_km2_col (str, default="Area_Mean_km2") – Attribute column containing mean iceberg area in km^2.
use_attr_area (bool, default=True) – If True and area_km2_col exists, use it (falling back to geometry area when missing/invalid). If False, use geometry area only.
normalise_lon_to ({"0-360","-180-180"}, default="0-360") – Longitude convention applied to the lon column (if present).
- Returns:
df – Table with columns: - uid, x_m, y_m, lon, lat - area_m2, perim_m - bed_depth (if present), plus additional metadata columns when available
- Return type:
pandas.DataFrame
Notes
Rows without finite x/y coordinates are dropped. Perimeter and geometry area are computed in the native CRS units (meters for EPSG:3031).
- reapply_landmask(DS, apply_unmodified=False)[source]
Apply the CICE land mask to all spatial variables in a dataset.
This method masks land points by setting values to NaN for all variables that contain both spatial dimensions
("nj", "ni"). The mask is constructed from either the original landmask (kmt_org) or the grounded-iceberg modified landmask (kmt_mod), depending on configuration.- Parameters:
DS (xarray.Dataset) – Dataset containing sea-ice fields. Any variable whose dimensions include both “nj” and “ni” will be masked.
apply_unmodified (bool, default False) – If True, always apply the original landmask (P_KMT_org) even when self.use_gi is True. If False (default) and self.use_gi is True, apply the modified landmask (P_KMT_mod).
- Returns:
xarray.Dataset – The same dataset instance with land cells set to NaN for spatial variables.
Side Effects
————
Opens the landmask dataset from disk via xarray.open_dataset.
Modifies DS in place by assigning masked variables.
Notes
The landmask is assumed to be encoded such that ocean == 1 and land == 0
(or equivalent). The mask applied is
kmt == 1. - Non-spatial variables (without both “nj” and “ni”) are left unchanged.
- write_F2_with_GI(P_F2_coast: str = None, P_GI: str = None, P_GI_CSV: str = None, P_out: str = None, P_grid: str = None, P_mask: str = None, P_topog: str = None, nx: int = None, ny: int = None, method: str = 'simple-geometry', base_area_m2: float = None, C_gi: float = 1.0, eps_cluster_km: float = 15.0, min_cluster_size: int = 3, cluster_buffer_m: float = None, cluster_amplification: float = 0.0, weight_by: str = 'perim', length_scale: str = 'perimeter', dedup_uid: bool = True, overwrite: bool = False)[source]
Combine coast-only F2 with grounded-iceberg (GI) contributions and write NetCDF.
sea_ice_classification
- class sea_ice_classification.SeaIceClassification(sim_name=None, logger=None, **kwargs)[source]
Bases:
objectClassify Antarctic landfast ice, pack ice, and total sea ice from CICE output.
- This class implements a threshold-based classification framework driven by:
sea-ice concentration (aice), and
sea-ice speed magnitude on the model’s analysis grid (nominally the CICE T-grid).
It supports multiple velocity staggering strategies for constructing a T-grid speed magnitude (ispd_T) from the native model velocity components:
Staggering → T-grid strategies
The strategy is controlled by self.BorC2T_type, which can be a string (e.g. “Tb”) or an iterable (e.g. [“Ta”, “Tb”]). Supported tokens are:
B-grid derived modes (legacy / B-grid CICE) - “Ta”: Convert corner-staggered B-grid velocities to a T-grid speed using a 2×2
box mean where NaNs propagate (area-like average; intended to preserve holes).
- “Tb”: Convert corner-staggered B-grid velocities to a T-grid speed using a 2×2
box mean with NaNs → 0 (no-slip-like fill; tends to remove coastal holes).
“Tx”: Regrid B-grid velocities to T-grid using precomputed weights (e.g., xESMF).
C-grid derived mode (new / C-grid CICE) - “Tc”: Reconstruct T-grid (cell-centre) velocities directly from C-grid edge
- components:
uvelE, uvelN, vvelE, vvelN → (velE_T, velN_T) → ispd_T
This mode uses a dedicated C-grid-to-T-grid reconstruction (C2T) and is conceptually distinct from the B-grid averaging/regridding modes. Because “Tc” is a physically different reconstruction (not merely an alternate smoothing), it is enforced as exclusive: if “Tc” is selected, it must be the only token.
Classification products
- For a given date range [dt0, dtN] the class can produce:
Daily masks: computed directly from daily aice and ispd_T.
Persistence (binary-day) masks: computed from the extended-range daily mask using a sliding window length W (days) and minimum count M (days in window).
Optional rolling-mean masks: computed from a rolling-mean speed field and the same concentration + speed thresholds.
Definitions
Using thresholds icon_thresh (concentration) and ispd_thresh (speed):
- Fast ice (FI):
(aice > icon_thresh) AND (0 < ispd_T <= ispd_thresh)
- Pack ice (PI):
(aice > icon_thresh) AND (ispd_T > ispd_thresh)
- Total sea ice (SI):
(aice > icon_thresh)
- Persistence (“binary-day”) classification:
A day is classed as persistent if at least bin_min_days out of bin_win_days days in a centred sliding window are classed as FI/PI.
Required configuration and dependencies
The class expects the broader AFIM stack to provide the following attributes:
Core configuration - CICE_dict : dict
- Must include (names may vary across your stack):
“time_dim”, “y_dim”, “x_dim”, “y_dim_length”, “x_dim_length”, “drop_coords”, “FI_chunks”, and optionally “wrap_x”, “y_chunk”, “x_chunk”.
- icon_threshfloat
Concentration threshold (e.g., 0.15).
- ispd_threshfloat
Speed threshold for FI/PI discrimination (e.g., 1e-4 m/s).
- bin_win_daysint
Persistence window length W.
- bin_min_daysint
Minimum count M within window.
- mean_periodint
Rolling mean window length for optional rolling output.
- BorC2T_typestr | list[str] | set[str]
Mode tokens controlling compute_ispdT().
External methods (implemented elsewhere in your toolbox) - load_cice_grid() - define_reG_weights() - load_cice_zarr(…) - slice_hemisphere(…) - reG(…) or B2Tx(…) (depending on your architecture) - B2Ta(…), B2Tb(…), B2Tx(…) (if Ta/Tb/Tx are used) - C2T(…) (required for Tc) - _apply_thresholds(…), _rolling_mean(…), _with_T_coords(…) - classify_binary_days_fast_ice(…), classify_binary_days_pack_ice(…) - _wrap_x_last_equals_first(…) (if wrap_x=True)
Notes
Many operations are Dask-friendly; however, persistence and rolling calculations benefit from careful chunking along time.
The high-level classifiers extend the requested time window to avoid edge artefacts when applying centred rolling/persistence windows, then crop back to [dt0, dtN].
For C-grid runs (“Tc”), velocities are expected as edge components (E/N) and are reconstructed rather than averaged/regridded. This typically reduces the systematic “west-of-land NaN/zero” artefacts seen in B-grid corner-staggered products.
- B2Ta(uB, vB, y_len, x_len)[source]
Compute T-cell speed magnitude by B->T averaging (method A) of u and v.
This routine: 1) Averages corner-staggered u and v to T-cell centers using Bavg_methA. 2) Computes speed magnitude: sqrt(uT^2 + vT^2).
- Parameters:
uB (numpy.ndarray) – Corner-staggered velocity components with shape (time, y, x).
vB (numpy.ndarray) – Corner-staggered velocity components with shape (time, y, x).
y_len (int) – Target T-grid spatial dimensions.
x_len (int) – Target T-grid spatial dimensions.
- Returns:
sT – T-centered speed magnitude with shape (time, y_len, x_len).
- Return type:
numpy.ndarray
Notes
Because Bavg_methA propagates NaNs, this speed product will also be NaN where any of the contributing corners are NaN.
- B2Tb(uB, vB, y_len, x_len, wrap_x)[source]
Compute T-cell speed magnitude by B->T averaging (method B) of u and v.
This routine: 1) Averages corner-staggered u and v to T-cell centers using Bavg_methB
(NaNs treated as 0.0 to enforce no-slip at masked corners).
Computes speed magnitude: sqrt(uT^2 + vT^2).
- Parameters:
uB (numpy.ndarray) – Corner-staggered velocity components with shape (time, y, x).
vB (numpy.ndarray) – Corner-staggered velocity components with shape (time, y, x).
y_len (int) – Target T-grid spatial dimensions.
x_len (int) – Target T-grid spatial dimensions.
wrap_x (bool) – Whether to wrap the x-direction cyclically during averaging.
- Returns:
sT – T-centered speed magnitude with shape (time, y_len, x_len).
- Return type:
numpy.ndarray
Notes
This method is typically more robust near land masks because NaNs are treated as 0.0 prior to averaging.
- B2Tx(uB_da, vB_da)[source]
Regrid B-grid velocity components to T points via xESMF and compute speed.
This method uses a pre-defined regridder self.reG (constructed by define_reG_weights / _ensure_reG_defined) to map B-grid corner fields onto the T-grid. Prior to regridding, NaNs are replaced by 0.0 to enforce a no-slip-style boundary condition.
- Parameters:
uB_da (xarray.DataArray) – Corner-staggered velocity components. Must be compatible with the configured xESMF regridder. Typically dims are (time, y, x) or similar.
vB_da (xarray.DataArray) – Corner-staggered velocity components. Must be compatible with the configured xESMF regridder. Typically dims are (time, y, x) or similar.
- Returns:
ispd_Tx – T-grid speed magnitude computed as hypot(uT, vT), dtype float32. Uses xr.apply_ufunc(…, dask=”parallelized”) for Dask compatibility.
- Return type:
xarray.DataArray
Notes
This is distinct from the simple 2x2 averaging methods (B2Ta, B2Tb);
it uses spatial regridding (often bilinear) and may yield smoother fields. - Because NaNs are converted to 0.0, masked/corner land values do not propagate into neighbouring T cells in the same way as NaN-preserving averages.
- Bavg_methA(B_component: ndarray, y_len, x_len) ndarray[source]
Average a corner-staggered (B-grid) field onto T-cell centers via a 2x2 mean.
This method computes the simple arithmetic mean of the four corner values surrounding each T-cell:
avg = (v00 + v01 + v10 + v11) / 4
It then pads the result back to the target T-grid shape and optionally wraps the x-direction by copying the first column into the last column.
- Parameters:
B_component (numpy.ndarray) – Corner-staggered field with shape (time, y, x). Typically a velocity component (u or v) on B-grid corners.
y_len (int) – Target T-grid y dimension length. (Note: current implementation reads self.CICE_dict[“y_dim_length”] and ignores the passed value.)
x_len (int) – Target T-grid x dimension length. (Note: current implementation reads self.CICE_dict[“x_dim_length”] and ignores the passed value.)
- Returns:
avg_T – T-centered field with shape (time, y_len, x_len). Padding is filled with NaN. dtype is unchanged until returned (typically float).
- Return type:
numpy.ndarray
Notes
This method does not apply a no-slip treatment: NaNs in the input propagate
into the 2x2 average. - X wrapping is implemented by setting the last x column equal to the first, which is appropriate for cyclic/periodic global grids. - If the input has insufficient halo/extent for the 2x2 operation, the output will contain NaNs introduced by padding.
- Bavg_methB(B_component, y_len, x_len, wrap_x)[source]
Average a corner-staggered (B-grid) field onto T-cell centers using a 2x2 mean with no-slip handling at land corners (NaNs treated as zero).
The method forms the mean of the four corner values surrounding each T-cell, but first converts NaNs to 0.0 to enforce a no-slip-style boundary at masked corners:
out = 0.25 * (nan_to_num(v00) + nan_to_num(v01) + nan_to_num(v10) + nan_to_num(v11))
- Parameters:
B_component (array-like) – Corner-staggered field with shape (time, y, x), e.g., uvel or vvel on a CICE B-grid.
y_len (int) – Target T-grid y dimension length.
x_len (int) – Target T-grid x dimension length.
wrap_x (bool) – If True, enforce cyclic boundary in x by copying the first column to the last column after padding.
- Returns:
out_T – T-centered field with shape (time, y_len, x_len), dtype float32.
- Return type:
numpy.ndarray
Notes
NaN-to-zero treatment is intentional and corresponds to a no-slip behaviour
at land/corner masks; this differs from Bavg_methA, where NaNs propagate. - Padding (if needed) is filled with NaN after averaging. - For cyclic grids, the wrap is applied after padding and before trimming.
- C2T(uvelE, uvelN, vvelE, vvelN, y_len: int | None = None, x_len: int | None = None, wrap_x: bool | None = None, combine: str = 'mean', nan_to_zero: bool = True)[source]
Reconstruct T-grid east/north velocity components from C-grid edge fields.
Inputs are assumed to be physical components in the local east/north basis: - uvelE, uvelN: (E,N) components defined on the U-stagger (x-faces) - vvelE, vvelN: (E,N) components defined on the V-stagger (y-faces)
The mapping is performed by edge-to-center averaging: - U-stagger → T via direction=”x” - V-stagger → T via direction=”y”
Then the two estimates are combined to produce a single (E,N) pair at T.
- Parameters:
uvelE (xarray.DataArray or array-like) – C-grid velocity components on the U and V staggers.
uvelN (xarray.DataArray or array-like) – C-grid velocity components on the U and V staggers.
vvelE (xarray.DataArray or array-like) – C-grid velocity components on the U and V staggers.
vvelN (xarray.DataArray or array-like) – C-grid velocity components on the U and V staggers.
y_len (int, optional) – Target T-grid spatial sizes. Defaults to self.CICE_dict[“*_dim_length”].
x_len (int, optional) – Target T-grid spatial sizes. Defaults to self.CICE_dict[“*_dim_length”].
wrap_x (bool, optional) – Seam handling in x. Defaults to self.CICE_dict.get(“wrap_x”, True).
combine ({"mean","uv"}, default "mean") – How to combine U- and V-stagger estimates at T: - “mean”: E_T = 0.5*(E_from_U + E_from_V), N_T = 0.5*(N_from_U + N_from_V) - “uv” : E_T = E_from_U, N_T = N_from_V (classic C-grid intuition)
nan_to_zero (bool, default True) – If True, treat NaNs as 0 prior to averaging (no-slip-like).
- Returns:
velE_T, velN_T – Eastward and northward velocity components at T-cell centers.
- Return type:
same type as inputs
Notes
If you only need speed, you can compute hypot(velE_T, velN_T) downstream using xr.apply_ufunc(np.hypot, …) for Dask-parallel execution.
- Cavg_meth(C_component, direction: str, y_len: int | None = None, x_len: int | None = None, wrap_x: bool | None = None, nan_to_zero: bool = True)[source]
Average a C-grid edge-centered field onto T-cell centers.
This is the C-grid analogue of your B-grid corner-to-center averaging, but uses a 1D average along the staggered direction:
direction=”x”: average along x (typical for U-grid / east-west faces)
direction=”y”: average along y (typical for V-grid / north-south faces)
Supports both xarray.DataArray (Dask-friendly) and NumPy arrays.
- Parameters:
C_component (xarray.DataArray or array-like) – C-grid edge-centered field. Expected to have time as the first dimension and the last two dims as (y, x) in some order consistent with typical CICE output. For xarray input, dims are inferred from the last two dims.
direction ({"x","y"}) – Stagger direction to average over: - “x”: average adjacent points in x to map U-grid → T-grid - “y”: average adjacent points in y to map V-grid → T-grid
y_len (int, optional) – Target T-grid spatial sizes. Defaults to self.CICE_dict[“y_dim_length”] and self.CICE_dict[“x_dim_length”].
x_len (int, optional) – Target T-grid spatial sizes. Defaults to self.CICE_dict[“y_dim_length”] and self.CICE_dict[“x_dim_length”].
wrap_x (bool, optional) – Whether to enforce cyclic wrap in the x-direction (seam handling). Defaults to self.CICE_dict.get(“wrap_x”, True).
nan_to_zero (bool, default True) – If True, treat NaNs as 0.0 before averaging (no-slip-like treatment near land/masks). If False, NaNs propagate into the average.
- Returns:
out_T – T-centered field with shape (time, y_len, x_len), float32 for NumPy input. For xarray input, coords/dims are preserved as much as possible.
- Return type:
same type as input
Notes
Size handling: - If the staggered dimension has length (target+1), a straightforward adjacent slice-average is used and yields the target length. - If it has length (target), we average using roll/shift:
wrap_x=True uses roll (periodic)
wrap_x=False uses shift (non-periodic; last column becomes NaN)
- classify_binary_days_mask(I_mask: DataArray, ice_type: str | None = None, bin_win_days: int | None = None, bin_min_days: int | None = None, time_dim: str | None = None, centered: bool = True) Dataset[source]
Compute a generic binary-day persistence mask using xarray rolling counts.
In each rolling window of length W, the mask is set to 1 if at least M days are True/1 within that window:
mask_bin(t) = 1 if sum(mask[t-window]) >= M else 0
- Parameters:
I_mask (xarray.DataArray) – Input boolean or 0/1 mask with dimensions (time, y, x).
mask_name (str) – Name of the output variable (e.g., “FI_mask”, “PI_mask”).
bin_win_days (int, optional) – Window length W. Defaults to self.bin_win_days when None.
bin_min_days (int, optional) – Minimum days M. Defaults to self.bin_min_days when None.
time_dim (str, optional) – Name of the time dimension. Defaults to self.CICE_dict[“time_dim”].
centered (bool, default=True) – If True, uses a centered window. If False, uses a trailing window.
- Returns:
ds – Dataset containing one uint8 variable named mask_name.
- Return type:
xarray.Dataset
Notes
Time chunking is adjusted so that time chunks are not larger than the
rolling window, which generally improves Dask rolling performance. - Rolling uses min_periods=M, so partial windows can contribute when at least M samples are available; this differs from a strict “full-window only” approach. - If the input has zero time length, returns an empty dataset (with logging).
- classify_fast_ice(dt0_str: str = None, dtN_str: str = None, bin_win_days: int = None, bin_min_days: int = None, enable_rolling_output: bool = False, roll_win_days: int = None, time_dim_name: str = None)[source]
Classify fast ice (FI) using concentration + speed thresholds with optional persistence/rolling products.
Fast ice is defined where concentration exceeds icon_thresh and the T-grid speed magnitude is small but non-zero:
FI := (aice > icon_thresh) AND (0 < ispd_T <= ispd_thresh)
The method computes products over an extended time range to avoid edge artefacts for centred windows (binary-day persistence and optional rolling-mean speed). It then crops the outputs back to the requested interval [dt0_str, dtN_str].
C-grid “Tc” behaviour (important)
- If BorC2T_type includes “Tc”, the classification uses C-grid edge velocities:
uvelE, uvelN, vvelE, vvelN
- and reconstructs T-grid velocities using C2T. In this mode:
define_reG_weights() is not called (no B→T regridding required),
the velocity dataset passed to compute_ispdT() must contain the four C-grid variables, and
“Tc” must be exclusive (it cannot be combined with Ta/Tb/Tx).
- param dt0_str:
Start and end dates (inclusive). Must be parseable by pandas, e.g. “YYYY-MM-DD”.
- type dt0_str:
str
- param dtN_str:
Start and end dates (inclusive). Must be parseable by pandas, e.g. “YYYY-MM-DD”.
- type dtN_str:
str
- param bin_win_days:
Persistence window length W (days). Defaults to self.bin_win_days.
- type bin_win_days:
int, optional
- param bin_min_days:
Minimum number of FI days M required within the window. Defaults to self.bin_min_days.
- type bin_min_days:
int, optional
- param enable_rolling_output:
If True, compute a rolling-mean speed field and corresponding FI mask.
- type enable_rolling_output:
bool, default False
- param roll_win_days:
Rolling mean window length (days). Defaults to self.mean_period.
- type roll_win_days:
int, optional
- param time_dim_name:
Name of the time dimension. Defaults to self.CICE_dict[“time_dim”].
- type time_dim_name:
str, optional
- returns:
FI_dly (xr.Dataset) – Daily fast-ice mask dataset with variable “FI_mask” over [dt0_str, dtN_str].
FI_bin (xr.Dataset) – Binary-day (persistence) fast-ice mask dataset with variable “FI_mask” over [dt0_str, dtN_str]. Computed from the extended-range daily mask and then cropped.
FI_roll (xr.Dataset or None) – Rolling-mean fast-ice mask dataset with variable “FI_mask” over [dt0_str, dtN_str], or None if enable_rolling_output is False.
ispd_out (dict) –
- Speed diagnostics:
“daily”: xr.DataArray “ispd_T” over the extended range (with T coords)
“rolly”: xr.DataArray “ispd_T_roll” over the extended range (if enabled), else None
- raises ValueError:
If “Tc” is selected together with any of {“Ta”, “Tb”, “Tx”}.
- raises ValueError:
If required C-grid velocity variables are missing when “Tc” is selected.
- raises Exception:
Propagates exceptions from dataset loading, coordinate attachment, thresholding, or persistence/rolling helpers.
Notes
Implementation outline
Determine selection set from BorC2T_type and enforce Tc exclusivity.
Load grid and, if needed (non-Tc), load/define regridding weights.
- Extend requested time range by half the maximum of:
persistence window (W/2)
rolling window (R/2, if enabled)
- Load required variables across the extended range:
Tc: aice + (uvelE,uvelN,vvelE,vvelN)
else: aice + (uvel,vvel)
Chunk for rolling-friendly time blocks.
Compute ispd_T once on the extended range and persist.
- Compute:
daily FI mask on extended range, then crop
binary-day FI mask on extended daily mask, then crop
optional rolling FI mask on rolled speed, then crop
- classify_ice(ice_type: str = None, dt0_str: str = None, dtN_str: str = None, bin_win_days: int = None, bin_min_days: int = None, enable_rolling_output: bool = False, roll_win_days: int = None, time_dim_name: str = None)[source]
Classify ocean ice as fast ice (FI), pack ice (PI) and both of them as sea ice (SI) using concentration (+ speed for fast and pack) thresholds with optional persistence/rolling products.
Pack ice is defined where concentration exceeds icon_thresh and the T-grid speed magnitude exceeds the fast-ice speed threshold:
PI := (aice > icon_thresh) AND (ispd_T > ispd_thresh)
Fast ice is defined where concentration exceeds icon_thresh and the T-grid speed magnitude is small but non-zero:
FI := (aice > icon_thresh) AND (0 < ispd_T <= ispd_thresh)
C-grid “Tc” behaviour
- If BorC2T_type includes “Tc”, the classification uses C-grid edge velocities:
uvelE, uvelN, vvelE, vvelN
and reconstructs T-grid velocities using C2T inside compute_ispdT(). In this mode: - define_reG_weights() is not called, - “Tc” must be exclusive (cannot be combined with Ta/Tb/Tx).
The method computes products over an extended time range to avoid edge artefacts for centred windows (binary-day persistence and optional rolling-mean speed), then crops the outputs back to the requested interval [dt0_str, dtN_str].
- returns:
PI_dly (xr.Dataset) – Daily pack-ice mask dataset with variable “PI_mask” over [dt0_str, dtN_str].
PI_bin (xr.Dataset) – Binary-day (persistence) pack-ice mask dataset with variable “PI_mask” over [dt0_str, dtN_str]. Computed from the extended daily mask and then cropped.
PI_roll (xr.Dataset or None) – Rolling-mean pack-ice mask dataset with variable “I_mask” over [dt0_str, dtN_str], or None if enable_rolling_output is False.
ispd_out (dict) – Speed diagnostics: - “daily”: xr.DataArray “ispd_T” over the extended range (with T coords) - “rolly”: xr.DataArray “ispd_T_roll” over the extended range (if enabled), else None
- raises ValueError:
If “Tc” is selected together with any of {“Ta”, “Tb”, “Tx”}.
- raises ValueError:
If required C-grid velocity variables are missing when “Tc” is selected.
- classify_marginal_ice_zone(dt0_str: str, dtN_str: str, miz_width_km: float = 100.0, cell_km: float | None = None, edge_aice_thresh: float | None = None, aice_min: float | None = None, aice_max: float | None = 0.8, exclude_polynyas: bool = True, bin_win_days: int | None = None, bin_min_days: int | None = None, enable_rolling_output: bool = False, roll_win_days: int | None = None, time_dim_name: str | None = None)[source]
Classify Marginal Ice Zone (MIZ) using an ice-edge distance-band method.
Default definition (recommended): - Ice edge: aice > edge_aice_thresh (defaults to icon_thresh, usually 0.15) - MIZ: ice cells within miz_width_km of the open-ocean ice edge - Optional aice band constraint: aice_min <= aice <= aice_max (defaults to <=0.80) - Optional exclusion of polynyas: open water not boundary-connected does NOT create local MIZ
- Returns:
MIZ_dly (xr.Dataset) – Daily MIZ mask dataset with variable “MIZ_mask” over [dt0_str, dtN_str].
MIZ_bin (xr.Dataset or None) – Binary-day (persistence) MIZ mask dataset (same variable name) over [dt0_str, dtN_str], or None if bin_win_days/bin_min_days are not provided.
MIZ_roll (xr.Dataset or None) – Rolling-time-smoothed MIZ mask dataset over [dt0_str, dtN_str], or None if disabled. Rolling is applied to aice (not distance), then MIZ is re-computed.
diag (dict) – Diagnostics: includes “width_cells”, “cell_km”, “edge_aice_thresh”, “aice_min/max”.
- classify_sea_ice(dt0_str: str = None, dtN_str: str = None, time_dim_name: str = None)[source]
Classify total sea ice (SI) using only a concentration threshold.
Total sea ice is defined as:
SI := (aice > icon_thresh)
This classifier does not require velocity variables and therefore does not produce persistence (binary-day) or rolling-mean products.
- Parameters:
dt0_str (str) – Start and end dates (inclusive). Must be parseable by pandas, e.g. “YYYY-MM-DD”.
dtN_str (str) – Start and end dates (inclusive). Must be parseable by pandas, e.g. “YYYY-MM-DD”.
time_dim_name (str, optional) – Name of the time dimension. Defaults to self.CICE_dict[“time_dim”].
- Returns:
SI_dly (xr.Dataset) – Daily sea-ice mask dataset with variable “SI_mask” over [dt0_str, dtN_str].
SI_bin (None) – Always None (no persistence product for SI).
SI_roll (None) – Always None (no rolling product for SI).
ispd_out (dict) – Empty dict, returned for API consistency with FI/PI classifiers.
Notes
Loads only aice across the requested period.
Applies hemisphere slicing via slice_hemisphere() prior to thresholding.
Uses chunking consistent with FI/PI products to keep downstream workflows uniform.
- compute_ispdT(uvel, vvel=None) DataArray[source]
Compute T-grid sea-ice speed magnitude (ispd_T) using the configured strategy.
This method converts native velocity components to a speed magnitude on the analysis grid (nominally T-grid). The conversion mode(s) are determined by self.BorC2T_type, which is normalised to a set of tokens via _b2t_selection_set().
Supported modes
B-grid derived modes (can be combined and averaged): - “Ta”: Uses self.B2Ta(uvel, vvel, y_len, x_len) to produce a T-grid-like speed
from corner-staggered B-grid components via a 2×2 mean where NaNs propagate.
“Tb”: Uses self.B2Tb(uvel, vvel, y_len, x_len, wrap_x=…) to produce a T-grid-like speed via a 2×2 mean where NaNs are converted to 0 (no-slip-like fill).
“Tx”: Uses self.B2Tx(uvel, vvel) to regrid B-grid components to T-grid (e.g., via weights).
If multiple of {Ta, Tb, Tx} are selected, the method returns the mean of the members (skipna=True) to form a composite speed estimate.
C-grid derived mode (exclusive): - “Tc”: Reconstructs T-grid velocities directly from C-grid edge components using
- self.C2T(uvelE, uvelN, vvelE, vvelN, …), then computes:
ispd_T = hypot(velE_T, velN_T)
“Tc” must be the only selected token. It is not averaged with Ta/Tb/Tx, because it represents a different staggering and reconstruction pathway.
- param uvel:
Velocity input. Interpretation depends on mode:
- For Ta/Tb/Tx:
uvel is the B-grid corner-staggered u-component DataArray (typically uvel). vvel must be provided as the matching v-component DataArray.
- For Tc:
uvel must be a Dataset (or dict-like) containing the four required C-grid edge components:
“uvelE”, “uvelN”, “vvelE”, “vvelN”
In Tc mode, the vvel argument is ignored.
- type uvel:
xr.DataArray, xr.Dataset, or mapping-like
- param vvel:
B-grid corner-staggered v-component. Required for Ta/Tb/Tx; ignored for Tc.
- type vvel:
xr.DataArray, optional
- returns:
- T-grid speed magnitude (float32). Name will reflect the mode:
“ispd_Tc” for Tc
“ispd_Ta”/”ispd_Tb”/”ispd_Tx” when only one B-grid member is selected
composite mean when multiple B-grid members are selected
- rtype:
xr.DataArray
- raises ValueError:
If Tc is requested but uvel does not provide the required variables.
- raises ValueError:
If Tc is selected together with any of Ta/Tb/Tx.
- raises ValueError:
If required C-grid variables (“uvelE”, “uvelN”, “vvelE”, “vvelN”) are missing.
- raises AttributeError:
If a requested conversion method (e.g., B2Ta/B2Tb/B2Tx/C2T) is not defined.
Notes
In Tc mode, wrap_x handling is applied (if self.CICE_dict[“wrap_x”] is True) using _wrap_x_last_equals_first() so that the last x-column matches the first.
Speed is computed using np.hypot via xr.apply_ufunc(…, dask=”parallelized”).
- load_classified_ice(sim_name: str = None, ice_type: str = None, ispd_thresh: float = None, class_method: str = 'binary-days', BorC2T_type: str = None, variables: list = None, dt0_str: str = None, dtN_str: str = None, D_zarr: str = None, chunks: dict = None, persist: bool = False, bin_win_days: int = None, bin_min_days: int = None, mean_period: int = None)[source]
Load classified ice products from yearly-grouped Zarr stores.
sea_ice_observations
- class sea_ice_observations.SeaIceObservations(**kwargs)[source]
Bases:
objectObservational sea-ice I/O and derived diagnostics for AFIM workflows.
This class centralises reading, standardisation, and metric generation for several observational products used throughout AFIM/SeaIceToolbox analyses:
- NSIDC (e.g., G02202 v4) sea-ice concentration:
local file discovery and loading over time windows (daily or monthly layouts),
masking of flagged concentration values,
computation of hemispheric sea-ice area (SIA) and extent (SIE) time series.
- AF2020 (Fraser et al. 2020) landfast sea-ice:
reading FIA CSV time series and constructing DOY climatologies,
regridding original AF2020 raster products to the CICE T-grid with xESMF,
convenience regional fast-ice area time series from AF2020-like gridded datasets.
- ESA-CCI / AWI CryoSat/Envisat/Sentinel sea-ice thickness (SIT):
robust path discovery across multiple institutional directory layouts,
indexing and de-duplication (one file per day or month),
L2P swath → daily gridded SIT via pyresample neighbour queries with optional Gaussian weighting and explicit uncertainty propagation,
L3 gridded monthly SIT assembly (no resampling), with optional SIC thresholding, QC flag masking, and hemispheric weighted means.
Expected configuration and dependencies
This class is intended to be used inside the AFIM stack and expects several attributes to exist on self:
Required attributes (typical) - logger : logging.Logger
Used for progress and warnings.
- dt0_str, dtN_strstr
Default analysis window in “YYYY-MM-DD” format.
- hemisphere_dictdict
Must include ‘abbreviation’ (e.g., “SH” or “NH”).
- NSIDC_dictdict
- Paths and dataset conventions for NSIDC loading and metrics. Typical keys:
“D_original”
“G02202_v4_file_format”
“file_versions” (dict ver -> “YYYY-MM-DD” start date, or None)
“SIC_name” (e.g., “cdr_seaice_conc”)
“cdr_seaice_conc_flags” (list of flag integer codes)
“P_cell_area” (cell area file path)
“projection_string” (Proj4 string)
- AF_FI_dictdict
- Paths and naming for AF2020. Typical keys:
“P_AF2020_csv”
“D_AF2020_db_org”
“AF_reG_weights”
“variable_name”, “lat_coord_name”, “lon_coord_name”, “time_coord_name”
“threshold_value”
“P_reG_reg_weights” (optional)
- CICE_dictdict
- CICE grid conventions and coordinate names (for mapping masks/regridding):
‘spatial_dims’, ‘three_dims’
‘lon_coord_name’, ‘lat_coord_name’
‘tcoord_names’ (lon, lat)
‘time_dim’
‘P_reG_reg_weights’ (optional)
- Sea_Ice_Obs_dictdict
- Root directories for institutions:
‘ESA-CCI’, ‘AWI’
- icon_threshfloat
Sea-ice concentration threshold for extent masking (fraction).
- SIC_scalefloat
Unit scaling between dataset SIC and area weighting.
- normalise_longitudes(lon, to=…)callable
Longitude wrapping helper used across multiple routines.
- define_cice_grid(…)callable
Must be available for AF2020 conservative regridding to the CICE T-grid.
- load_cice_grid(…) / define_cice_grid(…) conventions
Must be consistent with your xESMF usage.
Notes on time and units
All internal timestamp comparisons for file filtering are performed in UTC.
NSIDC SIA/SIE are computed using an explicit cell-area product and your configured concentration scaling.
- SIT uncertainty propagation uses standard-error style aggregation:
SE_cell = sqrt(sum(w^2 * unc^2)) / sum(w) SE_hemi = sqrt(sum(W^2 * SE_cell^2)) / sum(W)
where W is cos(latitude) and w is neighbour weight.
Implementation note (RTD/autodoc)
Keep docstrings on public methods comprehensive and consistent with your JSON keys. Methods with a leading underscore are treated as internal helpers.
- AF2020_clim_to_model_time(model: DataArray, clim: DataArray) DataArray[source]
Expand a DOY-based climatology to match a model time axis.
- Parameters:
model (xr.DataArray) – Model time series containing a time coordinate (datetime-like).
clim (xr.DataArray) – Climatology indexed by doy (and optionally region).
- Returns:
Climatology values mapped onto model.time using each date’s day-of-year.
- Return type:
xr.DataArray
Notes
Leap days: if the observational climatology does not define DOY=366, callers should drop Feb 29 from the model calendar or define a mapping strategy (e.g., nearest/pad) consistent with the analysis.
- AF2020_interpolate_FIA_clim(csv_df, full_doy=array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365]))[source]
Interpolate an AF2020 DOY climatology to a complete 1..365 DOY axis.
- Parameters:
csv_df (pandas.DataFrame) – DataFrame containing at least: - ‘DOY_start’ : day-of-year index values - ‘Circumpolar’ : FIA values (km²)
full_doy (array-like, default np.arange(1, 366)) – Target DOY coordinate to interpolate onto.
- Returns:
Interpolated climatology with coordinate: - doy : day-of-year Values are in 10³ km².
- Return type:
xr.DataArray
- AF2020_regional_fast_ice_area(ds: Dataset, region=(52.5, 97.5, -70.5, -64), lon_name='longitude', lat_name='latitude', area_name='area', fast_name='Fast_Ice_Time_series', threshold=4, lon_wrap='0-360')[source]
Compute a regional fast-ice area time series from a gridded observation product.
This routine is designed for AF2020-like datasets that provide: - curvilinear or projected 2-D lon/lat fields, and - a per-cell area field (m²), and - a fast-ice indicator variable over time.
- The region is defined by a geographic bounding box:
region = (lon_min, lon_max, lat_min, lat_max)
Longitude seam-crossing is supported (e.g., lon_min=350, lon_max=20 in 0–360 convention). The computation is performed as:
FIA(t) = Σ_{cells in region} [ fast(t,cell) * area(cell) ]
where fast(t,cell) is a boolean mask defined by ds[fast_name] >= threshold.
- Parameters:
ds (xarray.Dataset) –
Dataset containing at minimum: - ds[lon_name] : 2-D longitude field with a time dimension (time, Y, X) or
broadcastable; only time=0 is used for the static grid.
- ds[lat_name]2-D latitude field with a time dimension (time, Y, X) or
broadcastable; only time=0 is used for the static grid.
- ds[area_name]2-D grid-cell area in m² with a time dimension (time, Y, X)
or broadcastable; only time=0 is used.
ds[fast_name] : fast-ice indicator variable with a time dimension.
region (tuple[float, float, float, float], default (52.5, 97.5, -70.5, -64)) – Geographic region bounds as (lon_min, lon_max, lat_min, lat_max), in degrees.
lon_name (str, default "longitude") – Name of the longitude variable/coordinate in ds.
lat_name (str, default "latitude") – Name of the latitude variable/coordinate in ds.
area_name (str, default "area") – Name of the per-cell area variable/coordinate in ds (assumed m²).
fast_name (str, default "Fast_Ice_Time_series") – Name of the fast-ice indicator variable in ds.
threshold (float, default 4) – Threshold applied to ds[fast_name] to produce a boolean fast-ice mask. Cells with ds[fast_name] >= threshold are treated as fast ice.
lon_wrap ({"0-360", "-180-180"}, default "0-360") – Longitude convention used for the region definition and for wrapping lon2d.
- Returns:
1-D time series of regional fast-ice area in 1e3 km² (i.e., thousands of km²), named “FIA_1e3-km2”. The time coordinate is inherited from ds[fast_name].
- Return type:
xarray.DataArray
Notes
Lon/lat/area are read from time=0 only to avoid unnecessary time broadcasting.
The regional mask is optionally cropped to the minimal bounding index range to
reduce computation. - If region crosses the longitude seam (lon_min > lon_max), the longitude mask is built using an OR condition to include both sides.
Examples
>>> fia = obs.AF2020_regional_fast_ice_area(ds, region=(350, 20, -75, -65), lon_wrap="0-360") >>> fia.plot()
- NSIDC_coordinate_transformation(ds)[source]
Add geographic longitude/latitude coordinates to an NSIDC grid.
NSIDC concentration products are commonly stored in a Polar Stereographic projection with Cartesian coordinates (e.g., xgrid, ygrid, in meters). This routine converts the projected coordinates to WGS84 longitude/latitude and attaches them to the dataset.
- Parameters:
ds (xr.Dataset) – Input dataset containing projected coordinate variables. Expected variables: - xgrid : (x) or (y, x) coordinate(s) in meters - ygrid : (y) or (y, x) coordinate(s) in meters
- Returns:
Dataset with added variables: - lon : (y, x) longitude in degrees east - lat : (y, x) latitude in degrees north
- Return type:
xr.Dataset
- Raises:
KeyError – If xgrid/ygrid variables are missing.
AssertionError – If xgrid and ygrid shapes are incompatible for transformation.
ImportError – If pyproj is not installed.
Notes
Uses self.NSIDC_dict[“projection_string”] (Proj4) for the source CRS.
Uses EPSG specified in self.sea_ice_dic[“projection_wgs84”] (typically 4326) for the target CRS.
This method assumes x/y describe the native NSIDC grid in meters.
- build_AF2020_grid_corners(lat_c, lon_c)[source]
Construct (ny+1, nx+1) corner arrays from (ny, nx) center arrays.
This helper builds approximate corner coordinates by padding center fields and averaging surrounding values. Corner arrays are commonly required for conservative remapping in xESMF.
- Parameters:
lat_c (np.ndarray) – 2D arrays (ny, nx) of center latitude/longitude in degrees.
lon_c (np.ndarray) – 2D arrays (ny, nx) of center latitude/longitude in degrees.
- Returns:
(lat_b, lon_b) – 2D arrays (ny+1, nx+1) of corner latitude/longitude.
- Return type:
tuple[np.ndarray, np.ndarray]
Notes
This approach assumes a locally smooth grid. For strongly curvilinear grids, corner construction should ideally use native corner metadata when available.
Output longitudes are wrapped to [0, 360).
- build_sea_ice_satellite_paths(dt0_str: str, dtN_str: str, hemisphere: str | None = None, institutions: Iterable[str] | None = ('ESA', 'AWI'), sensors: Iterable[str] | None = None, levels: Iterable[str] | None = None, versions: Iterable[str] | None = None, root_esa: str | None = None, root_awi: str | None = None) List[Path][source]
Build a de-duplicated, time-filtered list of NetCDF files for satellite sea-ice products across ESA and/or AWI directory layouts.
- Time filtering:
Returns files whose date/month lies within the closed interval
[dt0_str, dtN_str], interpreted in UTC. - Daily products (e.g., ESA L2P) are filtered by date parsed from the filename (YYYYMMDD). Monthly products (e.g., ESA L3C or AWI monthly layout) are filtered by overlap between the (year, month) and [dt0, dtN].
- Supported directory patterns (walked automatically):
- ESA “flat” layout:
<root_esa>/<L2P|L3C>/<sensor>/<hemisphere>/*.nc (hemisphere directory may be ‘NH’/’SH’ or lowercase) - L2P: filenames contain YYYYMMDD - L3C: filenames often contain YYYYMM
- ESA versioned layout:
<root_esa>/<L2P|L3C>/<sensor>/<version>/<HEMISPHERE>/<YYYY>/<MM>/*.nc (HEMISPHERE can be ‘NH’/’SH’ or lowercase; <MM> subdirs may be absent)
- AWI release layout:
<root_awi>/<l2p_release|l3cp_release>/<hemisphere>/<sensor>/<YYYY>[/<MM>]/*.nc
- Parameters:
dt0_str – Start and end of the desired time window (inclusive). Any pandas-parsable string is accepted (e.g., “2002-01-01”, “2012-12-31”). Interpreted as UTC.
dtN_str – Start and end of the desired time window (inclusive). Any pandas-parsable string is accepted (e.g., “2002-01-01”, “2012-12-31”). Interpreted as UTC.
hemisphere – Hemisphere selector. If None, uses self.hemisphere_dict[‘abbreviation’]. Case is normalized; both ‘sh’/’nh’ and ‘SH’/’NH’ are supported in dir scans.
institutions – Iterable of institutions to include (case-insensitive). Valid values include “ESA” and/or “AWI”. If None or empty, both are searched.
sensors – Optional iterable of sensor names to filter (case-insensitive, matched to directory names under the institution layout).
levels – Optional iterable of level strings. ESA: “L2P”, “L3C”. AWI: “l2p_release”, “l3cp_release”. If “l2p”/”l3c” are provided for AWI they are mapped to “l2p_release”/”l3cp_release”.
versions – ESA only: one or more version directory names (e.g., “v2.0”, “v3.0”). Case-insensitive. Ignored for AWI.
root_esa – Root directories for ESA and AWI datasets.
root_awi – Root directories for ESA and AWI datasets.
- Returns:
Sorted unique list of file paths within the requested time window that satisfy the filters.
- Return type:
List[Path]
Notes
Filename parsing assumes dates follow YYYYMMDD (daily) or YYYYMM (monthly)
within 2000–2099. Files without such tokens are kept only when they live in a year/month directory that overlaps [dt0, dtN]. - If institutions is None or empty, both ESA and AWI are scanned. - All timestamps are treated as UTC to avoid tz-comparison issues.
Examples
>>> paths = self.build_sea_ice_satellite_paths( ... "2002-01-01", "2002-12-31", ... hemisphere="sh", ... institutions=["ESA", "AWI"], ... sensors=["cryosat2", "envisat"], ... levels=["L2P","L3C","l2p_release"] ... ) >>> len(paths), paths[:3]
- compute_NSIDC_metrics(dt0_str=None, dtN_str=None, local_load_dir=None, monthly_files=False, P_zarr=None, overwrite=False)[source]
Compute and persist NSIDC hemispheric sea-ice area (SIA) and extent (SIE).
This routine loads NSIDC concentration for the specified window, masks out flagged concentration values, applies a concentration threshold to define “ice-covered” cells, and then computes hemispheric aggregates using the NSIDC cell-area product.
- Parameters:
dt0_str (str, optional) – Start/end dates in “YYYY-MM-DD” format. Defaults to self.dt0_str and self.dtN_str.
dtN_str (str, optional) – Start/end dates in “YYYY-MM-DD” format. Defaults to self.dt0_str and self.dtN_str.
local_load_dir (str or pathlib.Path, optional) – Directory containing the NSIDC source NetCDFs. If omitted, uses your configured self.NSIDC_dict[“D_original”] hemisphere sub-tree.
monthly_files (bool, default False) – If True, compute metrics using monthly inputs; otherwise daily inputs.
P_zarr (str or pathlib.Path, optional) – Output Zarr path. Defaults to <local_load_dir>/zarr.
overwrite (bool, default False) – If True, recompute even if the Zarr store already exists.
- Returns:
Dataset with time series: - SIA(time) : sea-ice area (units determined by your scaling) - SIE(time) : sea-ice extent (same area units) plus a time coordinate.
- Return type:
xr.Dataset
- Raises:
FileNotFoundError – If no NSIDC source files are found for the requested window.
KeyError – If required variables/paths are missing from your NSIDC configuration.
Notes
Flags are defined in self.NSIDC_dict[“cdr_seaice_conc_flags”] and removed by setting affected concentrations to NaN before aggregation.
Area is loaded from self.NSIDC_dict[“P_cell_area”].
Extent mask is aice > self.icon_thresh.
- dedup_daily_one_file_per_day(df: DataFrame, prefer: str = 'last') DataFrame[source]
Select exactly one daily file per UTC day from an indexed path table.
- Parameters:
df (pandas.DataFrame) – Output of index_satellite_paths(…).
prefer ({"first","last"}, default "last") – If multiple files map to the same day, select the first/last after sorting.
- Returns:
Subset of df restricted to kind == “daily” with unique ‘date’.
- Return type:
pandas.DataFrame
Notes
Non-daily entries are discarded.
The returned frame preserves the same columns as the input df.
- dedup_monthly_one_file_per_month(df: DataFrame, prefer: str = 'last') DataFrame[source]
Select exactly one file per (year, month) from an indexed path table.
This is intended for monthly products, but will also accept daily files that encode a YYYYMM token.
- Parameters:
df (pandas.DataFrame) – Output of index_satellite_paths(…).
prefer ({"first","last"}, default "last") – If multiple files map to the same (year, month), select the first/last after sorting.
- Returns:
Subset with unique (year, month) combinations.
- Return type:
pandas.DataFrame
Notes
Rows without parsed (year, month) are dropped.
- static ease2_sh_50km()[source]
Convenience grid roughly matching AWI SH L3C (216×216 @ ~50 km, lon0=0).
- index_satellite_paths(paths: Iterable[Path]) DataFrame[source]
Build a tidy index of candidate satellite files with parsed dates for QA/QC and selection.
This function inspects each file name and classifies it as: - “daily” : contains a YYYYMMDD token (date parsed as that day) - “monthly” : contains a YYYYMM token (date set to first of that month) - “unknown” : no parseable token (date set to NaT)
- Parameters:
paths (iterable of pathlib.Path) – Input file paths.
- Returns:
A stable-sorted table with columns: - path : Path - kind : {“daily”,”monthly”,”unknown”} - date : UTC Timestamp (daily) or first-of-month (monthly) or NaT - year, month, day : ints (or None)
- Return type:
pandas.DataFrame
Notes
Sorting is stable and deterministic to make downstream “first”/”last” selection repeatable.
- l2p_to_daily_grid_pyresample(paths: List[Path], hem: str | None = None, area_def=None, roi_km: float = 75.0, neighbours: int = 16, epsilon: float = 0.0, gaussian_sigma_km: float | None = None, lat_cut: float | None = None, time_at_noon: bool = True) Dataset[source]
Resample ESA/AWI L2P swath files to a daily gridded product on an equal-area polar grid using pyresample nearest-neighbour sets.
The algorithm: 1) Group all valid swath samples by UTC day (per file) using time variable. 2) For each day:
Build a SwathDefinition from lon/lat points.
Use pyresample.kd_tree.get_neighbour_info to find up to neighbours candidates within roi_km of each grid cell.
Combine neighbour SIT & uncertainty with uniform or Gaussian weights.
Assemble (time, y, x) arrays and compute hemispheric daily means with cos(latitude) weights.
- Parameters:
paths (list of Path) – L2P NetCDF files. Each must include variables: sea_ice_thickness, sea_ice_thickness_uncertainty, lon, lat, time.
hem ({'sh','nh'}, optional) – Hemisphere. Defaults to self.hemisphere_dict[‘abbreviation’] when present, else ‘sh’.
area_def (pyresample.geometry.AreaDefinition, optional) – Target equal-area grid. If None, uses ease2_sh_50km() for SH or ease2_nh_25km() for NH.
roi_km (float, default 75.0) – Radius of influence for neighbour search (kilometres).
neighbours (int, default 16) – Max neighbours per grid cell from KD-tree.
epsilon (float, default 0.0) – Search tolerance passed to pyresample (trade accuracy vs. speed).
gaussian_sigma_km (float or None, default None) – If set, apply Gaussian weights with this sigma (km); otherwise uniform.
lat_cut (float or None, default None) – Optional extra swath mask: for SH keep lat <= lat_cut; for NH keep lat >= lat_cut. Useful for excluding marginal seas.
time_at_noon (bool, default True) – If True, time coordinate is day+12:00Z; else midnight.
- Returns:
Dimensions: time, y, x Coordinates: lon(y,x), lat(y,x), time Data variables:
SIT(time,y,x) [m] SIT_unc(time,y,x) [m] propagated standard error u_cell(time,y,x) [m] per-cell uncertainty mean (neighbour-weighted) n_cell(time,y,x) [#] count of contributing neighbours per cell SIT_hem(time) [m] hemispheric mean SIT_hem_unc(time) [m] propagated SE for hemispheric mean SIT_hem_u(time) [m] hemispheric mean of per-cell uncertainty SIT_hem_n(time) [#] total contributing neighbours (sum over grid)
- Return type:
xarray.Dataset
Notes
Hemispheric means use cos(lat) weights on the target grid after resampling.
Propagated SE follows: sqrt(sum(w^2 * se^2)) / sum(w).
Files missing required variables are skipped (warning).
- l3c_paths_to_monthly_grid(paths: list[Path], hem: str | None = None, sic_thresh_percent: float | None = 15.0, mask_strategy: str = 'none', include_snow: bool = True, include_flags: bool = True, min_obs: int = 1, time_midpoint: bool = True) Dataset[source]
Assemble ESA L3C / AWI l3cp_release files (already on LAEA grids) into a single (time, y, x) dataset with hemispheric monthly means.
This function mirrors the L2P driver logic, but no resampling is done: it reads gridded variables straight from the files and standardizes dims.
- Parameters:
paths (list[Path]) – Monthly L3 NetCDF paths (ESA L3C or AWI l3cp_release).
hem ({'sh','nh'}, optional) – Hemisphere hint (controls sign in optional lat_cut if you add one later). Defaults to self.hemisphere_dict[‘abbreviation’] if present.
sic_thresh_percent (float or None, default 15.0) – If set, mask cells with sea_ice_concentration < threshold (percent).
mask_strategy ({"none","quality","status","both"}, default "none") –
“quality” keeps cells with quality_flag == 0
”status” keeps cells with status_flag == 0
”both” requires both conditions
include_snow (bool, default True) – Include snow_depth and snow_depth_uncertainty when present.
include_flags (bool, default True) – Include flag fields (quality_flag, status_flag, region_code) when present.
min_obs (int, default 1) – If a count variable exists (e.g., “n_observations”), require count >= min_obs.
time_midpoint (bool, default True) – If time_bnds variable exists, use midpoint as the time coordinate.
- Returns:
Dimensions: time, y, x Coordinates: lon(y,x), lat(y,x), time Data variables:
SIT(time,y,x) [m] SIT_unc(time,y,x) [m] per-cell uncertainty (SE) SIC(time,y,x) [%] if present snow_depth(time,y,x) [m] if requested/present snow_depth_unc(time,y,x) [m] if requested/present quality_flag/status_flag/region_code if requested/present SIT_hem(time) [m] hemispheric mean SIT_hem_unc(time) [m] propagated SE = sqrt(sum(w^2 * se^2))/sum(w) SIT_hem_u(time) [m] hemispheric mean of per-cell uncertainty SIT_hem_n(time) [#] count of valid contributing cells
- Return type:
xarray.Dataset
- static laea_area_def(hem: str, nx: int, ny: int, dx_km: float, lon0: float = 0.0)[source]
Build a Lambert Azimuthal Equal-Area grid centered on the pole.
- Parameters:
hem ({'sh','nh'}) – Hemisphere.
nx (int) – Grid shape (x, y).
ny (int) – Grid shape (x, y).
dx_km (float) – Grid spacing in kilometers (projected meters).
lon0 (float, default 0.0) – Central longitude (°E). ESA products commonly use 0.
- Return type:
pyresample.geometry.AreaDefinition
- load_AF2020_FIA_summary(repeat_climatology: bool = True, start: str = '1994-01-01', end: str = '1999-12-31') Dataset[source]
Load AF2020 landfast sea-ice area (FIA) time series and build a DOY climatology.
This method reads the AF2020 FIA CSV, standardises sector names, produces a circumpolar daily time series, and computes a day-of-year (DOY) climatology. Optionally, the climatology is repeated over an arbitrary target time range for direct comparison against model series.
- Parameters:
repeat_climatology (bool, default True) – If True, returns a repeated climatology series (FIA_clim_repeat) over the specified [start, end] window. If False, returns only the raw FIA series and DOY climatology.
start (str, default ("1994-01-01", "1999-12-31")) – Date window used when repeating the climatology (daily frequency).
end (str, default ("1994-01-01", "1999-12-31")) – Date window used when repeating the climatology (daily frequency).
- Returns:
Variables: - FIA_obs(time, region) : observed FIA time series (typically “circumpolar”) - FIA_clim(doy, region) : DOY climatology - FIA_clim_repeat(time, region) : climatology mapped to [start, end] (optional)
- Return type:
xr.Dataset
- Raises:
FileNotFoundError – If the CSV path does not exist.
ValueError – If expected columns required for climatology construction are missing.
Notes
Units are converted from km² to 10³ km² by dividing by 1000.
Leap-day handling: this implementation drops DOY 366 when repeating unless explicitly handled; ensure consistency with your model calendar.
- load_local_NSIDC(dt0_str=None, dtN_str=None, local_directory=None, monthly_files=False)[source]
Load NSIDC sea-ice concentration NetCDF files from a local directory tree.
The loader constructs expected filenames for each day (or month) between dt0_str and dtN_str, selects the appropriate file version based on your configured version start dates, and uses xarray.open_mfdataset to open all available files in one dataset.
- Parameters:
dt0_str (str, optional) – Start/end dates in “YYYY-MM-DD” format. If omitted, defaults to self.dt0_str and self.dtN_str.
dtN_str (str, optional) – Start/end dates in “YYYY-MM-DD” format. If omitted, defaults to self.dt0_str and self.dtN_str.
local_directory (str or pathlib.Path, optional) – Root directory containing NSIDC files. If omitted, defaults to Path(self.NSIDC_dict[“D_original”], self.hemisphere, freq_str).
monthly_files (bool, default False) – If True, treat the dataset as monthly and search one file per month. If False, treat the dataset as daily.
- Returns:
Multi-file dataset combined “by_coords”.
- Return type:
xr.Dataset
- Raises:
FileNotFoundError – If no matching files are found within the requested window.
Notes
A small preprocess hook promotes time as a coordinate when datasets contain a tdim dimension.
Missing files are logged as warnings and skipped.
- make_daily_gridded_SIT(dt0_str: str, dtN_str: str, hemisphere: str | None = None, institutions: Iterable[str] | None = ('ESA', 'AWI'), sensors: Iterable[str] | None = None, levels: Iterable[str] | None = None, versions: Iterable[str] | None = None, root_esa: Path | None = None, root_awi: Path | None = None, roi_km: float = 75.0, neighbours: int = 16, epsilon: float = 0.0, gaussian_sigma_km: float | None = None, lat_cut: float | None = None, time_at_noon: bool = True, prefer: str = 'last', area_def=None) Dataset[source]
Create a daily gridded sea-ice thickness (SIT) dataset over a time window in one call.
This convenience wrapper performs: 1) File discovery via build_sea_ice_satellite_paths(…) across ESA and/or AWI layouts. 2) Indexing and de-duplication to one candidate file per UTC day. 3) Swath-to-grid resampling with l2p_to_daily_grid_pyresample(…).
- Parameters:
dt0_str (str) – Start/end of the time window (inclusive). Any pandas-parsable date string.
dtN_str (str) – Start/end of the time window (inclusive). Any pandas-parsable date string.
hemisphere ({"sh","nh"} or None, optional) – Hemisphere selector; defaults to self.hemisphere_dict[‘abbreviation’] when present.
institutions (iterable of str or None, default ("ESA","AWI")) – Institutions to search (case-insensitive).
sensors (iterable of str or None, optional) – Optional filters passed to build_sea_ice_satellite_paths(…).
levels (iterable of str or None, optional) – Optional filters passed to build_sea_ice_satellite_paths(…).
versions (iterable of str or None, optional) – Optional filters passed to build_sea_ice_satellite_paths(…).
root_esa (pathlib.Path or None, optional) – Root directories. If None, falls back to configured defaults (module globals or class config).
root_awi (pathlib.Path or None, optional) – Root directories. If None, falls back to configured defaults (module globals or class config).
roi_km (float, default 75.0) – Radius of influence for neighbour search (km).
neighbours (int, default 16) – Maximum neighbours per target grid cell.
epsilon (float, default 0.0) – KD-tree epsilon tolerance (speed/accuracy trade-off).
gaussian_sigma_km (float or None, optional) – If set, apply Gaussian distance weights with this sigma (km). If None, uniform weights.
lat_cut (float or None, optional) – Additional latitude filter applied to swath points (hemisphere-aware).
time_at_noon (bool, default True) – If True, output time coordinate is day+12:00Z; otherwise midnight.
prefer ({"first","last"}, default "last") – If multiple files map to a day, choose first/last after deterministic sorting.
area_def (pyresample.geometry.AreaDefinition or None, optional) – Target grid. If None, uses internal EASE/LAEA-like convenience grids.
- Returns:
Daily gridded dataset with dimensions (time, y, x) including: - SIT, SIT_unc, u_cell, n_cell (gridded) - SIT_hem, SIT_hem_unc, SIT_hem_u, SIT_hem_n (hemispheric aggregates)
- Return type:
xarray.Dataset
- make_monthly_gridded_SIT_L3(dt0_str: str, dtN_str: str, hemisphere: str | None = None, institutions: Iterable[str] | None = ('ESA', 'AWI'), sensors: Iterable[str] | None = None, levels: Iterable[str] | None = ('L3C', 'l3cp_release'), versions: Iterable[str] | None = None, root_esa: Path | None = None, root_awi: Path | None = None, prefer: str = 'last', sic_thresh_percent: float | None = 15.0, mask_strategy: str = 'none', include_snow: bool = True, include_flags: bool = True, min_obs: int = 1, time_midpoint: bool = True) Dataset[source]
Create a monthly gridded SIT dataset from Level-3 products (ESA L3C / AWI l3cp_release).
This convenience wrapper performs: 1) File discovery via build_sea_ice_satellite_paths(…). 2) Indexing and de-duplication to one file per (year, month). 3) Assembly into a single (time, y, x) dataset using l3c_paths_to_monthly_grid(…).
- Parameters:
dt0_str (str) – Start/end of the time window (inclusive).
dtN_str (str) – Start/end of the time window (inclusive).
hemisphere ({"sh","nh"} or None, optional) – Hemisphere selector; defaults to self.hemisphere_dict[‘abbreviation’] when present.
institutions (iterable of str or None, optional) – Filters passed to build_sea_ice_satellite_paths(…). By default, levels targets monthly L3 products (e.g., (“L3C”,”l3cp_release”)).
sensors (iterable of str or None, optional) – Filters passed to build_sea_ice_satellite_paths(…). By default, levels targets monthly L3 products (e.g., (“L3C”,”l3cp_release”)).
levels (iterable of str or None, optional) – Filters passed to build_sea_ice_satellite_paths(…). By default, levels targets monthly L3 products (e.g., (“L3C”,”l3cp_release”)).
versions (iterable of str or None, optional) – Filters passed to build_sea_ice_satellite_paths(…). By default, levels targets monthly L3 products (e.g., (“L3C”,”l3cp_release”)).
root_esa (pathlib.Path or None, optional) – Root directories. If None, falls back to configured defaults (module globals or class config).
root_awi (pathlib.Path or None, optional) – Root directories. If None, falls back to configured defaults (module globals or class config).
prefer ({"first","last"}, default "last") – If multiple files map to a month, choose first/last after deterministic sorting.
sic_thresh_percent (float or None, default 15.0) – If set, mask cells with SIC below this threshold (percent).
mask_strategy ({"none","quality","status","both"}, default "none") – Optional QC masking using quality_flag and/or status_flag when present.
include_snow (bool, default True) – Include snow depth fields when present.
include_flags (bool, default True) – Include flag fields (quality/status/region code) when present.
min_obs (int, default 1) – If a per-cell observation count exists, require count >= min_obs.
time_midpoint (bool, default True) – If time bounds are present, set time coordinate to the midpoint.
- Returns:
Monthly gridded dataset with dimensions (time, y, x), including: - SIT, SIT_unc, optional SIC/snow/flags - SIT_hem* aggregates and optional SIC-weighted SIT aggregates (SIT_wgt*)
- Return type:
xarray.Dataset
sea_ice_metrics
- class sea_ice_metrics.SeaIceMetrics(**kwargs)[source]
Bases:
objectCompute diagnostic and skill metrics for processed sea-ice classifications.
This class aggregates common post-processing calculations used in AFIM workflows, including:
building standardised input dictionaries for metrics computations (fast ice / pack ice / total sea ice),
computing hemisphere-integrated time series (area, volume, thickness, rates),
computing spatial summary fields (temporal mean thickness; spatial rates),
computing inter-comparison skill statistics against observations,
computing seasonal statistics (timing, growth/retreat slopes, extrema),
computing persistence-distance statistics relative to an Antarctic coastline,
loading previously computed metrics from Zarr and padding/reindexing time.
The principal entry point is compute_sea_ice_metrics(), which ingests a dictionary of masks and core prognostic/tendency fields (already prepared elsewhere), computes a suite of metrics, and optionally writes the results to a consolidated Zarr archive.
Expected external configuration
This class is designed to operate inside the broader AFIM stack and expects several attributes to exist on self, typically injected via kwargs or a toolbox manager:
- loggerlogging.Logger
For progress/debug logging.
- CICE_dictdict
- Must contain at least:
“time_dim” and “spatial_dims”
“drop_coords” (list of coordinate vars to drop before computations)
“FI_chunks” (chunking spec for small 1D/2D outputs)
- dt0_str, dtN_strstr
Analysis window bounds (“YYYY-MM-DD”).
- D_metricsstr | Path
Output directory for metrics Zarr archives.
- metrics_namestr
Suffix used when constructing default metrics filenames.
- ice_typestr
e.g., “FI”, “FI_Tb_bin”, etc. Cleaned internally to one of {“FI”,”PI”,”SI”}.
- FIC_scalefloat
Scaling factor applied to area (commonly used to convert m² → km² or similar).
- sim_configdict
Metadata inserted into output dataset attributes and/or scalars.
- AF_FI_dictdict
Must include path to AF2020 fast-ice area observations (e.g., “P_AF2020_FIA”).
- BAS_dictdict
Must include coastline shapefile path (e.g., “P_Ant_Cstln”) for persistence distance.
- BorC2T_type, ispd_thresh, D_zarr, etc.
Used by load_computed_metrics() to locate metrics on disk.
Required external methods (defined elsewhere in your stack)
compute_hemisphere_ice_area(…)
compute_hemisphere_ice_volume(…)
compute_hemisphere_ice_thickness(…)
compute_hemisphere_variable_aggregate(…)
compute_NSIDC_metrics() (for PI/SI observational comparisons)
persistence_stability_index(…)
load_cice_grid(slice_hem=True)
_prepare_BAS_coast(…)
compute_grounded_iceberg_area() (optional, for GI area accounting)
Notes
Many methods are written to be safe with Dask-backed inputs. Where appropriate, the implementation forces local computation to avoid shipping large task graphs.
Output writing uses Zarr v2 (explicitly zarr_format=2) for compatibility.
- antarctic_year_labels(time_da: DataArray) DataArray[source]
Construct Antarctic-year labels for a time coordinate (Jul→Jun “years”).
Antarctic-year (AY) grouping is commonly defined such that AY YYYY runs from July YYYY through June YYYY+1. This helper returns an integer label “ayear” for each timestamp:
months Jan–Jun are assigned to the previous calendar year,
months Jul–Dec are assigned to the current calendar year.
- Parameters:
time_da (xr.DataArray) – Time coordinate DataArray supporting .dt.year and .dt.month.
- Returns:
Integer DataArray named “ayear” aligned with time_da.
- Return type:
xr.DataArray
Examples
2000-06-15 -> ayear = 1999
2000-07-01 -> ayear = 2000
- compute_area_weighted_strength_hpa(SIC: DataArray, HI: DataArray, IS: DataArray, A: DataArray, spatial_dim_names: list | None = None, sic_threshold: float | None = None, hmin: float = 0.05, assume_IS_units: str = 'N/m', mask_invalid: bool = True) DataArray[source]
Compute an area-weighted mean ice “pressure-equivalent” in hectopascals (hPa).
This is a robust hemispheric aggregate designed to avoid two common pathologies: 1) summing a pressure-like field instead of averaging (domain-size dependence), 2) division-by-small-thickness spikes (handled via hmin and masking).
The calculation proceeds as: - Define a valid-ice mask: SIC > sic_threshold AND HI > hmin AND finite(IS) - Convert the internal strength variable to a pressure-like field in Pa:
if IS is in N/m, then P = IS / HI (Pa)
if IS is already in Pa, then P = IS
- Compute an area-weighted mean:
P_mean = sum(P * A) / sum(A) over the masked region
Convert Pa -> hPa (divide by 100)
- Parameters:
SIC (xr.DataArray) – Sea ice concentration [0–1].
HI (xr.DataArray) – Sea ice thickness [m].
IS (xr.DataArray) – Internal ice strength field. Expected units depend on assume_IS_units: - “N/m”: common for CICE strength diagnostics; converted to Pa via division by HI. - “Pa” : if your diagnostic is already pressure-like.
A (xr.DataArray) – Grid-cell area [m^2], broadcastable to SIC/HI/IS.
spatial_dim_names (list[str], optional) – Spatial dimensions to reduce over. Defaults to self.CICE_dict[“spatial_dims”].
sic_threshold (float, optional) – Concentration threshold for including cells. Defaults to self.icon_thresh.
hmin (float, default 0.05) – Minimum thickness (m) required to include a cell. Prevents spikes from dividing by very small HI and excludes near-zero ice.
assume_IS_units ({"N/m","Pa"}, default "N/m") – Controls conversion to Pa.
mask_invalid (bool, default True) – If True, masks non-finite values after calculation.
- Returns:
Area-weighted mean pressure-equivalent in hPa.
- Return type:
xr.DataArray
Notes
This is an analysis diagnostic: it translates a CICE internal strength-like quantity
into a pressure-equivalent for hemispheric comparison. Interpret with care. - If you want the same mask to be applied across multiple experiments, pass a common mask or use a common SIC/HI selection upstream.
- compute_hemisphere_area_weighted_stress(tau: DataArray, A: DataArray, spatial_dim_names: list | None = None, mask: DataArray | None = None, out_units: str = 'Pa', return_abs: bool = True, min_area_m2: float = 0.0, name: str | None = None) Dataset[source]
Compute a hemispheric, area-weighted aggregate of a stress component (or magnitude).
Designed for stress-like diagnostics such as lateral-drag stresses (e.g., KuxE, KuyN) whose native units are N/m^2 (= Pa). This yields a hemispheric mean stress time series with sensible physical units (Pa, hPa, or kPa) and a robust treatment of masks.
The aggregate is an area-weighted mean over the valid region:
<tau>_A(t) = sum( tau(t,…) * A(…) ) / sum( A(…) )
If return_abs=True, this also returns the area-weighted mean of |tau|, which is often more interpretable for signed stress components that alternate in sign over space.
- Parameters:
tau (xr.DataArray) – Stress component field in N/m^2 (Pa). Must include a time dimension and spatial dimensions in spatial_dim_names.
A (xr.DataArray) – Grid-cell area (m^2) corresponding to tau’s grid (e.g., earea for E-grid, narea for N-grid). Broadcastable to tau’s spatial dims.
spatial_dim_names (list[str], optional) – Spatial dimensions to reduce over (e.g., [“nj”, “ni”]). Defaults to self.CICE_dict[“spatial_dims”].
mask (xr.DataArray, optional) – Boolean mask (True=keep) broadcastable to tau. Use this to restrict aggregation to a region (e.g., fast-ice mask, SIC threshold mask, coastal mask, etc.). If None, uses all finite tau cells.
out_units ({"Pa","hPa","kPa"}, default "Pa") – Unit for the returned time series.
return_abs (bool, default True) – If True, return both signed mean stress and mean absolute stress.
min_area_m2 (float, default 0.0) – If >0, timesteps where the valid area sum is <= min_area_m2 are masked (set to NaN). Helpful to prevent noisy values when the valid mask becomes tiny.
name (str, optional) – Name root used for output variables. Defaults to tau.name if present, else “tau”.
- Returns:
Dataset containing: - f”{name}_mean” : area-weighted mean stress (signed) - f”{name}_abs_mean” : area-weighted mean |stress| (if return_abs=True) - “valid_area_m2” : total valid area used in weighting
Units of *_mean variables are Pa/hPa/kPa as requested.
- Return type:
xr.Dataset
Notes
N/m^2 is exactly Pascal. So converting to hPa uses /100, and to kPa uses /1000.
Prefer |tau| (absolute mean) when comparing “strength” of drag across experiments,
because signed components can cancel spatially. - If you want a vector magnitude across components (e.g., sqrt(Kux^2+Kuy^2)), compute that first and pass as tau.
- compute_hemisphere_ice_area(SIC, A, ice_area_scale=None, spatial_dim_names=None, sic_threshold=None, add_grounded_iceberg_area=None, grounded_iceberg_area=None, region=None)[source]
Compute the total sea ice area (IA) by integrating sea ice concentration (SIC) over the grid area, and optionally including the grounded iceberg area (GI_total_area).
This method computes the sea ice area by multiplying the sea ice concentration (SIC) by the grid cell area (GC_area), summing the result over the specified spatial dimensions, and optionally adding the grounded iceberg area. The final result is then scaled by an optional ice area scale factor.
Parameters:
- SICxarray.DataArray
Sea ice concentration field with dimensions (nj, ni) representing the concentration of sea ice at each grid point.
- Axarray.DataArray
Grid cell area with the same dimensions (nj, ni) as SIC, representing the area of each grid cell in square meters.
- ice_area_scalefloat, optional
A scale factor for the sea ice area calculation. If not provided, the default scale is used from the FIC_scale attribute.
- spatial_dim_nameslist of str, optional
The dimension names over which to sum the sea ice area (typically latitude and longitude). If not provided, the default spatial dimensions are used from the CICE_dict attribute.
- sic_thresholdfloat, optional
Minimum SIC value to be included in the area calculation. Points with SIC below this threshold are excluded. Defaults to self.icon_thresh.
- add_grounded_iceberg_areabool, optional
A flag indicating whether to include the grounded iceberg area in the ice area calculation. Defaults to the class attribute use_gi.
- grounded_iceberg_areafloat, optional
The grounded iceberg area in square meters to be added to the sea ice area. If not provided, the grounded iceberg area is computed using compute_grounded_iceberg_area().
Returns:
- IAxarray.DataArray
The total sea ice area, including the optional grounded iceberg area, with the same spatial dimensions as SIC and GC_area. The value is returned after summing over the spatial dimensions and scaling by the provided ice_area_scale.
Notes:
The grounded iceberg area is added to the sea ice area calculation if add_grounded_iceberg_area is True. If no grounded iceberg area
is provided, the method will compute it using the compute_grounded_iceberg_area() method. - The sea ice area is computed as the sum of SIC * GC_area across the specified spatial dimensions, and then scaled by ice_area_scale.
- compute_hemisphere_ice_area_rate(DAT: DataArray, IA: DataArray, A: DataArray, spatial_dim_names: list | None = None, mode: str = 'fractional', out_units: str = 'per_day', IA_min_m2: float = 50000000000.0, mask_invalid: bool = True) DataArray[source]
Compute a hemisphere-aggregated ice-area tendency rate with optional denominator gating.
This function aggregates a gridded area-tendency field over the hemisphere and returns either:
- an absolute integrated tendency:
d(Area)/dt [km^2/s] or [km^2/day]
- a fractional tendency (normalised by instantaneous integrated ice area):
(1/Area) d(Area)/dt [1/s] or [1/day]
The fractional form is often the most comparable across experiments, but it can become numerically unstable when the integrated ice area is small. To mitigate this, a minimum area floor IA_min_m2 is applied in fractional mode (values are masked where IA <= IA_min_m2).
- Parameters:
DAT (xr.DataArray) – Local ice-area tendency field, typically in [1/s] (e.g., dynamic area tendency), with dimensions including time and the spatial dimensions in spatial_dim_names. NOTE: DAT is assumed to act on the same area basis as A (grid-cell area).
IA (xr.DataArray) – Hemisphere-integrated ice area time series in [m^2], used as the denominator in fractional mode. Must share the same time coordinate as DAT.
A (xr.DataArray) – Grid-cell area in [m^2], broadcastable to the spatial dimensions of DAT.
spatial_dim_names (list[str], optional) – Spatial dimensions to sum over (e.g., [“nj”, “ni”]). Defaults to self.CICE_dict[“spatial_dims”].
mode ({"fractional","absolute"}, default "fractional") –
“fractional”: return (DAT*A).sum / IA -> [1/s] or [1/day]
”absolute” : return (DAT*A).sum -> [m^2/s] converted to [km^2/s] or [km^2/day]
out_units ({"per_day","per_second"}, default "per_day") – Output time unit scaling. For “per_day”, multiplies by 86400.
IA_min_m2 (float, default 5e10) – Minimum integrated ice area (m^2) required to compute fractional rates. Values where IA <= IA_min_m2 are masked (set to NaN) in fractional mode. A typical order-of-magnitude choice is 1e10–1e11 m^2 (~10^4–10^5 km^2).
mask_invalid (bool, default True) – If True, masks non-finite values after calculation (NaN/inf).
- Returns:
Hemisphere-aggregated area tendency rate. Units: - mode=”absolute” : km^2/s or km^2/day - mode=”fractional”: 1/s or 1/day
- Return type:
xr.DataArray
Notes
This function replaces the previous hard-coded /1e9 scaling and instead returns
physically interpretable units (km^2/time for absolute, 1/time for fractional). - If your IA is not in m^2, convert it before calling this function. - If DAT is not in 1/s, units will differ accordingly; the calculations here assume the common CICE convention for area tendencies.
- compute_hemisphere_ice_strength(SIC, HI, IS, spatial_dim_names: list = None, sic_threshold: float = None, ice_strength_scale: float = 100)[source]
Compute hemispheric mean sea ice pressure in hectopascals (hPa), based on masked concentration, thickness, strength, and area fields.
This function calculates an area-weighted mean ice thickness (m) by dividing total ice volume by total ice-covered area. It then converts internal ice strength (N/m) into an equivalent pressure field (Pa = N/m²), and finally into hectopascals (hPa). Grid cells below a threshold sea ice concentration are masked from the analysis.
INPUTS:
SIC : xr.DataArray; Sea ice concentration (unitless, typically 0–1), per grid cell. HI : xr.DataArray; Sea ice thickness in meters, per grid cell. IS : xr.DataArray; Internal ice strength in N/m, per grid cell. A : xr.DataArray; Grid cell area in m². spatial_dim_names : list, optional; Names of the spatial dimensions to reduce over (e.g., [“nj”, “ni”]).
If not provided, defaults to values in self.CICE_dict[‘spatial_dims’].
sic_threshold : float, optional; Threshold for masking sea ice concentration (default: self.icon_thresh).
OUTPUTS:
- Pxr.DataArray; Sea ice pressure in hectopascals (hPa), with masked cells excluded
and computed as (IS / mean_thickness) / 100.
Notes
The output pressure is an approximation derived from CICE’s internal strength field and is not a direct model output.
Use with caution where sea ice thickness is very small or near-zero.
- compute_hemisphere_ice_thickness(SIC, HI, A, spatial_dim_names=None, sic_threshold=None)[source]
Compute average sea ice thickness weighted by grid cell area and sea ice concentration.
This method calculates the domain-averaged sea ice thickness as the ratio of the total sea ice volume to the total sea ice area. A SIC threshold is used to exclude low-concentration grid cells from both the numerator and denominator.
- Parameters:
HI (xarray.DataArray) – Sea ice thickness in meters.
SIC (xarray.DataArray) – Sea ice concentration (unitless, typically between 0 and 1).
GC_area (xarray.DataArray) – Grid cell area in square meters (m²).
spatial_dim_names (list of str, optional) – Names of spatial dimensions over which to compute the sums. Defaults to self.CICE_dict[‘spatial_dims’].
sic_threshold (float, optional) – Minimum SIC value to include in the average. Grid cells with SIC ≤ sic_threshold are excluded. Defaults to self.icon_thresh.
- Returns:
Domain-averaged sea ice thickness, computed as the total volume divided by total area.
- Return type:
float or xarray.DataArray
Notes
Computed as: ∑(HI × area) / ∑(SIC × area), with masking by sic_threshold.
Excludes cells with SIC below threshold from both numerator and denominator.
Units are in meters.
- compute_hemisphere_ice_volume(SIC, HI, A, add_grounded_iceberg_area=None, ice_volume_scale=1000000000000.0, spatial_dim_names=None, sic_threshold=None, grounded_iceberg_area=None)[source]
Compute total sea ice volume by integrating sea ice concentration, thickness, and grid cell area.
This method calculates the total sea ice volume as the sum of the product of sea ice concentration (SIC), thickness (HI), and grid cell area (GC_area) across the model domain. A SIC threshold can be applied to exclude grid cells with low concentrations. Optionally includes grounded iceberg volume and applies a scaling factor for unit conversion.
- Parameters:
SIC (xarray.DataArray) – Sea ice concentration (unitless, typically between 0 and 1).
HI (xarray.DataArray) – Sea ice thickness in meters.
A (xarray.DataArray) – Grid cell area in square meters (m²).
ice_volume_scale (float, optional) – Scale factor for the output volume. Default is 1e12, converting m³ to 1000 km³.
spatial_dim_names (list of str, optional) – Names of spatial dimensions to sum over (e.g., [‘nj’, ‘ni’]). Defaults to self.CICE_dict[‘spatial_dims’].
sic_threshold (float, optional) – Minimum SIC value to be included in the volume calculation. Grid cells with SIC ≤ sic_threshold are masked. Defaults to self.icon_thresh.
- Returns:
Total sea ice volume, optionally including grounded ice and scaled (e.g., in 1000 km³ if default scale is used).
- Return type:
float
Notes
SIC, HI, and GC_area must share the same grid and be broadcastable.
Grid cells below sic_threshold are excluded from the volume calculation.
If self.use_gi is True, a grounded iceberg area is converted to volume and included.
- compute_hemisphere_ice_volume_rate(DVT: DataArray, IV: DataArray | None, A: DataArray, SIC: DataArray | None = None, spatial_dim_names: list | None = None, mode: str = 'fractional', out_units: str = 'per_day', IV_min_m3: float = 50000000000.0, sic_threshold: float | None = None, mask_invalid: bool = True, assume_units: str = 'auto') DataArray[source]
Compute a hemisphere-aggregated ice-volume tendency rate with sensible units and optional denominator gating.
This function aggregates a gridded “volume tendency” field (CICE history typically stores dvidtt / dvidtd in cm/day, representing an equivalent thickness tendency) into:
- an absolute integrated volume tendency:
dV/dt [km^3/s] or [km^3/day]
- a fractional (normalised) tendency:
(1/V) dV/dt [1/s] or [1/day]
The fractional form is often the most comparable across experiments, but it can become numerically unstable when the integrated ice volume is small. To mitigate this, a minimum volume floor IV_min_m3 is applied in fractional mode (values are masked where IV <= IV_min_m3).
- Parameters:
DVT (xr.DataArray) – Local ice “volume tendency” field. For standard CICE diagnostics: - dvidtt / dvidtd typically have units “cm/day” (equivalent thickness tendency). Must have dims including time and the spatial dims in spatial_dim_names.
IV (xr.DataArray or None) – Hemisphere-integrated ice volume time series in [m^3], used as the denominator in fractional mode. Must share the same time coordinate as DVT. Required if mode=”fractional”; can be None if mode=”absolute”.
A (xr.DataArray) – Grid-cell area in [m^2], broadcastable to the spatial dimensions of DVT. (For CICE dvidtt/dvidtd: thickness tendency * area -> volume tendency.)
SIC (xr.DataArray, optional) – Sea-ice concentration used to mask contributions. If provided, cells with SIC <= sic_threshold are excluded. If None, no SIC masking is applied.
spatial_dim_names (list[str], optional) – Spatial dimensions to sum over (e.g., [“nj”,”ni”]). Defaults to self.CICE_dict[“spatial_dims”].
mode ({"fractional","absolute"}, default "fractional") –
“fractional”: return (dV/dt)/IV -> [1/s] or [1/day]
”absolute” : return dV/dt -> [km^3/s] or [km^3/day]
out_units ({"per_day","per_second"}, default "per_day") – Output time unit. “per_day” scales by 86400 relative to per-second base.
IV_min_m3 (float, default 5e10) – Minimum integrated ice volume (m^3) required to compute fractional rates. Values where IV <= IV_min_m3 are masked (set to NaN) in fractional mode. Order-of-magnitude guidance: - fast-ice V ~ 1e10–1e11 m^3 (10–100 km^3) - sea-ice V ~ 1e12–1e13 m^3 (1000–10000 km^3)
sic_threshold (float, optional) – SIC threshold for masking. Defaults to self.icon_thresh when SIC is provided.
mask_invalid (bool, default True) – If True, mask non-finite results (NaN/inf) after calculation.
assume_units ({"auto","cm/day","m/day","m/s"}, default "auto") – How to interpret DVT units. If “auto”, tries to parse DVT.attrs[“units”]. If parsing fails and you have self.thickness_tendency_to_mps, it will be used as fallback.
- Returns:
Hemisphere-aggregated volume tendency rate time series. Units: - mode=”absolute” : km^3/s or km^3/day - mode=”fractional”: 1/s or 1/day
- Return type:
xr.DataArray
Notes
- For CICE dvidtt/dvidtd (cm/day): the physically consistent absolute tendency is:
dV/dt [m^3/day] = (DVT * 1e-2) * A summed over space
then converted to km^3/day by dividing by 1e9. - This function does NOT assume any hard-coded scaling like /1e9*86400 unless justified by units.
- compute_hemisphere_lateral_drag_stress(KuxE: DataArray | None = None, KuxN: DataArray | None = None, KuyE: DataArray | None = None, KuyN: DataArray | None = None, earea: DataArray | None = None, narea: DataArray | None = None, spatial_dim_names: list | None = None, maskE: DataArray | None = None, maskN: DataArray | None = None, out_units: str = 'Pa', min_area_m2: float = 0.0) Dataset[source]
Convenience wrapper to compute hemispheric aggregates for lateral-drag stress components.
This computes area-weighted mean and mean-absolute stresses for any subset of: KuxE, KuxN, KuyE, KuyN
using their appropriate grid-cell area weights: - E-grid stresses weighted by earea - N-grid stresses weighted by narea
- Parameters:
KuxE (xr.DataArray, optional) – Stress component fields (Pa) on E or N grids.
KuxN (xr.DataArray, optional) – Stress component fields (Pa) on E or N grids.
KuyE (xr.DataArray, optional) – Stress component fields (Pa) on E or N grids.
KuyN (xr.DataArray, optional) – Stress component fields (Pa) on E or N grids.
earea (xr.DataArray, optional) – Grid-cell areas (m^2) for E- and N-grids.
narea (xr.DataArray, optional) – Grid-cell areas (m^2) for E- and N-grids.
spatial_dim_names (list[str], optional) – Spatial reduction dims. Defaults to self.CICE_dict[“spatial_dims”].
maskE (xr.DataArray, optional) – Optional boolean masks for E-grid and N-grid fields.
maskN (xr.DataArray, optional) – Optional boolean masks for E-grid and N-grid fields.
out_units ({"Pa","hPa","kPa"}, default "Pa") – Units for output time series.
min_area_m2 (float, default 0.0) – Gating threshold for valid area per timestep.
- Returns:
Dataset containing time series for each provided field: - <name>_mean - <name>_abs_mean - <name>_valid_area_m2
Plus, if both x and y are present on the same grid, a magnitude aggregate: - K_uE_mag_mean / abs_mean (from KuxE,KuyE) - K_uN_mag_mean / abs_mean (from KuxN,KuyN)
- Return type:
xr.Dataset
Notes
For interpretation across experiments, the absolute mean and the magnitude
are usually the most stable.
- compute_hemisphere_variable_aggregate(da, time_coord_name='time', flat_mean=False, da2=None, scale=None)[source]
Compute a time-aggregated field from a DataArray across the specified time dimension.
This method calculates the mean of the input variable over time, with an optional normalisation step based on the maximum value across the time dimension. Designed for Dask-backed DataArrays to support scalable computation.
- Parameters:
da (xarray.DataArray) – Input data array to aggregate over time. Typically a geophysical variable such as sea ice thickness (m), strength (N/m), or velocity (m/s), with time as one of the dimensions.
time_coord_name (str, optional) – Name of the time dimension to aggregate over. Default is ‘time’.
flat_mean (bool, optional) – If True, the data has simple mean over time, Default is False.
- Returns:
A time-aggregated version of the input array. If normalise_max is True, the result represents the normalised mean occurrence across the time dimension; otherwise, it is the simple time mean.
- Return type:
xarray.DataArray
Notes
If the specified time_coord_name is not found in the input DataArray,
the original array is returned with a warning. - If normalise_max is enabled, the output represents a relative frequency-like measure, especially useful for thresholded or binary input fields (e.g., fast ice presence). - Assumes that all non-time dimensions are to be preserved in the output.
- compute_sea_ice_metrics(da_dict, P_mets_zarr, ice_type=None, dt0_str=None, dtN_str=None, ice_area_scale=None)[source]
- compute_seasonal_statistics(da, stat_name: str = 'FIA', window: int = 15, polyorder: int = 2, min_onset_doy: int = 50, min_retreat_doy: int = 240, growth_range: tuple[int, int] = (71, 273), retreat_early_range: tuple[int, int] = (273, 330), retreat_late_range: tuple[tuple[int, int], tuple[int, int]] = ((331, 365), (1, 71)), scaling_factor: float = 1000000.0, drop_leap_day: bool = True, return_per_year: bool = False)[source]
Compute seasonal timing and rate statistics from a daily 1D time series.
- This method is optimised for long daily time series (multi-year) by:
loading the full 1D series once,
computing Savitzky–Golay derivatives once per year,
using NumPy slicing for seasonal windows (minimising xarray overhead).
- It returns summary statistics (mean and standard deviation across years) for:
annual maxima and minima,
day-of-year (DOY) of maxima and minima,
onset timing (first positive derivative after min_onset_doy),
retreat timing (last negative curvature after min_retreat_doy),
duration (onset → retreat, wrap-aware),
growth slope (within growth_range),
retreat slope(s): early, late, and combined.
- Parameters:
da (xr.DataArray) – Daily 1D time series with dimension “time” (e.g., ice area in km²).
stat_name (str, default "FIA") – Label used for logging and downstream reporting.
window (int, default 15) – Savitzky–Golay filter window length (days). Will be coerced to an odd integer and adjusted upward if too small for polyorder.
polyorder (int, default 2) – Polynomial order for Savitzky–Golay derivatives.
min_onset_doy (int, default 50) – Earliest DOY at which onset is permitted (limits spurious early detections).
min_retreat_doy (int, default 240) – Earliest DOY at which retreat is permitted.
growth_range (tuple[int, int], default (71, 273)) – DOY window used to estimate growth slope.
retreat_early_range (tuple[int, int], default (273, 330)) – DOY window used to estimate early retreat slope within the same year.
retreat_late_range (tuple[tuple[int,int], tuple[int,int]], default ((331,365), (1,71))) –
- Late retreat window that may span the year boundary:
first tuple: late-year DOY window in current year
second tuple: early-year DOY window in next year
scaling_factor (float, default 1e6) – Multiplier applied to slope estimates to yield convenient units (e.g., km²/day).
drop_leap_day (bool, default True) – If True, removes Feb 29 to simplify DOY-consistent filtering.
return_per_year (bool, default False) – If True, also return a per-year table (pandas.DataFrame) with raw annual values.
- Returns:
summary (dict[str, float | None]) – Dictionary of mean/std summary metrics across years.
per_year (pandas.DataFrame, optional) – Returned only if return_per_year=True. Indexed by year, containing annual extrema, DOYs, and slope estimates.
- Raises:
ValueError – If da does not have a “time” dimension.
Notes
The routine iterates over years except the final year (by design) because the late-retreat window can require samples from the subsequent year.
Slope estimates use a lightweight least-squares implementation (_safe_slope) instead of np.polyfit to reduce overhead.
- compute_sector_FIA(FI_mask, area_grid, sector_defs, GI_area=False)[source]
Backwards-compatible wrapper for fast-ice area (FIA) sector computation.
This method preserves the original FI-specific API by delegating to the ice-type-agnostic compute_sector_ice_area().
- Parameters:
FI_mask (xr.DataArray) – Fast-ice mask on the same grid as area_grid.
area_grid (xr.DataArray) – Grid-cell area field with lon/lat coordinates.
sector_defs (dict) – Sector definition dictionary (see compute_sector_ice_area()).
GI_area (bool, default False) – If True, add grounded iceberg area to the domain total.
- Returns:
Sector areas and domain total, as returned by compute_sector_ice_area().
- Return type:
(xr.DataArray, float)
- compute_sector_ice_area(I_mask, area_grid, sector_defs, GI_area=False)[source]
Compute ice area per geographic sector and a domain total from an arbitrary ice mask.
This method generalises sector-based area computations for any ice type (FI/PI/SI) by integrating a boolean mask over an area grid within sector bounding boxes.
- Parameters:
I_mask (xr.DataArray) – Ice mask (boolean or 0/1) on the same grid as area_grid.
area_grid (xr.DataArray) – Grid-cell area field (typically m²). Must have coordinate variables lon and lat (either as DataArray attributes or attached coords) compatible with the sector definitions.
sector_defs (dict) –
- Sector definition dictionary. For each sector key, expects:
sector_defs[sec_name][“geo_region”] == (lon_min, lon_max, lat_min, lat_max)
GI_area (bool, default False) – If True, include grounded iceberg area in the domain total. This uses compute_grounded_iceberg_area() (called internally) and adds the result (converted to km²) to the total.
- Returns:
IA_da (xr.DataArray) – Sector ice areas (same units as area_grid integration), indexed by “sector”.
IA_tot (float) – Domain total ice area (sum of sectors) plus optional grounded iceberg contribution.
Notes
Current implementation assumes sectors do not cross the dateline in a way that requires wrap-aware longitude masking. If any sector bounds cross the dateline, adjust the masking logic accordingly.
If I_mask or area_grid are Dask-backed, each sector sum is computed independently; this is typically acceptable because the number of sectors is small.
- compute_skill_statistics(mod, obs, min_points=100)[source]
Compute model-vs-observation skill statistics for a time series.
The method attempts a direct time-series comparison by intersecting model and observation timestamps. If there are fewer than min_points intersecting times, it falls back to a climatological comparison using day-of-year means.
- Parameters:
mod (xr.DataArray) – Modelled time series (must have a “time” coordinate for direct alignment).
obs (xr.DataArray) – Observational time series. Ideally shares the same time basis as mod.
min_points (int, default 100) – Minimum number of intersecting timestamps required to compute direct time-series statistics. Otherwise, a climatology comparison is used.
- Returns:
- Dictionary of skill metrics:
Bias
RMSE
MAE
Corr
SD_Model
SD_Obs
Returns NaNs for all metrics if time alignment fails and climatology cannot be formed.
- Return type:
dict
Notes
Uses _safe_load_array_to_memory() to compute Dask-backed series locally and minimise graph shipping.
Climatology fallback uses groupby(“time.dayofyear”).mean(“time”) for both series.
- extrema_means_antarctic_year(da_ts: DataArray)[source]
Compute mean maxima/minima across Antarctic years for a 1D time series.
- Parameters:
da_ts (xr.DataArray) – 1D time series with dimension “time” (e.g., fast-ice area in km²).
- Returns:
- Dictionary containing:
”max_mean”: mean of per-AY maxima (xr.DataArray scalar)
”min_mean”: mean of per-AY minima (xr.DataArray scalar)
”max_ts” : per-AY maxima time series (dims: ayear)
”min_ts” : per-AY minima time series (dims: ayear)
- Return type:
dict
Notes
Uses antarctic_year_labels() and xarray groupby reductions.
For short 1D series, it is acceptable to fully load/compute in memory.
- load_computed_metrics(class_method: str = 'binary-days', BorC2T_type: str = None, ice_type: str = None, ispd_thresh: float = None, bin_min_days: int = None, bin_win_days: int = None, mean_period: int = None, zarr_directory: str = None, clip_to_self: bool = True, time_dim: str = 'time')[source]
Load a previously computed metrics Zarr store and optionally clip/pad to the current analysis window.
- metrics_data_dict(I_mask, I_data, A, *, ice_type=None, required=None, optional=None)[source]
Build a standardised input dictionary for FI/PI/SI metrics.
This method: - validates required variables exist in I_data, - adds optional variables if present, - always includes the ice mask and T-grid area (tarea), - and (NEW) auto-adds any referenced cell-area measures (e.g., earea/narea)
- when a variable declares them via CF cell_measures, e.g.:
cell_measures: “area: earea”
This is needed for lateral-drag stress diagnostics (KuxE/KuxN/KuyE/KuyN).
- Parameters:
I_mask (xr.DataArray) – Ice-type mask (e.g., FI/PI/SI), typically with dims (time, nj, ni) or (nj, ni).
I_data (xr.Dataset or dict-like) – Source dataset/dictionary containing model variables.
A (xr.DataArray) – T-grid cell area (m^2), stored into the output dict as “tarea”.
ice_type (str, optional) – Ice type key (“FI”,”PI”,”SI”). Defaults to self.ice_type.
required (list[str], optional) – Required variable names that must exist in I_data.
optional (list[str], optional) – Optional variable names to include if present in I_data.
- Returns:
Dictionary containing mask, tarea, and required/optional fields.
- Return type:
dict
- Raises:
KeyError – If any required variables are missing.
Notes
NEW: If a present variable declares an area measure via cell_measures,
and that area variable exists in I_data, it is added automatically. This supports stress diagnostics with earea/narea weights.
- persistence_ice_distance_mean_max(ice_prstnc, persistence_threshold: float = 0.8, path_coast_shape: str = None, crs_out: str = 'EPSG:3031')[source]
Compute mean and maximum distance from the Antarctic coastline for persistent ice.
This method measures how far persistent ice (e.g., persistent fast ice) extends from the coastline by:
thresholding a persistence field ice_prstnc in [0, 1],
projecting grid coordinates into a metric CRS (default EPSG:3031),
querying nearest coastline points via a pre-built KD-tree, and
returning mean and maximum distances (km) of persistent grid cells.
- Parameters:
ice_prstnc (xr.DataArray) – Persistence field in [0, 1], typically computed from a binary mask over a seasonal window (e.g., May–Oct). Must be on the same spatial grid as self.G_t.
persistence_threshold (float, default 0.8) – Cells with persistence >= threshold are treated as “persistent” for distance computations.
path_coast_shape (str, optional) – Path to a coastline shapefile. Defaults to self.BAS_dict[“P_Ant_Cstln”].
crs_out (str, default "EPSG:3031") – Output CRS used for distance computations. Must be a metric CRS.
- Returns:
- Dictionary with keys:
”persistence_mean_distance”: mean distance (km)
”persistence_max_distance” : max distance (km)
- Return type:
dict
- Raises:
KeyError – If longitude/latitude fields cannot be found in self.G_t.
ValueError – If ice_prstnc shape does not match the loaded grid coordinates.
Exception – Propagates errors from coordinate transforms or KD-tree construction.
Notes
The coastline KD-tree is cached on self as _coast_kdtree and rebuilt only if crs_out changes.
The method forces local computation of ice_prstnc (via .compute()) using the threaded scheduler to avoid distributing a large Dask graph.
- persistence_stability_index(I_mask, A, persistence_threshold: float = 0.8, winter_months: tuple = (5, 6, 7, 8, 9, 10), area_scale: float = 1000000000.0) dict[source]
Compute FIPSI (Fast-Ice Persistence Stability Index) from a binary fast-ice mask.
Definition
- FIPSI = (area of grid cells whose WINTER fast-ice persistence ≥ threshold)
/ (area of grid cells that have fast ice at least once in WINTER),
where persistence is the fraction of winter days (e.g., May–Oct) that a cell is fast.
- param I_mask:
Binary fast-ice mask (0/1) with a ‘time’ dimension; if a Dataset is given, a variable named ‘FI_mask’ is used.
- type I_mask:
xarray.DataArray or xarray.Dataset
- param A:
Grid-cell area on the same grid; if a Dataset is given, the first of {‘tarea’,’area’,’TAREA’} is used. If A has a ‘time’ dim, the first slice is used.
- type A:
xarray.DataArray or xarray.Dataset
- param persistence_threshold:
Minimum winter persistence to classify a cell as persistent (default 0.8).
- type persistence_threshold:
float, optional
- param winter_months:
Months to include for the winter persistence (default May–Oct).
- type winter_months:
tuple[int], optional
- returns:
- {‘persistence_stability_index’: float, # FIPSI in [0,1] or NaN if undefined
‘area_persistent_winter’: float, # area of persistent winter FI (units of A) ‘area_ever_winter’: float # area with any winter FI (units of A)}
- rtype:
dict
Notes
Uses the class’s spatial_dims to sum areas.
Assumes I_mask is truly binary (0/1). For classification outputs that are boolean, they will be cast to float internally.
Designed to be Dask-friendly: only two area reductions are computed eagerly.
sea_ice_plotter
- class sea_ice_plotter.SeaIcePlotter(**kwargs)[source]
Bases:
objectPlotting utilities for AFIM / CICE sea-ice diagnostics.
This class groups helper methods that prepare gridded xarray.DataArray fields for plotting (primarily with PyGMT), including:
Writing custom CPT (colour palette table) files for continuous or categorical difference fields (e.g., obs–model agreement maps).
Building “alpha/transparency” layers that modulate opacity by confidence or persistence strength.
Resolving 2D lon/lat coordinates from a DataArray via explicit coordinate names, inferred coordinate names, or fallback class grid objects (B-grid / T-grid).
Converting a 2D DataArray into 1D lon/lat/z vectors + masks suitable for pygmt.Figure.plot / pygmt.Figure.grdimage workflows.
Notes
Several methods assume the class has access to grid datasets such as self.G_t and self.G_u, and a method self.load_bgrid(slice_hem=True) that populates them.
The methods here are designed to be safe with Dask-backed DataArrays: they will compute only when necessary and will avoid materialising coordinate grids unless required.
CPT writing methods create plain text files on disk and return the resolved path.
Expected Attributes / Methods
- _hex_to_rgbcallable
Helper that converts hex colour strings (e.g. “#2CA25F”) to integer RGB triplets.
- load_bgridcallable
Loads grid coordinate datasets (at least self.G_t and/or self.G_u).
- G_t, G_uxarray.Dataset
Grid datasets holding 2D lon/lat fields under keys like “lon” and “lat”.
- normalise_longitudescallable
Converts longitudes to a target convention (“0-360” or “-180-180”).
- create_IBCSO_bath()[source]
Extract and save a masked IBCSO bathymetry layer for Antarctic plotting.
This method loads the IBCSO v2.0 dataset (as a NetCDF raster), extracts the seafloor depth (negative elevations), masks out land areas (positive or zero), and saves a cleaned version to NetCDF for use in plotting.
The result is saved at the path specified in self.config[‘pygmt_dict’][‘P_IBCSO_bath’].
- Return type:
None
Notes
Input file is assumed to be a NetCDF raster with variable band_data.
Only band=0 is used (i.e., the main bathymetric band).
Output variable is named bath and excludes attributes and encodings for simplicity.
This method is typically called once to prepare the bathymetry layer before reuse.
See also
-,-
- create_cbar_frame(series, label, units=None, extend_cbar=False, max_ann_steps=10)[source]
Construct a GMT-style colorbar annotation frame string for PyGMT plotting.
This utility generates a clean, readable colorbar frame string using adaptive step sizing based on the data range. An optional second axis label (e.g., for units) and extension arrows can be included.
- Parameters:
series (list or tuple of float) – Data range for the colorbar as [vmin, vmax].
label (str) – Label text for the colorbar (e.g., “Fast Ice Persistence”).
units (str, optional) – Units label to be shown along the secondary (y) axis (e.g., “1/100”).
extend_cbar (bool, optional) – Whether to append extension arrows to the colorbar (+e). Default is False.
max_ann_steps (int, default=10) – Desired maximum number of major annotations (controls tick spacing).
- Returns:
GMT-format colorbar annotation string (e.g., “a0.1f0.02+lLabel”), or a list of two strings if units is provided.
- Return type:
str or list of str
Notes
Tick spacing is determined using a scaled logarithmic rounding to ensure clean steps (e.g., 0.1, 0.2, 0.5).
If extend_cbar is True, +e is added to indicate out-of-bounds extension arrows.
Compatible with pygmt.Figure.colorbar(frame=…).
Examples
>>> self.create_cbar_frame([0.01, 1.0], "Persistence", units="1/100") ['a0.1f0.02+lPersistence', 'y+l 1/100']
- extract_min_max_dates(ts_dict, keys2plot=None, primary_key='FIA', time_coord='time')[source]
Extract the minimum and maximum datetime values from time series data.
This method scans through a dictionary of time series objects (either xarray.DataArrays, xarray.Datasets, or nested dictionaries containing a DataArray under primary_key), and returns the earliest and latest valid time values found across all selected entries.
- Parameters:
ts_dict (dict) – A dictionary where each value is either: - an xarray.DataArray with a time_coord coordinate, or - a dictionary containing a DataArray under the key primary_key. Keys typically represent simulation or experiment identifiers.
keys2plot (list of str, optional) – If provided, restricts the operation to keys within this list. Keys not in keys2plot will be skipped.
primary_key (str, default 'FIA') – The key to use when accessing nested dictionaries within ts_dict. Ignored if the value is already a DataArray or Dataset.
time_coord (str, default 'time') – The name of the time coordinate to extract from each DataArray.
- Returns:
tmin (pandas.Timestamp) – The earliest datetime found across all valid time series.
tmax (pandas.Timestamp) – The latest datetime found across all valid time series.
Notes
Entries in ts_dict with missing or malformed time coordinates are skipped.
The key “AF2020” is always excluded.
If no valid entries remain after filtering, a warning is logged and None is returned.
- generate_regional_annotation_stats(da, region, lon_coord_name, lat_coord_name, var_name, *, aice_da=None, u_da=None, v_da=None, area_unit='1e6km2', vol_unit='1e3km3', vel_unit='cm/s', loc_ndp=3)[source]
- get_meridian_center_from_geographic_extent(geographic_extent)[source]
Determine the optimal central meridian for PyGMT polar stereographic projections.
Given a geographic extent in longitude/latitude format, this method calculates the central meridian (longitude) to use in ‘S<lon>/<lat>/<width>’ PyGMT projection strings. It accounts for dateline wrapping and ensures the plot is centered visually.
- Parameters:
geographic_extent (list of float) – Geographic region as [min_lon, max_lon, min_lat, max_lat]. Accepts longitudes in either [-180, 180] or [0, 360] and gracefully handles dateline crossing.
- Returns:
Central meridian (longitude) in the range [-180, 180].
- Return type:
float
Notes
- If the computed center falls outside the intended range (e.g., due to dateline wrapping),
the method rotates the meridian 180° to better align the figure.
The result is saved to self.plot_meridian_center for reuse.
Used in PyGMT stereographic projections like ‘S{lon}/-90/30c’.
See also
-,-
- load_IBCSO_bath()[source]
Load masked IBCSO bathymetry dataset prepared by create_IBCSO_bath().
Returns the bath variable from the NetCDF file specified in self.config[‘pygmt_dict’][“P_IBCSO_bath”].
- Returns:
2D array of bathymetry values (only ocean depths, in meters). Values are negative below sea level, NaN over land.
- Return type:
xarray.DataArray
Notes
This method assumes that create_IBCSO_bath() has already been run.
Designed for use in PyGMT background plotting or masking.
See also
-,-
- load_ice_shelves()[source]
Load and preprocess Antarctic ice shelf polygons for PyGMT plotting.
This method reads a shapefile containing Antarctic coastal geometries, filters for polygons classified as ice shelves (POLY_TYPE == ‘S’), ensures valid geometry, reprojects them to WGS84 (EPSG:4326), and applies a zero-width buffer to clean topology issues.
- Returns:
Cleaned and reprojected geometries of Antarctic ice shelves.
- Return type:
geopandas.GeoSeries
Notes
The input shapefile path is read from self.config[‘pygmt_dict’][‘P_coast_shape’].
This method is typically used to overlay ice shelf boundaries in PyGMT plots.
The returned geometries can be passed directly to pygmt.Figure.plot().
See also
-,-
- pygmt_FIP_diff_four_panel(da_diff, *, sector_names, weight_da=None, fig_size=('24c', '17c'), margins=['0.15c', '0.15c'], basemap_frame='af', G_pt_size='0.08', plot_GI=True, GI_color='#0072B2', GI_size='0.08', P_png=None, show_fig=False, cbar_width='12c', cbar_bframe='n', left_label='model-dominant', mid_label='agreement', right_label='obs-dominant')[source]
- pygmt_FIP_figure(plot_data=None, *, obs=None, mod=None, fig=None, panel=None, add_colorbar=True, return_cmap=False, panel_title=None, show_fig=False, P_png=None, region=(0, 360, -90, -62), projection='S0.0/-90.0/50/?', basemap_frame=('af',), land_color='#666666', water_color='#BABCDE', shoreline_pen='1/0.25p', G_pt_marker='c', G_pt_size='0.05', G_pt_unit='c', mode='auto', color_mode='auto', cmap=None, series=[0, 1], tricolor=True, P_tricolor_cpt=None, tricolor_kwargs=None, P_cat_cpt=None, cat_labels=('Agreement', 'Model-dominant', 'Obs-dominant'), cat_colors=('#FDAE61', '#2CA25F', '#2171B5'), cbar_pos='JBC+w15c/0.45c+mc+h', cbar_frame=('xa1f0.5',), cat_cbar_frame=('+lFIP difference category',), transparency=None, weight_da=None, transparency_from='auto', n_transparency_bins=5, plot_GI=False, GI_color='#E349D0', GI_marker='s', GI_size='0.01', GI_pt_unit='c', plot_bathymetry=False)[source]
expressly for plotting a map of fast ice persistence that is either a difference (model - observations), or a fast ice persistence map of either model or observation data. If a difference map then it can either be a continuous colourbar or a categorical colourbar.
- pygmt_da_prep(da: DataArray, *, bcoords: bool = False, tcoords: bool = True, lon_coord_name: str | None = None, lat_coord_name: str | None = None, region: tuple[float, float, float, float] | None = None, lon_wrap: str = 'auto', extra_mask: DataArray = None, mask_zero: bool | None = None, z_clip: tuple[float, float] | None = None, z_range_mask: tuple[float, float] | None = None, dtype: str = 'float32', infer_coords: bool = True, return_mask: bool = True, return_flat_index: bool = True)[source]
Prepare a 2D DataArray for PyGMT plotting by returning 1D lon/lat/z vectors.
This routine standardises a gridded field into a structure that PyGMT commonly expects for point plotting or scattered gridding. It also constructs masks for:
finite values,
optional masking of structural zeros,
optional clipping/masking by z-range,
optional geographic subsetting by region (dateline-safe),
optional additional user-provided mask.
- Parameters:
da (xr.DataArray) – Input data field. It may contain singleton dimensions; these will be squeezed. After squeeze, the data must be 2D.
bcoords (bool, default False) – Use B-grid coordinates if lon/lat are not directly available on da.
tcoords (bool, default True) – Use T-grid coordinates if lon/lat are not directly available on da.
lon_coord_name (str or None, optional) – Explicit coordinate names to use for longitude and latitude in da. If provided, both must be provided.
lat_coord_name (str or None, optional) – Explicit coordinate names to use for longitude and latitude in da. If provided, both must be provided.
region (tuple[float, float, float, float], optional) – Geographic bounding box (xmin, xmax, ymin, ymax) applied as a mask. This masking is dateline-safe: if xmin > xmax, the region is interpreted as crossing the dateline and uses (lon >= xmin) OR (lon <= xmax).
lon_wrap ({"auto", "0-360", "-180-180"}, default "auto") – Longitude convention to enforce on the resolved lon grid. If “auto” and region is provided, the method will choose a convention consistent with the region bounds; otherwise it leaves longitudes unchanged.
extra_mask (array-like, xr.DataArray, callable, optional) – Additional mask to apply. If callable, it is invoked as extra_mask(da2) and must return a boolean mask with the same shape as the 2D data. If array-like, it is converted to a boolean array and combined with the existing mask.
mask_zero (bool or None, optional) –
- Whether to mask zeros (treat as missing) before flattening.
If None, uses _auto_mask_zero(da2) to decide.
If True, masks values close to zero (|z| <= 1e-8).
If False, preserves zeros.
z_clip (tuple[float, float], optional) – If provided, clip z values into [z_clip[0], z_clip[1]] prior to masking. This is a hard clip (values outside become boundary values).
z_range_mask (tuple[float, float], optional) – If provided, apply an additional mask retaining only values within [lo, hi].
dtype (str, default "float32") – Target dtype used when materialising arrays (especially helpful for memory).
infer_coords (bool, default True) – If True, allow _resolve_lonlat_2d() to infer lon/lat from common coordinate names on da when explicit names are not provided.
return_mask (bool, default True) – If True, include mask2d (boolean 2D mask) in the returned dict.
return_flat_index (bool, default True) – If True, include flat_idx (indices into z2d.ravel() where mask is True) in the returned dict.
- Returns:
- Always includes:
”lon” : 1D np.ndarray of longitudes (masked + flattened)
”lat” : 1D np.ndarray of latitudes (masked + flattened)
”z” : 1D np.ndarray of values (masked + flattened)
”shape” : tuple of original 2D shape for sanity checks
- Optionally includes:
”mask2d” : 2D boolean mask (if return_mask=True)
”flat_idx” : 1D integer indices into z2d.ravel() (if return_flat_index=True)
- Return type:
dict
- Raises:
ValueError – If da cannot be reduced to a 2D field (after squeeze).
ValueError – If coordinate resolution fails or extra_mask has a shape mismatch.
Notes
If da is Dask-backed, the data are computed only once, and coerced to dtype.
If da has dims (“nj”,”ni”) in a different order, the method transposes to (“nj”,”ni”) to keep indexing consistent with AFIM grid conventions.
- pygmt_fastice_panel(fast_ice_variable: str = 'FIA', sim_name: str = None, roll_days: int = 0, fig_width: str = None, fig_height: str = None, ylim: tuple = None, frame_bndy: str = None, yaxis_pri: str = None, xaxis_pri: str = 'a1Of15Dg', leg_pos: str = None, leg_box: str = '+gwhite+p.5p', spat_var_style: str = None, GI_plot_style: str = 'c0.05c', GI_fill_color: str = '#BA561A', plot_GI: bool = False, min_max_trans_val: int = 80, yshift_top: str = None, yshift_bot: str = None, bottom_frame_bndy: str = 'WSne', bottom_yaxis: str = None, bottom_xaxis: str = None, land_clr: str = '#D1DDE0', water_clr: str = '#EDF2F5', coast_pen: str = '1/0.5p,black', cbar_pos: str = None, lon_coord_name: str = None, lat_coord_name: str = None, cmap: str = None, series: list = None, cbar_frame: str = None, ANT_IS_pen: str = '0.2p,black', ANT_IS_color: str = '#C1CED6', font_annot_pri: str = '24p,Times-Roman', font_annot_sec: str = '16p,Times-Roman', font_lab: str = '22p,Times-Bold', line_pen: str = '2p', grid_pen_pri: str = '.5p', grid_pen_sec: str = '.25p', fmt_geo_map: str = 'D:mm', P_png: str = None, save_fig: bool = None, overwrite_fig: bool = None, show_fig: bool = None)[source]
Unified PyGMT fast-ice panel plotter.
- Set fast_ice_variable to:
“FIA” (or “fia”) -> plots FIA time series + FIP maps
“FIT” (or “fit”) -> plots FIT time series + FIHI maps
Everything else (loading, styling, legends, grounded iceberg overlay, etc.) follows the same code path with variable-specific defaults injected via a small configuration dictionary.
The rest of the arguments let you override those defaults if you need to.
- pygmt_map_plot_multi_var_8sectors(das, var_names, sim_name=None, time_stamp=None, tit_str=None, panel_titles=None, plot_GI=False, diff_plots=None, cmaps=None, series_list=None, reverse_list=None, cbar_labels=None, cbar_units_list=None, extend_cbars=False, cbar_positions=None, lon_coord_names=None, lat_coord_names=None, use_bcoords=False, use_tcoords=False, fig_size=None, var_sq_size=0.2, GI_sq_size=0.1, GI_fill_color='red', plot_iceshelves=True, plot_bathymetry=True, land_color=None, water_color=None, P_png=None, var_out=None, overwrite_fig=None, show_fig=None, xshift='w+1c')[source]
8-sector plot with 2–3 panels laid out left-to-right using shift_origin.
Performance: compute PyGMT plot point clouds ONCE per panel (u1/u2/du), then sector-filter the point clouds cheaply (no repeated Dask compute).
- pygmt_map_plot_one_var(da, var_name, aux_ds=None, sim_name=None, plot_regions=None, regional_dict=None, hemisphere='south', time_stamp=None, tit_str=None, plot_GI=False, cmap=None, series=None, reverse=None, cbar_label=None, cbar_units=None, extend_cbar=False, cbar_position=None, lon_name=None, lat_name=None, fig_size=None, var_sq_size=0.2, GI_sq_size=0.1, GI_fill_color='red', plot_iceshelves=True, plot_bathymetry=True, add_stat_annot=False, land_color=None, water_color=None, P_png=None, var_out=None, overwrite_fig=None, show_fig=None)[source]
Generate a PyGMT figure showing a variable (e.g., FIA, FIP, differences) as a spatial map over Antarctic regions or hemispheric view, optionally including grounded icebergs, ice shelf outlines, and bathymetry.
This flexible mapping function supports multiple types of visualizations: - Scalar field plots (e.g., fast ice persistence, concentration) - Binary or categorical masks (e.g., agreement maps, simulation masks) - Difference plots (e.g., observation minus simulation)
- Parameters:
da (xarray.DataArray) – Input 2D (or broadcastable) data array to be plotted, e.g., fast ice persistence.
var_name (str) – Name of the variable to be plotted. Used to look up color map settings and labels.
aux_ds (optional Dataset containing auxiliaries (e.g. aice) aligned with da)
sim_name (str, optional) – Simulation name used for file naming and figure annotation. Defaults to self.sim_name.
plot_regions (int or None, optional) – Number of regional plots: - 8 : Antarctic 8-sector view - 2 : East/West sectors - None : Full hemisphere plot (default)
regional_dict (dict, optional) – Custom dictionary of plotting regions (overrides built-ins if provided).
hemisphere (str, default="south") – Hemisphere name for hemispheric plot. Typically “south”.
time_stamp (str, optional) – Timestamp string for file naming. Defaults to self.dt0_str.
tit_str (str, optional) – Title string to display on the figure.
plot_GI (bool, default=False) – Whether to overlay grounded iceberg locations using load_GI_lon_lats().
cmap (str, optional) – Color map name for continuous scalar plots.
series (list of float, optional) – Min, max, and increment for the colorbar (e.g., [0, 1, 0.1]).
reverse (bool, optional) – Whether to reverse the colormap.
cbar_label (str, optional) – Label to show on the colorbar.
cbar_units (str, optional) – Units for the colorbar (shown on secondary axis).
extend_cbar (bool, default=False) – Whether to add extension arrows to colorbar.
cbar_position (str, optional) – PyGMT-compatible position string for placing the colorbar.
lon_name (str, optional) – Longitude coordinate name in da. Defaults to self.pygmt_dict.
lat_name (str, optional) – Latitude coordinate name in da. Defaults to self.pygmt_dict.
fig_size (float, optional) – Figure width in centimeters for hemispheric plot.
var_sq_size (float, default=0.2) – Marker size (in cm) for main plotted variable (for scatter plotting).
GI_sq_size (float, default=0.1) – Marker size (in cm) for grounded iceberg overlay.
GI_fill_color (str, default="red") – Color to use for grounded iceberg markers.
plot_iceshelves (bool, default=True) – Whether to overlay Antarctic ice shelf outlines.
plot_bathymetry (bool, default=True) – Whether to plot IBCSO bathymetry using shaded relief (grdimage).
add_stat_annot (bool, default=False) – Whether to annotate figure with basic regional statistics.
land_color (str, optional) – Color for land. Defaults to self.pygmt_dict[‘land_color’].
water_color (str, optional) – Color for ocean/water. Defaults to self.pygmt_dict[‘water_color’].
P_png (pathlib.Path, optional) – File path to save the figure. If None, path is generated automatically.
var_out (str, optional) – Output variable name used in file naming. Defaults to var_name.
overwrite_fig (bool, optional) – Whether to overwrite existing figure if it exists.
show_fig (bool, optional) – Whether to display the figure interactively.
- Returns:
The method generates and optionally saves or displays a PyGMT figure.
- Return type:
None
Notes
Supports 8-region, 2-region, or full hemisphere plotting via plot_regions.
For “diff” plots (where ‘diff’ in var_name), colors are assigned categorically (e.g., agreement/simulation/observation).
Bathymetry and ice shelf overlays are loaded from NetCDF and shapefiles respectively.
The method gracefully skips plotting if required coordinates or data are missing.
See also
-,-,-,-,-Examples
>>> self.pygmt_map_plot_one_var(FIP_DA, "FIP", plot_regions=8, show_fig=True)
- pygmt_timeseries(ts_dict, comp_name: str = 'test', primary_key: str = 'FIA', smooth: str | int | None = None, clim_smooth: int | None = None, climatology: bool = False, ylabel: str = None, ylim: tuple = [0, 1000], yaxis_pri: int = None, ytick_pri: int = 100, ytick_sec: int = 50, projection: str = None, fig_width: str = None, fig_height: str = None, xaxis_pri: str = None, xaxis_sec: str = None, frame_bndy: str = 'WS', line_colors: dict[str, str] | None = None, legend_labels: dict[str, str] | None = None, legend_pos: str = None, legend_box: str = '+gwhite+p0.5p', fmt_dt_pri: str = None, fmt_dt_sec: str = None, fmt_dt_map: str = None, fnt_type: str = 'Helvetica', fnt_wght_lab: str = '20p', fnt_wght_ax: str = '18p', line_pen: str = '1p', grid_wght_pri: str = '.25p', grid_wght_sec: str = '.1p', P_png: str = None, time_coord: str = 'time', time_coord_alt: str = 'date', keys2plot: list = None, repeat_keys: list[str] | None = None, repeat_policy: str = 'inside_others', repeat_ref_keys: list[str] | None = None, clip_x_axis: bool = False, zero_line: bool = False, zero_line_level: float = 0.0, zero_line_pen: str = '2p,black', save_fig: bool = None, show_fig: bool = None)[source]
Plot time series of a primary variable (e.g., FIA) for a set of simulations or observations.
- Parameters:
ts_dict (dict) – Dictionary of xarray DataArrays keyed by simulation or dataset name.
comp_name (str) – Name for the comparison (used in figure title and filename).
primary_key (str) – Key used to extract the variable from each dataset (except ‘AF2020’).
climatology (bool) – If True, plot daily climatology with fill and mean lines.
ylim (tuple or None) – Y-axis limits. If None, inferred from data with 5% padding.
ylabel (str) – Label for the Y-axis.
ytick_inc (int) – Interval between Y-axis ticks.
xaxis_pri (str) – GMT frame settings for primary and secondary axes (when climatology is False).
xaxis_sec (str) – GMT frame settings for primary and secondary axes (when climatology is False).
P_png (str or None) – Optional full path to save figure. If None, default filename is constructed.
legend_box (str) – GMT legend box styling.
line_pen (str) – Line thickness for plotted time series.
time_coord (str) – Name of time coordinate in each DataArray.
keys2plot (list or None) – If provided, only datasets with keys in this list are plotted.
show_fig (bool or None) – If True, show figure interactively. Defaults to self.show_fig.
repeat_keys (list[str] or None) – Keys in ts_dict to plot as a repeated day-of-year climatology when climatology=False.
repeat_policy ({"outside_others","fill_gaps","always"}) –
“outside_others”: keep original values where other series exist in time,
but replace values outside that union with day-of-year climatology of the nominated series (good for fair comparison). - “fill_gaps”: use climatology only where the nominated series has no data, keep original where it does. - “always”: ignore original values and plot the repeated climatology across the full x-range.
repeat_ref_keys (list[str] or None) – If provided, these keys define the “others” time span for the “outside_others” policy. By default, it’s all plotted series except the current one.
- tricolor_cpt_and_alpha(P_cpt, obs, mod, *, vmin: float = -1.0, vmid: float = 0.0, vmax: float = 1.0, cmin: str = '#2CA25F', cmid: str = '#FDAE61', cmax: str = '#2171B5', diff_mode: str = 'obs-mod', alpha_mode: str = 'max', alpha_power: float = 1.0, alpha_floor: float = 0.0, nan_transparency: float = 100.0)[source]
Build a tricolour CPT plus a transparency (alpha) layer from obs/model fields.
- This helper is designed for “difference + confidence” map products where:
colour encodes the signed difference (obs − model or model − obs), clipped into a fixed range [vmin, vmax], and
transparency encodes the strength/credibility of the signal (e.g., based on persistence or agreement magnitude), computed from the obs/model magnitudes.
- The function returns:
cpt_path: path to the written tricolour CPT
z: the clipped difference array
t: transparency percent array (0 = opaque, 100 = fully transparent)
- Parameters:
P_cpt (str or pathlib.Path) – Output path for the CPT file written by write_tricolour_cpt().
obs (array-like) – Observed and modelled fields on a common grid. These are converted to numpy.ndarray(dtype=float) internally and must be broadcast-compatible.
mod (array-like) – Observed and modelled fields on a common grid. These are converted to numpy.ndarray(dtype=float) internally and must be broadcast-compatible.
vmin (float, optional) – Colour scale limits and midpoint. The returned difference z is clipped into [vmin, vmax].
vmid (float, optional) – Colour scale limits and midpoint. The returned difference z is clipped into [vmin, vmax].
vmax (float, optional) – Colour scale limits and midpoint. The returned difference z is clipped into [vmin, vmax].
cmin (str, optional) – Hex colours for the CPT anchor points at vmin/vmid/vmax.
cmid (str, optional) – Hex colours for the CPT anchor points at vmin/vmid/vmax.
cmax (str, optional) – Hex colours for the CPT anchor points at vmin/vmid/vmax.
diff_mode ({"obs-mod", "mod-obs"}, optional) –
- Sign convention for the difference:
”obs-mod”: z = obs - mod
”mod-obs”: z = mod - obs
Choose a convention consistent with your map interpretation (e.g. green as model-dominant vs obs-dominant).
alpha_mode ({"max", "mean"}, optional) –
- How to compute the “alpha strength” a from obs and mod:
”max”: a = max(obs, mod)
”mean”: a = 0.5 * (obs + mod)
In both cases, a is clipped to [0, 1].
alpha_power (float, optional) – Exponent applied to a after clipping. Values > 1 sharpen opacity toward high-confidence regions; values < 1 broaden opacity.
alpha_floor (float, optional) –
- Minimum opacity floor expressed in alpha-space. When > 0, rescales:
a <- alpha_floor + (1 - alpha_floor) * a
This prevents regions from becoming too transparent.
nan_transparency (float, optional) – Transparency percentage used where either z or t becomes non-finite (NaN/inf). Default 100 = fully transparent.
- Returns:
- cpt_pathstr
Path to the written CPT file.
- znp.ndarray
Difference field, clipped to [vmin, vmax].
- tnp.ndarray
Transparency percent in [0, 100], where 0 is opaque.
- Return type:
(str, np.ndarray, np.ndarray)
- Raises:
ValueError – If diff_mode is not one of {“obs-mod”, “mod-obs”}.
ValueError – If alpha_mode is not one of {“max”, “mean”}.
Notes
Transparency is computed as: t = (1 - a) * 100, so larger a yields more opaque pixels (smaller transparency percentage).
This method does not attempt to regrid or align obs/model; inputs must already be on a common grid and comparable.
- write_tricolor_category_cpt(P_cpt, *, int0=(0.0, 1.0), int1=(1.0, 2.0), int2=(2.0, 3.0), hex0='#FDAE61', hex1='#2CA25F', hex2='#2171B5', background_rgb='255/255/255', foreground_rgb='0/0/0', nan_rgb='255/255/255') str[source]
Create and write a 3-class categorical CPT file.
- This is intended for integer-coded classification maps, commonly:
0 : agreement
1 : model-dominant
2 : observation-dominant
Each class is given a constant colour across its interval. The default colour scheme matches the continuous tricolour palette:
agreement: orange
model-dominant: green
obs-dominant: blue
- Parameters:
P_cpt (str or pathlib.Path) – Target output path for the categorical CPT file. Parent directories are created if needed.
int0 (tuple[float, float], optional) – Numeric intervals for categories 0, 1, and 2 respectively. Defaults are (0,1), (1,2), (2,3), which works well with integer-coded rasters.
int1 (tuple[float, float], optional) – Numeric intervals for categories 0, 1, and 2 respectively. Defaults are (0,1), (1,2), (2,3), which works well with integer-coded rasters.
int2 (tuple[float, float], optional) – Numeric intervals for categories 0, 1, and 2 respectively. Defaults are (0,1), (1,2), (2,3), which works well with integer-coded rasters.
hex0 (str, optional) – Hex colours for category 0, 1, and 2.
hex1 (str, optional) – Hex colours for category 0, 1, and 2.
hex2 (str, optional) – Hex colours for category 0, 1, and 2.
background_rgb (str, optional) – GMT background (B record) colour as “R/G/B”.
foreground_rgb (str, optional) – GMT foreground (F record) colour as “R/G/B”.
nan_rgb (str, optional) – GMT NaN (N record) colour as “R/G/B”.
- Returns:
String path to the written CPT file.
- Return type:
str
Notes
The CPT uses constant colours per interval, which is typically what you want for categorical class rasters.
This function assumes self._hex_to_rgb() returns integer RGB triplets.
- write_tricolour_cpt(P_cpt, *, vmin: float = -1.0, vmid: float = 0.0, vmax: float = 1.0, cmin: str = '#2CA25F', cmid: str = '#FDAE61', cmax: str = '#2171B5', background_rgb: str = '255/255/255', foreground_rgb: str = '0/0/0', nan_rgb: str = '255/255/255') str[source]
Create and write a 3-colour continuous CPT file for difference fields.
This writes a simple two-segment CPT suitable for continuous data where values below/above a midpoint should transition through three anchor colours:
[vmin, vmid] mapped cmin → cmid
[vmid, vmax] mapped cmid → cmax
The output is a GMT CPT text file, usable directly by PyGMT (e.g. cmap=P_cpt).
- Parameters:
P_cpt (str or pathlib.Path) – Target output path for the CPT file. Parent directories are created if needed.
vmin (float, optional) – Data range breakpoints. vmid defines the central colour transition point.
vmid (float, optional) – Data range breakpoints. vmid defines the central colour transition point.
vmax (float, optional) – Data range breakpoints. vmid defines the central colour transition point.
cmin (str, optional) – Hex colours used at vmin, vmid, and vmax respectively. Example: “#2CA25F”.
cmid (str, optional) – Hex colours used at vmin, vmid, and vmax respectively. Example: “#2CA25F”.
cmax (str, optional) – Hex colours used at vmin, vmid, and vmax respectively. Example: “#2CA25F”.
background_rgb (str, optional) – GMT background colour specification used for values below vmin (B record), formatted as “R/G/B”.
foreground_rgb (str, optional) – GMT foreground colour specification used for values above vmax (F record), formatted as “R/G/B”.
nan_rgb (str, optional) – GMT NaN colour specification (N record), formatted as “R/G/B”.
- Returns:
String path to the written CPT file.
- Return type:
str
Notes
This function assumes self._hex_to_rgb() returns (r, g, b) integers in [0, 255].
- GMT expects CPT lines of the form:
z0 r/g/b z1 r/g/b
plus optional B/F/N records for background/foreground/NaN.
sea_ice_regridder
- class sea_ice_regridder.SeaIceRegridder(**kwargs)[source]
Bases:
objectRegridding and geometric utilities for Antarctic sea-ice analysis workflows.
This class provides two broad groups of functionality:
CICE grid-to-grid remapping (B-grid → T-grid) - xESMF-based remapping using persistent weight files (recommended when you need
a consistent, reproducible mapping operator).
a lightweight, Dask-safe 2×2 corner averaging operator as a fast alternative when an exact xESMF regrid is unnecessary.
Swath-to-grid remapping and EPSG:3031 helpers - projection of lat/lon swaths to Antarctic polar stereographic (EPSG:3031), - extent unioning/snap-to-grid utilities for building a common analysis grid, - nearest-neighbour resampling of swath data onto an AreaDefinition grid, - convenience functions for adding lon/lat coordinates to EPSG:3031 grids, - geographic subsetting of curvilinear model grids using lon/lat masks that
handle dateline/seam crossing.
Expected external configuration
This class is designed to sit inside the broader AFIM stack. The following attributes/methods are expected to exist on self (typically injected by kwargs or provided by a base class/toolbox manager):
Attributes - logger : logging.Logger
For progress/debug logging.
- CICE_dictdict
- Must contain keys used by the B→T routines, typically:
“x_dim”, “y_dim”
“x_dim_length”, “y_dim_length”
“bcoord_names” (e.g., [“ULON”,”ULAT”])
“P_reG_u2t_weights” (path to xESMF weight file)
- G_u, G_tdict-like or xr.Dataset
CICE source and target grid definitions used by xESMF.
- kmt_orgxr.Dataset or dict-like
Must include “kmt_org” used as the target grid mask for xESMF weight generation.
Methods - load_cice_grid(…)
Must populate self.G_u and self.G_t (and typically self.kmt_org).
- define_reG_weights()
Creates self.reG (xESMF regridder) and sets self.reG_weights_defined.
- normalise_longitudes(lon, wrap=…)
Used by EPSG:3031 and geographic subsetting utilities.
Notes
xESMF weight generation is usually expensive; this class is structured to reuse persistent weight files where possible.
Longitudes are explicitly wrapped before building swath definitions and before geographic masking to avoid seam artefacts.
- add_lonlat_from_epsg3031(ds, x_name='x', y_name='y', wrap='0..360', out_dtype='float32')[source]
Add 2D lon/lat coordinate fields to a dataset defined on an EPSG:3031 grid.
This function broadcasts the 1D projected x/y coordinates to 2D, converts them to lon/lat using EPSG:3031 → EPSG:4326, wraps longitudes to the requested convention, and attaches the results as dataset coordinates.
- Parameters:
ds (xr.Dataset) – Dataset with 1D projected coordinate dimensions x_name and y_name.
x_name (str, default ("x","y")) – Names of the projected coordinate dimensions in ds.
y_name (str, default ("x","y")) – Names of the projected coordinate dimensions in ds.
wrap ({"0..360","-180..180"}, default "0..360") – Longitude wrapping convention for the output coordinate.
out_dtype (str, default "float32") – Output dtype for lon/lat coordinates. Use None to keep float64.
- Returns:
- Dataset with added coordinates:
lon(y, x)
lat(y, x)
- Return type:
xr.Dataset
- Raises:
ValueError – If x_name or y_name are not present as dataset dimensions.
Notes
Uses xr.apply_ufunc(…, dask=”parallelized”) so it can operate on Dask-backed x/y coordinates without materialising large intermediate arrays.
- define_reG_regular_weights(da, G_res=0.15, region=[0, 360, -90, 0], AF2020=False, variable_name=None, lon_coord_name=None, lat_coord_name=None, time_coord_name=None, spatial_dim_names=None, reG_method='bilinear', periodic=False, reuse_weights=True, P_weights=None)[source]
Define an xESMF regridder from a curvilinear source grid to a regular lat/lon grid.
- Parameters:
da (xr.Dataset or xr.DataArray) – Source object from which lat/lon coordinates are extracted.
G_res (float, default 0.15) – Regular grid resolution (degrees).
region (list[float], default [0,360,-90,0]) – Regular grid bounding box [lon_min, lon_max, lat_min, lat_max].
AF2020 (bool, default False) – If True, use AF2020 naming and default weights path; else CICE/model.
reG_method (str, default "bilinear") – xESMF method (e.g., “bilinear”, “nearest_s2d”, “conservative”).
periodic (bool, default False) – Set True for global periodic longitude grids.
reuse_weights (bool, default True) – If True, reuse an existing weights file if present.
P_weights (str or Path, optional) – Override weights filename. Defaults to AF2020/CICE configured weights path.
- Returns:
Configured regridder from source to regular grid.
- Return type:
xesmf.Regridder
Notes
Source/target grids are constructed from 2D lat/lon arrays.
Conservative regridding requires grid corner information; this helper builds only center coordinates unless you extend it.
- define_reG_weights()[source]
Define and store an xESMF regridder to remap CICE B-grid (U-point) data to the T-grid.
This method constructs/loads the source (U-point) and target (T-point) CICE grids, attaches a land/sea mask to the target grid, and then instantiates an xESMF Regridder object. If a weight file already exists, it is reused; otherwise, new weights are generated and written to disk.
Regridding configuration
Source grid : self.G_u (B-grid / U-point style grid definition)
Target grid : self.G_t (T-grid / cell-center grid definition)
Target mask : self.kmt_org[“kmt_org”] stored as G_t[“mask”]
Method : “bilinear”
periodic : True (assumes global periodicity in longitude)
ignore_degenerate : True (skip degenerate cells)
extrap_method : “nearest_s2d” (nearest-neighbour extrapolation source→dest)
Weight file : self.CICE_dict[“P_reG_u2t_weights”]
reuse_weights: True if the file exists, else False (weights created)
Side effects
Sets self.reG to the instantiated xESMF regridder.
Sets self.reG_weights_defined = True upon success.
- rtype:
None
- raises KeyError:
If required keys are missing from self.CICE_dict (e.g., weight path).
- raises AttributeError:
If load_cice_grid() does not populate self.G_u, self.G_t, or self.kmt_org.
- raises ImportError:
If xesmf is not available in the runtime environment.
- raises Exception:
Propagates xESMF errors arising from grid definitions or weight I/O.
Notes
The grid definitions must include valid “lon”/”lat” (and ideally corner) information for the selected method.
For conservative methods you would typically require corner arrays; for bilinear, centers are sufficient, but corner metadata may still improve robustness.
- grid_coords_from_area(area_def, pixel_size=5000)[source]
Construct 1D x/y coordinate arrays (cell centers) from a PyResample AreaDefinition.
- Parameters:
area_def (pyresample.geometry.AreaDefinition) – Target grid definition in EPSG:3031.
pixel_size (float, default 5000) – Grid resolution (meters).
- Returns:
(x, y) – 1D arrays of cell-center coordinates (meters). x increases eastward. y decreases from top to bottom (north → south) consistent with the area extent definition.
- Return type:
tuple[np.ndarray, np.ndarray]
- make_area_definition(extent, pixel_size=5000, area_id='epsg3031_5km_union')[source]
Create a PyResample AreaDefinition for a regular EPSG:3031 grid.
- Parameters:
extent (list[float]) – [xmin, ymin, xmax, ymax] in EPSG:3031 meters. Typically produced by to_3031_extent(), union_extents(), and snap_extent_to_grid().
pixel_size (float, default 5000) – Grid resolution (meters).
area_id (str, default "epsg3031_5km_union") – Identifier string for the AreaDefinition.
- Returns:
Regular projected grid definition suitable for PyResample resampling.
- Return type:
pyresample.geometry.AreaDefinition
Notes
Width/height are computed from the extent and rounded to integer pixel counts.
The returned extent is adjusted so that xmax/ymax align exactly with width/height.
- pygmt_regrid(da, lon, lat, grid_res=None, region=None, search_radius='200k')[source]
Regrid a 2D data array using PyGMT’s nearneighbor interpolation.
This method applies PyGMT’s nearneighbor algorithm to interpolate scattered data values (da) onto a regular grid based on specified longitude and latitude arrays. The input is masked to ignore NaNs or non-finite values.
- Parameters:
da (xarray.DataArray) – 2D array of data values to interpolate (e.g., sea ice thickness).
lon (xarray.DataArray or np.ndarray) – Longitude values corresponding to da, same shape.
lat (xarray.DataArray or np.ndarray) – Latitude values corresponding to da, same shape.
grid_res (str or float, optional) – Grid spacing for the output grid (e.g., “0.5”, “10m”). Required by PyGMT.
region (list or tuple, optional) – Bounding box for the output grid in the form [west, east, south, north].
search_radius (str or float, default "200k") – Search radius for PyGMT’s nearneighbor (e.g., “100k” for 100 km).
- Returns:
gridded – Gridded output with interpolated values over the defined region.
- Return type:
xarray.DataArray
Notes
All non-finite values in da are excluded prior to interpolation.
PyGMT must be properly installed and configured with GMT for this to work.
- reG_bgrid_to_tgrid_xesmf(da, coord_names=None)[source]
Regrid a B-grid DataArray to the T-grid using the pre-defined xESMF regridder.
The input DataArray must carry longitude/latitude coordinate variables. If coord_names is not provided, this method uses the configured coordinate names in self.CICE_dict[“bcoord_names”] (commonly [“ULON”,”ULAT”]).
- Parameters:
da (xr.DataArray) – B-grid variable to regrid. Must include coordinate variables corresponding to longitude and latitude (either 1D or 2D, depending on your grid definition).
coord_names (list[str], optional) – Names of the coordinates on da that represent lon/lat. If omitted, defaults to self.CICE_dict[“bcoord_names”].
- Returns:
Regridded DataArray on the T-grid. Returns None if required coordinates are missing or if regridding fails.
- Return type:
xr.DataArray or None
- Raises:
RuntimeError – If self.reG is not defined. Call _ensure_reG_defined() or define_reG_weights() first (or ensure your workflow does so).
Notes
The method renames the provided coordinate variables to the xESMF-conventional names “lon” and “lat” before applying self.reG(…).
Error handling is intentionally conservative: failures are logged and None is returned to allow calling workflows to skip problematic variables gracefully.
- resample_swath_to_area(src_da, lat2d, lon2d, area_def, radius=10000, fill_value=nan, pixel_size=5000)[source]
Nearest-neighbour resample a 2D swath to an EPSG:3031 AreaDefinition grid.
- Parameters:
src_da (xr.DataArray) – 2D swath data array (e.g., satellite swath variable). Must be aligned with lat2d/lon2d in shape.
lat2d (array-like) – 2D latitude/longitude arrays (degrees) describing the swath geolocation.
lon2d (array-like) – 2D latitude/longitude arrays (degrees) describing the swath geolocation.
area_def (pyresample.geometry.AreaDefinition) – Target grid definition (EPSG:3031).
radius (float, default 10000) – Radius of influence in meters used for nearest-neighbour resampling.
fill_value (scalar, default np.nan) – Fill value assigned where no source points fall within radius.
pixel_size (float, default 5000) – Target grid resolution used to construct the output x/y coordinates.
- Returns:
Resampled 2D field on the target grid with dims (“y”,”x”) and coordinates “x” and “y” in meters, plus metadata indicating EPSG:3031.
- Return type:
xr.DataArray
Notes
Longitudes are wrapped to [-180, 180) before constructing the SwathDefinition. This is critical to avoid dateline discontinuities in the KD-tree search.
nprocs=0 uses serial execution; set >0 to parallelise if appropriate.
- simple_spatial_averaging_bgrid_to_tgrid(var)[source]
Compute a Dask-safe 2×2 unweighted average from B-grid corner points to T-grid centers.
This provides a lightweight alternative to xESMF remapping when a simple local averaging is sufficient. The operation is performed by slicing four corner-shifted views (v00, v01, v10, v11), averaging them, padding back to the original grid size, and applying a cyclic wrap to the last x column (last equals first).
- Parameters:
var (xr.DataArray) – Input array on the B-grid. May be 2D (nj, ni) or 3D (time, nj, ni) depending on your configuration. Dimension names are taken from self.CICE_dict[“y_dim”] and self.CICE_dict[“x_dim”].
- Returns:
Averaged field on the T-grid with the same nominal shape as the configured target sizes (y_dim_length, x_dim_length). NaNs are used where padding is required.
- Return type:
xr.DataArray
- Raises:
KeyError – If required dimension names/lengths are missing from self.CICE_dict.
AssertionError – If the final array shape does not match the configured target sizes.
Notes
This is not a conservative remap; it is a local arithmetic average.
The last x column is set equal to the first to maintain periodicity.
Indexes for spatial dims are dropped to avoid downstream surprises when mixing integer and coordinate-based selection.
- snap_extent_to_grid(extent, pixel_size)[source]
Snap an extent to a regular grid defined by pixel_size.
- The snapped extent is expanded (never shrunk) such that:
xmin, ymin are floored to the nearest multiple of pixel_size
xmax, ymax are ceiled to the nearest multiple of pixel_size
- Parameters:
extent (list[float]) – [xmin, ymin, xmax, ymax] in meters.
pixel_size (float) – Grid resolution (meters).
- Returns:
Snapped extent [xmin, ymin, xmax, ymax] in meters.
- Return type:
list[float]
- subset_by_lonlat_box(da: DataArray, lon_range, lat_range, lon_name='TLON', lat_name='TLAT', jdim='nj', idim='ni', wrap='-180-180', crop=True)[source]
Subset a curvilinear grid DataArray by a geographic lon/lat bounding box.
This method constructs a boolean mask from 2D lon/lat coordinates and applies it to da using .where(mask). It supports bounding boxes that cross the longitude seam/dateline by interpreting a range where lon_min > lon_max as a wrapped interval.
- Parameters:
da (xr.DataArray) – DataArray on a curvilinear grid, typically with dims (time, nj, ni) or (nj, ni). Must include 2D coordinate fields lon_name and lat_name.
lon_range (tuple[float, float]) – (lon_min, lon_max) in degrees. If the range crosses the seam, use a wrapped specification (e.g., (350, 20) for 0..360 convention).
lat_range (tuple[float, float]) – (lat_min, lat_max) in degrees.
lon_name (str, default ("TLON","TLAT")) – Names of 2D longitude/latitude coordinate variables in da.coords.
lat_name (str, default ("TLON","TLAT")) – Names of 2D longitude/latitude coordinate variables in da.coords.
jdim (str, default ("nj","ni")) – Names of the two spatial dimensions.
idim (str, default ("nj","ni")) – Names of the two spatial dimensions.
wrap ({"0-360","-180-180"}, default "-180-180") – Longitude wrap convention to apply prior to masking.
crop (bool, default True) – If True, crops the output to the minimal bounding index box that contains at least one valid cell (reduces storage/plotting cost). Cropping is done by finding any-True rows/columns in the 2D mask.
- Returns:
Subsetted DataArray. Values outside the lon/lat box are masked (NaN). If crop=True, spatial dims are also index-cropped to a tight bounding box.
- Return type:
xr.DataArray
Notes
The masking step (da.where(mask)) is Dask-friendly.
The cropping step currently uses .values on 1D any-masks; for very large grids this may trigger compute. In typical Antarctic subsetting use-cases, this is acceptable because the mask reduction is cheap relative to the full field.
Requires self.normalise_longitudes(…) to exist and to accept the wrap argument.
- to_3031_extent(lat2d, lon2d, buffer_m=20000)[source]
Project a swath’s latitude/longitude coordinates to EPSG:3031 and return an extent.
This helper converts 2D lat/lon arrays to Antarctic polar stereographic coordinates (EPSG:3031), then returns an axis-aligned bounding box:
[xmin, ymin, xmax, ymax]
expanded by an optional buffer (meters).
- Parameters:
lat2d (array-like) – 2D arrays of latitude and longitude (degrees). Must be broadcast-compatible.
lon2d (array-like) – 2D arrays of latitude and longitude (degrees). Must be broadcast-compatible.
buffer_m (float, default 20000) – Buffer added to each side of the extent (meters).
- Returns:
Extent [xmin, ymin, xmax, ymax] in EPSG:3031 meters (including buffer).
- Return type:
list[float]
Notes
Longitudes are wrapped to [-180, 180) prior to projection to avoid dateline issues.
Non-finite projected points are excluded when computing min/max.
sea_ice_icebergs
- class sea_ice_icebergs.SeaIceIcebergs(**kwargs)[source]
Bases:
object- check_GI_coverage(da, varname='GI_counts', lon_name='lon', lat_name='lat')[source]
Assess grounded iceberg (GI) or landmask coverage across Antarctica.
- Parameters:
da (xr.DataArray) – The GI or KMT data.
varname (str) – Determines masking logic: ‘GI_counts’ uses >0, anything else uses >1.
lon_name (str, default 'lon') – Name of the longitude coordinate.
lat_name (str, default 'lat') – Name of the latitude coordinate.
- Returns:
report – Dictionary of summary statistics and Antarctic sector coverage.
- Return type:
dict
- compute_grounded_iceberg_area(region=None, scale=1000000.0)[source]
Returns: - dict of km^2 by region if region is provided - total GI area (m^2) if region is None (or km^2 if you divide by scale)
- load_existing_thinned()[source]
Load previously saved thinned grounded iceberg data and modified KMT file.
Sets the attributes self.GI_thin_da, self.GI_thin_mask, and self.GI_KMT if their respective NetCDF files exist.
- modify_landmask_with_grounded_icebergs(p_min=0.1, p_max=0.9, scaling_exponent=1.5)[source]
Apply probabilistic thinning to the grounded iceberg mask based on iceberg counts.
Parameters:
- p_minfloat
Minimum probability of retaining a cell with low counts.
- p_maxfloat
Maximum probability of retaining a cell with high counts.
- scaling_exponentfloat
Controls steepness of probability scaling with count.
Sets:
self.GI_thin_da : xarray.DataArray self.GI_thin_mask : ndarray (boolean) self.GI_thin_fact : float (fraction thinned out)
- process_raw_grounded_iceberg_database(GI_P_raw=None)[source]
Load and spatially map raw grounded iceberg locations from CSV to the model grid.
Uses a KD-tree nearest-neighbor search to assign iceberg coordinates to grid cells. Sets self.GI_clean with attached (nj, ni) indices.
- regrid_gi_to_fip_bool(gi_mask_da: DataArray, gi_lon2d: DataArray, gi_lat2d: DataArray, fip_lon2d: DataArray, fip_lat2d: DataArray, dilate_pixels: int = 1)[source]
Regrids a boolean GI mask from the model grid onto the FIP grid via nearest-neighbour on great-circle distance (xesmf ‘nearest_s2d’). Returns a boolean mask on the FIP grid. Optionally dilates by N pixels on the target grid to include coast-adjacent cells.
sea_ice_ACCESS
- class sea_ice_ACCESS.AccessOM3Run(run_dir: str | Path)[source]
Bases:
objectAdapter for ACCESS-OM3 Payu/NUOPC output structure -> AFIM-consumable paths. Works without copying data (symlinks recommended).
- class sea_ice_ACCESS.SeaIceACCESS(**kwargs)[source]
Bases:
objectInterface to the ACCESS-NRI Intake catalog for loading ACCESS-OM* CICE daily output and writing it into an AFIM-friendly monthly Zarr layout.
- This class is intended to:
Query the ACCESS-NRI Intake catalog for available CICE variables for a given ACCESS experiment.
Load a daily dataset (1day frequency) for a selected set of variables and time window.
Standardise basic indexing/coordinates (e.g., ensure nj/ni coordinates exist) so downstream merging/processing is reliable.
Persist the daily time series to monthly Zarr groups (one directory per month).
Notes
The Intake catalog must be installed and the ACCESS-NRI catalog must be available as intake.cat.access_nri.
This class assumes CICE daily output is labelled as a daily mean associated with the following day (a common convention); write_ACCESS_to_monthly_zarr() shifts timestamps back by 1 day to align with the intended averaging period.
Dask is optional, but recommended for performance. If self.client is not set, operations may still work but can be slower and may run locally.
Expected Attributes
- loggerlogging.Logger
Logger used for status and warnings.
- clientdask.distributed.Client or None
Dask client (optional). If None, a warning is emitted at initialisation.
- AOM2_dictdict
Configuration dictionary expected to contain at least experiment (str) when experiment is not passed explicitly to methods.
- dt0_str, dtN_strstr
Default ISO date strings (e.g. “1993-01-01”) used when method arguments are None.
- cice_vars_reqdlist[str]
Required CICE variable names you want to extract from the loaded dataset.
- cice_var_listlist[str]
Variable names used to query the catalog. This can match cice_vars_reqd, or be a broader list (e.g., including coordinates/auxiliary vars).
- D_zarrstr or pathlib.Path
Output directory where monthly Zarr groups are written.
- raises ValueError:
If self.cice_vars_reqd is not defined at initialisation.
- list_access_cice_variables(experiment=None)[source]
List available daily CICE variables in the ACCESS-NRI Intake catalog.
This method queries the ACCESS-NRI catalog for the specified experiment and returns the unique set of variable names found at daily frequency (frequency=”1day”).
- Parameters:
experiment (str, optional) – ACCESS experiment identifier (e.g., “access-om2-01-iaf”). If None, the method falls back to self.AOM2_dict[‘experiment’].
- Returns:
Sorted list of unique CICE variable names available for the experiment at 1-day frequency. Returns an empty list if no matching results are found.
- Return type:
list[str]
- Raises:
ValueError – If experiment is not provided and self.AOM2_dict[‘experiment’] is missing.
RuntimeError – If the ACCESS-NRI catalog is not available as intake.cat.access_nri.
Notes
The ACCESS-NRI Intake catalog must be installed and configured (i.e., the catalog is discoverable under intake.cat).
Catalog entries sometimes store variable names as a list per asset. This method flattens those lists into a unique set.
- load_ACCESS_OM_CICE(experiment=None, dt0_str=None, dtN_str=None)[source]
Load daily CICE output from the ACCESS-NRI Intake catalog as an xarray.Dataset.
The method searches the ACCESS-NRI catalog for a daily (frequency=”1day”) CICE dataset matching self.cice_var_list, opens it via to_dataset_dict(), extracts the variables listed in self.cice_vars_reqd (if present), and subsets the time dimension to the requested window.
- Parameters:
experiment (str, optional) – ACCESS experiment identifier. If None, defaults to self.AOM2_dict[‘experiment’].
dt0_str (str, optional) – Inclusive start date string (ISO-like), e.g. “1993-01-01”. If None, defaults to self.dt0_str.
dtN_str (str, optional) – Inclusive end date string (ISO-like), e.g. “1993-12-31”. If None, defaults to self.dtN_str.
- Returns:
Dataset containing the extracted variables on the time subset. The dataset is merged from individual variables to ensure consistent coordinate labelling.
- Return type:
xr.Dataset
- Raises:
RuntimeError – If the ACCESS-NRI catalog is not available as intake.cat.access_nri.
KeyError – If experiment is None and self.AOM2_dict[‘experiment’] is missing.
ValueError – If the catalog search returns zero or multiple daily datasets (expects exactly one).
- Warns:
UserWarning (via logger) – If any of self.cice_vars_reqd are missing from the loaded dataset.
Notes
Uses use_cftime=True for robust non-standard calendars.
Adds explicit nj and ni coordinates (integer labels) to each extracted variable to avoid merge failures or downstream ambiguity when datasets lack labelled indices.
The catalog search uses self.cice_var_list; variables actually returned are then filtered to self.cice_vars_reqd (if present).
- write_ACCESS_to_monthly_zarr(DS, overwrite=False)[source]
Write a daily ACCESS-CICE Dataset into monthly Zarr groups.
- The input dataset is assumed to have a daily time axis. The method:
Creates/ensures the output directory exists (self.D_zarr).
Shifts timestamps back by one day to correct the common “daily mean labelled by the following day” convention in some CICE outputs.
Splits the dataset into monthly subsets (YYYY-MM).
Writes each month to iceh_YYYY-MM.zarr under self.D_zarr.
- Parameters:
DS (xr.Dataset) – Daily CICE dataset with a time dimension. Expected to also have spatial dimensions nj and ni.
overwrite (bool, default False) – If True, existing monthly Zarr directories are overwritten. If False, existing outputs are skipped.
- Returns:
None – This method writes Zarr stores to disk and does not return a value.
Side Effects
————
Writes monthly Zarr groups to – <self.D_zarr>/iceh_YYYY-MM.zarr
- Raises:
OSError – If the output directory cannot be created.
Exception – Any exception raised by xarray.Dataset.to_zarr() will be logged and propagated unless caught (current implementation logs errors per month).
Notes
Time shift: time = time - 1 day. If your upstream data is already correctly labelled, remove or disable this adjustment.
Chunking: this method applies a default chunking {“time”: -1, “nj”: 540, “ni”: 1440}. Adjust to match your grid shape and storage/performance requirements.