climakitae.util package

Contents

climakitae.util package#

Submodules#

climakitae.util.cluster module#

class climakitae.util.cluster.Cluster(address=None, proxy_address=None, public_address=None, auth=None, cluster_options=None, shutdown_on_close=True, asynchronous=False, loop=None, **kwargs)#

Bases: GatewayCluster

A dask-gateway cluster allowing one cluster per user. Instead of always creating new clusters, connect to a previously running user cluster, and attempt to limit users to a single cluster.

get_client(set_as_default=True) Client#

Get a dask client connected to the cluster.

Examples

>>> from climakitae.util.cluster import Cluster
>>> cluster = Cluster() # Create cluster
>>> cluster.adapt(minimum=0, maximum=8) # Specify the number of workers to use
>>> client = cluster.get_client()
>>> cluster # Output cluster information
extra_packages = ['git+https://github.com/cal-adapt/climakitae.git']#
get_client(set_as_default: bool = True) Client#

Get client

Returns:

distributed.client.Client

climakitae.util.colormap module#

climakitae.util.colormap.read_ae_colormap(cmap: str = 'ae_orange', cmap_hex: bool = False) LinearSegmentedColormap | list#

Read in AE colormap by name

Parameters:
  • cmap (str) – one of [“ae_orange”, “ae_diverging”, “ae_blue”, “ae_diverging_r”, “categorical_cb”]

  • cmap_hex (boolean) – return RGB or hex colors?

Returns:

climakitae.util.generate_gwl_tables module#

Util for generating warming level reference data in ../data/ ###

To run, type: <<python generate_gwl_tables.py>> in the command line and wait for printed model outputs showing progress.

class climakitae.util.generate_gwl_tables.GWLGenerator(df: DataFrame, catalog_cesm: esm_datastore, sims_on_aws: dict = None)#

Bases: object

Class for generating Global Warming Level (GWL) reference data. Encapsulates the parameters and methods needed for GWL calculations.

df#

DataFrame containing metadata for CMIP6 simulations

Type:

pandas.DataFrame

sims_on_aws#

DataFrame listing available simulations on AWS

Type:

pandas.DataFrame

fs#

S3 file system object for accessing AWS data

Type:

s3fs.S3FileSystem

ens_mem_cesm#

List of realizations for CESM2

Type:

dict

cesm2_lens#

CESM2 LENS data loaded from catalog

Type:

Dataset

set_cesm2_lens()#

Pull subset of CESM2 model data.

get_sims_on_aws() pandas.DataFrame#

Generates a DataFrame listing all relevant CMIP6 simulations available on AWS.

build_timeseries(model_config: dict) xarray.Dataset#

Builds an xarray Dataset with a time dimension, containing the concatenated historical and SSP time series for all specified scenarios of a given model and ensemble member.

buildDFtimeSeries_cesm2(model_config: dict) xarray.Dataset#
get_gwl(smoothed: pandas.DataFrame, degree: float) pandas.DataFrame#

Computes the timestamp when a given GWL is first reached.

get_gwl_table_for_single_model_and_ensemble(model_config: dict, reference_period: dict) tuple[pandas.DataFrame, pandas.DataFrame]#

Generates a GWL table for a single model and ensemble member.

get_gwl_table(model_config: dict, reference_period: dict) tuple[pandas.DataFrame, pandas.DataFrame]#

Generates GWL tables for a model across all its ensemble members.

get_table_one_cesm2(model_config: dict, reference_period: dict) : Generates a GWL table for one member of CESM2.
get_table_cesm2(model_config: dict, reference_period: dict) : Generates a GWL table for the CESM2 model.

Examples

>>> df = pd.read_csv("https://cmip6-pds.s3.amazonaws.com/pangeo-cmip6.csv")
>>> catalog_cesm = intake.open_esm_datastore(
        "https://raw.githubusercontent.com/NCAR/cesm2-le-aws/main/intake-catalogs/aws-cesm2-le.json"
    )
>>> gwl_generator = GWLGenerator(df, catalog_cesm)
>>> models = ["EC-Earth3"]
>>> reference_periods = [{"start_year": "19810101", "end_year": "20101231"}]
>>> gwl_generator.generate_gwl_file(models, reference_periods)
buildDFtimeSeries_cesm2(model_config: dict[str, Any]) Dataset#

Builds a global temperature time series by weighting latitudes and averaging longitudes for the CESM2 model across specified scenarios from 1980 to 2100.

Parameters:

model_config (dict) – Dictionary containing ‘variable’, ‘model’, ‘ens_mem’, and ‘scenarios’ keys

Returns:

xarray.Dataset – A dataset containing the global temperature time series for each scenario.

build_timeseries(model_config: dict[str, Any]) Dataset#

Builds an xarray Dataset with a time dimension, containing the concatenated historical and SSP time series for all specified scenarios of a given model and ensemble member. Works for all of the models(/GCMs) in the list models, which appear in the current data catalog of WRF downscaling.

Parameters:

model_config (dict) – Dictionary containing ‘variable’, ‘model’, ‘ens_mem’, and ‘scenarios’ keys

Returns:

xarray.Dataset – A dataset with time as the dimension, containing the appended historical and SSP time series.

generate_gwl_file(models: list[str], reference_periods: list[dict])#

Generates global warming level (GWL) reference files for specified models.

Parameters:
  • models (list) – List of model names to processThe keys in the returned dictionary

  • reference_periods (list) – List of dictionaries with ‘start_year’ and ‘end_year’ keys

static get_gwl(smoothed: DataFrame, degree: float) DataFrame#

Computes the timestamp when a given GWL is first reached. Takes a smoothed time series of global mean temperature of different scenarios for a model and returns a table indicating the timestamp at which the specified warming level is reached.

Parameters:
  • smoothed (pandas.DataFrame) – A DataFrame containing a global mean temperature time series for a model for multiple scenarios.

  • degree (float) – The global warming level to detect, e.g., 1.5, 2, etc.

Returns:

pandas.DataFrame – A table containing timestamps for when each scenario first crosses the specified warming level.

get_gwl_table(model_config: dict[str, Any], reference_period: dict[str, str]) tuple[DataFrame, DataFrame]#

Generates a GWL table for a given model.

Parameters:
  • model_config (dict) – Dictionary containing ‘variable’, ‘model’, ‘ens_mem’, and ‘scenarios’ keys

  • reference_period (dict) – Dictionary containing ‘start_year’ and ‘end_year’ keys

Returns:

tuple – A DataFrame containing warming levels and a DataFrame with global mean temperature time series. To be exported into gwl_[time period]ref.csv and gwl_[time period]ref_timeidx.csv.

get_gwl_table_for_single_model_and_ensemble(model_config: dict[str, Any], reference_period: dict[str, str]) tuple[DataFrame, DataFrame]#

Generates a GWL table for a single model and ensemble member.

Loops through various global warming levels from climakitae.core.constants for the requested model/variant and scenarios.

Parameters:
  • model_config (dict) – Dictionary containing ‘variable’, ‘model’, ‘ens_mem’, and ‘scenarios’ keys

  • reference_period (dict) – Dictionary containing ‘start_year’ and ‘end_year’ keys

Returns:

tuple – A DataFrame containing warming levels and a DataFrame with global mean temperature time series.

get_sims_on_aws() DataFrame#

Generates a pandas DataFrame listing all relevant CMIP6 simulations available on AWS.

This function filters the input DataFrame df and identifies and lists CMIP6 model simulations for historical and various SSP (Shared Socioeconomic Pathway) scenarios. It only includes models that have both historical and at least one SSP ensemble member. Additionally, it ensures that only historical ensemble members with variants in at least one SSP are kept.

Returns:

pandas.DataFrame – A DataFrame indexed by model names (source_id) and columns corresponding to scenarios (‘historical’, ‘ssp585’, ‘ssp370’, ‘ssp245’, ‘ssp126’). Each cell contains a list of ensemble member IDs available on AWS for that model and scenario.

get_table_cesm2(model_config: dict[str, Any], reference_period: dict[str, str]) tuple[DataFrame, DataFrame]#

Generates a GWL table for the CESM2 model.

Parameters:
  • model_config (dict) – Dictionary containing ‘variable’, ‘model’, ‘ens_mem’, and ‘scenarios’ keys

  • reference_period (dict) – Dictionary containing ‘start_year’ and ‘end_year’ keys

Returns:

tuple – A DataFrame of warming levels and a DataFrame of global mean temperature time series for the CESM2 model.

get_table_one_cesm2(model_config: dict, reference_period: dict) tuple[DataFrame, DataFrame]#

Generates a GWL lookup table for one ensemble member of CESM2.

Parameters:
  • model_config (dict) – Dictionary containing ‘variable’, ‘model’, ‘ens_mem’, and ‘scenarios’ keys

  • reference_period (dict) – Dictionary containing ‘start_year’ and ‘end_year’ keys

Returns:

tuple – A DataFrame of warming levels and a DataFrame of global mean temperature time series.

climakitae.util.generate_gwl_tables.main(_kTest=False)#

Generates global warming level (GWL) reference files for all available CMIP6 GCMs and CESM2-LENS.

This includes: - Connecting to AWS S3 storage to access CMIP6 and CESM2-LENS data. - Filtering and processing data to create global temperature time series. - Generating and saving warming level tables in CSV format for different reference periods.

climakitae.util.generate_gwl_tables.make_weighted_timeseries(temp: DataArray) DataArray#

Creates a spatially-weighted single-dimension time series of global temperature.

The function weights the latitude grids by size and averages across all longitudes, resulting in a single time series object.

Parameters:

temp (xarray.DataArray) – An xarray DataArray of global temperature with latitude and longitude coordinates.

Returns:

  • xarray.DataArray – A time series of global temperature that is spatially weighted across latitudes and averaged across all longitudes.

  • Raises

  • -------

  • ValueError – If the DataArray doesn’t contain recognizable latitude/longitude coordinates.

climakitae.util.unit_conversions module#

Calculates alternative units for variables with multiple commonly used units, following NWS conversions for pressure and wind speed.

climakitae.util.unit_conversions.convert_units(da: DataArray, selected_units: str) DataArray#

Converts units for any variable

Parameters:
  • da (DataArray) – data

  • selected_units (str) – selected units of data, from selections.units

Returns:

da (DataArray) – data with converted units and updated units attribute

References

Wind speed: https://www.weather.gov/media/epz/wxcalc/windConversion.pdf Pressure: https://www.weather.gov/media/epz/wxcalc/pressureConversion.pdf

climakitae.util.unit_conversions.get_unit_conversion_options() dict#

Get dictionary of unit conversion options offered for each unit

climakitae.util.utils module#

climakitae.util.utils.add_dummy_time_to_wl(wl_da: DataArray, freq_name='daily') DataArray#

Replace the [hours/days/months]_from_center or time_delta dimension in a DataArray returned from WarmingLevels with a dummy time index for calculations with tools that require a time dimension.

Parameters:
  • wl_da (DataArray) – The input Warming Levels DataArray. It is expected to have a time-based dimension which typically includes “from_center” in its name or time_delta indicating the time dimension in relation to the year that the given warming level is reached per simulation.

  • freq_name (str, optional) – The frequency name to use when time_delta is the time dimension. Options are “hourly”, “daily”, or “monthly”. Default is “daily”.

Returns:

DataArray – A modified version of the input DataArray with the original time dimension replaced by a dummy time series. The new dimension will be named “time”.

Notes

  • The function looks for the dimension name containing “from_center” to identify the time-based dimension.

  • It supports creating dummy time series with frequencies of hours, days, or months, based on the prefix of the dimension name.

  • The dummy time series starts from “2000-01-01”.

climakitae.util.utils.area_average(dset: Dataset) Dataset#

Weighted area-average

Parameters:

dset (Dataset) – one dataset from the catalog

Returns:

Dataset – sub-setted output data

climakitae.util.utils.clip_gpd_to_shapefile(gdf: GeoDataFrame, shapefile: GeoDataFrame) GeoDataFrame#

Use a shapefile to select an area subset of a geodataframe. Used to subset stationlist to shapefile area.

Parameters:
  • gdf (gpd.GeoDataFrame) – Data to be clipped.

  • shapefile (gpd.GeoDataFrame) – Shapefile must include valid CRS.

Returns:

clipped (gpd.GeoDataFrame) – Subsetted geodataframe within shapefile area of interest.

climakitae.util.utils.clip_to_shapefile(data: Dataset | DataArray, shapefile_path: str, feature: tuple[str, Any] = (), name: str = 'user-defined', **kwargs) Dataset | DataArray#

Use a shapefile to select an area subset of AE data.

By default, this function will clip the data to the area covered by all features in the shapefile. To clip to specific features, use the feature keyword.

Parameters:
  • data (xr.Dataset | xr.DataArray) – Data to be clipped.

  • shapefile_path (str) – Filepath to shapefile. Shapefile must include valid CRS.

  • feature (tuple(str, str | int | float | list)) – Tuple containing attribute name and value(s) for target feature(s) (optional).

  • name (str) – Location name to record in data attributes if ‘feature’ parameter is not set (optional).

  • **kwargs – Additional arguments to pass to the rioxarray clip function

Returns:

clipped (xr.Dataset | xr.DataArray) – Returns same type as ‘data’, but grid is clipped to shapefile feature(s).

climakitae.util.utils.combine_hdd_cdd(data: DataArray) DataArray#

Drops specific unneeded coords from HDD/CDD data, independent of station or gridded data source

Parameters:

data (DataArray)

Returns:

data (DataArray)

climakitae.util.utils.compute_annual_aggreggate(data: DataArray, name: str, num_grid_cells: int) DataArray#

Calculates the annual sum of HDD and CDD

Parameters:
Returns:

annual_ag (DataArray)

climakitae.util.utils.compute_multimodel_stats(data: DataArray) DataArray#

Calculates model mean, min, max, median across simulations

Parameters:

data (DataArray)

Returns:

stats_concat (DataArray)

climakitae.util.utils.convert_to_local_time(data: ~xarray.core.dataarray.DataArray | ~xarray.core.dataset.Dataset, lon: float = <object object>, lat: float = <object object>) DataArray | Dataset#

Convert time dimension from UTC to local time for the grid or station.

Parameters:
  • data (xr.DataArray | xr.Dataset) – Input data.

  • grid_lon (float) – Mean longitude of dataset if no lat/lon coordinates

  • grid_lat (float) – Mean latitude of dataset if no lat/lon coordinates

Returns:

xr.DataArray | xr.Dataset – Data with converted time coordinate.

climakitae.util.utils.downscaling_method_as_list(downscaling_method: str) list[str]#

Function to convert string based radio button values to python list.

Parameters:

downscaling_method (str) – one of “Dynamical”, “Statistical”, or “Dynamical+Statistical”

Returns:

method_list (list) – one of [“Dynamical”], [“Statistical”], or [“Dynamical”,”Statistical”]

climakitae.util.utils.downscaling_method_to_activity_id(downscaling_method: str, reverse: bool = False) str#

Convert downscaling method to activity id to match catalog names

Parameters:
  • downscaling_method (str) – Downscaling method

  • reverse (boolean, optional) – Set reverse=True to get downscaling method from input activity_id Default to False

Returns:

str

climakitae.util.utils.get_closest_gridcell(data: Dataset | DataArray, lat: float, lon: float, print_coords: bool = True) DataArray | None#

From input gridded data, get the closest gridcell to a lat, lon coordinate pair.

This function first transforms the lat,lon coords to the gridded data’s projection. Then, it uses xarray’s built in method .sel to get the nearest gridcell.

Parameters:
  • data (xr.DataArray | xr.Dataset) – Gridded data

  • lat (float) – Latitude of coordinate pair

  • lon (float) – Longitude of coordinate pair

  • print_coords (bool, optional) – Print closest coorindates? Default to True. Set to False for backend use.

Returns:

xr.DataArray | None – Grid cell closest to input lat,lon coordinate pair

See also

xr.DataArray.sel

climakitae.util.utils.get_closest_gridcells(data: Dataset, lats: Iterable[float] | float, lons: Iterable[float] | float) Dataset | DataArray | None#

Find the nearest grid cell(s) for given latitude and longitude coordinates.

If the dataset uses (x, y) coordinates, lat/lon values are transformed to match its projection. The function then selects the closest grid cell using sel() or get_indexer(), ensuring the selection is within an appropriate tolerance.

Parameters:
  • data (xr.DataArray | xr.Dataset) – Gridded dataset with (x, y) or (lat, lon) dimensions.

  • lats (float | Iterable[float]) – Latitude coordinate(s).

  • lons (float | Iterable[float]) – Longitude coordinate(s).

Returns:

xr.Dataset | xr.DataArray | None – Nearest grid cell(s) or None if no valid match is found.

Notes

  • If (x, y) dimensions exist, lat/lon coordinates are projected using pyproj.Transformer.

  • The search tolerance is derived from the dataset resolution.

  • Returns None if no grid cells are within tolerance.

See also

xr.DataArray.sel, pyproj.Transformer

climakitae.util.utils.julianDay_to_date(julday: int, year: int = None, return_type: str = 'str', str_format: str = '%b-%d') str | datetime | date#

Convert julian day of year to a date object or formatted string.

Parameters:
  • julday (int) – Julian day (day of year)

  • year (int, optional) – Year to use. If None, uses current year or a leap year (2024) based on needs. Default is None.

  • return_type (str, optional) – Type of return value: - “str”: formatted string (default) - “datetime”: datetime object - “date”: date object

  • str_format (str, optional) – String format of output date when return_type is “str”. Default is “%b-%d” which outputs format like “Jan-01”.

Returns:

date (str, datetime.datetime, or datetime.date) – Julian day converted to specified format or object

Examples

>>> julianDay_to_date(1)
'Jan-01'
>>> julianDay_to_date(32, year=2023, return_type="date")
datetime.date(2023, 2, 1)
>>> julianDay_to_date(60, year=2024, str_format="%Y-%m-%d")
'2024-02-29'
climakitae.util.utils.read_csv_file(rel_path: str, index_col: str = <object object>, parse_dates: bool = False) DataFrame#

Read CSV file into pandas DataFrame

Parameters:
  • rel_path (str) – path to CSV file relative to this util python file

  • index_col (str) – CSV column to index DataFrame on

  • parse_dates (boolean) – Whether to have pandas parse the date strings

Returns:

DataFrame

climakitae.util.utils.readable_bytes(b: int) str#

Return the given bytes as a human friendly KB, MB, GB, or TB string.

Parameters:

B (byte)

Returns:

  • str

  • Code from stackoverflow (https://stackoverflow.com/questions/12523586/python-format-size-application-converting-b-to-kb-mb-gb-tb)

climakitae.util.utils.reproject_data(xr_da: DataArray, proj: str = 'EPSG:4326', fill_value: float = nan) DataArray#

Reproject xr.DataArray using rioxarray.

Parameters:
  • xr_da (DataArray) – 2-or-3-dimensional DataArray, with 2 spatial dimensions

  • proj (str) – proj to use for reprojection (default to “EPSG:4326”– lat/lon coords)

  • fill_value (float) – fill value (default to np.nan)

Returns:

data_reprojected (DataArray) – 2-or-3-dimensional reprojected DataArray

Raises:
  • ValueError – if input data does not have spatial coords x,y

  • ValueError – if input data has more than 5 dimensions

climakitae.util.utils.resolution_to_gridlabel(resolution: str, reverse: bool = False) str#

Convert resolution format to grid_label format matching catalog names.

Parameters:
  • resolution (str) – resolution

  • reverse (boolean, optional) – Set reverse=True to get resolution format from input grid_label. Default to False

Returns:

str

climakitae.util.utils.scenario_to_experiment_id(scenario: str, reverse: bool = False) str#

Convert scenario format to experiment_id format matching catalog names.

Parameters:
  • scenario (str)

  • reverse (boolean, optional) – Set reverse=True to get scenario format from input experiement_id. Default to False

Returns:

str

climakitae.util.utils.stack_sims_across_locs(ds)#
climakitae.util.utils.summary_table(data: Dataset) DataFrame#

Helper function to organize dataset object into a pandas dataframe for ease.

Parameters:

data (Dataset)

Returns:

df (DataFrame) – df is organized so that the simulations are stacked in individual columns by year/time

climakitae.util.utils.timescale_to_table_id(timescale: str, reverse: bool = False) str#

Convert resolution format to table_id format matching catalog names.

Parameters:
  • timescale (str)

  • reverse (boolean, optional) – Set reverse=True to get resolution format from input table_id. Default to False

Returns:

str

climakitae.util.utils.trendline(data: Dataset, kind: str = 'mean') Dataset#

Calculates treadline of the multi-model mean or median.

Parameters:
  • data (Dataset)

  • kind (str , *optional*) – Options are ‘mean’ and ‘median’

Returns:

trendline (Dataset)

Note

1. Development note: If an additional option to trendline ‘kind’ is required, compute_multimodel_stats must be modified to update optionality.

climakitae.util.utils.write_csv_file(df: DataFrame, rel_path: str) None#

Write CSV file from pandas DataFrame

Parameters:
  • df (DataFrame) – pandas DataFrame to write out

  • rel_path (str) – path to CSV file relative to this util python file

Returns:

None

Module contents#