climakitae.util package#
Submodules#
climakitae.util.cluster module#
- class climakitae.util.cluster.Cluster(address=None, proxy_address=None, public_address=None, auth=None, cluster_options=None, shutdown_on_close=True, asynchronous=False, loop=None, **kwargs)#
Bases:
GatewayClusterA dask-gateway cluster allowing one cluster per user. Instead of always creating new clusters, connect to a previously running user cluster, and attempt to limit users to a single cluster.
- get_client(set_as_default=True) Client#
Get a dask client connected to the cluster.
Examples
>>> from climakitae.util.cluster import Cluster >>> cluster = Cluster() # Create cluster >>> cluster.adapt(minimum=0, maximum=8) # Specify the number of workers to use >>> client = cluster.get_client() >>> cluster # Output cluster information
- extra_packages = ['git+https://github.com/cal-adapt/climakitae.git']#
climakitae.util.colormap module#
- climakitae.util.colormap.read_ae_colormap(cmap: str = 'ae_orange', cmap_hex: bool = False) LinearSegmentedColormap | list#
Read in AE colormap by name
- Parameters:
- Returns:
oneofeithercmap_data (
matplotlib.colors.LinearSegmentedColormap) – used for matplotlib (if cmap_hex == False)cmap_data (
list) – used for hvplot maps (if cmap_hex == True)
climakitae.util.generate_gwl_tables module#
Util for generating warming level reference data in ../data/ ###
To run, type: <<python generate_gwl_tables.py>> in the command line and wait for printed model outputs showing progress.
- class climakitae.util.generate_gwl_tables.GWLGenerator(df: DataFrame, catalog_cesm: esm_datastore, sims_on_aws: dict = None)#
Bases:
objectClass for generating Global Warming Level (GWL) reference data. Encapsulates the parameters and methods needed for GWL calculations.
- df#
DataFrame containing metadata for CMIP6 simulations
- Type:
- sims_on_aws#
DataFrame listing available simulations on AWS
- Type:
- fs#
S3 file system object for accessing AWS data
- Type:
s3fs.S3FileSystem
- set_cesm2_lens()#
Pull subset of CESM2 model data.
- get_sims_on_aws() pandas.DataFrame#
Generates a DataFrame listing all relevant CMIP6 simulations available on AWS.
- build_timeseries(model_config: dict) xarray.Dataset#
Builds an xarray Dataset with a time dimension, containing the concatenated historical and SSP time series for all specified scenarios of a given model and ensemble member.
- buildDFtimeSeries_cesm2(model_config: dict) xarray.Dataset#
- get_gwl(smoothed: pandas.DataFrame, degree: float) pandas.DataFrame#
Computes the timestamp when a given GWL is first reached.
- get_gwl_table_for_single_model_and_ensemble(model_config: dict, reference_period: dict) tuple[pandas.DataFrame, pandas.DataFrame]#
Generates a GWL table for a single model and ensemble member.
- get_gwl_table(model_config: dict, reference_period: dict) tuple[pandas.DataFrame, pandas.DataFrame]#
Generates GWL tables for a model across all its ensemble members.
- get_table_one_cesm2(model_config: dict, reference_period: dict) : Generates a GWL table for one member of CESM2.
- get_table_cesm2(model_config: dict, reference_period: dict) : Generates a GWL table for the CESM2 model.
Examples
>>> df = pd.read_csv("https://cmip6-pds.s3.amazonaws.com/pangeo-cmip6.csv") >>> catalog_cesm = intake.open_esm_datastore( "https://raw.githubusercontent.com/NCAR/cesm2-le-aws/main/intake-catalogs/aws-cesm2-le.json" ) >>> gwl_generator = GWLGenerator(df, catalog_cesm) >>> models = ["EC-Earth3"] >>> reference_periods = [{"start_year": "19810101", "end_year": "20101231"}] >>> gwl_generator.generate_gwl_file(models, reference_periods)
- buildDFtimeSeries_cesm2(model_config: dict[str, Any]) Dataset#
Builds a global temperature time series by weighting latitudes and averaging longitudes for the CESM2 model across specified scenarios from 1980 to 2100.
- Parameters:
model_config (
dict) – Dictionary containing ‘variable’, ‘model’, ‘ens_mem’, and ‘scenarios’ keys- Returns:
xarray.Dataset– A dataset containing the global temperature time series for each scenario.
- build_timeseries(model_config: dict[str, Any]) Dataset#
Builds an xarray Dataset with a time dimension, containing the concatenated historical and SSP time series for all specified scenarios of a given model and ensemble member. Works for all of the models(/GCMs) in the list models, which appear in the current data catalog of WRF downscaling.
- Parameters:
model_config (
dict) – Dictionary containing ‘variable’, ‘model’, ‘ens_mem’, and ‘scenarios’ keys- Returns:
xarray.Dataset– A dataset with time as the dimension, containing the appended historical and SSP time series.
- generate_gwl_file(models: list[str], reference_periods: list[dict])#
Generates global warming level (GWL) reference files for specified models.
- static get_gwl(smoothed: DataFrame, degree: float) DataFrame#
Computes the timestamp when a given GWL is first reached. Takes a smoothed time series of global mean temperature of different scenarios for a model and returns a table indicating the timestamp at which the specified warming level is reached.
- Parameters:
smoothed (
pandas.DataFrame) – A DataFrame containing a global mean temperature time series for a model for multiple scenarios.degree (
float) – The global warming level to detect, e.g., 1.5, 2, etc.
- Returns:
pandas.DataFrame– A table containing timestamps for when each scenario first crosses the specified warming level.
- get_gwl_table(model_config: dict[str, Any], reference_period: dict[str, str]) tuple[DataFrame, DataFrame]#
Generates a GWL table for a given model.
- Parameters:
- Returns:
tuple– A DataFrame containing warming levels and a DataFrame with global mean temperature time series. To be exported into gwl_[time period]ref.csv and gwl_[time period]ref_timeidx.csv.
- get_gwl_table_for_single_model_and_ensemble(model_config: dict[str, Any], reference_period: dict[str, str]) tuple[DataFrame, DataFrame]#
Generates a GWL table for a single model and ensemble member.
Loops through various global warming levels from climakitae.core.constants for the requested model/variant and scenarios.
- Parameters:
- Returns:
tuple– A DataFrame containing warming levels and a DataFrame with global mean temperature time series.
- get_sims_on_aws() DataFrame#
Generates a pandas DataFrame listing all relevant CMIP6 simulations available on AWS.
This function filters the input DataFrame df and identifies and lists CMIP6 model simulations for historical and various SSP (Shared Socioeconomic Pathway) scenarios. It only includes models that have both historical and at least one SSP ensemble member. Additionally, it ensures that only historical ensemble members with variants in at least one SSP are kept.
- Returns:
pandas.DataFrame– A DataFrame indexed by model names (source_id) and columns corresponding to scenarios (‘historical’, ‘ssp585’, ‘ssp370’, ‘ssp245’, ‘ssp126’). Each cell contains a list of ensemble member IDs available on AWS for that model and scenario.
- get_table_cesm2(model_config: dict[str, Any], reference_period: dict[str, str]) tuple[DataFrame, DataFrame]#
Generates a GWL table for the CESM2 model.
- Parameters:
- Returns:
tuple– A DataFrame of warming levels and a DataFrame of global mean temperature time series for the CESM2 model.
- climakitae.util.generate_gwl_tables.main(_kTest=False)#
Generates global warming level (GWL) reference files for all available CMIP6 GCMs and CESM2-LENS.
This includes: - Connecting to AWS S3 storage to access CMIP6 and CESM2-LENS data. - Filtering and processing data to create global temperature time series. - Generating and saving warming level tables in CSV format for different reference periods.
- climakitae.util.generate_gwl_tables.make_weighted_timeseries(temp: DataArray) DataArray#
Creates a spatially-weighted single-dimension time series of global temperature.
The function weights the latitude grids by size and averages across all longitudes, resulting in a single time series object.
- Parameters:
temp (
xarray.DataArray) – An xarray DataArray of global temperature with latitude and longitude coordinates.- Returns:
xarray.DataArray– A time series of global temperature that is spatially weighted across latitudes and averaged across all longitudes.Raises-------ValueError– If the DataArray doesn’t contain recognizable latitude/longitude coordinates.
climakitae.util.unit_conversions module#
Calculates alternative units for variables with multiple commonly used units, following NWS conversions for pressure and wind speed.
- climakitae.util.unit_conversions.convert_units(da: DataArray, selected_units: str) DataArray#
Converts units for any variable
- Parameters:
- Returns:
da (
DataArray) – data with converted units and updated units attribute
References
Wind speed: https://www.weather.gov/media/epz/wxcalc/windConversion.pdf Pressure: https://www.weather.gov/media/epz/wxcalc/pressureConversion.pdf
climakitae.util.utils module#
- climakitae.util.utils.add_dummy_time_to_wl(wl_da: DataArray, freq_name='daily') DataArray#
Replace the [hours/days/months]_from_center or time_delta dimension in a DataArray returned from WarmingLevels with a dummy time index for calculations with tools that require a time dimension.
- Parameters:
wl_da (
DataArray) – The input Warming Levels DataArray. It is expected to have a time-based dimension which typically includes “from_center” in its name or time_delta indicating the time dimension in relation to the year that the given warming level is reached per simulation.freq_name (
str, optional) – The frequency name to use when time_delta is the time dimension. Options are “hourly”, “daily”, or “monthly”. Default is “daily”.
- Returns:
DataArray– A modified version of the input DataArray with the original time dimension replaced by a dummy time series. The new dimension will be named “time”.
Notes
The function looks for the dimension name containing “from_center” to identify the time-based dimension.
It supports creating dummy time series with frequencies of hours, days, or months, based on the prefix of the dimension name.
The dummy time series starts from “2000-01-01”.
- climakitae.util.utils.clip_gpd_to_shapefile(gdf: GeoDataFrame, shapefile: GeoDataFrame) GeoDataFrame#
Use a shapefile to select an area subset of a geodataframe. Used to subset stationlist to shapefile area.
- Parameters:
gdf (
gpd.GeoDataFrame) – Data to be clipped.shapefile (
gpd.GeoDataFrame) – Shapefile must include valid CRS.
- Returns:
clipped (
gpd.GeoDataFrame) – Subsetted geodataframe within shapefile area of interest.
- climakitae.util.utils.clip_to_shapefile(data: Dataset | DataArray, shapefile_path: str, feature: tuple[str, Any] = (), name: str = 'user-defined', **kwargs) Dataset | DataArray#
Use a shapefile to select an area subset of AE data.
By default, this function will clip the data to the area covered by all features in the shapefile. To clip to specific features, use the feature keyword.
- Parameters:
data (
xr.Dataset | xr.DataArray) – Data to be clipped.shapefile_path (
str) – Filepath to shapefile. Shapefile must include valid CRS.feature (
tuple(str,str | int | float | list)) – Tuple containing attribute name and value(s) for target feature(s) (optional).name (
str) – Location name to record in data attributes if ‘feature’ parameter is not set (optional).**kwargs – Additional arguments to pass to the rioxarray clip function
- Returns:
clipped (
xr.Dataset | xr.DataArray) – Returns same type as ‘data’, but grid is clipped to shapefile feature(s).
- climakitae.util.utils.combine_hdd_cdd(data: DataArray) DataArray#
Drops specific unneeded coords from HDD/CDD data, independent of station or gridded data source
- climakitae.util.utils.compute_annual_aggreggate(data: DataArray, name: str, num_grid_cells: int) DataArray#
Calculates the annual sum of HDD and CDD
- climakitae.util.utils.compute_multimodel_stats(data: DataArray) DataArray#
Calculates model mean, min, max, median across simulations
- climakitae.util.utils.convert_to_local_time(data: ~xarray.core.dataarray.DataArray | ~xarray.core.dataset.Dataset, lon: float = <object object>, lat: float = <object object>) DataArray | Dataset#
Convert time dimension from UTC to local time for the grid or station.
- climakitae.util.utils.downscaling_method_as_list(downscaling_method: str) list[str]#
Function to convert string based radio button values to python list.
- climakitae.util.utils.downscaling_method_to_activity_id(downscaling_method: str, reverse: bool = False) str#
Convert downscaling method to activity id to match catalog names
- climakitae.util.utils.get_closest_gridcell(data: Dataset | DataArray, lat: float, lon: float, print_coords: bool = True) DataArray | None#
From input gridded data, get the closest gridcell to a lat, lon coordinate pair.
This function first transforms the lat,lon coords to the gridded data’s projection. Then, it uses xarray’s built in method .sel to get the nearest gridcell.
- Parameters:
- Returns:
xr.DataArray | None– Grid cell closest to input lat,lon coordinate pair
See also
xr.DataArray.sel
- climakitae.util.utils.get_closest_gridcells(data: Dataset, lats: Iterable[float] | float, lons: Iterable[float] | float) Dataset | DataArray | None#
Find the nearest grid cell(s) for given latitude and longitude coordinates.
If the dataset uses (x, y) coordinates, lat/lon values are transformed to match its projection. The function then selects the closest grid cell using sel() or get_indexer(), ensuring the selection is within an appropriate tolerance.
- Parameters:
data (
xr.DataArray | xr.Dataset) – Gridded dataset with (x, y) or (lat, lon) dimensions.lats (
float | Iterable[float]) – Latitude coordinate(s).lons (
float | Iterable[float]) – Longitude coordinate(s).
- Returns:
xr.Dataset | xr.DataArray | None– Nearest grid cell(s) or None if no valid match is found.
Notes
If (x, y) dimensions exist, lat/lon coordinates are projected using pyproj.Transformer.
The search tolerance is derived from the dataset resolution.
Returns None if no grid cells are within tolerance.
See also
xr.DataArray.sel,pyproj.Transformer
- climakitae.util.utils.julianDay_to_date(julday: int, year: int = None, return_type: str = 'str', str_format: str = '%b-%d') str | datetime | date#
Convert julian day of year to a date object or formatted string.
- Parameters:
julday (
int) – Julian day (day of year)year (
int, optional) – Year to use. If None, uses current year or a leap year (2024) based on needs. Default is None.return_type (
str, optional) – Type of return value: - “str”: formatted string (default) - “datetime”: datetime object - “date”: date objectstr_format (
str, optional) – String format of output date when return_type is “str”. Default is “%b-%d” which outputs format like “Jan-01”.
- Returns:
date (
str,datetime.datetime, ordatetime.date) – Julian day converted to specified format or object
Examples
>>> julianDay_to_date(1) 'Jan-01' >>> julianDay_to_date(32, year=2023, return_type="date") datetime.date(2023, 2, 1) >>> julianDay_to_date(60, year=2024, str_format="%Y-%m-%d") '2024-02-29'
- climakitae.util.utils.read_csv_file(rel_path: str, index_col: str = <object object>, parse_dates: bool = False) DataFrame#
Read CSV file into pandas DataFrame
- climakitae.util.utils.readable_bytes(b: int) str#
Return the given bytes as a human friendly KB, MB, GB, or TB string.
- Parameters:
B (
byte)- Returns:
Code from stackoverflow (
https://stackoverflow.com/questions/12523586/python-format-size-application-converting-b-to-kb-mb-gb-tb)
- climakitae.util.utils.reproject_data(xr_da: DataArray, proj: str = 'EPSG:4326', fill_value: float = nan) DataArray#
Reproject xr.DataArray using rioxarray.
- Parameters:
- Returns:
data_reprojected (
DataArray) – 2-or-3-dimensional reprojected DataArray- Raises:
ValueError – if input data does not have spatial coords x,y
ValueError – if input data has more than 5 dimensions
- climakitae.util.utils.resolution_to_gridlabel(resolution: str, reverse: bool = False) str#
Convert resolution format to grid_label format matching catalog names.
- climakitae.util.utils.scenario_to_experiment_id(scenario: str, reverse: bool = False) str#
Convert scenario format to experiment_id format matching catalog names.
- climakitae.util.utils.stack_sims_across_locs(ds)#
- climakitae.util.utils.summary_table(data: Dataset) DataFrame#
Helper function to organize dataset object into a pandas dataframe for ease.
- climakitae.util.utils.timescale_to_table_id(timescale: str, reverse: bool = False) str#
Convert resolution format to table_id format matching catalog names.
- climakitae.util.utils.trendline(data: Dataset, kind: str = 'mean') Dataset#
Calculates treadline of the multi-model mean or median.
- Parameters:
data (
Dataset)kind (str , *optional*) – Options are ‘mean’ and ‘median’
- Returns:
trendline (
Dataset)
Note
1. Development note: If an additional option to trendline ‘kind’ is required, compute_multimodel_stats must be modified to update optionality.