climakitae.explore package

Contents

climakitae.explore package#

Submodules#

climakitae.explore.agnostic module#

Backend for agnostic tools.

climakitae.explore.agnostic.agg_area_subset_sims(area_subset, cached_area, downscaling_method, variable, agg_func, units, years, months=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], wrf_timescale='monthly')#

This function combines all available WRF or LOCA simulation data that is filtered on the area_subset (a string from existing keys in Boundaries.boundary_dict()) and on one of the areas of the values in that area_subset (cached_area). It then extracts this data across all SSP pathways for specific years/months, and runs the passed in agg_func on all of this data. The results are then returned in 3 values, the first as a dict of statistic names to xr.DataArray single simulation objects (i.e. median), the second as a dict of statistic names to xr.DataArray objects consisting of multiple simulation objects (i.e. middle 10%), and the last as a xr.DataArray of simulations’ aggregated values sorted in ascending order.

Parameters:
  • area_subset (str) – Describes the category of the boundaries of interest (i.e. “CA Electric Load Serving Entities (IOU & POU)”)

  • cached_area (str) – Describes the specific area of interest (i.e. “Southern California Edison”)

  • agg_func (str) – The metric to aggregate the simulations by.

  • years (tuple) – The lower and upper year bounds (inclusive) to extract simulation data by.

  • months (list, optional) – Specific months of interest. The default is all months.

Returns:

  • single_stats (dict of str: DataArray) – Dictionary mapping string names of statistics to single simulation xr.DataArray objects.

  • multiple_stats (dict of str: DataArray) – Dictionary mapping string names of statistics to multiple simulations xr.DataArray objects.

  • results (DataArray) – Aggregated results of running the given aggregation function on the lat/lon gridcell of interest. Results are also sorted in ascending order.

climakitae.explore.agnostic.agg_lat_lon_sims(lat: float | Tuple[float, float], lon: float | Tuple[float, float], downscaling_method, variable, agg_func, units, years, months=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], wrf_timescale='monthly')#

Gets aggregated WRF or LOCA simulation data for a lat/lon coordinate or lat/lon range for a given metric and timeframe (years, months). It combines all selected simulation data that is filtered by lat/lon, years, and specific months across SSP pathways and runs the passed in metric on all of the data. The results are then returned in ascending order, along with dictionaries mapping specific statistic names to the simulation objects themselves.

Parameters:
  • lat (float) – Latitude for specific location of interest.

  • lon (float) – Longitude for specific location of interest.

  • agg_func (str) – The function to aggregate the simulations by.

  • years (tuple) – The lower and upper year bounds (inclusive) to subset simulation data by.

  • months (list, optional) – Specific months of interest. The default is all months.

Returns:

  • single_stats (dict of str: DataArray) – Dictionary mapping string names of statistics to single simulation xr.DataArray objects.

  • multiple_stats (dict of str: DataArray) – Dictionary mapping string names of statistics to multiple simulations xr.DataArray objects.

  • results (DataArray) – Aggregated results of running the given aggregation function on the lat/lon gridcell of interest. Results are also sorted in ascending order.

climakitae.explore.agnostic.create_lookup_tables()#

Create lookup tables for converting between warming level and time.

Returns:

dict of pandas.DataFrame – A dictionary containing two dataframes: “time lookup table” which maps warming levels to their occurence times for each GCM simulation we catalog, and “warming level lookup table” which contains yearly warming levels for those simulations.

climakitae.explore.agnostic.get_available_units(variable, downscaling_method, wrf_timescale='monthly')#

Get other available units available for the given unit

climakitae.explore.agnostic.show_available_vars(downscaling_method, wrf_timescale='monthly')#

Function that shows the available variables based on the input downscaling method.

climakitae.explore.agnostic.warm_level_to_month(time_df, scenario, warming_level)#

Given warming level, give month.

climakitae.explore.agnostic.year_to_warm_levels(warm_df, scenario, year)#

Given year, give warming levels and their median.

climakitae.explore.standard_year_profile module#

Calculates the Standard Year Climate Profiles using a warming level approach and designated quantiles. The historical baseline for relative profile computation is a warming level of 1.2 C. User specified warming level will be calculated relative to this baseline unless the “no_delta” option is set to True, in which case the raw profile(s) for the requested warming level(s) will be returned.

climakitae.explore.standard_year_profile.compute_profile(data: DataArray, days_in_year: int = 365, q=0.5) DataFrame

Calculates the standard year climate profile for warming level data using 8760 analysis.

This function handles global warming levels approach using time_delta coordinate. Processes all 30 years of warming level data centered around the year a warming level is reached, computes the specified quantile for each hour of the year across all years, then selects the actual data value closest to that quantile (not interpolated), and returns a characteristic profile of 8760 hours (one year) for each warming level and simulation combination.

Parameters:
  • data (DataArray) – Hourly base-line subtracted data for one variable with warming_level, time_delta, and simulation dimensions. Expected to contain ~30 years (262,800 hours) of data for each warming level and simulation.

  • days_in_year (int, optional) – Either 366 or 365, depending on whether or not the year is a leap year. Default to 365 days

  • q (float, optional) – Quantile value for selecting representative values (0.0 to 1.0). Default is 0.5 (median).

Returns:

DataFrame – Standard year table for each warming level and simulation, with days of year as the index and hour of day as the columns. Multi-index columns include Hour, Warming_Level, and Simulation dimensions.

climakitae.explore.standard_year_profile.export_profile_to_csv(profile: ~pandas.core.frame.DataFrame, variable: str, q: float, global_warming_levels: list[float], station_name: str = <object object>, cached_area: str = <object object>, latitude: float | int = <object object>, longitude: float | int = <object object>, no_delta: bool = False)

Export profile to csv file with a descriptive file name.

Each warming level is saved in a separate file.

Parameters:
  • profile (DataFrame) – Standard year profile with MultiIndex columns

  • variable (str) – Name of variable used in profile

  • q (float) – Percentile used in profile

  • global_warming_levels (list[float]) – List of global warming levels in profile

  • latitude (float | int, optional) – Latitude coordinate from profile location

  • longitude (float | int, optional) – Longitude coordinate from profile location

  • station_name (str, optional) – Name of HadISD station or custom location used in profile

  • cached_area (str, optional) – Name of cached area used in profile

  • coord_name (str, optional) – Name of location, used with latitude and longitude

  • no_delta (bool, optional) – True if no_delta=True when generating profile

Notes

station_name can be used with latitude and longitude as long as station_name is not set to a HadISD station.

If cached_area is set along with latitude and longitude, latitude and longitude take priority and cached_area will be dropped.

climakitae.explore.standard_year_profile.get_climate_profile(**kwargs) DataFrame

High-level function to compute standard year climate profiles using warming level data.

This function retrieves climate data and computes standard year profiles using the 8760 analysis approach. It combines data retrieval and profile computation in a single call.

Parameters:

**kwargs (dict) – Keyword arguments for data selection. Allowed keys: - variable (Optional) : str, default “Air Temperature at 2m” - resolution (Optional) : str, default “3 km” - warming_level (Required) : List[float], default [1.2] - cached_area (Optional) : str or List[str] - units (Optional) : str, default “degF” - latitude (Optional) : float or tuple - longitude (Optional) : float or tuple - stations (Optional) : list[str], default None - days_in_year (Optional) : int, default 365 - q (Optional) : float | list[float], default 0.5, quantile for profile calculation - no_delta (optional) : bool, default False, if True, do not apply baseline subtraction, return raw future profile

Returns:

DataFrame – Standard year table for each warming level, with days of year as the index and hour of day as the columns. If multiple warming levels exist, they will be included as additional column levels. Units and metadata are preserved in the DataFrame’s attrs dictionary.

Examples

>>> profile = get_climate_profile(
...     variable="Air Temperature at 2m",
...     warming_level=[1.5, 2.0, 3.0],
...     units="degF"
... )
>>> profile = get_climate_profile(warming_level=[2.0])
climakitae.explore.standard_year_profile.get_profile_metadata(profile_df: DataFrame) dict

Extract all metadata from a climate profile DataFrame.

Parameters:

profile_df (DataFrame) – Climate profile DataFrame with metadata stored in attrs

Returns:

dict – Dictionary containing all available metadata

Examples

>>> profile = get_climate_profile(variable="Air Temperature at 2m", warming_level=[2.0])
>>> metadata = get_profile_metadata(profile)
>>> print(f"Variable: {metadata.get('variable_name')}")
>>> print(f"Units: {metadata.get('units')}")
>>> print(f"Method: {metadata.get('method')}")
climakitae.explore.standard_year_profile.get_profile_units(profile_df: DataFrame) str

Extract units information from a climate profile DataFrame.

Parameters:

profile_df (DataFrame) – Climate profile DataFrame with units stored in attrs

Returns:

str – Units string, or ‘Unknown’ if not found

Examples

>>> profile = get_climate_profile(variable="Air Temperature at 2m", warming_level=[2.0])
>>> units = get_profile_units(profile)
>>> print(f"Temperature units: {units}")
climakitae.explore.standard_year_profile.retrieve_profile_data(**kwargs: any) Tuple[Dataset, Dataset]

Backend function for retrieving data needed for computing climate profiles.

Reads in the full hourly data for the 8760 analysis, including all warming levels.

Parameters:

**kwargs (dict) – Keyword arguments for data selection. Allowed keys: - variable (Optional) : str, default “Air Temperature at 2m” - resolution (Optional) : str, default “3 km” - warming_levels (Optional) : List[float], default [1.2] - cached_area (Optional) : str or List[str] - latitude (Optional) : float or tuple - longitude (Optional) : float or tuple - stations (Optional) : list[str], default None - units (Optional) : str, default “degF” - no_delta (optional) : bool, default False, if True, do not retrieve historical data, return raw future profile

Returns:

Tuple[xr.Dataset, xr.Dataset] – (historic_data, future_data) - Historical data at 1.2°C warming, and future data at specified warming levels.

Raises:

ValueError – If invalid parameter keys are provided.

Example

>>> historic_data, future_data = retrieve_profile_data(
...     variable="Air Temperature at 2m",
...     resolution="45 km",
...     scenario=["SSP 3-7.0"],
...     warming_level=[1.5, 2.0, 3.0],
...     units="degF"
... )
>>> historic_data, future_data = retrieve_profile_data(
...     warming_level=[2.0]
... )

Notes

Historical data is always retrieved for warming level = 1.2°C. Future data uses user-specified warming levels or defaults.

The function prioritizes location parameters in the following order: 1. cached_area 2. latitude/longitude 3. stations Each parameter will override the lower-priority ones if provided. So if cached_area is given, lat/lon and stations are ignored. If lat/lon are given, stations are ignored. If stations are given, they are used only if neither cached_area nor lat/lon are provided.

If no location parameters are provided, a warning is issued about retrieving the entire CA dataset.

climakitae.explore.standard_year_profile.set_profile_metadata(profile_df: DataFrame, metadata: dict) None

Set or update metadata in a climate profile DataFrame.

Parameters:
  • profile_df (DataFrame) – Climate profile DataFrame to update

  • metadata (dict) – Dictionary containing metadata key-value pairs to set

Returns:

None – The function modifies the DataFrame in place

Examples

>>> profile = get_climate_profile(variable="Air Temperature at 2m", warming_level=[2.0])
>>> new_metadata = {
...     "source": "Custom Dataset",
...     "author": "Jane Doe",
...     "notes": "This profile was generated for testing purposes."
... }
>>> set_profile_metadata(profile, new_metadata)
>>> print(profile.attrs)

climakitae.explore.threshold_tools module#

Helper functions for performing analyses related to thresholds

climakitae.explore.threshold_tools.calculate_ess(data: ~xarray.core.dataarray.DataArray, nlags: int = <object object>) DataArray#

Function for calculating the effective sample size (ESS) of the provided data.

Parameters:
  • data (DataArray) – Input array is assumed to be timeseries data with potential autocorrelation.

  • nlags (int, optional) – Number of lags to use in the autocorrelation function, defaults to the length of the timeseries.

Returns:

DataArray – Effective sample size. Returned as a DataArray object so it can be utilized by xr.groupby and xr.resample.

climakitae.explore.threshold_tools.exceedance_plot_subtitle(exceedance_count: DataArray) str#

Function of build exceedance plot subtitle

Helper function for making the subtile for exceedance plots.

Parameters:

exceedance_count (xarray.DataArray)

Returns:

string

Examples

‘Number of hours per year’ ‘Number of 4-hour events per 3-months’ ‘Number of days per year with conditions lasting at least 4-hours’

climakitae.explore.threshold_tools.exceedance_plot_title(exceedance_count: DataArray) str#

Function to build title for exceedance plots

Helper function for making the title for exceedance plots.

Parameters:

exceedance_count (xarray.DataArray)

Returns:

string

Examples

‘Air Temperatue at 2m: events above 35C’ ‘Preciptation (total): events below 10mm’

climakitae.explore.threshold_tools.get_block_maxima(da_series: ~xarray.core.dataarray.DataArray, extremes_type: str = 'max', duration: tuple[int, str] = <object object>, groupby: tuple[int, str] = <object object>, grouped_duration: tuple[int, str] = <object object>, check_ess: bool = True, block_size: int = 1) DataArray#

Function that converts data into block maximums, defaulting to annual maximums (default block size = 1 year).

Takes input array and resamples by taking the maximum value over the specified block size.

Optional arguments duration, groupby, and grouped_duration define the type of event to find the annual maximums of. These correspond to the event types defined in the get_exceedance_count function.

Parameters:
  • da_series (xarray.DataArray) – DataArray from retrieve

  • extremes_type (str) – option for max or min Defaults to max

  • duration (tuple) – length of extreme event, specified as (4, ‘hour’)

  • groupby (tuple) – group over which to look for max occurance, specified as (1, ‘day’)

  • grouped_duration (tuple) – length of event after grouping, specified as (5, ‘day’)

  • check_ess (boolean) – optional flag specifying whether to check the effective sample size (ESS) within the blocks of data, and throw a warning if the average ESS is too small. can be silenced with check_ess=False.

  • block_size (int) – block size in years. default is 1 year.

Returns:

xarray.DataArray

climakitae.explore.threshold_tools.get_exceedance_count(da: ~xarray.core.dataarray.DataArray, threshold_value: float, duration1: tuple[int, str] = <object object>, period: tuple[int, str] = (1, 'year'), threshold_direction='above', duration2: tuple[int, str] = <object object>, groupby: tuple[int, str] = <object object>, smoothing: int = <object object>) DataArray#

Calculate the number of occurances of exceeding the specified threshold within each period.

Returns an xarray.DataArray with the same coordinates as the input data except for the time dimension, which will be collapsed to one value per period (equal to the number of event occurances in each period).

Parameters:
  • da (xarray.DataArray) – array of some climate variable. Can have multiple scenarios, simulations, or x and y coordinates.

  • threshold_value (float) – value against which to test exceedance

  • duration1 (tuple[int, str]) – length of exceedance in order to qualify as an event (before grouping)

  • period (tuple[int, str]) – amount of time across which to sum the number of occurances, default is (1, “year”). Specified as a tuple: (x, time) where x is an integer, and time is one of: [“day”, “month”, “year”]

  • threshold_direction (str) – either “above” or “below”, default is above.

  • duration2 (tuple[int, str]) – length of exceedance in order to qualify as an event (after grouping)

  • groupby (tuple[int, str]) – see examples for explanation. Typical grouping could be (1, “day”)

  • smoothing (int) – option to average the result across multiple periods with a rolling average; value is either UNSET or the number of timesteps to use as the window size

Returns:

xarray.DataArray

climakitae.explore.threshold_tools.get_ks_stat(bms: DataArray, distr: str = 'gev', multiple_points: bool = True) Dataset#

Function to perform kstest on input DataArray

Creates a dataset of ks test d-statistics and p-values from an inputed maximum series.

Parameters:
  • bms (xarray.DataArray) – Block maximum series, can be output from the function get_block_maxima()

  • distr (str) – name of distribution to use

  • multiple_points (boolean) – Whether or not the data contains multiple points (has x, y dimensions)

Returns:

xarray.Dataset

climakitae.explore.threshold_tools.get_return_period(bms: DataArray, return_value: float, distr: str = 'gev', bootstrap_runs: int = 100, conf_int_lower_bound: float = 2.5, conf_int_upper_bound: float = 97.5, multiple_points: bool = True, dropna_time: bool = False) Dataset#

Creates xarray Dataset with return periods and confidence intervals from maximum series.

Parameters:
  • bms (xarray.DataArray) – Block maximum series, can be output from the function get_block_maxima()

  • return_value (float) – The threshold value for which to calculate the return period of occurance

  • distr (str) – The type of extreme value distribution to fit

  • bootstrap_runs (int) – Number of bootstrap samples

  • conf_int_lower_bound (float) – Confidence interval lower bound

  • conf_int_upper_bound (float) – Confidence interval upper bound

  • multiple_points (boolean) – Whether or not the data contains multiple points (has x, y dimensions)

  • dropna_time (boolean) – Whether to drop NaNs along the time axis

Returns:

xarray.Dataset – Dataset with return periods and confidence intervals

climakitae.explore.threshold_tools.get_return_prob(bms: DataArray, threshold: float, distr: str = 'gev', bootstrap_runs: int = 100, conf_int_lower_bound: float = 2.5, conf_int_upper_bound: float = 97.5, multiple_points: bool = True, extremes_type: str = 'max', dropna_time: bool = False) Dataset#

Creates xarray Dataset with return probabilities and confidence intervals from maximum series.

Parameters:
  • bms (xarray.DataArray) – Block maximum series, can be output from the function get_block_maxima()

  • threshold (float) – The threshold value for which to calculate the probability of exceedance

  • distr (str) – The type of extreme value distribution to fit

  • bootstrap_runs (int) – Number of bootstrap samples

  • conf_int_lower_bound (float) – Confidence interval lower bound

  • conf_int_upper_bound (float) – Confidence interval upper bound

  • multiple_points (boolean) – Whether or not the data contains multiple points (has x, y dimensions)

  • dropna_time (boolean) – Whether to drop NaNs along the time axis

Returns:

xarray.Dataset – Dataset with return probabilities and confidence intervals

climakitae.explore.threshold_tools.get_return_value(bms: DataArray, return_period: float = 10, distr: str = 'gev', bootstrap_runs: int = 100, conf_int_lower_bound: float = 2.5, conf_int_upper_bound: float = 97.5, multiple_points: bool = True, extremes_type: str = 'max', dropna_time: bool = False) Dataset#

Creates xarray Dataset with return values and confidence intervals from maximum series.

Parameters:
  • bms (xarray.DataArray) – Block maximum series, can be output from the function get_block_maxima()

  • return_period (float) – The recurrence interval (in years) for which to calculate the return value

  • distr (str) – The type of extreme value distribution to fit

  • bootstrap_runs (int) – Number of bootstrap samples

  • conf_int_lower_bound (float) – Confidence interval lower bound

  • conf_int_upper_bound (float) – Confidence interval upper bound

  • multiple_points (boolean) – Whether or not the data contains multiple points (has x, y dimensions)

  • dropna_time (boolean) – Whether to drop NaNs along the time axis

Returns:

xarray.Dataset – Dataset with return values and confidence intervals

climakitae.explore.thresholds module#

climakitae.explore.thresholds.get_threshold_data(selections: DataParameters) DataArray

This function pulls data from the catalog and reads it into memory

Parameters:

selections (DataParameters) – object holding user’s selections

Returns:

data (DataArray) – data to use for creating postage stamp data

climakitae.explore.timeseries module#

class climakitae.explore.timeseries.TimeSeries(data: DataArray)#

Bases: object

Holds the instance of TimeSeriesParameters that is used for the following purposes: 1) to display a panel that previews various time-series transforms (explore), and 2) to save the transform represented by the current state of that preview into a new variable (output_current).

Parameters:

data (DataArray) – Time series array with no spatial coordinates.

choices#

Param object containing time series data and analysis parameters.

Type:

TimeSeriesParameters

output_current() DataArray#

Output the current attributes of the class to a DataArray object. Allows the data to be easily accessed by the user after modifying the attributes directly in the explore panel, for example.

Returns:

DataArray

class climakitae.explore.timeseries.TimeSeriesParameters(dataset: DataArray, **params)#

Bases: Parameterized

Class of python Param to hold parameters for Time Series.

Parameters:
  • dataset (DataArray) – Timeseries data

  • **params – Additional arguments to initialize Param class.

data#

The time series data provided to the class.

Type:

DataArray

anomaly#

True to transform timeseries into anomalies (default True).

Type:

bool, optional

extremes#

List of extremes quantities to compute (options “Max”, “Min”, “Percentile”).

Type:

list[str], optional

num_timesteps#

Number of timesteps for rolling mean calculations (default 0).

Type:

int, optional

percentile#

Percentile to calculate when using the “Percentile” option in extremes (range 0-1).

Type:

int | float, optional

reference_range#

Reference date range (default 1981-01-01 to 2010-12-31).

Type:

tuple[dt.datetime, dt.datetime]

remove_seasonal_cycle#

True to remove the seasonal cycle from the timeseries (default False).

Type:

bool, optional

resample_window#

Size of resample window (between 1-30, inclusive).

Type:

int, optional

separate_seasons#

True to disaggregate into four seasons (default False).

Type:

bool, optional

smoothing#

Set to “Running Mean” for smoothing (default “None”).

Type:

str, optional

transform_data(self)#

Transform timeseries dataset using user parameters.

anomaly = True#
extremes = []#
name = 'TimeSeriesParameters'#
num_timesteps = 0#
percentile = 0#
reference_range = (datetime.datetime(1981, 1, 1, 0, 0), datetime.datetime(2010, 12, 31, 0, 0))#
remove_seasonal_cycle = False#
resample_period = 'years'#
resample_window = 1#
separate_seasons = False#
smoothing = 'None'#
transform_data() DataArray#

Transform timeseries based on parameters. Returns a dataset that has been transformed in the ways that the params indicate, ready to plot in the preview window (“view” method of this class), or be saved out.

Returns:

DataArray – Transformed result.

update_anom()#
update_seasonal_cycle()#

climakitae.explore.uncertainty module#

class climakitae.explore.uncertainty.CmipOpt(variable: str = 'tas', area_subset: str = 'states', location: str = 'California', timescale: str = 'monthly', area_average: bool = True)#

Bases: object

A class for holding relevant data options for cmip preprocessing

variable#

variable name, cf-compliant (or cmip6 variable name)

Type:

str

area_subset#

geographic boundary name (states/counties)

Type:

str

location#

geographic area name (name of county/state)

Type:

str

timescale#

frequency of data

Type:

str

area_average#

average computed across domain

Type:

bool

_cmip_clip()#

CMIP6-specific subsetting

climakitae.explore.uncertainty.calc_anom(ds_yr: Dataset, base_start: int, base_end: int) Dataset#

Calculates the difference relative to a historical baseline.

First calculates a baseline per simulation using input (base_start, base_end). Then calculates the anomaly from baseline per simulation.

Parameters:
  • ds_yr (Dataset) – must be the output from cmip_annual

  • base_start (int) – start year of baseline to calculate

  • base_end (int) – end year of the baseline to calculate

Returns:

Dataset – Anomaly data calculated with input baseline start and end

climakitae.explore.uncertainty.cmip_mmm(ds: Dataset) Dataset#

Calculate the CMIP6 multi-model mean by collapsing across simulations.

Parameters:

ds (Dataset) – Input data, multiple simulations

Returns:

Dataset – Mean across input data taken on simulation dim

climakitae.explore.uncertainty.get_ensemble_data(variable: str, selections: DataParameters, cmip_names: list[str], warm_level: float = 3.0)#

Returns processed data from multiple CMIP6 models for uncertainty analysis.

Searches the CMIP6 data catalog for data from models that have specific ensemble member id in the historical and ssp370 runs. Preprocessing includes subsetting for specific location and dropping the member_id for easier analysis.

Get’s future data at warming level range. Slices historical period to 1981-2010.

Parameters:
  • variable (str) – Name of variable

  • selections (DataParameters) – Data and location settings

  • cmip_names (list[str]) – Name of CMIP6 simulations

  • warm_level (float, optional) – Global warming level to use, default to 3.0

Returns:

list[xr.Dataset]

climakitae.explore.uncertainty.get_ks_pval_df(sample1: Dataset, sample2: Dataset, sig_lvl: float = 0.05) DataFrame#

Performs a Kolmogorov-Smirnov test at all lat, lon points

Parameters:
  • sample1 (Dataset) – first sample for comparison

  • sample2 (Dataset) – sample against which to compare sample1

  • sig_lvl (float) – alpha level for statistical significance

Returns:

DataFrame – columns are lat, lon, and p_value; only retains spatial points where p_value < sig_lvl

climakitae.explore.uncertainty.get_warm_level(warm_level: float | int, ds: Dataset, multi_ens: bool = False, ipcc: bool = True) Dataset | None#

Subsets projected data centered to the year that the selected warming level is reached for a particular simulation/member_id

Parameters:
  • warm_level (float | int) – options: 1.5, 2.0, 3.0, 4.0

  • ds (Dataset) – Can only have one ‘simulation’ coordinate

  • multi_ens (bool, default False) – Set to True if passing a simulation with multiple member_id

  • ipcc (bool, default True) – Set to False if not performing warming level analysis with respect to IPCC standard baseline (1850-1900)

Returns:

Dataset – Subset of projected data -14/+15 years from warming level threshold

climakitae.explore.uncertainty.grab_multimodel_data(copt: CmipOpt, alpha_sort: bool = False) Dataset#

Returns processed data from multiple CMIP6 models for uncertainty analysis.

Searches the CMIP6 data catalog for data from models that have specific ensemble member id in the historical and ssp370 runs. Preprocessing includes subsetting for specific location and dropping the member_id for easier analysis.

Parameters:
  • copt (CmipOpt) – Selections: variable, area_subset, location, area_average, timescale

  • alpha_sort (bool, default False) – Set to True if sorting model names alphabetically is desired

Returns:

Dataset – Processed CMIP6 models concatenated into a single ds

climakitae.explore.uncertainty.weighted_temporal_mean(ds: DataArray) DataArray#

Weight by days in each month

Function for calculating annual averages pulled + adapted from NCAR Link: https://ncar.github.io/esds/posts/2021/yearly-averages-xarray/

Parameters:

ds (xarray.DataArray)

Returns:

xarray.Dataset

climakitae.explore.vulnerability module#

Tools for CAVA vulnerability assessment pilot

class climakitae.explore.vulnerability.CavaParams(*, approach, batch_mode, distr, downscaling_method, event_duration, export_method, file_format, file_name, heat_idx_threshold, historical_data, input_locations, metric_calc, one_in_x, percentile, season, separate_files, ssp_data, time_end_year, time_start_year, units, variable, warming_level, wrf_bias_adjust, name)#

Bases: Parameterized

Climate Analysis and Vulnerability Assessment Parameters Class.

This class defines and validates parameters for climate vulnerability analysis, supporting various climate variables, metrics, and analysis approaches.

input_locations#

Input coordinates that must have ‘lat’ and ‘lon’ columns with numeric data.

Type:

DataFrame

time_start_year#

Start year for the analysis period. Must be between 1900 and 2100.

Type:

int, default 1981

time_end_year#

End year for the analysis period. Must be between 1900 and 2100.

Type:

int, default 2010

units#

Units for temperature measurement.

Type:

str, default "Celsius"

variable#

Climate variable to analyze. Options include “Air Temperature at 2m”, “Precipitation (total)”, “NOAA Heat Index”, “Effective Temperature”.

Type:

str, default "Air Temperature at 2m"

metric_calc#

Statistical metric calculation method. Options: “min”, “max”, “mean”, “median”.

Type:

str, default "max"

percentile#

Percentile value for calculation. Must be between 0 and 100 if specified.

Type:

float or None, default None

heat_idx_threshold#

Threshold value for heat index calculations.

Type:

float or None, default None

one_in_x#

Return period(s) for extreme event analysis (e.g., 1-in-100 year event). Can be a single value or list of values.

Type:

int, float, list or None, default None

season#

Season to analyze. Options: “summer”, “winter”, “all”.

Type:

str, default "all"

downscaling_method#

Climate model downscaling method. Options: “Dynamical”, “Statistical”.

Type:

str, default "Dynamical"

approach#

Analysis approach. Options: “Time” (time period), “Warming Level” (temperature target).

Type:

str, default "Time"

warming_level#

Global warming level in degrees Celsius. Must be between 0 and 7.

Type:

float, default 1.0

wrf_bias_adjust#

Whether to apply bias adjustment to WRF model data.

Type:

bool, default True

historical_data#

Type of historical data. Options: “Historical Climate”, “Historical Reconstruction”.

Type:

str, default "Historical Climate"

ssp_data#

Shared Socioeconomic Pathway scenarios to use. Options: “SSP 2-4.5”, “SSP 3-7.0”, “SSP 5-8.5”.

Type:

list, default [``”SSP 3-7.0”``]

export_method#

Data export method. Options: “raw”, “calculate”, “both”, “None”.

Type:

str, default "both"

separate_files#

Whether to save climate variables in separate files.

Type:

bool, default False

file_format#

Output file format.

Type:

str, default "NetCDF"

batch_mode#

Whether to run in batch processing mode.

Type:

bool, default False

distr#

Statistical distribution for extreme value analysis. Options: “gev” (Generalized Extreme Value), “genpareto” (Generalized Pareto), “gamma”.

Type:

str, default "gev"

event_duration#

Duration and unit for event analysis. Format: (duration, unit). Unit options: “hour”, “day”.

Type:

tuple, default (1, "day")

file_name#

Base name for output files, no extension (e.g., “output”, not “output.nc”).

Type:

str

validate_params()#

Validate all parameters for consistency and compatibility.

get_names()#

Generate standardized names and metadata for data processing. If parameter validation fails. Common validation errors include: - Missing or non-numeric lat/lon columns in input_locations - Invalid time range (start year > end year) - Incompatible parameter combinations (e.g., multiple threshold parameters) - Unsupported variable-downscaling method combinations - Invalid approach-data type combinations

Notes

The class enforces several validation rules: - Only one threshold parameter (heat_idx_threshold, percentile, or one_in_x) can be specified - Historical Reconstruction data requires time-based approach and end year <= 2022 - NOAA Heat Index and Effective Temperature cannot use Statistical downscaling - Dynamical downscaling with time-based approach only supports SSP 3-7.0 - Event duration currently supports only “hour” or “day” units

Examples

>>> import pandas as pd
>>> locations = pd.DataFrame({'lat': [34.05, 36.17], 'lon': [-118.25, -115.14]})
>>> params = CavaParams(
...     input_locations=locations,
...     time_start_year=2020,
...     time_end_year=2050,
...     variable="Air Temperature at 2m",
...     percentile=95,
...     metric_calc="max"
... )
approach = 'Time'#
batch_mode = False#
distr = 'gev'#
downscaling_method = 'Dynamical'#
event_duration = (1, 'day')#
export_method = 'both'#
file_format = 'NetCDF'#
file_name = None#
get_names()#

Generate names and metadata for climate data processing.

Returns:

dict – A dictionary containing the following keys: ssp_selected : str or list

The SSP (Shared Socioeconomic Pathway) data selected for analysis.

variablestr

The climate variable being analyzed, potentially adjusted based on downscaling method and metric calculation.

variable_typestr

Type of variable, either “Variable” for raw climate variables or “Derived Index” for calculated indices.

var_namestr

Human-readable description of the calculation being performed, including variable, metric, time period/warming level, and season.

raw_namestr

Standardized name for raw data storage.

calc_namestr

Standardized name for calculated data storage.

Notes

The function handles three main calculation types based on instance attributes: - Percentile calculations (when self.percentile is not None) - Heat index threshold calculations (when self.heat_idx_threshold is not None) - Return period calculations (when self.one_in_x is not None) For LOCA2 statistical downscaling with air temperature, the variable name is adjusted based on whether maximum or minimum metric calculation is selected. The approach can be either “Time” (using start/end years) or “Warming Level” (using a specific warming level in degrees Celsius).

Raises:
  • ValueError – If metric_calc is not “max” or “min” for LOCA2 air temperature data.

  • ValueError – If approach is not “Time” or “Warming Level”.

  • ValueError – If an unsupported variable is used for 1-in-X calculations.

heat_idx_threshold = None#
historical_data = 'Historical Climate'#
input_locations = None#
metric_calc = 'max'#
name = 'CavaParams'#
one_in_x = None#
percentile = None#
season = 'all'#
separate_files = False#
ssp_data = ['SSP 3-7.0']#
time_end_year = 2010#
time_start_year = 1981#
units = 'Celsius'#
validate_params()#

Validate the parameters for vulnerability analysis.

Parameters:

None

Returns:

None

Raises:

ValueError – If any validation check fails. The error message lists all validation failures found during the validation process.

Notes

The method validates the following conditions: - Input locations DataFrame contains required ‘lat’ and ‘lon’ columns with

numeric data types (float64 or int64)

  • Time range validity (start year must be <= end year)

  • Historical reconstruction data constraints:
    • End year must be <= 2022

    • Only time-based approach is supported

  • Mutual exclusivity of threshold parameters (only one of heat_idx_threshold, percentile, or one_in_x can be specified)

  • At least one threshold parameter must be specified

  • Metric calculation (‘min’ or ‘max’) compatibility with percentile calculations

  • Variable and downscaling method compatibility (NOAA Heat Index and Effective Temperature cannot use Statistical downscaling)

  • SSP data constraints for WRF/Dynamical downscaling (only SSP 3-7.0 allowed for time-based approach)

Side Effects#

Converts self.one_in_x to a list if it’s not None and not already a list.

variable = 'Air Temperature at 2m'#
warming_level = 1.0#
wrf_bias_adjust = True#
climakitae.explore.vulnerability.cava_data(input_locations, variable, units=None, approach='Time', downscaling_method='Dynamical', time_start_year=1981, time_end_year=2010, historical_data='Historical Climate', ssp_data=<object object>, warming_level=1.5, metric_calc='max', heat_idx_threshold=None, one_in_x=None, event_duration=(1, 'day'), percentile=None, season='all', wrf_bias_adjust=True, export_method='both', separate_files=True, file_format='NetCDF', batch_mode=False, distr='gev', file_name=None)#

Retrieve, process, and export climate data based on inputs.

Designed for CAVA reports.

Parameters:
  • input_locations (pandas.DataFrame) – Input locations containing ‘lat’ and ‘lon’ columns.

  • variable (str) – Type of climate variable to retrieve and calculate.

  • units (str, optional) – Units for the retrieved data.

  • approach (str) – Approach to follow, default is “Time”.

  • downscaling_method (str) – Method of downscaling, default is “Dynamical”.

  • time_start_year (int, optional) – Starting year for data selection.

  • time_end_year (int, optional) – Ending year for data selection.

  • historical_data (str, optional) – Type of historical data, default is “Historical Climate”.

  • ssp_data (str, optional) – Shared Socioeconomic Pathway data, default is “SSP 3-7.0”.

  • warming_level (str, optional) – Global warming levels, default is 1.5°C.

  • metric_calc (str, optional) – Metric calculation type (e.g., ‘mean’, ‘max’, ‘min’) for supported metrics. Default is “max”

  • heat_idx_threshold (float) – Heat index threshold for counting events.

  • one_in_x (int, optional) – Return period for 1-in-X events.

  • event_duration (tuple[int, str], optional) – Duration of event and time unit (e.g. (1, “day”))

  • percentile (int, optional) – Percentile for calculating “likely” event occurrence.

  • season (str, optional) – Season to subset time dimension on (e.g., ‘summer’, ‘winter’, ‘all’). Default is ‘all’.

  • wrf_bias_adjust (str, optional) – Flag to subset the WRF data for the bias-adjusted models. Default is True.

  • export_method (str, optional) – Export method, options are ‘raw’, ‘calculate’, ‘both’, default is ‘both’.

  • separate_files (bool, optional) – Whether to separate climate variable information into separate files, default is True.

  • file_format (str, optional) – Export file format options.

  • batch_mode (bool) – Whether to process data with batch mode or through iterating through the points.

  • distr (str) – Name of distribution to use

  • file_name (str, optional) – Base name for output files, no extension (e.g., “output”, not “output.nc”).

Returns:

xarray.DataArray – Computed climate metrics for input locations.

Raises:

ValueError – If input coordinates lack ‘lat’ and ‘lon’ columns or if ‘lat’/’lon’ columns are not of type float64 or int64.

climakitae.explore.vulnerability_table module#

climakitae.explore.vulnerability_table.create_vul_table(example_loc, percentile, heat_idx_threshold, one_in_x)#

Creates a vulnerability assessment table and exports the table to CSV.

climakitae.explore.warming module#

Helper functions for performing analyses related to global warming levels, along with backend code for building the warming levels GUI

class climakitae.explore.warming.WarmingLevelChoose(*args, **params)#

Bases: DataParameters

Class for selecting data at specific warming levels in climate datasets. This class extends DataParameters to provide functionality for choosing and analyzing data around specific global warming levels (GWLs). It allows users to specify a time window around the warming level and whether to return anomalies relative to a historical reference period.

window#

Size of the time window (in years) around the global warming level. The default is 15 years (i.e., a 30-year window centered on the GWL).

Type:

param.Integer

anom#

Whether to return data as anomalies (difference from historical reference period). Options are “Yes” or “No”.

Type:

param.Selector

warming_levels#

Available warming levels for selection.

Type:

list

months#

Available months (1-12) for selection.

Type:

numpy.ndarray

load_data#

Whether to load data as it’s being computed. Setting to False allows for batch processing or working with smaller chunks of data.

Type:

bool

_anom_allowed()#

Controls whether the anomaly option is required based on the downscaling method.

anom = 'Yes'#
name = 'WarmingLevelChoose'#
window = 15#
class climakitae.explore.warming.WarmingLevels#

Bases: object

A container for all of the warming levels-related functionality: - A pared-down Select panel, under “choose_data” - a “calculate” step where most of the waiting occurs - an optional “visualize” panel, as an instance of WarmingLevelVisualize - postage stamps from visualize “main” tab are accessible via “gwl_snapshots” - data sliced around gwl window retrieved from “sliced_data”

calculate()#

Calculate the warming levels for the selected parameters.

This function retrieves the data from the catalog, slices it according to the warming levels, and stores the results in the sliced_data and gwl_snapshots attributes.

catalog_data = <xarray.DataArray ()> Size: 8B array(nan)#
find_warming_slice(level: str, gwl_times: DataFrame) DataArray#

Find the warming slice data for the current level from the catalog data.

Parameters:
  • level (str) – The warming level to find the slice for.

  • gwl_times (DataFrame) – The DataFrame containing the warming level times.

Returns:

DataArray – The warming slice data for the specified level.

gwl_snapshots = <xarray.DataArray ()> Size: 8B array(nan)#
sliced_data = <xarray.DataArray ()> Size: 8B array(nan)#
climakitae.explore.warming.clean_list(data: Dataset, gwl_times: DataFrame) Dataset#

Filters an xarray dataset to retain only simulations with valid warming level data.

This function removes simulations from the dataset that do not have corresponding entries in the provided lookup table (gwl_times). It ensures that only valid simulations are included for further analysis.

Parameters:
  • data (Dataset) – An xarray dataset containing a dimension all_sims, which represents simulation metadata.

  • gwl_times (DataFrame) – A pandas DataFrame acting as a lookup table. Its index should contain valid simulation metadata (e.g., simulation string, ensemble, and scenario).

Returns:

Dataset – A filtered xarray dataset containing only simulations with valid warming level data.

Raises:
  • AttributeError – If data does not have a dimension named all_sims.

  • KeyError – If process_item fails to find a simulation in the gwl_times index.

climakitae.explore.warming.clean_warm_data(warm_data: DataArray) DataArray#

Cleans warming level data by removing invalid simulations and timestamps.

This function performs the following cleaning steps: 1. Removes simulations where the warming level is not crossed (i.e., centered_year is null). 2. (Optional) Removes timestamps at the end to account for leap years. 3. (Optional) Removes simulations that exceed the year 2100.

Parameters:

warm_data (DataArray) – An xarray DataArray containing warming level data. It is expected to have a centered_year attribute and dimensions like all_sims and time.

Returns:

DataArray – The cleaned xarray DataArray with invalid simulations and timestamps removed.

Raises:

AttributeError – If warm_data does not have the required attributes or dimensions.

climakitae.explore.warming.get_sliced_data(y: DataArray, level: str, years: DataFrame, months: Iterable = array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]), window: int = 15, anom: str = 'No') DataArray#

Calculating warming level anomalies.

Parameters:
  • y (DataArray) – Data to compute warming level anomolies, one simulation at a time via groupby

  • level (str) – Warming level amount

  • years (DataFrame) – Lookup table for the date a given simulation reaches each warming level.

  • months (np.ndarray) – Months to include in a warming level slice.

  • window (int, optional) – Number of years to generate time window for. Default to 15 years. For example, a 15 year window would generate a window of 15 years in the past from the central warming level date, and 15 years into the future. I.e. if a warming level is reached in 2030, the window would be (2015,2045).

  • anom (str) – Find the anomaly

Returns:

DataArray

climakitae.explore.warming.process_item(y: DataArray) tuple[str, str, str]#

Extracts and processes simulation metadata from an xarray DataArray.

This function retrieves identifiers for a simulation, including the simulation string, ensemble, and scenario, and returns them as a tuple. The scenario string is processed using the scenario_to_experiment_id function to standardize its format.

Parameters:

y (DataArray) – An xarray DataArray containing metadata about a simulation. It is expected to have simulation and scenario attributes that can be accessed using .item().

Returns:

tuple[str, str, str] – A tuple containing: - sim_str (str): The second part of the simulation string. - ensemble (str): The third part of the simulation string. - scenario (str): The processed scenario identifier.

Raises:
  • AttributeError – If y does not have simulation or scenario attributes.

  • ValueError – If the simulation string cannot be split into three parts.

Examples

>>> y = xr.DataArray(attrs={
...     "simulation": "Dynamical_sim1_ensemble1",
...     "scenario": "Historical + ssp585"
... })
>>> process_item(y)
('sim1', 'ensemble1', 'ssp585')
climakitae.explore.warming.relabel_axis(all_sims_dim: Iterable) List[str]#

Converts an iterable of tuples into a list of strings by concatenating the first two elements of each tuple with an underscore (_).

This function is designed to simplify dimension names, particularly for compatibility with plotting libraries like hvplot, which may not handle tuple-based dimension names well.

Parameters:

all_sims_dim (Iterable) – An iterable containing elements that can be converted into tuples. Each element is expected to have a .values.item() method to extract the tuple.

Returns:

List[str] – A list of strings where each string is formed by concatenating the first two elements of the tuples in all_sims_dim with an underscore (_).

Raises:
  • AttributeError – If an element in all_sims_dim does not have a .values.item() method.

  • IndexError – If a tuple in all_sims_dim does not have at least two elements.

Examples

>>> import xarray as xr
>>> # The input `all_sims_dim` is typically an xarray.DataArray
>>> # representing a coordinate, often created by stacking dimensions.
>>> # For example, if `ds` is a Dataset with 'simulation' and 'scenario'
>>> # coordinates, then `ds.stack(all_sims=('simulation', 'scenario'))['all_sims']`
>>> # would be such an input.
>>>
>>> # Create an example of such an xarray.DataArray:
>>> simulation_scenario_pairs = [
...     ('ModelA_run1', 'SSP1-2.6'),
...     ('ModelB_run2', 'SSP5-8.5')
... ]
>>> # This DataArray holds the coordinate values (the tuples).
>>> all_sims_coordinate = xr.DataArray(
...     data=simulation_scenario_pairs,
...     dims=['all_sims'],
...     name='all_sims_stacked_coordinate'
... )
>>> # The function iterates over `all_sims_coordinate`. Each element `one`
>>> # (as in the function's loop) is a 0-D xarray.DataArray containing one tuple.
>>> # For the first pair, `one.values.item()` would yield ('ModelA_run1', 'SSP1-2.6').
>>> relabel_axis(all_sims_coordinate)
['ModelA_run1_SSP1-2.6', 'ModelB_run2_SSP5-8.5']

Module contents#

climakitae.explore.warming_levels()#

Top level alias for the WarmingLevels class. Typical way to call class.