climakitae.core package#

Submodules#

climakitae.core.boundaries module#

class climakitae.core.boundaries.Boundaries(boundary_catalog)#

Bases: object

Get geospatial polygon data from the S3 stored parquet catalog. Used to access boundaries for subsetting data by state, county, etc.

_cat#

Parquet boundary catalog instance

Type:

intake.catalog.Catalog

_us_states#

Table of US state names and geometries

Type:

DataFrame

_ca_counties#

Table of California county names and geometries Sorted by county name alphabetical order

Type:

DataFrame

_ca_watersheds#

Table of California watershed names and geometries Sorted by watershed name alphabetical order

Type:

DataFrame

_ca_utilities#

Table of California IOUs and POUs, names and geometries

Type:

DataFrame

_ca_forecast_zones#

Table of California Demand Forecast Zones

Type:

DataFrame

_ca_electric_balancing_areas#

Table of Electric Balancing Areas

Type:

DataFrame

_get_us_states(self)#

Returns a dict of state abbreviations and indices

_get_ca_counties(self)#

Returns a dict of California counties and their indices

_get_ca_watersheds(self)#

Returns a dict for CA watersheds and their indices

_get_forecast_zones(self)#

Returns a dict for CA electricity demand forecast zones

_get_ious_pous(self)#

Returns a dict for CA electric load serving entities IOUs & POUs

_get_electric_balancing_areas(self)#

Returns a dict for CA Electric Balancing Authority Areas

boundary_dict()#

Return a dict of the other boundary dicts, used to populate ck.Select.

This returns a dictionary of lookup dictionaries for each set of geoparquet files that the user might be choosing from. It is used to populate the DataParameters cached_area dynamically as the category in the area_subset parameter changes.

Returns:

dict

load()#

Read parquet files and sets class attributes.

climakitae.core.data_export module#

climakitae.core.data_export.export(data: DataArray | Dataset, filename: str = 'dataexport', format: str = 'NetCDF', mode: str = 'local')#

Save xarray data as NetCDF, Zarr, or CSV in the current working directory, or if Zarr optionally stream the export file to an AWS S3 scratch bucket and give download URL. NetCDF can only be written to the HUB user partition if it will fit. Zarr can either be written to the HUB user partition or to S3 scratch bucket using the mode option.

Parameters:
  • data (xr.DataArray | xr.Dataset) – Data to export, as output by e.g. DataParameters.retrieve().

  • filename (str, optional) – Output file name (without file extension, i.e. “my_filename” instead of “my_filename.nc”). The default is “dataexport”.

  • format (str, optional) – File format (“Zarr”, “NetCDF”, “CSV”). The default is “NetCDF”.

  • mode (str, optional) – Save location logic for Zarr file (“local”, “s3”). The default is “local”

Returns:

None

climakitae.core.data_export.remove_zarr(filename: str)#

Remove Zarr directory structure helper function. As Zarr format is a directory tree it is not easily removed using JupyterHUB GUI. This function simply deletes an entire directory tree.

Parameters:

filename (str) – Output Zarr file name (without file extension, i.e. “my_filename” instead of “my_filename.zarr”).

climakitae.core.data_export.write_tmy_file(filename_to_export: str, df: DataFrame, years: Tuple[int, int], location_name: str, station_code: int, stn_lat: float, stn_lon: float, stn_state: str, stn_elev: float = 0.0, file_ext: str = 'tmy')#

Exports TMY data either as .epw or .tmy file

Parameters:
  • filename_to_export (str) – Filename string, constructed with station name and simulation

  • df (DataFrame) – Dataframe of TMY data to export

  • years (Tuple[int, int]) – Tuple containing climatology start and end years

  • location_name (str) – Location name string, often station name

  • station_code (int) – Station code

  • stn_lat (float) – Station latitude

  • stn_lon (float) – Station longitude

  • stn_state (str) – State of station location

  • stn_elev (float, optional) – Elevation of station, default is 0.0

  • file_ext (str, optional) – File extension for export, default is .tmy, options are “tmy” and “epw”

Returns:

None

climakitae.core.data_interface module#

This module provides the core data interface to access climate data. It contains several key components:

1. VariableDescriptions: A singleton class to load and provide access to available climate variables. 2. DataInterface: A singleton class that manages connections to the data catalog, boundary data, and stations. 3. DataParameters: A parameterized class that handles data selection, filtering, and retrieval.

The module also includes several utility functions to: - Get available data options and subsetting options - Handle spatial subsetting by different boundaries (states, counties, watersheds, etc.) - Retrieve data with simplified parameter specification - Validate user inputs and provide helpful error messages - Convert between different naming conventions in the catalog

This interface serves as the foundation for both programmatic access to climate data and the interactive GUI selection interface.

class climakitae.core.data_interface.DataParameters(*, _data_warning, _station_data_info, all_touched, approach, area_average, area_subset, cached_area, data_type, downscaling_method, enable_hidden_vars, extended_description, latitude, longitude, resolution, scenario_historical, scenario_ssp, simulation, stations, time_slice, timescale, units, variable, variable_id, variable_type, warming_level, warming_level_months, warming_level_window, name)#

Bases: Parameterized

Python param object to hold data parameters for use in panel GUI.

Call DataParameters when you want to select and retrieve data from the climakitae data catalog without using the ckg.Select GUI. ckg.Select uses this class to store selections and retrieve data.

DataParameters calls DataInterface, a singleton class that makes the connection to the intake-esm data store in S3 bucket.

unit_options_dictdict

options dictionary for converting unit to other units

area_subsetstr

dataset to use from Boundaries for sub area selection

cached_arealist of strs

one or more features from area_subset datasets to use for selection

latitudetuple

latitude range of selection box

longitudetuple

longitude range of selection box

variable_typestr

toggle raw or derived variable selection

default_variablestr

initial variable to have selected in widget

time_slicetuple

year range to select

resolutionstr

resolution of data to select (“3 km”, “9 km”, “45 km”)

timescalestr

frequency of dataset (“hourly”, “daily”, “monthly”)

scenario_historicallist of strs

historical scenario selections

area_averagestr

whether to comput area average (“Yes”, “No”)

downscaling_methodstr

whether to choose WRF or LOCA2 data or both (“Dynamical”, “Statistical”, “Dynamical+Statistical”)

data_typestr

whether to choose gridded or station based data (“Gridded”, “Stations”)

stationslist or strs

list of stations that can be filtered by cached_area

_station_data_infostr

informational statement when station data selected with data_type

scenario_ssplist of strs

list of future climate scenarios selected (availability depends on other params)

simulationlist of strs

list of simulations (models) selected (availability depends on other params)

variablestr

variable long display name

unitsstr

unit abbreviation currently of the data (native or converted)

enable_hidden_varsboolean

enable selection of variables that are hidden from the GUI?

extended_descriptionstr

extended description of the data variable

variable_idlist of strs

list of variable ids that match the variable (WRF and LOCA2 can have different codes for same type of variable)

historical_climate_range_wrftuple

time range of historical WRF data

historical_climate_range_locatuple

time range of historical LOCA2 data

historical_climate_range_wrf_and_locatuple

time range of historical WRF and LOCA2 data combined

historical_reconstruction_rangetuple

time range of historical reanalysis data

ssp_rangetuple

time range of future scenario SSP data

_info_about_station_datastr

warning message about station data

_data_warningstr

warning about selecting unavailable data combination

data_interfaceDataInterface

data connection singleton class that provides data

_data_catalogintake_esm.source.ESMDataSource

shorthand alias to DataInterface.data_catalog

_variable_descriptionspd.DataFrame

shorthand alias to DataInterface.variable_descriptions

_stations_gdfgpd.GeoDataFrame

shorthand alias to DataInterface.stations_gdf

_geographiesBoundaries

shorthand alias to DataInterface.geographies

_geography_choosedict

shorthand alias to Boundaries.boundary_dict()

_warming_level_timespd.DataFrame

shorthand alias to DataInterface.warming_level_times

colormapstr

default colormap to render the currently selected data

scenario_optionslist of strs

list of available scenarios (historical and ssp) for selection

variable_options_dfpd.DataFrame

filtered variable descriptions for the downscaling_method and timescale

warming_levelarray

global warming level(s)

warming_level_windowinteger

years around Global Warming Level (+/-)

(e.g. 15 means a 30yr window)
approachstr, “Warming Level” or “Time”

how do you want the data to be retrieved?

warming_level_monthsarray

months of year to use for computing warming levels default to entire calendar year: 1,2,3,4,5,6,7,8,9,10,11,12

all_touchedboolean

spatial subset option for within or touching selection

retrieve(config: str = None, merge: bool = True) DataArray | Dataset | List[DataArray]#

Retrieve data from catalog

By default, DataParameters determines the data retrieved. Grabs the data from the AWS S3 bucket, returns lazily loaded dask array. User-facing function that provides a wrapper for read_catalog_from_select.

Returns:

data_return (xr.DataArray | xr.Dataset | List[xr.DataArray]) – DataArray or Dataset object

class climakitae.core.data_interface.DataInterface#

Bases: object

Load data connections into memory once

This is a singleton class called by the various Param classes to connect to the local data and to the intake data catalog and parquet boundary catalog. The class attributes are read only so that the data does not get changed accidentially.

variable_descriptions#

variable descriptions pandas data frame

Type:

DataFrame

stations#

station locations pandas data frame

Type:

gpd.DataFrame

stations_gdf#

station locations geopandas data frame

Type:

gpd.GeoDataFrame

data_catalog#

intake ESM data catalog

Type:

intake_esm.source.ESMDataSource

boundary_catalog#

parquet boundary catalog

Type:

intake.catalog.Catalog

geographies#

boundary dictionaries class

Type:

Boundaries

warming_level_times#

table of when each simulation/scenario reaches each warming level

Type:

DataFrame

property boundary_catalog#

Get the boundary catalog

property data_catalog#

Get the data catalog

property geographies#

Get the geographies object

property stations#

Get the stations dataframe

property stations_gdf#

Get the stations geopandas dataframe

property variable_descriptions#

Get the variable descriptions dataframe

property warming_level_times#

Get the warming level times dataframe

class climakitae.core.data_interface.VariableDescriptions#

Bases: object

Load Variable Desciptions CSV only once

This is a singleton class that needs to be called separately from DataInterface because variable descriptions are used without DataInterface in ck.view. Also ck.view is loaded on package load so this avoids loading boundary data when not needed.

variable_descriptions#

pandas dataframe that stores available data variables usable with the package

Type:

DataFrame

load()#

Read the variable descriptions csv into class variable.

climakitae.core.data_interface.get_data(variable: str, resolution: str, timescale: str, downscaling_method: str = 'Dynamical', data_type: str = 'Gridded', approach: str = 'Time', scenario: str | list[str] = None, units: str = None, warming_level: list[float] = None, area_subset: str = 'none', latitude: tuple[float, float] = None, longitude: tuple[float, float] = None, cached_area: list[str] = None, area_average: str = None, time_slice: tuple = None, stations: list[str] = None, warming_level_window: int = None, warming_level_months: list[int] = None, all_touched=False, enable_hidden_vars: bool = False, **kwargs) DataArray#
Retrieve formatted data from the Analytics Engine data catalog using a simple function.

Contrasts with DataParameters().retrieve(), which retrieves data from the user inputs in climakitaegui’s selections GUI.

variablestr

String name of climate variable

resolutionstr, one of [“3 km”, “9 km”, “45 km”]

Resolution of data in kilometers

timescalestr, one of [“hourly”, “daily”, “monthly”]

Temporal frequency of dataset

downscaling_methodstr, one of [“Dynamical”, “Statistical”, “Dynamical+Statistical”], optional

Downscaling method of the data: WRF (“Dynamical”), LOCA2 (“Statistical”), or both “Dynamical+Statistical” Default to “Dynamical”

data_typestr, one of [“Gridded”, “Stations”], optional

Whether to choose gridded data or weather station data Default to “Gridded”

approachone of [“Time”, “Warming Level”], optional

Default to “Time”

scenariostr or list of str, optional

SSP scenario [“SSP 3-7.0”, “SSP 2-4.5”,”SSP 5-8.5”] and/or historical data selection [“Historical Climate”, “Historical Reconstruction”] If approach = “Time”, you need to set a valid option If approach = “Warming Level”, scenario is ignored

unitsstr, optional

Variable units. Defaults to native units of data

area_subsetstr, optional

Area category: i.e “CA counties” Defaults to entire domain (“none”)

cached_arealist, optional

Area: i.e “Alameda county” Defaults to entire domain ([“entire domain”])

area_averageone of [“Yes”,”No”], optional

Take an average over spatial domain? Default to “No”.

latitudeNone or tuple of float, optional

Tuple of valid latitude bounds Default to entire domain

longitudeNone or tuple of float, optional

Tuple of valid longitude bounds Default to entire domain

time_slicetuple, optional

Time range for retrieved data Only valid for approach = “Time”

stationslist of str, optional

Which weather stations to retrieve data for Only valid for data_type = “Stations” Default to all stations

warming_levellist of float, optional

Must be one of the warming levels available in clmakitae.core.constants Only valid for approach = “Warming Level” and data_type = “Stations”

warming_level_windowint in range (5,25), optional

Years around Global Warming Level (+/-)

(e.g. 15 means a 30yr window)

Only valid for approach = “Warming Level” and data_type = “Stations”

warming_level_monthslist of int, optional

Months of year for which to perform warming level computation Default to all months in a year: [1,2,3,4,5,6,7,8,9,10,11,12] For example, you may want to set warming_level_months=[12,1,2] to perform the analysis for the winter season. Only valid for approach = “Warming Level” and data_type = “Stations”

all_touchedboolean

spatial subset option for within or touching selection

enable_hidden_varsboolean, optional

Return all variables, including the ones in which “show” is set to False? Default to False

kwargsdict

Additional keyword arguments to pass to DataParameters()

data : xr.DataArray

Errors aren’t raised by the function. Rather, an appropriate informative message is printed, and the function returns None. This is due to the fact that the AE Jupyter Hub raises a strange Pieces Mismatch Error for some bad inputs; instead, that error is ignored and a more informative error message is printed instead.

climakitae.core.data_interface.get_data_options(variable: str = None, downscaling_method: str = None, resolution: str = None, timescale: str = None, scenario: str | list[str] = None, tidy: bool = True, enable_hidden_vars: bool = False) DataFrame#

Get data options, in the same format as the Select GUI, given a set of possible inputs. Allows the user to access the data using the same language as the GUI, bypassing the sometimes unintuitive naming in the catalog. If no function inputs are provided, the function returns the entire AE catalog that is available via the Select GUI

Parameters:
  • variable (str, optional) – Default to None

  • downscaling_method (str, optional) – Default to None

  • resolution (str, optional) – Default to None

  • timescale (str, optional) – Default to None

  • scenario (str or list, optional) – Default to None

  • tidy (boolean, optional) – Format the pandas dataframe? This creates a DataFrame with a MultiIndex that makes it easier to parse the options. Default to True

  • enable_hidden_vars (boolean, optional) – Return all variables, including the ones in which “show” is set to False? Default to False

Returns:

cat_subset (DataFrame) – Catalog options for user-provided inputs

climakitae.core.data_interface.get_subsetting_options(area_subset: str = 'all') DataFrame#

Get all geometry options for spatial subsetting. Options match those in selections GUI

Parameters:

area_subset (str) – One of “all”, “states”, “CA counties”, “CA Electricity Demand Forecast Zones”, “CA watersheds”, “CA Electric Balancing Authority Areas”, “CA Electric Load Serving Entities (IOU & POU)”, “Stations” Defaults to “all”, which shows all the geometry options with area_subset as a multiindex

Returns:

geom_df (DataFrame) – Geometry options Shows only options for one area_subset if input is provided that is not “all” i.e. if area_subset = “states”, only the options for states will be returned

climakitae.core.data_load module#

climakitae.core.data_load.area_subset_geometry(selections: DataParameters) list[Polygon] | None#

Get geometry to perform area subsetting with.

Parameters:

selections (DataParameters) – object holding user’s selections

Returns:

ds_region (shapely.geometry) – geometry to use for subsetting

climakitae.core.data_load.load(xr_da: DataArray, progress_bar: bool = False) DataArray#

Read lazily loaded dask array into memory for faster access

Parameters:
Returns:

da_computed (DataArray)

climakitae.core.data_load.read_catalog_from_select(selections: DataParameters) DataArray#

The primary and first data loading method, called by DataParameters.retrieve, it returns a DataArray (which can be quite large) containing everything requested by the user (which is stored in ‘selections’).

Parameters:

selections (DataParameters) – object holding user’s selections

Returns:

da (DataArray) – output data

climakitae.core.paths module#

This module defines package level paths

Module contents#