climakitae.core package

climakitae.core package#

Submodules#

climakitae.core.boundaries module#

class climakitae.core.boundaries.Boundaries(boundary_catalog)#

Bases: object

Get geospatial polygon data from the S3 stored parquet catalog. Used to access boundaries for subsetting data by state, county, etc.

_cat#

Parquet boundary catalog instance

Type:: intake.catalog.Catalog

_us_states#

Table of US state names and geometries

Type:: DataFrame

_ca_counties#

Table of California county names and geometries Sorted by county name alphabetical order

Type:: DataFrame

_ca_watersheds#

Table of California watershed names and geometries Sorted by watershed name alphabetical order

Type:: DataFrame

_ca_utilities#

Table of California IOUs and POUs, names and geometries

Type:: DataFrame

_ca_forecast_zones#

Table of California Demand Forecast Zones

Type:: DataFrame

_ca_electric_balancing_areas#

Table of Electric Balancing Areas

Type:: DataFrame

_get_us_states(self)#: Returns a dict of state abbreviations and indices

_get_ca_counties(self)#: Returns a dict of California counties and their indices

_get_ca_watersheds(self)#: Returns a dict for CA watersheds and their indices

_get_forecast_zones(self)#: Returns a dict for CA electricity demand forecast zones

_get_ious_pous(self)#: Returns a dict for CA electric load serving entities IOUs & POUs

_get_electric_balancing_areas(self)#: Returns a dict for CA Electric Balancing Authority Areas

boundary_dict()#

Return a dict of the other boundary dicts, used to populate ck.Select.

This returns a dictionary of lookup dictionaries for each set of geoparquet files that the user might be choosing from. It is used to populate the DataParameters cached_area dynamically as the category in the area_subset parameter changes.

Returns:: dict

load()#: Read parquet files and sets class attributes.

climakitae.core.data_export module#

climakitae.core.data_export.export(data: DataArray | Dataset, filename: str = 'dataexport', format: str = 'NetCDF', mode: str = 'local')#

Save xarray data as NetCDF, Zarr, or CSV in the current working directory, or if Zarr optionally stream the export file to an AWS S3 scratch bucket and give download URL. NetCDF can only be written to the HUB user partition if it will fit. Zarr can either be written to the HUB user partition or to S3 scratch bucket using the mode option.

Parameters:

data (xr.DataArray | xr.Dataset) – Data to export, as output by e.g. DataParameters.retrieve().
filename (str, optional) – Output file name (without file extension, i.e. “my_filename” instead of “my_filename.nc”). The default is “dataexport”.
format (str, optional) – File format (“Zarr”, “NetCDF”, “CSV”). The default is “NetCDF”.
mode (str, optional) – Save location logic for Zarr file (“local”, “s3”). The default is “local”

Returns:

None

climakitae.core.data_export.remove_zarr(filename: str)#

Remove Zarr directory structure helper function. As Zarr format is a directory tree it is not easily removed using JupyterHUB GUI. This function simply deletes an entire directory tree.

Parameters:: filename (str) – Output Zarr file name (without file extension, i.e. “my_filename” instead of “my_filename.zarr”).

climakitae.core.data_export.write_tmy_file(filename_to_export: str, df: DataFrame, years: Tuple[int, int], location_name: str, station_code: int, stn_lat: float, stn_lon: float, stn_state: str, stn_elev: float = 0.0, file_ext: str = 'tmy')#

Exports TMY data either as .epw or .tmy file

Parameters:

filename_to_export (str) – Filename string, constructed with station name and simulation
df (DataFrame) – Dataframe of TMY data to export
years (Tuple[int, int]) – Tuple containing climatology start and end years
location_name (str) – Location name string, often station name
station_code (int) – Station code
stn_lat (float) – Station latitude
stn_lon (float) – Station longitude
stn_state (str) – State of station location
stn_elev (float, optional) – Elevation of station, default is 0.0
file_ext (str, optional) – File extension for export, default is .tmy, options are “tmy” and “epw”

Returns:

None

climakitae.core.data_interface module#

This module provides the core data interface to access climate data. It contains several key components:

1. VariableDescriptions: A singleton class to load and provide access to available climate variables. 2. DataInterface: A singleton class that manages connections to the data catalog, boundary data, and stations. 3. DataParameters: A parameterized class that handles data selection, filtering, and retrieval.

The module also includes several utility functions to: - Get available data options and subsetting options - Handle spatial subsetting by different boundaries (states, counties, watersheds, etc.) - Retrieve data with simplified parameter specification - Validate user inputs and provide helpful error messages - Convert between different naming conventions in the catalog

This interface serves as the foundation for both programmatic access to climate data and the interactive GUI selection interface.

class climakitae.core.data_interface.DataParameters(*, _data_warning, _station_data_info, all_touched, approach, area_average, area_subset, cached_area, data_type, downscaling_method, enable_hidden_vars, extended_description, latitude, longitude, resolution, scenario_historical, scenario_ssp, simulation, stations, time_slice, timescale, units, variable, variable_id, variable_type, warming_level, warming_level_months, warming_level_window, name)#

Bases: Parameterized

Python param object to hold data parameters for use in panel GUI.

Call DataParameters when you want to select and retrieve data from the climakitae data catalog without using the ckg.Select GUI. ckg.Select uses this class to store selections and retrieve data.

DataParameters calls DataInterface, a singleton class that makes the connection to the intake-esm data store in S3 bucket.

unit_options_dictdict: options dictionary for converting unit to other units
area_subsetstr: dataset to use from Boundaries for sub area selection
cached_arealist of strs: one or more features from area_subset datasets to use for selection
latitudetuple: latitude range of selection box
longitudetuple: longitude range of selection box
variable_typestr: toggle raw or derived variable selection
default_variablestr: initial variable to have selected in widget
time_slicetuple: year range to select
resolutionstr: resolution of data to select (“3 km”, “9 km”, “45 km”)
timescalestr: frequency of dataset (“hourly”, “daily”, “monthly”)
scenario_historicallist of strs: historical scenario selections
area_averagestr: whether to comput area average (“Yes”, “No”)
downscaling_methodstr: whether to choose WRF or LOCA2 data or both (“Dynamical”, “Statistical”, “Dynamical+Statistical”)
data_typestr: whether to choose gridded or station based data (“Gridded”, “Stations”)
stationslist or strs: list of stations that can be filtered by cached_area
_station_data_infostr: informational statement when station data selected with data_type
scenario_ssplist of strs: list of future climate scenarios selected (availability depends on other params)
simulationlist of strs: list of simulations (models) selected (availability depends on other params)
variablestr: variable long display name
unitsstr: unit abbreviation currently of the data (native or converted)
enable_hidden_varsboolean: enable selection of variables that are hidden from the GUI?
extended_descriptionstr: extended description of the data variable
variable_idlist of strs: list of variable ids that match the variable (WRF and LOCA2 can have different codes for same type of variable)
historical_climate_range_wrftuple: time range of historical WRF data
historical_climate_range_locatuple: time range of historical LOCA2 data
historical_climate_range_wrf_and_locatuple: time range of historical WRF and LOCA2 data combined
historical_reconstruction_rangetuple: time range of historical reanalysis data
ssp_rangetuple: time range of future scenario SSP data
_info_about_station_datastr: warning message about station data
_data_warningstr: warning about selecting unavailable data combination
data_interfaceDataInterface: data connection singleton class that provides data
_data_catalogintake_esm.source.ESMDataSource: shorthand alias to DataInterface.data_catalog
_variable_descriptionspd.DataFrame: shorthand alias to DataInterface.variable_descriptions
_stations_gdfgpd.GeoDataFrame: shorthand alias to DataInterface.stations_gdf
_geographiesBoundaries: shorthand alias to DataInterface.geographies
_geography_choosedict: shorthand alias to Boundaries.boundary_dict()
_warming_level_timespd.DataFrame: shorthand alias to DataInterface.warming_level_times
colormapstr: default colormap to render the currently selected data
scenario_optionslist of strs: list of available scenarios (historical and ssp) for selection
variable_options_dfpd.DataFrame: filtered variable descriptions for the downscaling_method and timescale
warming_levelarray: global warming level(s)
warming_level_windowinteger: years around Global Warming Level (+/-)

(e.g. 15 means a 30yr window)

approachstr, “Warming Level” or “Time”: how do you want the data to be retrieved?
warming_level_monthsarray: months of year to use for computing warming levels default to entire calendar year: 1,2,3,4,5,6,7,8,9,10,11,12
all_touchedboolean: spatial subset option for within or touching selection

retrieve(config: str = None, merge: bool = True) → DataArray | Dataset | List[DataArray]#

Retrieve data from catalog

By default, DataParameters determines the data retrieved. Grabs the data from the AWS S3 bucket, returns lazily loaded dask array. User-facing function that provides a wrapper for read_catalog_from_select.

Returns:: data_return (xr.DataArray | xr.Dataset | List[xr.DataArray]) – DataArray or Dataset object

class climakitae.core.data_interface.DataInterface#

Bases: object

Load data connections into memory once

This is a singleton class called by the various Param classes to connect to the local data and to the intake data catalog and parquet boundary catalog. The class attributes are read only so that the data does not get changed accidentially.

variable_descriptions#

variable descriptions pandas data frame

Type:: DataFrame

stations#

station locations pandas data frame

Type:: gpd.DataFrame

stations_gdf#

station locations geopandas data frame

Type:: gpd.GeoDataFrame

data_catalog#

intake ESM data catalog

Type:: intake_esm.source.ESMDataSource

boundary_catalog#

parquet boundary catalog

Type:: intake.catalog.Catalog

geographies#

boundary dictionaries class

Type:: Boundaries

warming_level_times#

table of when each simulation/scenario reaches each warming level

Type:: DataFrame

property boundary_catalog#: Get the boundary catalog

property data_catalog#: Get the data catalog

property geographies#: Get the geographies object

property stations#: Get the stations dataframe

property stations_gdf#: Get the stations geopandas dataframe

property variable_descriptions#: Get the variable descriptions dataframe

property warming_level_times#: Get the warming level times dataframe

class climakitae.core.data_interface.VariableDescriptions#

Bases: object

Load Variable Desciptions CSV only once

This is a singleton class that needs to be called separately from DataInterface because variable descriptions are used without DataInterface in ck.view. Also ck.view is loaded on package load so this avoids loading boundary data when not needed.

variable_descriptions#

pandas dataframe that stores available data variables usable with the package

Type:: DataFrame

load()#: Read the variable descriptions csv into class variable.

climakitae.core.data_interface.get_data(variable: str, resolution: str, timescale: str, downscaling_method: str = 'Dynamical', data_type: str = 'Gridded', approach: str = 'Time', scenario: str | list[str] = None, units: str = None, warming_level: list[float] = None, area_subset: str = 'none', latitude: tuple[float, float] = None, longitude: tuple[float, float] = None, cached_area: list[str] = None, area_average: str = None, time_slice: tuple = None, stations: list[str] = None, warming_level_window: int = None, warming_level_months: list[int] = None, all_touched=False, enable_hidden_vars: bool = False, **kwargs) → DataArray#

Retrieve formatted data from the Analytics Engine data catalog using a simple function.

Contrasts with DataParameters().retrieve(), which retrieves data from the user inputs in climakitaegui’s selections GUI.

variablestr: String name of climate variable
resolutionstr, one of [“3 km”, “9 km”, “45 km”]: Resolution of data in kilometers
timescalestr, one of [“hourly”, “daily”, “monthly”]: Temporal frequency of dataset
downscaling_methodstr, one of [“Dynamical”, “Statistical”, “Dynamical+Statistical”], optional: Downscaling method of the data: WRF (“Dynamical”), LOCA2 (“Statistical”), or both “Dynamical+Statistical” Default to “Dynamical”
data_typestr, one of [“Gridded”, “Stations”], optional: Whether to choose gridded data or weather station data Default to “Gridded”
approachone of [“Time”, “Warming Level”], optional: Default to “Time”
scenariostr or list of str, optional: SSP scenario [“SSP 3-7.0”, “SSP 2-4.5”,”SSP 5-8.5”] and/or historical data selection [“Historical Climate”, “Historical Reconstruction”] If approach = “Time”, you need to set a valid option If approach = “Warming Level”, scenario is ignored
unitsstr, optional: Variable units. Defaults to native units of data
area_subsetstr, optional: Area category: i.e “CA counties” Defaults to entire domain (“none”)
cached_arealist, optional: Area: i.e “Alameda county” Defaults to entire domain ([“entire domain”])
area_averageone of [“Yes”,”No”], optional: Take an average over spatial domain? Default to “No”.
latitudeNone or tuple of float, optional: Tuple of valid latitude bounds Default to entire domain
longitudeNone or tuple of float, optional: Tuple of valid longitude bounds Default to entire domain
time_slicetuple, optional: Time range for retrieved data Only valid for approach = “Time”
stationslist of str, optional: Which weather stations to retrieve data for Only valid for data_type = “Stations” Default to all stations
warming_levellist of float, optional: Must be one of the warming levels available in clmakitae.core.constants Only valid for approach = “Warming Level” and data_type = “Stations”
warming_level_windowint in range (5,25), optional: Years around Global Warming Level (+/-)

(e.g. 15 means a 30yr window)

Only valid for approach = “Warming Level” and data_type = “Stations”

warming_level_monthslist of int, optional: Months of year for which to perform warming level computation Default to all months in a year: [1,2,3,4,5,6,7,8,9,10,11,12] For example, you may want to set warming_level_months=[12,1,2] to perform the analysis for the winter season. Only valid for approach = “Warming Level” and data_type = “Stations”
all_touchedboolean: spatial subset option for within or touching selection
enable_hidden_varsboolean, optional: Return all variables, including the ones in which “show” is set to False? Default to False
kwargsdict: Additional keyword arguments to pass to DataParameters()

data : xr.DataArray

Errors aren’t raised by the function. Rather, an appropriate informative message is printed, and the function returns None. This is due to the fact that the AE Jupyter Hub raises a strange Pieces Mismatch Error for some bad inputs; instead, that error is ignored and a more informative error message is printed instead.

climakitae.core.data_interface.get_data_options(variable: str = None, downscaling_method: str = None, resolution: str = None, timescale: str = None, scenario: str | list[str] = None, tidy: bool = True, enable_hidden_vars: bool = False) → DataFrame#

Get data options, in the same format as the Select GUI, given a set of possible inputs. Allows the user to access the data using the same language as the GUI, bypassing the sometimes unintuitive naming in the catalog. If no function inputs are provided, the function returns the entire AE catalog that is available via the Select GUI

Parameters:

variable (str, optional) – Default to None
downscaling_method (str, optional) – Default to None
resolution (str, optional) – Default to None
timescale (str, optional) – Default to None
scenario (str or list, optional) – Default to None
tidy (boolean, optional) – Format the pandas dataframe? This creates a DataFrame with a MultiIndex that makes it easier to parse the options. Default to True
enable_hidden_vars (boolean, optional) – Return all variables, including the ones in which “show” is set to False? Default to False

Returns:

cat_subset (DataFrame) – Catalog options for user-provided inputs

climakitae.core.data_interface.get_subsetting_options(area_subset: str = 'all') → DataFrame#

Get all geometry options for spatial subsetting. Options match those in selections GUI

Parameters:: area_subset (str) – One of “all”, “states”, “CA counties”, “CA Electricity Demand Forecast Zones”, “CA watersheds”, “CA Electric Balancing Authority Areas”, “CA Electric Load Serving Entities (IOU & POU)”, “Stations” Defaults to “all”, which shows all the geometry options with area_subset as a multiindex
Returns:: geom_df (DataFrame) – Geometry options Shows only options for one area_subset if input is provided that is not “all” i.e. if area_subset = “states”, only the options for states will be returned

climakitae.core.data_load module#

climakitae.core.data_load.area_subset_geometry(selections: DataParameters) → list[Polygon] | None#

Get geometry to perform area subsetting with.

Parameters:: selections (DataParameters) – object holding user’s selections
Returns:: ds_region (shapely.geometry) – geometry to use for subsetting

climakitae.core.data_load.load(xr_da: DataArray, progress_bar: bool = False) → DataArray#

Read lazily loaded dask array into memory for faster access

Parameters:

xr_da (DataArray)
progress_bar (boolean)

Returns:

da_computed (DataArray)

climakitae.core.data_load.read_catalog_from_select(selections: DataParameters) → DataArray#

The primary and first data loading method, called by DataParameters.retrieve, it returns a DataArray (which can be quite large) containing everything requested by the user (which is stored in ‘selections’).

Parameters:: selections (DataParameters) – object holding user’s selections
Returns:: da (DataArray) – output data

climakitae.core.paths module#

This module defines package level paths

climakitae.core package

Contents

climakitae.core package#

Submodules#

climakitae.core.boundaries module#

climakitae.core.data_export module#

climakitae.core.data_interface module#

climakitae.core.data_load module#

climakitae.core.paths module#

Module contents#