climakitae.core package#

Submodules#

climakitae.core.boundaries module#

class climakitae.core.boundaries.Boundaries(boundary_catalog)#

Bases: object

Get geospatial polygon data from the S3 stored parquet catalog. Used to access boundaries for subsetting data by state, county, etc.

_cat#

Parquet boundary catalog instance

Type:

intake.catalog.Catalog

_us_states#

Table of US state names and geometries

Type:

DataFrame

_ca_counties#

Table of California county names and geometries Sorted by county name alphabetical order

Type:

DataFrame

_ca_watersheds#

Table of California watershed names and geometries Sorted by watershed name alphabetical order

Type:

DataFrame

_ca_utilities#

Table of California IOUs and POUs, names and geometries

Type:

DataFrame

_ca_forecast_zones#

Table of California Demand Forecast Zones

Type:

DataFrame

_ca_electric_balancing_areas#

Table of Electric Balancing Areas

Type:

DataFrame

_get_us_states(self)#

Returns a dict of state abbreviations and indices

_get_ca_counties(self)#

Returns a dict of California counties and their indices

_get_ca_watersheds(self)#

Returns a dict for CA watersheds and their indices

_get_forecast_zones(self)#

Returns a dict for CA electricity demand forecast zones

_get_ious_pous(self)#

Returns a dict for CA electric load serving entities IOUs & POUs

_get_electric_balancing_areas(self)#

Returns a dict for CA Electric Balancing Authority Areas

boundary_dict()#

Return a dict of the other boundary dicts, used to populate ck.Select.

This returns a dictionary of lookup dictionaries for each set of geoparquet files that the user might be choosing from. It is used to populate the DataParameters cached_area dynamically as the category in the area_subset parameter changes.

Returns:

dict

load()#

Read parquet files and sets class attributes.

climakitae.core.data_export module#

climakitae.core.data_export.export(data, filename='dataexport', format='NetCDF', mode='auto')#

Save xarray data as either a NetCDF or CSV in the current working directory, or stream the export file to an AWS S3 scratch bucket and give download URL. Default behavior is for the code to automatically determine the output destination based on whether file is small enough to fit in HUB user partition, this can be overridden using the mode parameter.

Parameters:
  • data (DataArray or Dataset) – Data to export, as output by e.g. climakitae.Select().retrieve().

  • filename (str, optional) – Output file name (without file extension, i.e. “my_filename” instead of “my_filename.nc”). The default is “dataexport”.

  • format (str, optional) – File format (“NetCDF” or “CSV”). The default is “NetCDF”.

  • mode (str, optional) – Save location logic for NetCDF file (“auto”, “local”, “s3”). The default is “auto”

climakitae.core.data_export.write_tmy_file(filename_to_export, df, location_name, station_code, stn_lat, stn_lon, stn_state, stn_elev=0.0, file_ext='tmy')#

Exports TMY data either as .epw or .tmy file

Parameters:
  • filename_to_export (str) – Filename string, constructed with station name and simulation

  • df (DataFrame) – Dataframe of TMY data to export

  • location_name (str) – Location name string, often station name

  • station_code (int) – Station code

  • stn_lat (float) – Station latitude

  • stn_lon (float) – Station longitude

  • stn_state (str) – State of station location

  • stn_elev (float, optional) – Elevation of station, default is 0.0

  • file_ext (str, optional) – File extension for export, default is .tmy, options are “tmy” and “epw”

Returns:

None

climakitae.core.data_interface module#

class climakitae.core.data_interface.DataParameters(**params)#

Bases: Parameterized

Python param object to hold data parameters for use in panel GUI.

Call DataParameters when you want to select and retrieve data from the climakitae data catalog without using the ckg.Select GUI. ckg.Select uses this class to store selections and retrieve data.

DataParameters calls DataInterface, a singleton class that makes the connection to the intake-esm data store in S3 bucket.

unit_options_dict: dict

options dictionary for converting unit to other units

area_subset: str

dataset to use from Boundaries for sub area selection

cached_area: list of strs

one or more features from area_subset datasets to use for selection

latitude: tuple

latitude range of selection box

longitude: tuple

longitude range of selection box

variable_type: str

toggle raw or derived variable selection

default_variable: str

initial variable to have selected in widget

time_slice: tuple

year range to select

resolution: str

resolution of data to select (“3 km”, “9 km”, “45 km”)

timescale: str

frequency of dataset (“hourly”, “daily”, “monthly”)

scenario_historical: list of strs

historical scenario selections

area_average: str

whether to comput area average (“Yes”, “No”)

downscaling_method: str

whether to choose WRF or LOCA2 data or both (“Dynamical”, “Statistical”, “Dynamical+Statistical”)

data_type: str

whether to choose gridded or station based data (“Gridded”, “Station”)

station: list or strs

list of stations that can be filtered by cached_area

_station_data_info: str

informational statement when station data selected with data_type

scenario_ssp: list of strs

list of future climate scenarios selected (availability depends on other params)

simulation: list of strs

list of simulations (models) selected (availability depends on other params)

variable: str

variable long display name

units: str

unit abbreviation currently of the data (native or converted)

enable_hidden_vars: boolean

enable selection of variables that are hidden from the GUI?

extended_description: str

extended description of the data variable

variable_id: list of strs

list of variable ids that match the variable (WRF and LOCA2 can have different codes for same type of variable)

historical_climate_range_wrf: tuple

time range of historical WRF data

historical_climate_range_loca: tuple

time range of historical LOCA2 data

historical_climate_range_wrf_and_loca: tuple

time range of historical WRF and LOCA2 data combined

historical_reconstruction_range: tuple

time range of historical reanalysis data

ssp_range: tuple

time range of future scenario SSP data

_info_about_station_data: str

warning message about station data

_data_warning: str

warning about selecting unavailable data combination

data_interface: DataInterface

data connection singleton class that provides data

_data_catalog: intake_esm.source.ESMDataSource

shorthand alias to DataInterface.data_catalog

_variable_descriptions: pd.DataFrame

shorthand alias to DataInterface.variable_descriptions

_stations_gdf: gpd.GeoDataFrame

shorthand alias to DataInterface.stations_gdf

_geographies: Boundaries

shorthand alias to DataInterface.geographies

_geography_choose: dict

shorthand alias to Boundaries.boundary_dict()

_warming_level_times: pd.DataFrame

shorthand alias to DataInterface.warming_level_times

colormap: str

default colormap to render the currently selected data

scenario_options: list of strs

list of available scenarios (historical and ssp) for selection

variable_options_df: pd.DataFrame

filtered variable descriptions for the downscaling_method and timescale

warming_level: array

global warming level(s)

warming_level_window: integer

years around Global Warming Level (+/-)

(e.g. 15 means a 30yr window)
approach: str, “Warming Level” or “Time”

how do you want the data to be retrieved?

warming_level_months: array

months of year to use for computing warming levels default to entire calendar year: 1,2,3,4,5,6,7,8,9,10,11,12

retrieve(config=None, merge=True)#

Retrieve data from catalog

By default, DataParameters determines the data retrieved. To retrieve data using the settings in a configuration csv file, set config to the local filepath of the csv. Grabs the data from the AWS S3 bucket, returns lazily loaded dask array. User-facing function that provides a wrapper for read_catalog_from_csv and read_catalog_from_select.

Parameters:
  • config (str, optional) – Local filepath to configuration csv file Default to None– retrieve settings in selections

  • merge (bool, optional) – If config is TRUE and multiple datasets desired, merge to form a single object? Defaults to True.

Returns:

  • DataArray – Lazily loaded dask array Default if no config file provided

  • Dataset – If multiple rows are in the csv, each row is a data_variable Only an option if a config file is provided

  • list of DataArray – If multiple rows are in the csv and merge=True, multiple DataArrays are returned in a single list. Only an option if a config file is provided.

class climakitae.core.data_interface.DataInterface#

Bases: object

Load data connections into memory once

This is a singleton class called by the various Param classes to connect to the local data and to the intake data catalog and parquet boundary catalog. The class attributes are read only so that the data does not get changed accidentially.

variable_descriptions#

variable descriptions pandas data frame

Type:

DataFrame

stations#

station locations pandas data frame

Type:

gpd.DataFrame

stations_gdf#

station locations geopandas data frame

Type:

gpd.GeoDataFrame

data_catalog#

intake ESM data catalog

Type:

intake_esm.source.ESMDataSource

boundary_catalog#

parquet boundary catalog

Type:

intake.catalog.Catalog

geographies#

boundary dictionaries class

Type:

Boundaries

warming_level_times#

table of when each simulation/scenario reaches each warming level

Type:

DataFrame

property boundary_catalog#
property data_catalog#
property geographies#
property stations#
property stations_gdf#
property variable_descriptions#
property warming_level_times#
class climakitae.core.data_interface.VariableDescriptions#

Bases: object

Load Variable Desciptions CSV only once

This is a singleton class that needs to be called separately from DataInterface because variable descriptions are used without DataInterface in ck.view. Also ck.view is loaded on package load so this avoids loading boundary data when not needed.

variable_descriptions#

pandas dataframe that stores available data variables usable with the package

Type:

DataFrame

load()#

Read the variable descriptions csv into class variable.

climakitae.core.data_interface.get_data(variable, downscaling_method, resolution, timescale, approach='Time', scenario=None, units=None, warming_level=None, area_subset='none', latitude=None, longitude=None, cached_area=['entire domain'], area_average=None, time_slice=None, warming_level_window=None, warming_level_months=None)#
Retrieve formatted data from the Analytics Engine data catalog using a simple function.

Contrasts with DataParameters().retrieve(), which retrieves data from the user inputs in climakitaegui’s selections GUI.

variable: str

String name of climate variable

downscaling_method: str, one of [“Dynamical”, “Statistical”, “Dynamical+Statistical”]

Downscaling method of the data: WRF (“Dynamical”), LOCA2 (“Statistical”), or both “Dynamical+Statistical”

resolution: str, one of [“3 km”, “9 km”, “45 km”]

Resolution of data in kilometers

timescale: str, one of [“hourly”, “daily”, “monthly”]

Temporal frequency of dataset

approach: one of [“Time”, “Warming Level”], optional

Default to “Time”

scenario: str or list of str, optional

SSP scenario and/or historical data selection (“Historical Climate”, “Historical Reconstruction”) If approach = “Time”, you need to set a valid option If approach = “Warming Level”, scenario is ignored

units: str, optional

Variable units. Defaults to native units of data

area_subset: str, optional

Area category: i.e “CA counties” Defaults to entire domain (“none”)

cached_area: list, optional

Area: i.e “Alameda county” Defaults to entire domain ([“entire domain”])

area_average: one of [“Yes”,”No”], optional

Take an average over spatial domain? Default to “No”.

latitude: None or tuple of float, optional

Tuple of valid latitude bounds Default to entire domain

longitude: None or tuple of float, optional

Tuple of valid longitude bounds Default to entire domain

time_slice: tuple, optional

Time range for retrieved data Only valid for approach = “Time”

warming_level: list of float, optional

Must be one of [1.5, 2.0, 2.5, 3.0, 4.0] Only valid for approach = “Warming Level”

warming_level_window: int in range (5,25), optional

Years around Global Warming Level (+/-)

(e.g. 15 means a 30yr window)

Only valid for approach = “Warming Level”

warming_level_months: list of int, optional

Months of year for which to perform warming level computation Default to all months in a year: [1,2,3,4,5,6,7,8,9,10,11,12] For example, you may want to set warming_level_months=[12,1,2] to perform the analysis for the winter season. Only valid for approach = “Warming Level”

data: xr.DataArray

Errors aren’t raised by the function. Rather, an appropriate informative message is printed, and the function returns None. This is due to the fact that the AE Jupyter Hub raises a strange Pieces Mismatch Error for some bad inputs; instead, that error is ignored and a more informative error message is printed instead.

climakitae.core.data_interface.get_data_options(variable=None, downscaling_method=None, resolution=None, timescale=None, scenario=None, tidy=True)#

Get data options, in the same format as the Select GUI, given a set of possible inputs. Allows the user to access the data using the same language as the GUI, bypassing the sometimes unintuitive naming in the catalog. If no function inputs are provided, the function returns the entire AE catalog that is available via the Select GUI

Parameters:
  • variable (str, optional) – Default to None

  • downscaling_method (str, optional) – Default to None

  • resolution (str, optional) – Default to None

  • timescale (str, optional) – Default to None

  • scenario (str or list, optional) – Default to None

  • tidy (boolean, optional) – Format the pandas dataframe? This creates a DataFrame with a MultiIndex that makes it easier to parse the options. Default to True

Returns:

cat_subset (DataFrame) – Catalog options for user-provided inputs

climakitae.core.data_interface.get_subsetting_options(area_subset='all')#

Get all geometry options for spatial subsetting. Options match those in selections GUI

Parameters:

area_subset (str) – One of “all”, “states”, “CA counties”, “CA Electricity Demand Forecast Zones”, “CA watersheds”, “CA Electric Balancing Authority Areas”, “CA Electric Load Serving Entities (IOU & POU)” Defaults to “all”, which shows all the geometry options with area_subset as a multiindex

Returns:

geom_df (DataFrame) – Geometry options Shows only options for one area_subset if input is provided that is not “all” i.e. if area_subset = “states”, only the options for states will be returned

climakitae.core.data_load module#

climakitae.core.data_load.area_subset_geometry(selections)#

Get geometry to perform area subsetting with.

Parameters:

selections (DataParameters) – object holding user’s selections

Returns:

ds_region (shapely.geometry) – geometry to use for subsetting

climakitae.core.data_load.load(xr_da, progress_bar=False)#

Read lazily loaded dask array into memory for faster access

Parameters:
Returns:

da_computed (DataArray)

climakitae.core.data_load.read_catalog_from_csv(selections, csv, merge=True)#

Retrieve user data selections from csv input.

Allows user to bypass ck.Select() GUI and allows developers to pre-set inputs in a csv file for ease of use in a notebook.

Parameters:
  • selections (DataParameters) – Data settings (variable, unit, timescale, etc).

  • csv (str) – Filepath to local csv file.

  • merge (bool, optional) – If multiple datasets desired, merge to form a single object? Default to True.

Returns:

  • one of the following, depending on csv input and merge

  • xr_ds (Dataset) – if multiple rows are in the csv, each row is a data_variable

  • xr_da (DataArray) – if csv only has one row

  • xr_list (list of xr.DataArrays) – if multiple rows are in the csv and merge=True, multiple DataArrays are returned in a single list.

climakitae.core.data_load.read_catalog_from_select(selections)#

The primary and first data loading method, called by DataParameters.retrieve, it returns a DataArray (which can be quite large) containing everything requested by the user (which is stored in ‘selections’).

Parameters:

selections (DataParameters) – object holding user’s selections

Returns:

da (DataArray) – output data

climakitae.core.paths module#

This module defines package level paths

Module contents#