climakitae.core package#
Submodules#
climakitae.core.boundaries module#
- class climakitae.core.boundaries.Boundaries(boundary_catalog)#
Bases:
object
Get geospatial polygon data from the S3 stored parquet catalog. Used to access boundaries for subsetting data by state, county, etc.
- _cat#
Parquet boundary catalog instance
- Type:
intake.catalog.Catalog
- _ca_counties#
Table of California county names and geometries Sorted by county name alphabetical order
- Type:
- _ca_watersheds#
Table of California watershed names and geometries Sorted by watershed name alphabetical order
- Type:
- _get_us_states(self)#
Returns a dict of state abbreviations and indices
- _get_ca_counties(self)#
Returns a dict of California counties and their indices
- _get_ca_watersheds(self)#
Returns a dict for CA watersheds and their indices
- _get_forecast_zones(self)#
Returns a dict for CA electricity demand forecast zones
- _get_ious_pous(self)#
Returns a dict for CA electric load serving entities IOUs & POUs
- _get_electric_balancing_areas(self)#
Returns a dict for CA Electric Balancing Authority Areas
- boundary_dict()#
Return a dict of the other boundary dicts, used to populate ck.Select.
This returns a dictionary of lookup dictionaries for each set of geoparquet files that the user might be choosing from. It is used to populate the DataParameters cached_area dynamically as the category in the area_subset parameter changes.
- Returns:
- load()#
Read parquet files and sets class attributes.
climakitae.core.data_export module#
- climakitae.core.data_export.export(data, filename='dataexport', format='NetCDF', mode='auto')#
Save xarray data as either a NetCDF or CSV in the current working directory, or stream the export file to an AWS S3 scratch bucket and give download URL. Default behavior is for the code to automatically determine the output destination based on whether file is small enough to fit in HUB user partition, this can be overridden using the mode parameter.
- Parameters:
data (
DataArray
orDataset
) – Data to export, as output by e.g. climakitae.Select().retrieve().filename (
str
, optional) – Output file name (without file extension, i.e. “my_filename” instead of “my_filename.nc”). The default is “dataexport”.format (
str
, optional) – File format (“NetCDF” or “CSV”). The default is “NetCDF”.mode (
str
, optional) – Save location logic for NetCDF file (“auto”, “local”, “s3”). The default is “auto”
- climakitae.core.data_export.write_tmy_file(filename_to_export, df, location_name, station_code, stn_lat, stn_lon, stn_state, stn_elev=0.0, file_ext='tmy')#
Exports TMY data either as .epw or .tmy file
- Parameters:
filename_to_export (
str
) – Filename string, constructed with station name and simulationdf (
DataFrame
) – Dataframe of TMY data to exportlocation_name (
str
) – Location name string, often station namestation_code (
int
) – Station codestn_lat (
float
) – Station latitudestn_lon (
float
) – Station longitudestn_state (
str
) – State of station locationstn_elev (
float
, optional) – Elevation of station, default is 0.0file_ext (
str
, optional) – File extension for export, default is .tmy, options are “tmy” and “epw”
- Returns:
climakitae.core.data_interface module#
- class climakitae.core.data_interface.DataParameters(**params)#
Bases:
Parameterized
- Python param object to hold data parameters for use in panel GUI.
Call DataParameters when you want to select and retrieve data from the climakitae data catalog without using the ckg.Select GUI. ckg.Select uses this class to store selections and retrieve data.
DataParameters calls DataInterface, a singleton class that makes the connection to the intake-esm data store in S3 bucket.
- unit_options_dict: dict
options dictionary for converting unit to other units
- area_subset: str
dataset to use from Boundaries for sub area selection
- cached_area: list of strs
one or more features from area_subset datasets to use for selection
- latitude: tuple
latitude range of selection box
- longitude: tuple
longitude range of selection box
- variable_type: str
toggle raw or derived variable selection
- default_variable: str
initial variable to have selected in widget
- time_slice: tuple
year range to select
- resolution: str
resolution of data to select (“3 km”, “9 km”, “45 km”)
- timescale: str
frequency of dataset (“hourly”, “daily”, “monthly”)
- scenario_historical: list of strs
historical scenario selections
- area_average: str
whether to comput area average (“Yes”, “No”)
- downscaling_method: str
whether to choose WRF or LOCA2 data or both (“Dynamical”, “Statistical”, “Dynamical+Statistical”)
- data_type: str
whether to choose gridded or station based data (“Gridded”, “Station”)
- station: list or strs
list of stations that can be filtered by cached_area
- _station_data_info: str
informational statement when station data selected with data_type
- scenario_ssp: list of strs
list of future climate scenarios selected (availability depends on other params)
- simulation: list of strs
list of simulations (models) selected (availability depends on other params)
- variable: str
variable long display name
- units: str
unit abbreviation currently of the data (native or converted)
- enable_hidden_vars: boolean
enable selection of variables that are hidden from the GUI?
- extended_description: str
extended description of the data variable
- variable_id: list of strs
list of variable ids that match the variable (WRF and LOCA2 can have different codes for same type of variable)
- historical_climate_range_wrf: tuple
time range of historical WRF data
- historical_climate_range_loca: tuple
time range of historical LOCA2 data
- historical_climate_range_wrf_and_loca: tuple
time range of historical WRF and LOCA2 data combined
- historical_reconstruction_range: tuple
time range of historical reanalysis data
- ssp_range: tuple
time range of future scenario SSP data
- _info_about_station_data: str
warning message about station data
- _data_warning: str
warning about selecting unavailable data combination
- data_interface: DataInterface
data connection singleton class that provides data
- _data_catalog: intake_esm.source.ESMDataSource
shorthand alias to DataInterface.data_catalog
- _variable_descriptions: pd.DataFrame
shorthand alias to DataInterface.variable_descriptions
- _stations_gdf: gpd.GeoDataFrame
shorthand alias to DataInterface.stations_gdf
- _geographies: Boundaries
shorthand alias to DataInterface.geographies
- _geography_choose: dict
shorthand alias to Boundaries.boundary_dict()
- _warming_level_times: pd.DataFrame
shorthand alias to DataInterface.warming_level_times
- colormap: str
default colormap to render the currently selected data
- scenario_options: list of strs
list of available scenarios (historical and ssp) for selection
- variable_options_df: pd.DataFrame
filtered variable descriptions for the downscaling_method and timescale
- warming_level: array
global warming level(s)
- warming_level_window: integer
years around Global Warming Level (+/-)
- (e.g. 15 means a 30yr window)
- approach: str, “Warming Level” or “Time”
how do you want the data to be retrieved?
- warming_level_months: array
months of year to use for computing warming levels default to entire calendar year: 1,2,3,4,5,6,7,8,9,10,11,12
- retrieve(config=None, merge=True)#
Retrieve data from catalog
By default, DataParameters determines the data retrieved. To retrieve data using the settings in a configuration csv file, set config to the local filepath of the csv. Grabs the data from the AWS S3 bucket, returns lazily loaded dask array. User-facing function that provides a wrapper for read_catalog_from_csv and read_catalog_from_select.
- Parameters:
- Returns:
DataArray
– Lazily loaded dask array Default if no config file providedDataset
– If multiple rows are in the csv, each row is a data_variable Only an option if a config file is providedlist
ofDataArray
– If multiple rows are in the csv and merge=True, multiple DataArrays are returned in a single list. Only an option if a config file is provided.
- class climakitae.core.data_interface.DataInterface#
Bases:
object
Load data connections into memory once
This is a singleton class called by the various Param classes to connect to the local data and to the intake data catalog and parquet boundary catalog. The class attributes are read only so that the data does not get changed accidentially.
- stations#
station locations pandas data frame
- Type:
gpd.DataFrame
- stations_gdf#
station locations geopandas data frame
- Type:
gpd.GeoDataFrame
- data_catalog#
intake ESM data catalog
- Type:
intake_esm.source.ESMDataSource
- boundary_catalog#
parquet boundary catalog
- Type:
intake.catalog.Catalog
- geographies#
boundary dictionaries class
- Type:
Boundaries
- warming_level_times#
table of when each simulation/scenario reaches each warming level
- Type:
- property boundary_catalog#
- property data_catalog#
- property geographies#
- property stations#
- property stations_gdf#
- property variable_descriptions#
- property warming_level_times#
- class climakitae.core.data_interface.VariableDescriptions#
Bases:
object
Load Variable Desciptions CSV only once
This is a singleton class that needs to be called separately from DataInterface because variable descriptions are used without DataInterface in ck.view. Also ck.view is loaded on package load so this avoids loading boundary data when not needed.
- variable_descriptions#
pandas dataframe that stores available data variables usable with the package
- Type:
- load()#
Read the variable descriptions csv into class variable.
- climakitae.core.data_interface.get_data(variable, downscaling_method, resolution, timescale, approach='Time', scenario=None, units=None, warming_level=None, area_subset='none', latitude=None, longitude=None, cached_area=['entire domain'], area_average=None, time_slice=None, warming_level_window=None, warming_level_months=None)#
- Retrieve formatted data from the Analytics Engine data catalog using a simple function.
Contrasts with DataParameters().retrieve(), which retrieves data from the user inputs in climakitaegui’s selections GUI.
- variable: str
String name of climate variable
- downscaling_method: str, one of [“Dynamical”, “Statistical”, “Dynamical+Statistical”]
Downscaling method of the data: WRF (“Dynamical”), LOCA2 (“Statistical”), or both “Dynamical+Statistical”
- resolution: str, one of [“3 km”, “9 km”, “45 km”]
Resolution of data in kilometers
- timescale: str, one of [“hourly”, “daily”, “monthly”]
Temporal frequency of dataset
- approach: one of [“Time”, “Warming Level”], optional
Default to “Time”
- scenario: str or list of str, optional
SSP scenario and/or historical data selection (“Historical Climate”, “Historical Reconstruction”) If approach = “Time”, you need to set a valid option If approach = “Warming Level”, scenario is ignored
- units: str, optional
Variable units. Defaults to native units of data
- area_subset: str, optional
Area category: i.e “CA counties” Defaults to entire domain (“none”)
- cached_area: list, optional
Area: i.e “Alameda county” Defaults to entire domain ([“entire domain”])
- area_average: one of [“Yes”,”No”], optional
Take an average over spatial domain? Default to “No”.
- latitude: None or tuple of float, optional
Tuple of valid latitude bounds Default to entire domain
- longitude: None or tuple of float, optional
Tuple of valid longitude bounds Default to entire domain
- time_slice: tuple, optional
Time range for retrieved data Only valid for approach = “Time”
- warming_level: list of float, optional
Must be one of [1.5, 2.0, 2.5, 3.0, 4.0] Only valid for approach = “Warming Level”
- warming_level_window: int in range (5,25), optional
Years around Global Warming Level (+/-)
- (e.g. 15 means a 30yr window)
Only valid for approach = “Warming Level”
- warming_level_months: list of int, optional
Months of year for which to perform warming level computation Default to all months in a year: [1,2,3,4,5,6,7,8,9,10,11,12] For example, you may want to set warming_level_months=[12,1,2] to perform the analysis for the winter season. Only valid for approach = “Warming Level”
data: xr.DataArray
Errors aren’t raised by the function. Rather, an appropriate informative message is printed, and the function returns None. This is due to the fact that the AE Jupyter Hub raises a strange Pieces Mismatch Error for some bad inputs; instead, that error is ignored and a more informative error message is printed instead.
- climakitae.core.data_interface.get_data_options(variable=None, downscaling_method=None, resolution=None, timescale=None, scenario=None, tidy=True)#
Get data options, in the same format as the Select GUI, given a set of possible inputs. Allows the user to access the data using the same language as the GUI, bypassing the sometimes unintuitive naming in the catalog. If no function inputs are provided, the function returns the entire AE catalog that is available via the Select GUI
- Parameters:
variable (
str
, optional) – Default to Nonedownscaling_method (
str
, optional) – Default to Noneresolution (
str
, optional) – Default to Nonetimescale (
str
, optional) – Default to Nonetidy (
boolean
, optional) – Format the pandas dataframe? This creates a DataFrame with a MultiIndex that makes it easier to parse the options. Default to True
- Returns:
cat_subset (
DataFrame
) – Catalog options for user-provided inputs
- climakitae.core.data_interface.get_subsetting_options(area_subset='all')#
Get all geometry options for spatial subsetting. Options match those in selections GUI
- Parameters:
area_subset (
str
) – One of “all”, “states”, “CA counties”, “CA Electricity Demand Forecast Zones”, “CA watersheds”, “CA Electric Balancing Authority Areas”, “CA Electric Load Serving Entities (IOU & POU)” Defaults to “all”, which shows all the geometry options with area_subset as a multiindex- Returns:
geom_df (
DataFrame
) – Geometry options Shows only options for one area_subset if input is provided that is not “all” i.e. if area_subset = “states”, only the options for states will be returned
climakitae.core.data_load module#
- climakitae.core.data_load.area_subset_geometry(selections)#
Get geometry to perform area subsetting with.
- Parameters:
selections (
DataParameters
) – object holding user’s selections- Returns:
ds_region (
shapely.geometry
) – geometry to use for subsetting
- climakitae.core.data_load.load(xr_da, progress_bar=False)#
Read lazily loaded dask array into memory for faster access
- climakitae.core.data_load.read_catalog_from_csv(selections, csv, merge=True)#
Retrieve user data selections from csv input.
Allows user to bypass ck.Select() GUI and allows developers to pre-set inputs in a csv file for ease of use in a notebook.
- Parameters:
- Returns:
one
ofthe following
,depending on csv input
andmerge
xr_ds (
Dataset
) – if multiple rows are in the csv, each row is a data_variablexr_da (
DataArray
) – if csv only has one rowxr_list (
list
ofxr.DataArrays
) – if multiple rows are in the csv and merge=True, multiple DataArrays are returned in a single list.
- climakitae.core.data_load.read_catalog_from_select(selections)#
The primary and first data loading method, called by DataParameters.retrieve, it returns a DataArray (which can be quite large) containing everything requested by the user (which is stored in ‘selections’).
- Parameters:
selections (
DataParameters
) – object holding user’s selections- Returns:
da (
DataArray
) – output data
climakitae.core.paths module#
This module defines package level paths