climakitae.core package#
Submodules#
climakitae.core.boundaries module#
- class climakitae.core.boundaries.Boundaries(boundary_catalog)#
Bases:
objectGet geospatial polygon data from the S3 stored parquet catalog. Used to access boundaries for subsetting data by state, county, etc.
- _cat#
Parquet boundary catalog instance
- Type:
intake.catalog.Catalog
- _ca_counties#
Table of California county names and geometries Sorted by county name alphabetical order
- Type:
- _ca_watersheds#
Table of California watershed names and geometries Sorted by watershed name alphabetical order
- Type:
- _get_us_states(self)#
Returns a dict of state abbreviations and indices
- _get_ca_counties(self)#
Returns a dict of California counties and their indices
- _get_ca_watersheds(self)#
Returns a dict for CA watersheds and their indices
- _get_forecast_zones(self)#
Returns a dict for CA electricity demand forecast zones
- _get_ious_pous(self)#
Returns a dict for CA electric load serving entities IOUs & POUs
- _get_electric_balancing_areas(self)#
Returns a dict for CA Electric Balancing Authority Areas
- boundary_dict()#
Return a dict of the other boundary dicts, used to populate ck.Select.
This returns a dictionary of lookup dictionaries for each set of geoparquet files that the user might be choosing from. It is used to populate the DataParameters cached_area dynamically as the category in the area_subset parameter changes.
- Returns:
- load()#
Read parquet files and sets class attributes.
climakitae.core.data_export module#
- climakitae.core.data_export.export(data: DataArray | Dataset, filename: str = 'dataexport', format: str = 'NetCDF', mode: str = 'local')#
Save xarray data as NetCDF, Zarr, or CSV in the current working directory, or if Zarr optionally stream the export file to an AWS S3 scratch bucket and give download URL. NetCDF can only be written to the HUB user partition if it will fit. Zarr can either be written to the HUB user partition or to S3 scratch bucket using the mode option.
- Parameters:
data (
xr.DataArray | xr.Dataset) – Data to export, as output by e.g. DataParameters.retrieve().filename (
str, optional) – Output file name (without file extension, i.e. “my_filename” instead of “my_filename.nc”). The default is “dataexport”.format (
str, optional) – File format (“Zarr”, “NetCDF”, “CSV”). The default is “NetCDF”.mode (
str, optional) – Save location logic for Zarr file (“local”, “s3”). The default is “local”
- Returns:
- climakitae.core.data_export.remove_zarr(filename: str)#
Remove Zarr directory structure helper function. As Zarr format is a directory tree it is not easily removed using JupyterHUB GUI. This function simply deletes an entire directory tree.
- Parameters:
filename (
str) – Output Zarr file name (without file extension, i.e. “my_filename” instead of “my_filename.zarr”).
- climakitae.core.data_export.write_tmy_file(filename_to_export: str, df: DataFrame, years: Tuple[int, int], location_name: str, station_code: int, stn_lat: float, stn_lon: float, stn_state: str, stn_elev: float = 0.0, file_ext: str = 'tmy')#
Exports TMY data either as .epw or .tmy file
- Parameters:
filename_to_export (
str) – Filename string, constructed with station name and simulationdf (
DataFrame) – Dataframe of TMY data to exportyears (
Tuple[int,int]) – Tuple containing climatology start and end yearslocation_name (
str) – Location name string, often station namestation_code (
int) – Station codestn_lat (
float) – Station latitudestn_lon (
float) – Station longitudestn_state (
str) – State of station locationstn_elev (
float, optional) – Elevation of station, default is 0.0file_ext (
str, optional) – File extension for export, default is .tmy, options are “tmy” and “epw”
- Returns:
climakitae.core.data_interface module#
This module provides the core data interface to access climate data. It contains several key components:
1. VariableDescriptions: A singleton class to load and provide access to available climate variables. 2. DataInterface: A singleton class that manages connections to the data catalog, boundary data, and stations. 3. DataParameters: A parameterized class that handles data selection, filtering, and retrieval.
The module also includes several utility functions to: - Get available data options and subsetting options - Handle spatial subsetting by different boundaries (states, counties, watersheds, etc.) - Retrieve data with simplified parameter specification - Validate user inputs and provide helpful error messages - Convert between different naming conventions in the catalog
This interface serves as the foundation for both programmatic access to climate data and the interactive GUI selection interface.
- class climakitae.core.data_interface.DataParameters(*, _data_warning, _station_data_info, all_touched, approach, area_average, area_subset, cached_area, data_type, downscaling_method, enable_hidden_vars, extended_description, latitude, longitude, resolution, scenario_historical, scenario_ssp, simulation, stations, time_slice, timescale, units, variable, variable_id, variable_type, warming_level, warming_level_months, warming_level_window, name)#
Bases:
Parameterized- Python param object to hold data parameters for use in panel GUI.
Call DataParameters when you want to select and retrieve data from the climakitae data catalog without using the ckg.Select GUI. ckg.Select uses this class to store selections and retrieve data.
DataParameters calls DataInterface, a singleton class that makes the connection to the intake-esm data store in S3 bucket.
- unit_options_dictdict
options dictionary for converting unit to other units
- area_subsetstr
dataset to use from Boundaries for sub area selection
- cached_arealist of strs
one or more features from area_subset datasets to use for selection
- latitudetuple
latitude range of selection box
- longitudetuple
longitude range of selection box
- variable_typestr
toggle raw or derived variable selection
- default_variablestr
initial variable to have selected in widget
- time_slicetuple
year range to select
- resolutionstr
resolution of data to select (“3 km”, “9 km”, “45 km”)
- timescalestr
frequency of dataset (“hourly”, “daily”, “monthly”)
- scenario_historicallist of strs
historical scenario selections
- area_averagestr
whether to comput area average (“Yes”, “No”)
- downscaling_methodstr
whether to choose WRF or LOCA2 data or both (“Dynamical”, “Statistical”, “Dynamical+Statistical”)
- data_typestr
whether to choose gridded or station based data (“Gridded”, “Stations”)
- stationslist or strs
list of stations that can be filtered by cached_area
- _station_data_infostr
informational statement when station data selected with data_type
- scenario_ssplist of strs
list of future climate scenarios selected (availability depends on other params)
- simulationlist of strs
list of simulations (models) selected (availability depends on other params)
- variablestr
variable long display name
- unitsstr
unit abbreviation currently of the data (native or converted)
- enable_hidden_varsboolean
enable selection of variables that are hidden from the GUI?
- extended_descriptionstr
extended description of the data variable
- variable_idlist of strs
list of variable ids that match the variable (WRF and LOCA2 can have different codes for same type of variable)
- historical_climate_range_wrftuple
time range of historical WRF data
- historical_climate_range_locatuple
time range of historical LOCA2 data
- historical_climate_range_wrf_and_locatuple
time range of historical WRF and LOCA2 data combined
- historical_reconstruction_rangetuple
time range of historical reanalysis data
- ssp_rangetuple
time range of future scenario SSP data
- _info_about_station_datastr
warning message about station data
- _data_warningstr
warning about selecting unavailable data combination
- data_interfaceDataInterface
data connection singleton class that provides data
- _data_catalogintake_esm.source.ESMDataSource
shorthand alias to DataInterface.data_catalog
- _variable_descriptionspd.DataFrame
shorthand alias to DataInterface.variable_descriptions
- _stations_gdfgpd.GeoDataFrame
shorthand alias to DataInterface.stations_gdf
- _geographiesBoundaries
shorthand alias to DataInterface.geographies
- _geography_choosedict
shorthand alias to Boundaries.boundary_dict()
- _warming_level_timespd.DataFrame
shorthand alias to DataInterface.warming_level_times
- colormapstr
default colormap to render the currently selected data
- scenario_optionslist of strs
list of available scenarios (historical and ssp) for selection
- variable_options_dfpd.DataFrame
filtered variable descriptions for the downscaling_method and timescale
- warming_levelarray
global warming level(s)
- warming_level_windowinteger
years around Global Warming Level (+/-)
- (e.g. 15 means a 30yr window)
- approachstr, “Warming Level” or “Time”
how do you want the data to be retrieved?
- warming_level_monthsarray
months of year to use for computing warming levels default to entire calendar year: 1,2,3,4,5,6,7,8,9,10,11,12
- all_touchedboolean
spatial subset option for within or touching selection
- retrieve(config: str = None, merge: bool = True) DataArray | Dataset | List[DataArray]#
Retrieve data from catalog
By default, DataParameters determines the data retrieved. Grabs the data from the AWS S3 bucket, returns lazily loaded dask array. User-facing function that provides a wrapper for read_catalog_from_select.
- Returns:
data_return (
xr.DataArray | xr.Dataset | List[xr.DataArray]) – DataArray or Dataset object
- class climakitae.core.data_interface.DataInterface#
Bases:
objectLoad data connections into memory once
This is a singleton class called by the various Param classes to connect to the local data and to the intake data catalog and parquet boundary catalog. The class attributes are read only so that the data does not get changed accidentially.
- stations#
station locations pandas data frame
- Type:
gpd.DataFrame
- stations_gdf#
station locations geopandas data frame
- Type:
gpd.GeoDataFrame
- data_catalog#
intake ESM data catalog
- Type:
intake_esm.source.ESMDataSource
- boundary_catalog#
parquet boundary catalog
- Type:
intake.catalog.Catalog
- geographies#
boundary dictionaries class
- Type:
Boundaries
- warming_level_times#
table of when each simulation/scenario reaches each warming level
- Type:
- property boundary_catalog#
Get the boundary catalog
- property data_catalog#
Get the data catalog
- property geographies#
Get the geographies object
- property stations#
Get the stations dataframe
- property stations_gdf#
Get the stations geopandas dataframe
- property variable_descriptions#
Get the variable descriptions dataframe
- property warming_level_times#
Get the warming level times dataframe
- class climakitae.core.data_interface.VariableDescriptions#
Bases:
objectLoad Variable Desciptions CSV only once
This is a singleton class that needs to be called separately from DataInterface because variable descriptions are used without DataInterface in ck.view. Also ck.view is loaded on package load so this avoids loading boundary data when not needed.
- variable_descriptions#
pandas dataframe that stores available data variables usable with the package
- Type:
- load()#
Read the variable descriptions csv into class variable.
- climakitae.core.data_interface.get_data(variable: str, resolution: str, timescale: str, downscaling_method: str = 'Dynamical', data_type: str = 'Gridded', approach: str = 'Time', scenario: str | list[str] = None, units: str = None, warming_level: list[float] = None, area_subset: str = 'none', latitude: tuple[float, float] = None, longitude: tuple[float, float] = None, cached_area: list[str] = None, area_average: str = None, time_slice: tuple = None, stations: list[str] = None, warming_level_window: int = None, warming_level_months: list[int] = None, all_touched=False, enable_hidden_vars: bool = False, **kwargs) DataArray#
- Retrieve formatted data from the Analytics Engine data catalog using a simple function.
Contrasts with DataParameters().retrieve(), which retrieves data from the user inputs in climakitaegui’s selections GUI.
- variablestr
String name of climate variable
- resolutionstr, one of [“3 km”, “9 km”, “45 km”]
Resolution of data in kilometers
- timescalestr, one of [“hourly”, “daily”, “monthly”]
Temporal frequency of dataset
- downscaling_methodstr, one of [“Dynamical”, “Statistical”, “Dynamical+Statistical”], optional
Downscaling method of the data: WRF (“Dynamical”), LOCA2 (“Statistical”), or both “Dynamical+Statistical” Default to “Dynamical”
- data_typestr, one of [“Gridded”, “Stations”], optional
Whether to choose gridded data or weather station data Default to “Gridded”
- approachone of [“Time”, “Warming Level”], optional
Default to “Time”
- scenariostr or list of str, optional
SSP scenario [“SSP 3-7.0”, “SSP 2-4.5”,”SSP 5-8.5”] and/or historical data selection [“Historical Climate”, “Historical Reconstruction”] If approach = “Time”, you need to set a valid option If approach = “Warming Level”, scenario is ignored
- unitsstr, optional
Variable units. Defaults to native units of data
- area_subsetstr, optional
Area category: i.e “CA counties” Defaults to entire domain (“none”)
- cached_arealist, optional
Area: i.e “Alameda county” Defaults to entire domain ([“entire domain”])
- area_averageone of [“Yes”,”No”], optional
Take an average over spatial domain? Default to “No”.
- latitudeNone or tuple of float, optional
Tuple of valid latitude bounds Default to entire domain
- longitudeNone or tuple of float, optional
Tuple of valid longitude bounds Default to entire domain
- time_slicetuple, optional
Time range for retrieved data Only valid for approach = “Time”
- stationslist of str, optional
Which weather stations to retrieve data for Only valid for data_type = “Stations” Default to all stations
- warming_levellist of float, optional
Must be one of the warming levels available in clmakitae.core.constants Only valid for approach = “Warming Level” and data_type = “Stations”
- warming_level_windowint in range (5,25), optional
Years around Global Warming Level (+/-)
- (e.g. 15 means a 30yr window)
Only valid for approach = “Warming Level” and data_type = “Stations”
- warming_level_monthslist of int, optional
Months of year for which to perform warming level computation Default to all months in a year: [1,2,3,4,5,6,7,8,9,10,11,12] For example, you may want to set warming_level_months=[12,1,2] to perform the analysis for the winter season. Only valid for approach = “Warming Level” and data_type = “Stations”
- all_touchedboolean
spatial subset option for within or touching selection
- enable_hidden_varsboolean, optional
Return all variables, including the ones in which “show” is set to False? Default to False
- kwargsdict
Additional keyword arguments to pass to DataParameters()
data : xr.DataArray
Errors aren’t raised by the function. Rather, an appropriate informative message is printed, and the function returns None. This is due to the fact that the AE Jupyter Hub raises a strange Pieces Mismatch Error for some bad inputs; instead, that error is ignored and a more informative error message is printed instead.
- climakitae.core.data_interface.get_data_options(variable: str = None, downscaling_method: str = None, resolution: str = None, timescale: str = None, scenario: str | list[str] = None, tidy: bool = True, enable_hidden_vars: bool = False) DataFrame#
Get data options, in the same format as the Select GUI, given a set of possible inputs. Allows the user to access the data using the same language as the GUI, bypassing the sometimes unintuitive naming in the catalog. If no function inputs are provided, the function returns the entire AE catalog that is available via the Select GUI
- Parameters:
variable (
str, optional) – Default to Nonedownscaling_method (
str, optional) – Default to Noneresolution (
str, optional) – Default to Nonetimescale (
str, optional) – Default to Nonetidy (
boolean, optional) – Format the pandas dataframe? This creates a DataFrame with a MultiIndex that makes it easier to parse the options. Default to Trueenable_hidden_vars (
boolean, optional) – Return all variables, including the ones in which “show” is set to False? Default to False
- Returns:
cat_subset (
DataFrame) – Catalog options for user-provided inputs
- climakitae.core.data_interface.get_subsetting_options(area_subset: str = 'all') DataFrame#
Get all geometry options for spatial subsetting. Options match those in selections GUI
- Parameters:
area_subset (
str) – One of “all”, “states”, “CA counties”, “CA Electricity Demand Forecast Zones”, “CA watersheds”, “CA Electric Balancing Authority Areas”, “CA Electric Load Serving Entities (IOU & POU)”, “Stations” Defaults to “all”, which shows all the geometry options with area_subset as a multiindex- Returns:
geom_df (
DataFrame) – Geometry options Shows only options for one area_subset if input is provided that is not “all” i.e. if area_subset = “states”, only the options for states will be returned
climakitae.core.data_load module#
- climakitae.core.data_load.area_subset_geometry(selections: DataParameters) list[Polygon] | None#
Get geometry to perform area subsetting with.
- Parameters:
selections (
DataParameters) – object holding user’s selections- Returns:
ds_region (
shapely.geometry) – geometry to use for subsetting
- climakitae.core.data_load.load(xr_da: DataArray, progress_bar: bool = False) DataArray#
Read lazily loaded dask array into memory for faster access
- climakitae.core.data_load.read_catalog_from_select(selections: DataParameters) DataArray#
The primary and first data loading method, called by DataParameters.retrieve, it returns a DataArray (which can be quite large) containing everything requested by the user (which is stored in ‘selections’).
- Parameters:
selections (
DataParameters) – object holding user’s selections- Returns:
da (
DataArray) – output data
climakitae.core.paths module#
This module defines package level paths