Working with our Data#

Data from the Cal-Adapt Analytics Engine can be retrieved, subsetted, visualized, and exported using the climakitae library. Visit the Analytics Engine website to see more information about the various datasets available in our catalog.

Retrieve and subset the data#

In this section we will detail the various methods to retrieve and subset the catalog data.

Use the ck.Select() panel GUI#

If you are working in a Jupyter notebook environment, you can view and set your data and location options in the climakitae.Select() GUI (graphical user interface). This GUI also provides a visual overview of the various datasets available in the AE data catalog. Using this GUI, you can chose what dataset you’d like to retrieve– choosing a variable, timeslice, resolution, etc.– and the location for which you’d like to retrieve the data:

import climakitae as ck     # Import the package
selections = ck.Select()    # Initialize an Select object
selections.show()           # Display the GUI in the notebook.

After using the widgets (buttons, sliders, etc) in the GUI, you can retrieve the data with climakitae.Select.retrieve():

data = selections.retrieve()

Directly modifying the location and selections attributes#

The climakitae.Select() object stores the data selections information used to retrieve data. These attributes can be easily modified in the climakitae.Select() GUI (see above), but can also be directly modified in code. This is trickier than simply using the GUI, but can allow for better reproducability of notebooks.

For example, if you want to set the location to the LA Metro demand forecast zone, you would use the following code:

selections.area_subset = "CA Electricity Demand Forecast Zones"
selections.cached_area = "LA Metro"

To compute an area average over that entire region, you can modify the area_average attribute of the selectors object:

selections.area_average = "Yes"

To set the the variable to Air Temperature at 2m and retrieve the data in units of degrees Fahrenheit:

selections.variable = "Air Temperature at 2m"
selections.units = "degF"

Similarly, to set the model resolution, timescale, time slice, and scenario:

selections.scenario_ssp = "SSP 3-7.0 -- Business as Usual"
selections.scenario_historical = "Historical Climate"
selections.resolution = "9 km"
selections.time_slice = (2005, 2025)
selections.timescale = "hourly"

You must set these attributes using the formatting and naming conventions exactly as they appear in the climakitae.Select() GUI. For example, you must set timescale to hourly, not Hourly. Only scenario_ssp, scenario_historical, and time_slice accept multiple values.

Lastly, you’ll need to retrieve the data:

data = selections.retrieve()

Use a csv config file#

The climakitae.core.DataParams.retrieve() method can be used to retrieve data from a csv configuration file. To retrieve data using the settings in a configuration csv file, set config to the local filepath of the csv. Depending on the number of rows in the csv, different data types can be returned. If the csv has one row, a single xarray.DataArray object will be returned. If the csv has multiple rows, we assume you want to retrieve multiple datasets. Set the function argument merge to False to return a list of xarray.DataArray objects, or merge to True (the default value) to return a single xarray.Dataset object.

The csv file needs to be configured in a particular way in order for the function to properly read it in. The row values must match valid options in our data catalog, and the headers of the csv must be labelled exactly as they are in the following example:

variable

units

scenario_historical

scenario_ssp

area_average

timescale

resolution

time_slice

area_subset

cached_area

Air Temperature at 2m

degF

Historical Climate

SSP 3-7.0 – Business as Usual

Yes

hourly

9 km

(2005, 2025)

states

CA

Read the data into memory#

The data is retrieved as lazily loaded Dask arrays until you choose to read the data into memory. You’ll want to read your data into memory before plotting it, exporting it, or performing certain computations in order to optimize performance. To read the data into memory, use the climakitae.load() method.

data = selections.retrieve()
data = ck.load(data)

Create a quick visualization of the data#

Once you’ve retrieved the data and read it into memory, you can generate a quick visualization of the data using the climakitae.view() method. An appropriate visualization will be automatically generated depending on the dimensionality of the input data.

ck.view(data)

You can also set the colormap and size of the output visualization using the function arguments; see the documentation in the API for more information.

Export the data#

To save data as a file, use the climakitae.export() method and input your desired

  • data to export – an xarray.DataArray or xarray.Dataset object, as output by e.g. selections.retrieve()

  • output file name (without file extension)

  • file format (“NetCDF” or “CSV”)

We recommend NetCDF, which suits data and outputs from the Analytics Engine well – it efficiently stores large data containing multiple variables and dimensions. Metadata will be retained in NetCDF files.

CSV can also store Analytics Engine data with any number of variables and dimensions. It works the best for smaller data with fewer dimensions. The output file will be compressed to ensure efficient storage. Metadata will be preserved in a separate file.

CSV stores data in tabular format. Rows will be indexed by the index coordinate(s) of the DataArray or Dataset (e.g. scenario, simulation, time). Columns will be formed by the data variable(s) and non-index coordinate(s).

ck.export(data, "my_filename", "NetCDF")