climakitae.new_core package#
Subpackages#
- climakitae.new_core.data_access package
- Submodules
- climakitae.new_core.data_access.boundaries module
- Classes
BoundariesBoundaries._catBoundaries.PropertiesBoundaries._us_statesBoundaries._ca_countiesBoundaries._ca_watershedsBoundaries._ca_utilitiesBoundaries._ca_forecast_zonesBoundaries._ca_electric_balancing_areasBoundaries.boundary_dict()Boundaries.preload_all()Boundaries.clear_cache()Boundaries.validate_catalog()Boundaries.get_memory_usage()Boundaries.load()Boundaries.boundary_dict()Boundaries.clear_cache()Boundaries.get_memory_usage()Boundaries.load()Boundaries.preload_all()Boundaries.validate_catalog()
- climakitae.new_core.data_access.data_access module
- Classes
DataCatalogDataCatalog.catalog_keyDataCatalog.PropertiesDataCatalog.dataDataCatalog.boundaryDataCatalog.boundariesDataCatalog.renewablesDataCatalog.set_catalog_key()DataCatalog.set_catalog()DataCatalog.get_data()DataCatalog.__new__()DataCatalog.boundariesDataCatalog.boundaryDataCatalog.dataDataCatalog.get_data()DataCatalog.list_clip_boundaries()DataCatalog.merge_catalogs()DataCatalog.print_clip_boundaries()DataCatalog.renewablesDataCatalog.reset()DataCatalog.set_catalog()DataCatalog.set_catalog_key()
- Module contents
- climakitae.new_core.param_validation package
- Submodules
- climakitae.new_core.param_validation.abc_param_validation module
- Classes
- Functions
- Module Variables
ParameterValidatorParameterValidator.catalog_pathParameterValidator.catalogParameterValidator.all_catalog_keysParameterValidator.catalog_dfParameterValidator.is_valid_query()ParameterValidator.populate_catalog_keys()ParameterValidator.load_catalog_df()ParameterValidator.is_valid_query()ParameterValidator.load_catalog_df()ParameterValidator.populate_catalog_keys()
register_catalog_validator()register_processor_validator()
- climakitae.new_core.param_validation.concat_param_validator module
- climakitae.new_core.param_validation.data_param_validator module
- climakitae.new_core.param_validation.filter_unadjusted_models_param_validator module
- climakitae.new_core.param_validation.param_validation_tools module
- climakitae.new_core.param_validation.renewables_param_validator module
- climakitae.new_core.param_validation.update_attributes_param_validator module
- Module contents
- climakitae.new_core.processors package
- Submodules
- climakitae.new_core.processors.abc_data_processor module
- climakitae.new_core.processors.concatenate module
- climakitae.new_core.processors.filter_unadjusted_models module
FilterUnAdjustedModelsFilterUnAdjustedModels.execute()FilterUnAdjustedModels.update_context()FilterUnAdjustedModels.set_data_accessor()FilterUnAdjustedModels._contains_unadjusted_models()FilterUnAdjustedModels._remove_unadjusted_models()FilterUnAdjustedModels.execute()FilterUnAdjustedModels.set_data_accessor()FilterUnAdjustedModels.update_context()
- climakitae.new_core.processors.processor_utils module
- climakitae.new_core.processors.template module
- climakitae.new_core.processors.update_attributes module
- Module contents
ClipConcatFilterUnAdjustedModelsFilterUnAdjustedModels.execute()FilterUnAdjustedModels.update_context()FilterUnAdjustedModels.set_data_accessor()FilterUnAdjustedModels._contains_unadjusted_models()FilterUnAdjustedModels._remove_unadjusted_models()FilterUnAdjustedModels.execute()FilterUnAdjustedModels.set_data_accessor()FilterUnAdjustedModels.update_context()
TimeSliceUpdateAttributesWarmingLevel
Submodules#
climakitae.new_core.dataset module#
Dataset Processing Pipeline Module
This module provides the core Dataset class that implements a flexible, pipeline-based approach for climate data processing. The Dataset class serves as a central orchestrator that coordinates data access, parameter validation, and a series of processing steps.
Classes#
- Dataset
A pipeline-based data processing class that supports method chaining for building complex data workflows.
Key Features#
Pipeline Architecture: Execute sequential processing steps on climate data
Method Chaining: Fluent interface for building complex data workflows
Parameter Validation: Integrated validation system for query parameters
Data Access Integration: Pluggable data catalog system for various data sources
Error Handling: Comprehensive error handling with meaningful error messages
Usage Example#
```python from climakitae.new_core.dataset import Dataset from climakitae.new_core.data_access import DataCatalog from climakitae.new_core.param_validation import ParameterValidator from climakitae.new_core.processors import TimeSliceProcessor, ClipProcessor
# Create a dataset processing pipeline dataset = (Dataset()
.with_catalog(my_data_catalog) .with_param_validator(my_validator) .with_processing_step(TimeSliceProcessor(“2010-01-01”, “2020-12-31”)) .with_processing_step(ClipProcessor(bounds=((32, 42), (-125, -115))))
)
# Execute the pipeline result = dataset.execute({“variable”: “temperature”, “grid_label”: “d03”}) ```
Pipeline Processing#
The Dataset class executes processing in the following order:
Parameter Validation: Validates input parameters using the configured validator
Data Access: Retrieves raw data using the configured data catalog
Processing Steps: Applies each processing step in sequence
Result Return: Returns the final processed xarray.Dataset
Each processing step receives the output of the previous step, allowing for complex data transformations and filtering operations.
Error Handling#
The class provides comprehensive error handling:
TypeError: Raised for incorrect component types (validators, catalogs, processors)
ValueError: Raised for missing required components
AttributeError: Raised for components missing required methods
RuntimeError: Raised for pipeline execution failures
Notes
Processing steps are executed in the order they are added to the pipeline
The context dictionary is passed through all processing steps and may be modified
Steps that require data access can set needs_catalog = True to receive the data accessor
Validation failures return an empty xarray.Dataset rather than raising exceptions
- class climakitae.new_core.dataset.Dataset#
Bases:
objectA pipeline-based data processing class for climate data workflows.
The Dataset class serves as a central orchestrator that coordinates data access, parameter validation, and sequential processing steps. It implements a fluent interface pattern allowing method chaining for building complex data workflows.
- Parameters:
None
- data_access#
The data catalog instance used for retrieving raw data from various sources.
- Type:
DataCatalogorUNSET
- parameter_validator#
The parameter validator instance used for validating query parameters.
- Type:
ParameterValidatororUNSET
- processing_pipeline#
A list of processing steps to be executed sequentially on the data.
- Type:
listofDataProcessororUNSET
- execute(parameters=UNSET)#
Execute the complete data processing pipeline and return the result.
- with_param_validator(parameter_validator)#
Set the parameter validator for the dataset (method chaining).
- with_catalog(catalog)#
Set the data catalog for the dataset (method chaining).
- with_processing_step(step)#
Add a processing step to the pipeline (method chaining).
- Raises:
TypeError – If provided components don’t match expected types or lack required methods.
ValueError – If required components are missing during execution.
RuntimeError – If the processing pipeline encounters execution errors.
Notes
Processing steps are executed in the order they are added to the pipeline
The context dictionary is passed through all processing steps and may be modified
Steps that require data access can set needs_catalog = True to receive the data accessor
Validation failures return an empty xarray.Dataset rather than raising exceptions
All components (validator, catalog, processors) must implement their respective interfaces
See also
DataCatalogInterface for data access components
ParameterValidatorInterface for parameter validation components
DataProcessorInterface for data processing components
- execute(parameters: ~typing.Dict[str, ~typing.Any] = <object object>) Dataset#
Execute the dataset processing pipeline.
- Parameters:
parameters (
Dict[str,Any], optional) – Parameters to pass to the processing pipeline- Returns:
Dataset– Result of the processing pipeline
- with_catalog(catalog: DataCatalog) Dataset#
Set a new data catalog.
- Parameters:
catalog (
DataCatalog) – Data catalog to set for the dataset.- Returns:
Dataset– The current instance of Dataset allowing method chaining.- Raises:
TypeError – If the catalog is not an instance of DataCatalog.
AttributeError – If the catalog does not have a ‘get_data’ method.
TypeError – If the ‘get_data’ method is not callable.
- with_param_validator(parameter_validator: ParameterValidator) Dataset#
Set a new parameter validator.
- with_processing_step(step: DataProcessor) Dataset#
Add a new processing step to the pipeline.
- Parameters:
step (
DataProcessor) – Processing step to add to the pipeline. Must have ‘execute’ and ‘update_context’ methods.- Returns:
Dataset– The current instance of Dataset allowing method chaining.- Raises:
TypeError – If the step is not an instance of DataProcessor.
AttributeError – If the step does not have ‘execute’, ‘update_context’, or ‘set_data_accessor’ methods.
TypeError – If the step is not callable.
climakitae.new_core.dataset_factory module#
DatasetFactory Module.
This module provides a factory class for creating climate data processing components and complete datasets with appropriate validation and processing pipelines. It serves as the central orchestrator for constructing validators, processors, and data access objects based on data type, analytical approach, and user requirements.
The factory pattern implemented here simplifies the instantiation of complex component combinations while maintaining flexibility for different climate data scenarios including gridded versus station-based observations, time-based versus warming-level analysis approaches, and different data catalogs and processing requirements.
Key Features#
Dynamic component registration and discovery
Automatic processing pipeline construction
Catalog-based data source management
Extensible validator and processor registries
See also
climakitae.new_core.dataset.DatasetDataset container class
climakitae.new_core.data_access.DataCatalogData catalog management
climakitae.new_core.param_validation.abc_param_validatorParameter validation framework
climakitae.new_core.processors.abc_data_processorData processing framework
Notes
This module follows the factory design pattern to encapsulate the complex logic of creating appropriate combinations of data access, validation, and processing components based on user queries from the ClimateData UI.
- class climakitae.new_core.dataset_factory.DatasetFactory#
Bases:
objectFactory for creating Dataset objects with appropriate catalogs, validators, and processors.
This factory translates UI queries from the ClimateData interface into fully configured Dataset objects with the correct combination of data catalogs for accessing climate data, parameter validators for query validation, and processing steps for data transformation.
The factory uses registries to maintain extensible collections of components and automatically determines the appropriate combination based on query parameters.
- Parameters:
catalog_path (
str, optional) – Path to the catalog configuration CSV file. Default is ‘climakitae/data/catalogs.csv’.
- _catalog_df#
DataFrame containing catalog metadata loaded from CSV.
- Type:
- _processing_step_registry#
Registry mapping processing step names to DataProcessor classes.
- Type:
- register_catalog(key, catalog)#
Register a data catalog with the factory.
- register_validator(key, validator_class)#
Register a parameter validator with the factory.
- register_processing_step(step_type, step_class)#
Register a processing step with the factory.
- create_validator(val_reg_key)#
Create a parameter validator based on registry key.
- create_dataset(ui_query)#
Create a Dataset based on a UI query from ClimateData.
- get_catalog_options(key, query=None)#
Get available options for a specific catalog.
- get_validators()#
Get a list of available validators.
- get_processors()#
Get a list of available processors.
Examples
Creating a basic dataset:
>>> factory = DatasetFactory() >>> query = {'data_type': 'gridded', 'variable': 'precipitation'} >>> dataset = factory.create_dataset(query)
Registering custom components:
>>> factory = DatasetFactory() >>> factory.register_validator('custom_type', CustomValidator) >>> factory.register_processing_step('custom_process', CustomProcessor)
Notes
The factory automatically handles the selection of appropriate processing steps based on the query parameters. Some processing steps are mandatory and will be added automatically even if not explicitly requested.
See also
DatasetThe main dataset container class
DataCatalogData access abstraction
ParameterValidatorBase class for parameter validation
DataProcessorBase class for data processing steps
- create_dataset(ui_query: Dict[str, Any]) Dataset#
Create a Dataset based on a UI query from ClimateData.
This method orchestrates the creation of a complete Dataset by: 1. Determining the appropriate catalog based on query parameters 2. Creating and configuring the parameter validator 3. Adding the necessary processing steps in the correct order
- Parameters:
ui_query (
dict) – Query dictionary from ClimateData UI containing at minimum: - ‘data_type’ : str, type of climate data - Additional keys depend on the specific data type and analysis- Returns:
Dataset– Properly configured Dataset instance ready for data retrieval and processing.- Raises:
ValueError – If required query parameters are missing, invalid, or if no appropriate catalog can be determined.
RuntimeError – If dataset creation fails due to internal errors.
Notes
The method automatically adds mandatory processing steps such as concatenation and attribute updates even if not specified in the query.
Processing steps are applied in priority order, with preprocessing steps (like bias correction) applied before postprocessing steps.
See also
DatasetThe returned dataset class
create_validatorMethod for creating parameter validators
- create_validator(val_reg_key: str) ParameterValidator#
Create a parameter validator based on data_type and approach.
- Parameters:
val_reg_key (
str) – Key for the validator (data_type_approach)- Returns:
ParameterValidator– An appropriate parameter validator- Raises:
ValueError – If no validator is registered for the given combination
- get_boundaries(boundary_type: str) List[str]#
Get a list of available boundary datasets.
- Parameters:
boundary_type (
str) – The type of boundary datasets to retrieve. If the type is not found in the cache, returns all available boundary types.- Returns:
List[str]– List of available boundary datasets for the specified type, or all available boundary types if the specified type is not found.
- get_catalog_options(key: str, query: dict[str, ~typing.Any] | object = <object object>) List[str]#
Get available options for a specific catalog.
- Parameters:
- Returns:
List[str]– List of available options for the specified catalog.
- get_processors() List[str]#
Get a list of available processors.
- Returns:
List[str]– List of available processors.
- get_stations() List[str]#
Get a list of available station datasets.
- Returns:
List[str]– List of available station datasets.
- get_validators() List[str]#
Get a list of available validators.
- Returns:
List[str]– List of available validators.
- register_catalog(key: str, catalog: DataCatalog)#
Register a data catalog with the factory.
- Parameters:
key (
str) – Identifier for the catalog. Should correspond to data_type, installation, or other distinguishing characteristics.catalog (
DataCatalog) – Catalog implementation to register for the given key.
- Raises:
TypeError – If catalog is not an instance of DataCatalog.
ValueError – If key is empty or None.
Examples
>>> factory = DatasetFactory() >>> custom_catalog = DataCatalog() >>> factory.register_catalog('wind_data', custom_catalog)
See also
DataCatalogBase catalog class
- register_processing_step(step_type: str, step_class)#
Register a processing step with the factory.
- Parameters:
step_type (
str) – Identifier for the processing stepstep_class (
class) – Processing step class to register
- register_validator(key: str, validator_class: Type[ParameterValidator])#
Register a parameter validator with the factory.
- Parameters:
key (
str) – Identifier for the validator (approach, data_type combination)validator_class (
Type[ParameterValidator]) – Validator class to register
- reset()#
Reset the factory state, clearing all registered catalogs, validators, and processors.
This method is useful for reinitializing the factory without creating a new instance.
climakitae.new_core.user_interface module#
Climate Data Interface Module for Accessing Climate Data.
This module provides a high-level interface for accessing climate data through the ClimateData class. It implements a fluent interface pattern that allows users to chain method calls to configure data queries.
The module facilitates retrieving climate data with various parameters such as catalogs, installations, activities, institutions, sources, experiments, variables, and processing options. It implements a factory pattern for creating appropriate datasets and validators based on specified parameters.
- Example Usage:
>>> data = ClimateData() >>> result = (data.catalog("renewables") ... .installation("pv_utility") ... .activity_id("CMIP6") ... .variable("tasmax") ... .table_id("day") ... .grid_label("d03") ... .get())
- class climakitae.new_core.user_interface.ClimateData#
Bases:
objectA fluent interface for accessing climate data.
This class provides a chainable interface for setting parameters and retrieving climate data. It uses a factory pattern to create datasets and validators based on the specified parameters. The class is designed to be chainable, allowing users to set multiple parameters in a single expression.
The interface supports various climate data sources and allows for flexible querying with different combinations of parameters. All methods return the instance itself to enable method chaining.
Parameters supported in queries: - catalog: The data catalog to use (e.g., “renewable energy generation”, “cadcat”) - installation: The installation type (e.g., “pv_utility”, “wind_offshore”) - activity_id: The activity identifier (e.g., “WRF”, “LOCA2”) - institution_id: The institution identifier (e.g., “CNRM”, “DWD”) - source_id: The source identifier (e.g., “GCM”, “RCM”, “Station”) - experiment_id: The experiment identifier (e.g., “historical”, “ssp245”) - table_id: The temporal resolution (e.g., “1hr”, “day”, “mon”) - grid_label: The spatial resolution (e.g., “d01”, “d02”, “d03”) - variable_id: The climate variable (e.g., “tasmax”, “pr”, “cf”) - processes: Dictionary of data processing operations to apply
- catalog(catalog: str) ClimateData#
Set the data catalog to use.
- installation(installation: str) ClimateData#
Set the installation type.
- activity_id(activity_id: str) ClimateData#
Set the activity identifier.
- institution_id(institution_id: str) ClimateData#
Set the institution identifier.
- source_id(source_id: str) ClimateData#
Set the source identifier.
- experiment_id(experiment_id: str | list[str]) ClimateData#
Set the experiment identifier(s).
- table_id(table_id: str) ClimateData#
Set the temporal resolution.
- grid_label(grid_label: str) ClimateData#
Set the spatial resolution.
- variable(variable: str) ClimateData#
Set the climate variable to retrieve.
- processes(processes: Dict[str, str | Iterable]) ClimateData#
Set processing operations to apply to the data.
- Utility methods for exploring available options:
- show_*_options() methods display available values for each parameter.
- show_query() displays the current query configuration.
- show_all_options() displays all available options for exploration.
- Returns:
DataArrayorNone– The retrieved climate data as a lazy-loaded xarray DataArray, or None if the query fails or required parameters are missing.- Raises:
ValueError – If required parameters are missing or invalid during validation.
Exception – If there is an error during data retrieval or processing.
Examples
Basic usage with method chaining:
>>> cd = ClimateData() >>> data = (cd ... .catalog("cadcat") ... .activity_id("WRF") ... .experiment_id("historical") ... .table_id("1hr") ... .grid_label("d02") ... .variable("prec") ... .get() ... )
Exploring available options:
>>> cd = ClimateData() >>> cd.show_catalog_options() >>> cd.catalog("cadcat").show_variable_options()
Using with processing:
>>> processes = {"spatial_avg": "region", "temporal_avg": "monthly"} >>> data = (ClimateData() ... .catalog("climate") ... .variable("pr") ... .processes(processes) ... .get())
- activity_id(activity_id: str) ClimateData#
Set the activity identifier for the query.
- Parameters:
activity_id (
str) – The activity ID (e.g., “CMIP6”, “CORDEX”).- Returns:
ClimateData– The current instance for method chaining.
- catalog(catalog: str) ClimateData#
Set the data catalog to use for the query.
- Parameters:
catalog (
str) – The name of the catalog (e.g., “renewables”, “climate”).- Returns:
ClimateData– The current instance for method chaining.
- copy_query() Dict[str, Any]#
Get a copy of the current query parameters.
- Returns:
Dict[str,Any]– A copy of the current query parameters.
- experiment_id(experiment_id: str | list[str]) ClimateData#
Set the experiment identifier for the query.
- Parameters:
experiment_id (
str) – The experiment ID (e.g., “historical”, “ssp245”).- Returns:
ClimateData– The current instance for method chaining.
- get() Any | None#
Execute the configured query and retrieve climate data.
Validates required parameters, creates the appropriate dataset using the factory pattern, executes the query, and resets the query state for the next use.
- Returns:
Optional[xr.DataArray]– The retrieved climate data as a lazy-loaded xarray DataArray, or None if the query fails or validation errors occur.- Raises:
ValueError – If required parameters are missing during validation.
Exception – If there are errors during dataset creation or execution.
- grid_label(grid_label: str) ClimateData#
Set the spatial resolution identifier for the query.
- Parameters:
grid_label (
str) – The spatial resolution (e.g., “d01”, “d02”, “d03”).- Returns:
ClimateData– The current instance for method chaining.
- installation(installation: str) ClimateData#
Set the installation type for the query.
- Parameters:
installation (
str) – The installation type (e.g., “pv_utility”, “wind_offshore”).- Returns:
ClimateData– The current instance for method chaining.
- institution_id(institution_id: str) ClimateData#
Set the institution identifier for the query.
- Parameters:
institution_id (
str) – The institution ID (e.g., “CNRM”, “DWD”).- Returns:
ClimateData– The current instance for method chaining.
- load_query(query_params: Dict[str, Any]) ClimateData#
Load query parameters from a dictionary.
- Parameters:
query_params (
Dict[str,Any]) – Dictionary of query parameters to load.- Returns:
ClimateData– The current instance with loaded parameters.
- processes(processes: Dict[str, str | Iterable]) ClimateData#
Set processing operations to apply to the retrieved data.
- Parameters:
processes (
Dict[str,Union[str,Iterable]]) – A dictionary of processing operations and their parameters.- Returns:
ClimateData– The current instance for method chaining.
- reset() ClimateData#
Manually reset the query parameters.
- Returns:
ClimateData– The current instance with reset parameters.
- source_id(source_id: str) ClimateData#
Set the source identifier for the query.
- Parameters:
source_id (
str) – The source ID (e.g., “GCM”, “RCM”, “Station”).- Returns:
ClimateData– The current instance for method chaining.
- table_id(table_id: str) ClimateData#
Set the temporal resolution identifier for the query.
- Parameters:
table_id (
str) – The temporal resolution (e.g., “1hr”, “day”, “mon”).- Returns:
ClimateData– The current instance for method chaining.
- variable(variable: str) ClimateData#
Set the climate variable to retrieve.
- Parameters:
variable (
str) – The variable identifier (e.g., “tasmax”, “pr”, “cf”).- Returns:
ClimateData– The current instance for method chaining.