Source code for energy_analysis_toolbox.thermosensitivity.thermosensitivity

r"""Process the thermosensitivity data.

# Available classes

## ThermoSensitivity

Class to compute the thermosensitivity of a building.
Needs a time series of energy consumption and outdoor temperature.

## CategoricalThermoSensitivity

Class to compute the thermosensitivity of a building with labeled periods.
Needs a time series of energy consumption, outdoor temperature, and labeled periods.

The labeled periods are resampled to the same frequency as the energy and temperature
data by taking the most common category in the period.

Currently, the class only calibrates one base temperature for all the categories
aggregated.

# Implementation details

## Resampling frequency

The energy and temperature data are resampled at a given frequency.
The degree days are computed at the same frequency.

## Thermo Sensitivity

The thermo-sensitivity is modelled as a linear regression between the energy
consumption and the degree days.

    .. math::

        E ~ E0 + TS \times DegreeDays

The degree days are computed from the temperature data and the base temperature.

    .. math::

        DegreeDays = \\int max(0, BaseTemperature - T(t)) dt

Different methods are available to compute the degree days:

- Integral: sum the difference between the base temperature and the temperature.
    .. math::
        DegreeDays = \\sum_{t=0}^N max(0, BaseTemperature - T(t))

- Mean: sum the difference between the base temperature and the mean temperature.
    .. math::
        DegreeDays = max(0, BaseTemperature - \\bar{T} )

- MinMax: sum the difference between the base temperature and the mean temperature
  computed as the mean of the minimum and maximum temperature.

    .. math::
        DegreeDays = max(0, BaseTemperature - \\frac{T_{min} + T_{max}}{2} )

See the `dd_compute` function in the `energy_analysis_toolbox.weather.degree_days`
module.

Over a long period, the data can present a thermosensitivity with different types of
degree days:

- Heating: the energy consumption increases when the temperature decreases. Usually,
  the base temperature is around 18°C.
- Cooling: the energy consumption increases when the temperature increases. Usually,
  the base temperature is around 24°C.

## Auto-calibration

Two aspects of the thermosensitivity can be automatically detected:

- The degree days type: heating, cooling, or both.
- The base temperature.

### Degree days type

Each building, depending of the installed systems, can have different thermosensitivity
types :
- use of heating systems (heating degree days, during the winter)
- use of cooling systems (cooling degree days, during the summer)
- use of both systems (heating and cooling degree days)

The degree days type can be automatically detected by setting the `degree_days_type`
parameter to ``"auto"``. The method will compute the Spearman correlation between the
energy and the temperature for the periods with the mean temperature below and above
the intersaison mean temperature (default is 20°C).

### Base temperature

The base temperature can be calibrated by minimizing the mean squared error between the
data and the model.

Each degree days type has a specific base temperature that is determined by analyzing
the data over the corresponding periods. The heating (resp. cooling) base temperature
is calibrated by minimizing the mean squared error between the energy and the heating
(resp. cooling) degree days for the periods with the mean temperature below (resp.
above) the intersaison mean temperature.

The optimization is done with the `scipy.optimize.minimize_scalar` function with the
`bounded` method.

"""

import logging
from functools import cached_property
from typing import TYPE_CHECKING, Literal, TypeVar, cast

import numpy as np
import pandas as pd
import statsmodels
from scipy.optimize import minimize_scalar
from scipy.stats import spearmanr
from statsmodels.api import OLS

from energy_analysis_toolbox.energy.resample import to_freq as energy_to_freq
from energy_analysis_toolbox.logger import init_logging
from energy_analysis_toolbox.weather.degree_days import (
    dd_compute,
    dd_types,
    literal_computation_dd_types,
    literal_dd_types,
    literal_valid_dd_types,
)

if TYPE_CHECKING:
    from pandas.core.resample import Resampler


ThermosensitivityInstance = TypeVar(
    "ThermosensitivityInstance",
    bound="ThermoSensitivity",
)



[docs]
class ThermoSensitivity:
    """Class to compute the thermosensitivity of a building.

    Examples
    --------
    >>> from energy_analysis_toolbox.thermosensitivity import ThermoSensitivity
    >>> import pandas as pd
    >>> import numpy as np
    >>> np.random.seed(0)
    >>> temperature_data = 15 + 2*pd.Series(np.random.rand(366),
    ... index=pd.date_range("2020-01-01", "2020-12-31", freq="D"))
    >>> energy_data = 10 + (16 - temperature_data).clip(0) * 5 + np.random.rand(366)
    >>> ts = ThermoSensitivity(energy_data, temperature_data, degree_days_type="auto")
    >>> ts.fit()
    >>> ts
    ThermoSensitivity(frequency=1D,
            degree_days_type=heating,
            degree_days_base_temperature={"heating": 15.98},
            degree_days_computation_method="integral",
            interseason_mean_temperature=20)
    <BLANKLINE>
                                OLS Regression Results
    ==============================================================================
    Dep. Variable:                 energy   R-squared:                       0.969
    Model:                            OLS   Adj. R-squared:                  0.969
    No. Observations:                 366   F-statistic:                 1.137e+04
    Covariance Type:            nonrobust   Prob (F-statistic):          1.25e-276
    =======================================================================================
                              coef    std err          t      P>|t|      [0.025   \
                                  0.975]
    ---------------------------------------------------------------------------------------
    heating_degree_days     5.1177      0.048    106.638      0.000       5.023    \
        5.212
    Intercept              10.5120      0.019    539.733      0.000      10.474    \
        10.550
    =======================================================================================
    <BLANKLINE>
    Notes:
    [1] Standard Errors assume that the covariance matrix of the errors is correctly
    specified.

    As you can see from the example above:

    - the type of thermosensitivity is automatically detected (heating in this case)
    - the base temperature is calibrated to 15.98 (true value is 16)
    - the model is fitted with a R-squared of 0.969
    - the heating degree days coefficient is 5.1177 (true value is 5)
    - the intercept is 10.5120 (true value is 10)

    """

    target_name = "energy"
    temperature_name = "temperature"

    def __init__(
        self,
        energy_data: pd.Series,
        temperature_data: pd.Series,
        frequency: str = "1D",
        degree_days_type: literal_dd_types = "heating",
        degree_days_base_temperature: dict | None = None,
        degree_days_computation_method: literal_computation_dd_types = "integral",
        interseason_mean_temperature: float = 20,
        base_logger_name: str | None = None,
        min_logger_level_stdout: int | str = logging.ERROR,
    ) -> None:
        """Initialize a ``ThermoSensitivity`` instance.

        Parameters
        ----------
        energy_data : pd.Series
            Time series of energy consumption data for the building.
        temperature_data : pd.Series
            Time series of outdoor temperature data.
        frequency : str, optional
            Frequency for resampling the data (default is "1D").
            Options include:
            - "1D": daily resampling
            - "7D": weekly resampling
            - "1W-MON": weekly resampling starting on Monday.
        degree_days_type : str, optional
            Type of degree days to compute (default is "heating").
            Options are:
            - "heating": heating degree days.
            - "cooling": cooling degree days.
            - "both": both heating and cooling degree days.
            - "auto": automatically detect the degree days type.
        degree_days_base_temperature : dict, optional
            Base temperature(s) for degree day calculations (default is an empty dict).
            Should include keys "heating" and/or "cooling".
        degree_days_computation_method : str, optional
            Method to compute degree days (default is "integral").
            Options are:
            - "integral": integral calculation.
            - "mean": mean temperature calculation.
            - "min_max": min-max temperature calculation.
            - "pro": energy-professionals calculation.
        interseason_mean_temperature : float, optional
            Mean temperature to differentiate heating and cooling periods (default 20).
        base_logger_name : str, optional
            Name of the logger. By default, it is the class name. All following
            following instances receive a unique identifier, based on the first
            one, with the pattern:

            - ``<base_logger_name>``
            - ``<base_logger_name>_1``
            - ...

        min_logger_level_stdout: str, int, optional
            Minimum logger level below which no message is transferred to stdout
            (i.e. not printed). Default is ``"ERROR"``.

        Raises
        ------
        ValueError
            If the degree days type is not valid or if required parameters are missing.

        Example
        -------
        >>> ts = ThermoSensitivity(energy_data, temperature_data,
            degree_days_type="auto")

        """
        self._init_logger(
            base_logger_name=base_logger_name,
            min_logger_level_stdout=min_logger_level_stdout,
        )
        self._energy_data = energy_data
        self._temperature_data = temperature_data
        self._frequency = frequency
        self._aggregated_data: pd.DataFrame | None = None
        self.degree_days_type = degree_days_type
        self.degree_days_base_temperature = degree_days_base_temperature or {}
        self.degree_days_computation_method = degree_days_computation_method
        self.interseason_mean_temperature = interseason_mean_temperature
        self.predictors: list[str] = []
        self._model = None
        self._validate_data()
        self._post_init()

    def _init_logger(
        self,
        base_logger_name: str | None,
        min_logger_level_stdout: str | int = logging.ERROR,
    ) -> None:
        """Initialize the logger for the current instance of the class.

        This method configures a logger specific to the instance of the class. If a
        `base_logger_name` is provided, it will be used as the logger's name; otherwise,
        the class name will be used. The logger is set up to manage output levels,
        allowing only messages with a severity level above the specified minimum to be
        printed to standard output.

        Parameters
        ----------
        base_logger_name : str, optional
            The name of the logger. If not provided, the logger will default to the
            name of the class. Subsequent instances will receive unique logger names
            by appending an incrementing counter to the base name.

        min_logger_level_stdout : str or int, optional
            The minimum severity level for messages to be printed to standard output.
            Messages below this level will not be displayed. The default is
            ``"ERROR"``, meaning only errors and critical messages will
            be printed. Acceptable values can be specified as strings (e.g.,
            "DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL") or as corresponding
            integer constants from the `logging` module. If you want to have details
            on what's going on in the code, set this to ``"INFO"`` or ``"WARNING"``.

        Returns
        -------
        None
            This method does not return a value; it initializes the logger instance
            for use in logging messages throughout the class.

        Example
        -------
        >>> ts = ThermoSensitivity(energy_data, temperature_data)
        >>> ts._init_logger(base_logger_name="my_custom_logger",
        ... min_logger_level_stdout=logging.ERROR)

        Notes
        -----
        - Ensuring unique logger names prevents conflicts when multiple instances of
          the class are used simultaneously.
        - Log messages can be directed to different outputs (e.g., console, file) based
          on logger configuration.

        """
        if not base_logger_name:
            base_logger_name = self.__class__.__name__
        self.logger = init_logging(
            base_logger_name=base_logger_name,
            min_logger_level_stdout=min_logger_level_stdout,
        )

    @property
    def frequency(
        self,
    ) -> str:
        """The frequency of the resampled data.

        The property is unmutable. To change the frequency, create a new object.
        """
        return self._frequency

    @property
    def energy_data(
        self,
    ) -> pd.Series:
        """The energy data of the building.

        The property is unmutable. To change the energy data, create a new object.
        """
        return self._energy_data

    @property
    def temperature_data(
        self,
    ) -> pd.Series:
        """The outdoor temperature data.

        The property is unmutable. To change the temperature data, create a new object.
        """
        return self._temperature_data

    @cached_property
    def resampled_energy(
        self,
    ) -> pd.Series:
        """The energy data resampled at the given frequency.

        Uses the `to_freq` function from the `energy_analysis_toolbox.energy.resample`
        module to convert the energy data to the desired frequency.

        This property is cached to avoid recomputing it multiple times.
        """
        energy = self.energy_data.copy()
        last_period = energy.index[-1] - energy.index[-2]
        energy[energy.index[-1] + last_period] = 0
        return energy_to_freq(
            energy,
            self.frequency,
        ).rename(self.target_name)

    @cached_property
    def resampled_temperature(
        self,
    ) -> pd.Series:
        """The temperature data resampled at the given frequency.

        Average the temperature data over the given frequency.

        This property is cached to avoid recomputing it multiple times.
        """
        return (
            self.temperature_data.resample(self.frequency)
            .mean()
            .rename(self.temperature_name)
        )

    @cached_property
    def resampled_energy_temperature(
        self,
    ) -> pd.DataFrame:
        """The resampled energy and temperature data.

        The DataFrame contains the resampled energy and temperature data.
        Periods with missing values are removed.
        """
        return pd.concat(
            [self.resampled_energy, self.resampled_temperature],
            axis=1,
        ).dropna(how="any", axis=0)

    @property
    def model(
        self,
    ) -> statsmodels.regression.linear_model.RegressionResults:
        """The thermosensitivity model.

        A `statsmodels.regression.linear_model.RegressionResults` object.

        Raises
        ------
        ValueError
            If the model is not fitted. Use the `fit` method to train the model.

        """
        if self._model is None:
            err = "Model not fitted. Please run the `fit` method."
            raise ValueError(err)
        return self._model

    @property
    def aggregated_data(
        self,
    ) -> pd.DataFrame:
        """The aggregated data used to fit the model.

        The Data is a DataFrame resampled at the provided Frequency with the following
        columns:
        - "energy": the total energy at the frequency.
        - "temperature": the mean temperature at the frequency.
        - (Optional) "heating_degree_days": the heating degree days at the frequency.
        - (Optional) "cooling_degree_days": the cooling degree days at the frequency.

        Raises
        ------
        ValueError
            If the data is not aggregated. Use the `fit` method to aggregate the data.

        """
        if self._aggregated_data is None:
            err = "Data not aggregated. Please run the `fit` method."
            raise ValueError(err)
        return self._aggregated_data

    @aggregated_data.setter
    def aggregated_data(
        self,
        value: pd.DataFrame,
    ) -> None:
        """Set the aggregated data, and reset the model."""
        self._aggregated_data = value
        self._model = None

    def _validate_data(
        self,
    ) -> None:
        """Check the validity of the parameters.

        Raises
        ------
        ValueError
            If the degree days type is not valid.
        ValueError
            If the base temperature is not specified for the heating or cooling degree
            days. While not empty.

        """
        if self.degree_days_type not in dd_types:
            err = f"Invalid degree days type. Must be one of {dd_types}."
            raise ValueError(err)
        if self.degree_days_base_temperature != {}:
            if self.degree_days_type in ["heating", "both"]:
                try:
                    self.degree_days_base_temperature["heating"]
                except KeyError:
                    err = (
                        "Base temperature for heating degree days must be specified."
                        "\n Example: degree_days_base_temperature={'heating': 18, "
                        "'cooling': 24}"
                    )
                    raise ValueError(err) from None
            elif self.degree_days_type in ["cooling", "both"]:
                try:
                    self.degree_days_base_temperature["cooling"]
                except KeyError:
                    err = (
                        "Base temperature for cooling degree days must be specified."
                        "\n Example: degree_days_base_temperature={'heating': 18, "
                        "'cooling': 24}"
                    )
                    raise ValueError(err) from None

    def _post_init(
        self,
    ) -> None:
        """End the initialization process.

        If the degree days type is set to ``"auto"``, the method will detect the degree
        days type. See :meth:`_detect_degree_days_type`. After the detection, the
        predictors will be set.
        """
        self._detect_degree_days_type()
        if self.degree_days_type == "both":
            self.predictors = ["heating_degree_days", "cooling_degree_days"]
        elif self.degree_days_type in ["heating", "cooling"]:
            self.predictors = [f"{self.degree_days_type}_degree_days"]

    def _aggregate_data(
        self,
        degree_days_base_temperature: dict,
    ) -> None:
        """Compute the degree days and aggregate the data.

        Store the aggregated data in the `aggregated_data` property.
        """
        degree_days = self._calculate_degree_days(degree_days_base_temperature)
        self.aggregated_data = pd.concat(
            [
                self.resampled_energy_temperature,
                degree_days,
            ],
            axis=1,
        )

    def _detect_degree_days_type(
        self,
        significance_level: float = 0.05,
    ) -> None:
        """Estimate the degree days type if it is set to ``"auto"``.

        It will compute the Spearman correlation (with the alternative hypothesis)
        between the energy and the temperature. If the p-value is below the
        significance level, the degree days type will be set to ``"heating"`` or
        ``"cooling"`` or ``"both"``.

        If no significant correlation is found, the method will raise a ValueError.

        - **Heating**: The energy consumption is negatively correlated with the
          temperature for the periods with the mean temperature below the intersaison
          mean temperature.
        - **Cooling**: The energy consumption is positively correlated with the
          temperature for the periods with the mean temperature above the intersaison
          mean temperature.

        Note:
        ----
        The Spearman correlation is a non-parametric test that measures the strength
        and direction of the monotonic relationship between two variables. The relation
        is not necessarily linear, and the test does not assume that the data is
        normally distributed.

        .. warning::
            If the data contains multiple groups, the Simpson's paradox may occur. The
            Paradox states that the correlation observed in the aggregated data may not
            hold when the data is split into subgroups. See the
            `CategoricalThermoSensitivity` class for a solution.

            References: `wikipedia <https://en.wikipedia.org/wiki/Simpson%27s_paradox>`_

        """
        if self.degree_days_type == "auto":
            heating_mask = (
                self.resampled_energy_temperature[self.temperature_name]
                < self.interseason_mean_temperature
            )
            min_nb_points_heating = 10
            if sum(heating_mask) <= min_nb_points_heating:
                warn = (
                    "Not enough data for the heating period."
                    f"Number of data points: {sum(heating_mask)}"
                )
                self.logger.warning(warn)
                # too few point to do any test
                heating_sp = 1
            else:
                heating_sp = spearmanr(
                    self.resampled_energy_temperature.loc[
                        heating_mask,
                        [self.target_name, self.temperature_name],
                    ],
                    alternative="less",
                ).pvalue
            cooling_mask = (
                self.resampled_energy_temperature[self.temperature_name]
                > self.interseason_mean_temperature
            )
            min_nb_points_cooling = 10
            if sum(cooling_mask) <= min_nb_points_cooling:
                warn = (
                    "Not enough data for the cooling period."
                    f"Number of data points: {sum(heating_mask)}"
                )
                self.logger.warning(warn)
                # too few point to do any test
                cooling_sp = 1
            else:
                data_to_test = self.resampled_energy_temperature.loc[
                    cooling_mask,
                    [self.target_name, self.temperature_name],
                ]
                cooling_sp = spearmanr(data_to_test, alternative="greater").pvalue
            if heating_sp < significance_level and cooling_sp < significance_level:
                self.degree_days_type = "both"
            elif heating_sp < significance_level:
                self.degree_days_type = "heating"
            elif cooling_sp < significance_level:
                self.degree_days_type = "cooling"
            else:
                err = (
                    "Cannot detect the degree days type. Please specify it manually.\n"
                    f"{cooling_sp=} \n"
                    f"{heating_sp=} \n"
                    f"{self.resampled_energy_temperature=} \n"
                    f"{self.resampled_energy=} \n"
                    f"{self.resampled_temperature=} \n"
                )
                raise ValueError(err)

    def _calculate_degree_days(
        self,
        degree_days_base_temperature: dict,
    ) -> pd.DataFrame:
        """Compute the degree days.

        Parameters
        ----------
        degree_days_base_temperature : dict
            Base temperature for the computation of the degree days.
            Must be a dictionary with the keys ``"heating"`` and/or ``"cooling"``.
            Example: degree_days_base_temperature={'heating': 18, 'cooling': 24}

        Returns
        -------
        pd.DataFrame
            DataFrame with the heating and/or cooling degree days
            sampled at the given frequency.

        """
        for dd_type in degree_days_base_temperature:
            if self.degree_days_type in [dd_type, "both"]:
                degree_days = [
                    dd_compute(
                        self.temperature_data,
                        degree_days_base_temperature[dd_type],
                        dd_type=dd_type,
                        method=self.degree_days_computation_method,
                    )
                    .resample(self.frequency)
                    .sum()
                    for dd_type in degree_days_base_temperature
                ]
        return pd.concat(degree_days, axis=1)


[docs]
    def calibrate_base_temperature(
        self,
        dd_type: literal_dd_types = "heating",
        t0: float | None = None,
        xatol: float = 1e-1,
    ) -> float:
        """Calibrate the base temperature for the specified degree days type.

        This method optimizes the base temperature used to compute the degree days
        by minimizing the mean squared error between the energy consumption data and
        the degree days model. The optimization is done using the
        `scipy.optimize.minimize_scalar` function with a bounded method.

        Parameters
        ----------
        dd_type : str, optional
            The type of degree days to calibrate, must be one of the following:
            - "heating": to calibrate the base temperature for heating degree days.
            - "cooling": to calibrate the base temperature for cooling degree days.
            The default is "heating".

        t0 : float, optional
            The initial guess for the base temperature.
            If not provided, the default initial guess is 16°C for heating or 24°C for
            cooling.

        xatol : float, optional
            The absolute error tolerance for the optimization.
            This controls how precise the optimized base temperature needs to be.
            Default is 1e-1 (0.1°C).

        Returns
        -------
        float
            The optimized base temperature for the specified degree days type.

        Raises
        ------
        ValueError
            If the `dd_type` is invalid (not one of "heating" or "cooling").

        Example
        -------
        >>> ts.calibrate_base_temperature(dd_type="heating", t0=15, xatol=0.05)

        """
        if dd_type not in ["heating", "cooling"]:
            err = "Invalid degree days type. Must be one of 'heating' or 'cooling'."
            raise ValueError(err)
        if t0 is None:
            t0 = 16 if dd_type == "heating" else 24
        if dd_type == "heating":
            mask = (
                self.resampled_energy_temperature[self.temperature_name]
                < self.interseason_mean_temperature
            )
            bounds = (10.0, self.interseason_mean_temperature)
        elif dd_type == "cooling":
            mask = (
                self.resampled_energy_temperature[self.temperature_name]
                > self.interseason_mean_temperature
            )
            bounds = (self.interseason_mean_temperature, 30.0)
        else:
            err = "Invalid degree days type. Must be one of 'heating' or 'cooling'."
            raise ValueError(err)
        res = minimize_scalar(
            self.loss_function,
            args=(
                dd_type,
                self.resampled_energy_temperature[self.target_name],
                self.temperature_data,
                self.frequency,
                mask,
                self.degree_days_computation_method,
            ),
            bounds=bounds,
            method="bounded",
            options={
                "xatol": xatol,
            },
        )
        return res.x



[docs]
    def calibrate_base_temperatures(
        self,
        t0_heating: float | None = None,
        t0_cooling: float | None = None,
        xatol: float = 1e-1,
    ) -> None:
        """Calibrate the base temperatures for both heating and cooling degree days.

        This method optimizes the base temperatures for heating and/or cooling degree
        days by minimizing the mean squared error between the energy consumption data
        and the degree days model. The method will calibrate the base temperatures based
        on the detected or specified `degree_days_type`.

        If the `degree_days_type` is "heating", only the heating base temperature is
        calibrated. If it is "cooling", only the cooling base temperature is calibrated.
        If it is "both", both base temperatures are calibrated.

        Parameters
        ----------
        t0_heating : float, optional
            The initial guess for the heating base temperature. If not provided, the
            default
            initial guess is 16°C.

        t0_cooling : float, optional
            The initial guess for the cooling base temperature. If not provided, the
            default initial guess is 24°C.

        xatol : float, optional
            The absolute error tolerance for the optimization.
            This controls how precise the optimized base temperatures need to be.
            Default is 1e-1 (0.1°C).

        Returns
        -------
        None
            The method updates the `degree_days_base_temperature` attribute with the
            optimized base temperatures for heating and/or cooling.

        Raises
        ------
        ValueError
            If the `degree_days_type` is invalid.

        Example
        -------
        >>> ts.calibrate_base_temperatures(t0_heating=15, t0_cooling=25, xatol=0.05)

        """
        types_to_calibrate = []
        if self.degree_days_type in ["heating", "both"]:
            types_to_calibrate.append("heating")
        if self.degree_days_type in ["cooling", "both"]:
            types_to_calibrate.append("cooling")
        for dd_type in types_to_calibrate:
            if dd_type == "heating":
                t0 = t0_heating
            elif dd_type == "cooling":
                t0 = t0_cooling
            topt = self.calibrate_base_temperature(
                dd_type=cast(Literal["heating", "cooling"], dd_type),
                t0=t0,
                xatol=xatol,
            )
            self.degree_days_base_temperature[dd_type] = topt


    def _fit_thermosensitivity(
        self,
    ) -> None:
        """Fit the thermosensitivity model with interactions for categories.

        This method performs the following steps:

        1. Drops rows with missing values from the aggregated data.
        2. Creates a regression model where the dependent variable is energy consumption
          and the independent variables are the degree days (heating, cooling, or both)
          and category interactions.
        3. Encodes the category labels using one-hot encoding.
        4. Constructs interaction terms between the degree days and the one-hot encoded
          categories.
        5. Fits the Ordinary Least Squares (OLS) model to the data with the degree days
          and interaction terms.

        The fitted model is stored in the ``_model`` attribute.

        """
        data = self.aggregated_data.dropna(how="any", axis=0)
        y = data[self.target_name].copy()
        x = data[self.predictors].copy()
        x["Intercept"] = 1  # add constant
        self._model = OLS(y, x).fit()


[docs]
    def fit(
        self: ThermosensitivityInstance,
    ) -> ThermosensitivityInstance:
        """Train the model.

        This method will:

        1. Calibrate the base temperature if it is not set.
           See :meth:`calibrate_base_temperature`.
        2. Aggregate the data. This consists of resampling the energy and temperature
           data and the computation of the degree days. See :meth:`_aggregate_data`.
        3. Fit the thermosensitivity model.
           See :meth:`_fit_thermosensitivity`.

        """
        self.calibrate_base_temperatures()
        self._aggregate_data(self.degree_days_base_temperature)
        self._fit_thermosensitivity()
        return self


    def __repr__(
        self,
    ) -> str:
        """Return the representation of the object."""
        class_name = self.__class__.__name__
        header = f"""{class_name}(frequency={self.frequency},
        degree_days_type={self.degree_days_type},
        degree_days_base_temperature={ {k:round(v, 2) for k,v
            in self.degree_days_base_temperature.items()} },
        degree_days_computation_method={self.degree_days_computation_method},
        interseason_mean_temperature={self.interseason_mean_temperature})"""
        if self._model is not None:
            message = f"{header}\n\n{self.model.summary(slim=True)}"
        else:
            message = header
        return message


[docs]
    def loss_function(
        self,
        t0: float,
        dd_type: literal_valid_dd_types,
        resampled_energy: pd.Series,
        raw_temperature: pd.Series,
        frequency: str,
        mask: pd.Series | None = None,
        degree_days_computation_method: literal_computation_dd_types = "integral",
    ) -> float:
        """Loss function for the optimization of the base temperature.

        Compute the mean squared error (MSE) between the observed energy data and
        the energy model based on degree days. This function is used for calibrating the
        base temperature in the thermosensitivity model by finding the base temperature
        that minimizes this error.

        The degree days are calculated based on the input temperature data, resampled
        to the desired frequency, and compared against the resampled energy data. The
        MSE is used as the objective function to optimize the base temperature.

        Parameters
        ----------
        t0 : float
            Base temperature used to compute degree days.
        dd_type : literal_dd_types
            Type of degree days to compute. Must be one of:
            - "heating": degree days when temperatures are below the base temperature.
            - "cooling": degree days when temperatures are above the base temperature.
        resampled_energy : pd.Series
            The resampled time series of energy data to be modeled.
        raw_temperature : pd.Series
            The original time series of temperature data, which is used to compute
            degree days based on the base temperature.
        frequency : str
            The frequency to resample the data (e.g., "1D" for daily, "7D" for weekly).
        mask : pd.Series or None, optional
            A boolean mask to filter the data before fitting the model. This allows for
            focusing on specific periods (e.g., only heating or cooling periods).
            Default is None.
        degree_days_computation_method : literal_computation_dd_types, optional
            The method used to compute degree days. Must be one of:
            - "integral": sum the difference between the base temperature and actual
            temperature.
            - "mean": use the mean temperature for degree day calculations.
            - "min_max": use the average of daily minimum and maximum temperatures.
            Default is "integral".

        Returns
        -------
        float
            The mean squared error (MSE) of the residuals between the observed
            energy data and the modeled energy data based on degree days.

        Example
        -------
        >>> loss = loss_function(
                t0=18.0,
                dd_type="heating",
                resampled_energy=energy_data,
                raw_temperature=temperature_data,
                frequency="1D",
                degree_days_computation_method="integral",
            )
        >>> print(f"Calculated loss: {loss}")

        """
        degree_days = dd_compute(
            temperature=raw_temperature,
            reference=t0,
            dd_type=dd_type,
            method=degree_days_computation_method,
        )
        degree_days_resampled = (
            degree_days.resample(frequency).sum().rename("degree_days")
        )
        data = pd.concat([resampled_energy, degree_days_resampled], axis=1).dropna(
            how="any",
            axis=0,
        )
        data["Intercept"] = 1
        if mask is not None:
            data = data[mask]
        model = OLS(
            data[resampled_energy.name],
            data[["degree_days", "Intercept"]],
        ).fit()
        log_message = f"{t0=:.4f}, {model.mse_resid:.2f}, {model.mse_total:.2f}"
        self.logger.info(log_message)
        return model.mse_resid





[docs]
class CategoricalThermoSensitivity(
    ThermoSensitivity,
):
    """Class to compute the thermosensitivity of a building with labeled periods.

    Based on the `ThermoSensitivity` class.

    """

    categories_name = "category"

    def __init__(
        self,
        energy_data: pd.Series,
        temperature_data: pd.Series,
        categories: pd.Series,
        frequency: str = "1D",
        degree_days_type: literal_dd_types = "heating",
        degree_days_base_temperature: dict | None = None,
        degree_days_computation_method: literal_computation_dd_types = "integral",
        interseason_mean_temperature: float = 20,
        base_logger_name: str | None = None,
        min_logger_level_stdout: int | str = logging.ERROR,
    ) -> None:
        """Return a ``CategoricalThermoSensitivity`` instance.

        Parameters
        ----------
        energy_data : pd.Series
            Energy data of the building.
        temperature_data : pd.Series
            Outdoor temperature data.
        categories : pd.Series
            Labels for the periods.
        frequency : str, optional
            Frequency for the analysis the data, by default "1D".
            If "1D", the data will be resampled daily.
            If "7D", the data will be resampled weekly.
            If "1W-MON", the data will be resampled weekly starting on Monday.
        degree_days_type : str, optional
            Type of degree days to compute. Must be one of the following:

            - "heating": compute only heating degree days (temperature below a
              threshold).
            - "cooling": compute only cooling degree days (temperature above a
              threshold).
            - "both": compute both heating and cooling degree days.
            - "auto": automatically detect the degree days type. See the
              `detect_degree_days_type` method.

        degree_days_base_temperature : dict, optional
            Base temperature for the computation of the degree days, by default {}.
            Must be a dictionary with the keys ``"heating"`` and/or ``"cooling"``.
            Example: degree_days_base_temperature={'heating': 18, 'cooling': 24}
        degree_days_computation_method : str, optional
            Method to compute the degree days, by default ``"integral"``. Possibilities
            are:

            - "integral": integral the degree days above or below the base temperature.
            - "mean": sum the difference between the base temperature and the mean
              temperature.
            - "min_max": sum the difference between the base temperature and the mean
              temperature computed as the mean of the minimum and maximum temperature.

        interseason_mean_temperature : float, optional
            Mean temperature to detect the heating and cooling periods, by default 20.
            Used only:

            - to detect the degree days type automatically. See the
              `detect_degree_days_type` method.
            - to estimate the base temperature. See the `calibrate_base_temperature`
              method.

        base_logger_name : str, optional
            Name of the logger. By default, it is the class name. All following
            following instances receive a unique identifier, based on the first
            one, with the pattern:

            - ``<base_logger_name>``
            - ``<base_logger_name>_1``
            - ...

        min_logger_level_stdout : str or int, optional
            The minimum severity level for messages to be printed to standard output.
            Messages below this level will not be displayed. The default is
            ``"ERROR"``, meaning only errors and critical messages will
            be printed. Acceptable values can be specified as strings (e.g.,
            "DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL") or as corresponding
            integer constants from the `logging` module. If you want to have details
            on what's going on in the code, set this to ``"INFO"`` or ``"WARNING"``.

        """
        degree_days_base_temperature = {} or degree_days_base_temperature
        self._categories = categories
        super().__init__(
            energy_data=energy_data,
            temperature_data=temperature_data,
            frequency=frequency,
            degree_days_type=degree_days_type,
            degree_days_base_temperature=degree_days_base_temperature,
            degree_days_computation_method=degree_days_computation_method,
            interseason_mean_temperature=interseason_mean_temperature,
            base_logger_name=base_logger_name,
            min_logger_level_stdout=min_logger_level_stdout,
        )

    @cached_property
    def resampled_categories(
        self,
    ) -> pd.Series:
        """Resample categories at the specified frequency.

        This method resamples the categorical data (self.categories) at the given
        frequency (self.frequency) and computes the most common category within each
        resampled period. If a period contains no data, it returns None for that
        period. The method uses an efficient aggregation approach by leveraging `agg()`
        and `value_counts()` to find the most frequent category.

        The result is a `pd.Series` with the same frequency as the resampling,
        containing the most common category for each resampled time window.

        This property is cached to avoid recomputing it multiple times, improving
        performance in case of repeated access.

        Returns
        -------
        pd.Series
            A Series where each value corresponds to the most common category in the
            resampled period, indexed by the resampled time index.

        Example
        -------
        >>> self.frequency = '1D'  # Daily resampling
        >>> self.categories = pd.Series(['A', 'B', 'A', 'A', 'B'],
        index=pd.date_range('2023-01-01', periods=5))
        >>> resampled = self.resampled_categories
        >>> print(resampled)
        2023-01-01    A
        2023-01-02    A
        2023-01-03    A
        2023-01-04    B
        Freq: D, dtype: object

        .. note::
            In case there are multiple categories for one resampled period, the
            category assigned to the resampled period is the most common one.

        """
        categories: pd.Series = self.categories
        resampler: Resampler = categories.resample(self.frequency)
        resampled_categories: pd.Series = resampler.agg(
            lambda x: x.value_counts().idxmax() if not x.empty else None,
        )
        return resampled_categories.rename(self.categories_name)

    @property
    def categories(
        self,
    ) -> pd.Series:
        """The categories of the periods."""
        return self._categories

    @categories.setter
    def categories(
        self,
        value: pd.Series,
    ) -> None:
        """Set the categories attribute.

        Parameters
        ----------
        value : pd.Series
            A pandas Series to assign as the categories.

        """
        self._categories = value

    @cached_property
    def resampled_energy_temperature_category(
        self,
    ) -> pd.DataFrame:
        """The resampled energy, temperature and category data.

        The DataFrame contains the resampled energy and temperature data.
        Periods with missing values are removed.
        """
        return pd.concat(
            [
                self.resampled_energy,
                self.resampled_temperature,
                self.resampled_categories,
            ],
            axis=1,
        ).dropna(how="any", axis=0)

    def _aggregate_data(
        self,
        degree_days_base_temperature: dict,
    ) -> None:
        """Compute the degree days and aggregate the data.

        Store the aggregated data in the `aggregated_data` property.
        """
        degree_days = self._calculate_degree_days(degree_days_base_temperature)
        self.aggregated_data = pd.concat(
            [
                self.resampled_energy_temperature_category,
                degree_days,
            ],
            axis=1,
        )

    def _detect_degree_days_type(
        self,
        significance_level: float = 0.05,
    ) -> None:
        """Detect the degree days type (heating, cooling, or both).

        Detection of the degree-days types is based on correlations between
        energy consumption and temperature, across different categories.

        This method automatically detects the type of degree days (heating, cooling, or
        both) if the `degree_days_type` is set to "auto". It manages potential
        Simpson's paradox by calculating the Spearman correlation for each category in
        the data, rather than relying on aggregated data. The detection is performed by
        evaluating the correlation between energy consumption and temperature in each
        resampled category.

        If the Spearman correlation between energy consumption and temperature is
        significant for any category:

        - A negative correlation for periods below the interseason mean temperature
          indicates "heating" degree days.
        - A positive correlation for periods above the interseason mean temperature
          indicates "cooling" degree days.
        - If both correlations are significant, the system is assumed to have both
          heating and cooling sensitivity.

        Parameters
        ----------
        significance_level : float, optional
            The threshold for the p-value to consider the correlation as statistically
            significant. The default is 0.05 (5% significance level).

        Returns
        -------
        None
            The method updates the `degree_days_type` attribute of the class, setting
            it to either "heating", "cooling", or "both" based on the correlation
            results. If no significant correlation is found, a `ValueError` is raised.

        Raises
        ------
        ValueError
            If no significant correlation can be detected for either heating or cooling
            degree days, the method raises a `ValueError` and suggests manual
            specification of the `degree_days_type`.

        Notes
        -----
        - **Spearman Correlation**: This non-parametric correlation test is used to
          determine monotonic relationships between energy consumption and temperature.
          It is robust to non-linear relationships and does not assume normality.
        - **Simpson's Paradox**: By evaluating correlations within each category, the
          method avoids potential misleading conclusions that might arise from
          aggregated data. Simpson's Paradox occurs when trends that appear in
          different groups of data disappear or reverse when the data is combined.

        See Also
        --------
        :py:func:`ThermoSensitivity._detect_degree_days_type` : Base implementation for
        non-categorical data.

        Example
        -------
        >>> self._detect_degree_days_type(significance_level=0.05)
        Detected degree days type: heating

        """
        if self.degree_days_type == "auto":
            heating_sp = 1
            cooling_sp = 1
            for cat in list(self.resampled_categories.unique()):
                log_message = f"{cat=}"
                self.logger.info(log_message)
                heating_mask = (
                    self.resampled_energy_temperature_category[self.temperature_name]
                    < self.interseason_mean_temperature
                )
                cat_mask = (
                    self.resampled_energy_temperature_category[self.categories_name]
                    == cat
                )
                tmp_heating_sp = spearmanr(
                    self.resampled_energy_temperature_category.loc[
                        heating_mask & cat_mask,
                        [self.target_name, self.temperature_name],
                    ],
                    alternative="less",
                )
                possib_heat_sp_no_nan = [
                    el for el in [tmp_heating_sp.pvalue, heating_sp] if ~np.isnan(el)
                ]
                heating_sp = min(possib_heat_sp_no_nan)
                cooling_mask = (
                    self.resampled_temperature > self.interseason_mean_temperature
                )
                tmp_cooling_sp = spearmanr(
                    self.resampled_energy_temperature_category.loc[
                        cooling_mask & cat_mask,
                        [self.target_name, self.temperature_name],
                    ],
                    alternative="greater",
                )
                log_message = f"{tmp_cooling_sp}"
                self.logger.info(log_message)
                possib_cool_sp_no_nan = [
                    el for el in [tmp_cooling_sp.pvalue, cooling_sp] if ~np.isnan(el)
                ]
                cooling_sp = min(possib_cool_sp_no_nan)
            if heating_sp < significance_level and cooling_sp < significance_level:
                self.degree_days_type = "both"
            elif heating_sp < significance_level:
                self.degree_days_type = "heating"
            elif cooling_sp < significance_level:
                self.degree_days_type = "cooling"
            else:
                err = (
                    "Cannot detect the degree days type. Please specify it manually. "
                    f"For info, {cooling_sp=}, {heating_sp=}"
                )
                raise ValueError(err)

    def _fit_thermosensitivity(
        self,
    ) -> None:
        """Fit the thermosensitivity model with interactions for categories.

        This method performs the following steps:

        1. Drops rows with missing values from the aggregated data.
        2. Creates a regression model where the dependent variable is energy consumption
          and the independent variables are the degree days (heating, cooling, or both)
          and category interactions.
        3. Encodes the category labels using one-hot encoding.
        4. Constructs interaction terms between the degree days and the one-hot encoded
          categories.
        5. Fits the Ordinary Least Squares (OLS) model to the data with the degree days
          and interaction terms.

        The fitted model is stored in the ``_model`` attribute.

        """
        data = self.aggregated_data.dropna(how="any", axis=0)
        y = data[self.target_name].copy()
        x = data[self.predictors].copy()
        x["Intercept"] = 1  # add constant
        one_hot_encoding = pd.get_dummies(data["category"], dtype=int)
        # create interaction terms
        interactions = None
        for col in x.columns:
            for cat in one_hot_encoding.columns:
                tmp_int = x[col] * one_hot_encoding[cat]
                tmp_int.name = f"{col}:{cat}"
                if interactions is None:
                    interactions = tmp_int
                else:
                    interactions = pd.concat([interactions, tmp_int], axis=1)
        self._model = OLS(y, interactions).fit()