Using `eat` for thermosensitivity analysis: Real data#

This notebook uses data from the real world. It comes from a school building. Hence, it shows a large variety of occupancy:

usual school days
usual weekends
Wednesdays can differ from the other days
Bank holidays
school holidays either closed, or with another occupancy (think of a summer camp)

[1]:

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

[2]:

files = ["data/example_data_1.parquet", "data/example_data_2.parquet"]

for file in files:
    data = pd.read_parquet(file)
    data = data.resample("1D").agg({"energy": "sum", "temperature": "mean"})
    fig, [ax1, ax2] = plt.subplots(1, 2, figsize=(10, 5))
    data[["energy"]].plot(ax=ax1)
    data[["temperature"]].plot(ax=ax1, secondary_y=True)
    ax1.set_ylabel("Energy consumption (kWh)")
    ax1.right_ax.set_ylabel("mean Daily Temperature (°C)")
    data.plot.scatter(x="temperature", y="energy", ax=ax2)
    ax2.set_xlabel("mean Daily Temperature (°C)")
    ax2.set_ylabel("Energy consumption (kWh)")
    ax1.set_title("Energy consumption and temperature over time")
    ax2.set_title("Energy consumption vs temperature")
    fig.suptitle(
        "Synthetic data for energy consumption " + file.split("/")[-1].split(".")[0]
    )
    fig.tight_layout()

../_images/user_guide_thermosensitivity_real_2_0.png

../_images/user_guide_thermosensitivity_real_2_1.png

Thermo-sensitivity analysis#

The main objective of the analysis is to explain the impact of the temperature over the energy.

Each category of occupancy can be modeled with a thermosensitivity model by: - finding the threshold temperatures for heating and cooling - finding the three coefficients \(E_0\), \(TS_{cooling}\) and \(TS_{heating}\) such that

\[E = E_0 + TS_{cooling} * DD_{cooling} + TS_{heating} * DD_{heating}\]

But this must be done for all the different (unlabeled) categories.

Modeling the Weekly energy#

The first step is to model the weekly energy consumption. This is done by passing the frequencu “7D” to the ThermoSensitivity class.

Processing weekly data is a good way to get a first idea of the thermosensitivity of the building without being too much influenced by the daily variations.

[3]:

from energy_analysis_toolbox.thermosensitivity import ThermoSensitivity

[4]:

data = pd.read_parquet(files[0])
start = data.index[0]
first_monday = start + pd.DateOffset(weekday=0, hour=0, minute=0, second=0)
data = data.loc[first_monday:]
data.head()

[4]:

	duration	energy	holidays	temperature
timestamp
2015-05-04 00:00:00+02:00	600000.0	600000.0	True	15.800000
2015-05-04 00:10:00+02:00	600000.0	1200000.0	True	15.766667
2015-05-04 00:20:00+02:00	600000.0	1200000.0	True	15.733333
2015-05-04 00:30:00+02:00	600000.0	1200000.0	True	15.700000
2015-05-04 00:40:00+02:00	600000.0	600000.0	True	15.666667

[5]:

ts = ThermoSensitivity(
    energy_data=data["energy"],
    temperature_data=data["temperature"],
    frequency="7D",
    degree_days_type="auto",
    degree_days_computation_method="integral",
)

[6]:

ts.fit()
ts.model.summary()

[6]:

OLS Regression Results
Dep. Variable:	energy	R-squared:	0.568
Model:	OLS	Adj. R-squared:	0.566
Method:	Least Squares	F-statistic:	421.2
Date:	Fri, 11 Oct 2024	Prob (F-statistic):	2.21e-60
Time:	20:44:14	Log-Likelihood:	-7243.9
No. Observations:	323	AIC:	1.449e+04
Df Residuals:	321	BIC:	1.450e+04
Df Model:	1
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
heating_degree_days	6.991e+07	3.41e+06	20.523	0.000	6.32e+07	7.66e+07
Intercept	1.089e+09	1.01e+08	10.746	0.000	8.9e+08	1.29e+09

Omnibus:	46.497	Durbin-Watson:	0.965
Prob(Omnibus):	0.000	Jarque-Bera (JB):	130.523
Skew:	-0.649	Prob(JB):	4.54e-29
Kurtosis:	5.831	Cond. No.	40.7

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

[7]:

ts.aggregated_data

[7]:

	energy	temperature	heating_degree_days
timestamp
2015-05-04 00:00:00+02:00	8.418000e+08	18.067510	2.365944
2015-05-11 00:00:00+02:00	1.519200e+09	19.319345	1.547856
2015-05-18 00:00:00+02:00	1.975200e+09	17.227480	4.530394
2015-05-25 00:00:00+02:00	1.726800e+09	17.525794	2.079876
2015-06-01 00:00:00+02:00	1.925400e+09	22.076042	0.111308
...	...	...	...
2021-09-06 00:00:00+02:00	1.674000e+09	22.365526	0.000000
2021-09-13 00:00:00+02:00	1.678800e+09	21.557242	0.000000
2021-09-20 00:00:00+02:00	1.786200e+09	19.917758	0.473862
2021-09-27 00:00:00+02:00	1.790400e+09	19.841270	0.058568
2021-10-04 00:00:00+02:00	1.202400e+09	17.835594	0.995487

336 rows × 3 columns

[8]:

predictions = ts.model.predict()

fig, ax = plt.subplots()
aggregated_data = ts.aggregated_data.dropna(how="any", axis=0)
ax.plot(aggregated_data.index, aggregated_data["energy"], label="True")
ax.plot(aggregated_data.index, predictions, label="Predicted")
ax.legend()
ax.set_title("Predicted vs true energy consumption")
ax.set_ylabel("Energy consumption (kWh)")
ax.set_xlabel("Date")

[8]:

Text(0.5, 0, 'Date')

../_images/user_guide_thermosensitivity_real_10_1.png

[9]:

fig, ax = plt.subplots()

ax.scatter(aggregated_data["temperature"], aggregated_data["energy"], label="True")
ax.scatter(aggregated_data["temperature"], predictions, label="Predicted")
ax.legend()
ax.set_xlabel("mean weekly Temperature (°C)")
ax.set_ylabel("Energy consumption (kWh)")
ax.set_title("Energy consumption vs temperature")

[9]:

Text(0.5, 1.0, 'Energy consumption vs temperature')

../_images/user_guide_thermosensitivity_real_11_1.png

[10]:

data2 = pd.read_parquet(files[1])
start = data2.index[0]
first_monday = start + pd.DateOffset(weekday=0, hour=0, minute=0, second=0)
data2 = data2.loc[first_monday:]
ts2 = ThermoSensitivity(
    energy_data=data2["energy"],
    temperature_data=data2["temperature"],
    frequency="7D",
    degree_days_type="auto",
    degree_days_computation_method="integral",  # here the provided data is already in daily frequency. Hence the only mean
)
ts2.fit()
predictions2 = ts2.model.predict()

fig, ax = plt.subplots()
aggregated_data = ts2.aggregated_data.dropna(how="any", axis=0)
ax.plot(aggregated_data.index, aggregated_data["energy"], label="True")
ax.plot(aggregated_data.index, predictions2, label="Predicted")
ax.legend()
ax.set_title("Predicted vs true energy consumption")
ax.set_ylabel("Energy consumption (kWh)")
ax.set_xlabel("Date")

fig, ax = plt.subplots()
ax.scatter(aggregated_data["temperature"], aggregated_data["energy"], label="True")
ax.scatter(aggregated_data["temperature"], predictions2, label="Predicted")
ax.legend()
ax.set_xlabel("mean weekly Temperature (°C)")
ax.set_ylabel("Energy consumption (kWh)")
ax.set_title("Energy consumption vs temperature")

[10]:

Text(0.5, 1.0, 'Energy consumption vs temperature')

../_images/user_guide_thermosensitivity_real_12_1.png

../_images/user_guide_thermosensitivity_real_12_2.png

Conclusion of Weekly Analysis#

As you can see above, the model is able to fit the data quite well. The model is able to capture the heating and cooling needs of the building.

One limitation that we can observe is the drop in energy consumption during the summer. This is due to the fact that the building is closed during the summer. This is a limitation of the model as it is not able to capture the occupancy of the building.

Modeling the Daily energy#

The next step is to model the daily energy consumption. This is done by passing the frequency “1D” to the ThermoSensitivity class.

However, the daily data is more complex to model as it is influenced by the occupancy of the building.

[11]:

from energy_analysis_toolbox.thermosensitivity.thermosensitivity import (
    CategoricalThermoSensitivity,
)

data = pd.read_parquet(files[0]).loc[
    "2016-02-01":"2018-10-01"
]  # Selecting dates where the behaviour is more stable
start = data.index[0]
first_monday = start + pd.DateOffset(weekday=0, hour=0, minute=0, second=0)
data = data.loc[first_monday:]


def open_close_categoriser(series: pd.Series):
    """Return a series of categories based on the day of the week of the index"""
    timestamps = series.index
    return_data = pd.Series(
        data=[
            "Open" if timestamp.weekday() < 5 else "Closed" for timestamp in timestamps
        ],
        index=series.index,
    )
    return return_data


categories = open_close_categoriser(data)
display(categories.head())

ts_cat = CategoricalThermoSensitivity(
    energy_data=data["energy"],
    temperature_data=data["temperature"],
    categories=categories,
    frequency="1D",
    degree_days_type="auto",
    degree_days_computation_method="mean",
)
ts_cat.fit()

display(ts_cat.model.summary())
predictions = ts_cat.model.predict()
fig, [ax1, ax2] = plt.subplots(1, 2, figsize=(10, 5))
aggregated_data = ts_cat.aggregated_data.dropna(how="any", axis=0)
aggregated_data[["energy"]].plot(ax=ax1)
ax1.plot(aggregated_data.index, predictions, label="Predicted")

aggregated_data.plot.scatter(x="temperature", y="energy", ax=ax2)
ax2.scatter(aggregated_data["temperature"], predictions, label="Predicted", color="red")
ax2.legend()
fig.suptitle("Model with Weekend/Weekday categorisation")

timestamp
2016-02-01 00:00:00+01:00    Open
2016-02-01 00:10:00+01:00    Open
2016-02-01 00:20:00+01:00    Open
2016-02-01 00:30:00+01:00    Open
2016-02-01 00:40:00+01:00    Open
dtype: object

OLS Regression Results
Dep. Variable:	energy	R-squared:	0.522
Model:	OLS	Adj. R-squared:	0.520
Method:	Least Squares	F-statistic:	352.9
Date:	Fri, 11 Oct 2024	Prob (F-statistic):	6.83e-155
Time:	20:44:16	Log-Likelihood:	-20126.
No. Observations:	974	AIC:	4.026e+04
Df Residuals:	970	BIC:	4.028e+04
Df Model:	3
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
heating_degree_days:Closed	1.855e+06	4.2e+06	0.442	0.659	-6.39e+06	1.01e+07
heating_degree_days:Open	6.648e+07	2.57e+06	25.819	0.000	6.14e+07	7.15e+07
Intercept:Closed	6.558e+07	1.75e+07	3.756	0.000	3.13e+07	9.98e+07
Intercept:Open	2.124e+08	1.11e+07	19.157	0.000	1.91e+08	2.34e+08

Omnibus:	124.025	Durbin-Watson:	1.065
Prob(Omnibus):	0.000	Jarque-Bera (JB):	375.035
Skew:	-0.632	Prob(JB):	3.65e-82
Kurtosis:	5.765	Cond. No.	8.88

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

[11]:

Text(0.5, 0.98, 'Model with Weekend/Weekday categorisation')

../_images/user_guide_thermosensitivity_real_15_3.png

[12]:

data = pd.read_parquet(files[0]).loc[
    "2016-02-01":"2018-10-01"
]  # Selecting dates where the behaviour is more stable


def dayofweek_categoriser(series: pd.Series):
    """Return a series of categories based on the day of the week of the index"""
    timestamps = series.index
    return_data = pd.Series(
        data=[timestamp.weekday() for timestamp in timestamps],
        index=series.index,
    )
    return return_data


categories = dayofweek_categoriser(data)
display(categories.head())

ts_cat = CategoricalThermoSensitivity(
    energy_data=data["energy"],
    temperature_data=data["temperature"],
    categories=categories,
    frequency="1D",
    degree_days_type="auto",
    degree_days_computation_method="mean",
)
ts_cat.fit()

display(ts_cat.model.summary())
predictions = ts_cat.model.predict()
fig, [ax1, ax2] = plt.subplots(1, 2, figsize=(10, 5))
aggregated_data = ts_cat.aggregated_data.dropna(how="any", axis=0)
aggregated_data[["energy"]].plot(ax=ax1)
ax1.plot(aggregated_data.index, predictions, label="Predicted")

aggregated_data.plot.scatter(x="temperature", y="energy", ax=ax2)
sns.scatterplot(
    x=aggregated_data["temperature"], y=predictions, hue=aggregated_data["category"]
)
ax2.legend()
fig.suptitle("Model with Day of Week categorisation (Monday, etc.)")

timestamp
2016-02-01 00:00:00+01:00    0
2016-02-01 00:10:00+01:00    0
2016-02-01 00:20:00+01:00    0
2016-02-01 00:30:00+01:00    0
2016-02-01 00:40:00+01:00    0
dtype: int64

OLS Regression Results
Dep. Variable:	energy	R-squared:	0.601
Model:	OLS	Adj. R-squared:	0.595
Method:	Least Squares	F-statistic:	111.0
Date:	Fri, 11 Oct 2024	Prob (F-statistic):	6.43e-181
Time:	20:44:16	Log-Likelihood:	-20039.
No. Observations:	974	AIC:	4.011e+04
Df Residuals:	960	BIC:	4.017e+04
Df Model:	13
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
heating_degree_days:0.0	8.133e+07	5.36e+06	15.171	0.000	7.08e+07	9.18e+07
heating_degree_days:1.0	7.282e+07	5.28e+06	13.788	0.000	6.25e+07	8.32e+07
heating_degree_days:2.0	3.684e+07	5.1e+06	7.229	0.000	2.68e+07	4.68e+07
heating_degree_days:3.0	7.318e+07	5.25e+06	13.931	0.000	6.29e+07	8.35e+07
heating_degree_days:4.0	7.303e+07	5.48e+06	13.315	0.000	6.23e+07	8.38e+07
heating_degree_days:5.0	1.907e+06	5.43e+06	0.351	0.725	-8.75e+06	1.26e+07
heating_degree_days:6.0	1.8e+06	5.49e+06	0.328	0.743	-8.97e+06	1.26e+07
Intercept:0.0	2.402e+08	2.28e+07	10.552	0.000	1.96e+08	2.85e+08
Intercept:1.0	2.414e+08	2.27e+07	10.613	0.000	1.97e+08	2.86e+08
Intercept:2.0	1.054e+08	2.27e+07	4.637	0.000	6.08e+07	1.5e+08
Intercept:3.0	2.33e+08	2.28e+07	10.215	0.000	1.88e+08	2.78e+08
Intercept:4.0	2.32e+08	2.29e+07	10.143	0.000	1.87e+08	2.77e+08
Intercept:5.0	6.622e+07	2.27e+07	2.919	0.004	2.17e+07	1.11e+08
Intercept:6.0	6.494e+07	2.27e+07	2.863	0.004	2.04e+07	1.09e+08

Omnibus:	330.600	Durbin-Watson:	0.781
Prob(Omnibus):	0.000	Jarque-Bera (JB):	1675.453
Skew:	-1.482	Prob(JB):	0.00
Kurtosis:	8.701	Cond. No.	5.86

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

[12]:

Text(0.5, 0.98, 'Model with Day of Week categorisation (Monday, etc.)')

../_images/user_guide_thermosensitivity_real_16_3.png

[13]:

data2 = pd.read_parquet(files[1]).loc[
    "2017-06-01":"2019-06-01"
]  # Selecting dates where the behaviour is more stable


def open_close_categoriser(series: pd.Series):
    """Return a series of categories based on the day of the week of the index"""
    timestamps = series.index
    return_data = pd.Series(
        data=[
            "Open" if timestamp.weekday() < 5 else "Closed" for timestamp in timestamps
        ],
        index=series.index,
    )
    return return_data


categories = open_close_categoriser(data2)
display(categories.head())

ts_cat = CategoricalThermoSensitivity(
    energy_data=data2["energy"],
    temperature_data=data2["temperature"],
    categories=categories,
    frequency="1D",
    degree_days_type="auto",
    degree_days_computation_method="mean",
)
ts_cat.fit()

display(ts_cat.model.summary())
predictions = ts_cat.model.predict()
fig, [ax1, ax2] = plt.subplots(1, 2, figsize=(10, 5))
aggregated_data = ts_cat.aggregated_data.dropna(how="any", axis=0)
aggregated_data[["energy"]].plot(ax=ax1)
ax1.plot(aggregated_data.index, predictions, label="Predicted")

aggregated_data.plot.scatter(x="temperature", y="energy", ax=ax2)
ax2.scatter(aggregated_data["temperature"], predictions, label="Predicted", color="red")
ax2.legend()
fig.suptitle("Model with Weekend/Weekday categorisation")

timestamp
2017-06-01 00:00:00+02:00    Open
2017-06-01 00:10:00+02:00    Open
2017-06-01 00:20:00+02:00    Open
2017-06-01 00:30:00+02:00    Open
2017-06-01 00:40:00+02:00    Open
dtype: object

OLS Regression Results
Dep. Variable:	energy	R-squared:	0.588
Model:	OLS	Adj. R-squared:	0.586
Method:	Least Squares	F-statistic:	346.1
Date:	Fri, 11 Oct 2024	Prob (F-statistic):	1.47e-139
Time:	20:44:17	Log-Likelihood:	-15439.
No. Observations:	731	AIC:	3.089e+04
Df Residuals:	727	BIC:	3.090e+04
Df Model:	3
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
heating_degree_days:Closed	2.833e+07	6.95e+06	4.078	0.000	1.47e+07	4.2e+07
heating_degree_days:Open	1.062e+08	4.23e+06	25.095	0.000	9.79e+07	1.15e+08
Intercept:Closed	2.67e+08	3.29e+07	8.113	0.000	2.02e+08	3.32e+08
Intercept:Open	5.864e+08	2.11e+07	27.728	0.000	5.45e+08	6.28e+08

Omnibus:	30.135	Durbin-Watson:	0.894
Prob(Omnibus):	0.000	Jarque-Bera (JB):	79.470
Skew:	0.099	Prob(JB):	5.54e-18
Kurtosis:	4.603	Cond. No.	10.6

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

[13]:

Text(0.5, 0.98, 'Model with Weekend/Weekday categorisation')

../_images/user_guide_thermosensitivity_real_17_3.png

[14]:

data2

[14]:

	duration	energy	holidays	temperature
timestamp
2017-06-01 00:00:00+02:00	600000.0	1800000.0	False	20.200000
2017-06-01 00:10:00+02:00	600000.0	2400000.0	False	20.133333
2017-06-01 00:20:00+02:00	600000.0	2400000.0	False	20.066667
2017-06-01 00:30:00+02:00	600000.0	1800000.0	False	20.000000
2017-06-01 00:40:00+02:00	600000.0	2400000.0	False	19.933333
...	...	...	...	...
2019-06-01 23:10:00+02:00	600000.0	1200000.0	False	17.483333
2019-06-01 23:20:00+02:00	600000.0	1800000.0	False	17.366667
2019-06-01 23:30:00+02:00	600000.0	1800000.0	False	17.250000
2019-06-01 23:40:00+02:00	600000.0	1200000.0	False	17.133333
2019-06-01 23:50:00+02:00	600000.0	1800000.0	False	17.016667

105256 rows × 4 columns

[15]:

def dayofweek_categoriser(series: pd.Series):
    """Return a series of categories based on the day of the week of the index"""
    timestamps = series.index
    return_data = pd.Series(
        data=[timestamp.weekday() for timestamp in timestamps],
        index=series.index,
    )
    return return_data


categories = dayofweek_categoriser(data2)
display(categories.head())


ts_cat = CategoricalThermoSensitivity(
    energy_data=data2["energy"],
    temperature_data=data2["temperature"],
    categories=categories,
    frequency="1D",
    degree_days_type="auto",
    degree_days_computation_method="mean",
)
ts_cat.fit()

display(ts_cat.model.summary())
predictions = ts_cat.model.predict()
fig, [ax1, ax2] = plt.subplots(1, 2, figsize=(10, 5))
aggregated_data = ts_cat.aggregated_data.dropna(how="any", axis=0)
aggregated_data[["energy"]].plot(ax=ax1)
ax1.plot(aggregated_data.index, predictions, label="Predicted")

aggregated_data.plot.scatter(x="temperature", y="energy", ax=ax2)
ax2.scatter(aggregated_data["temperature"], predictions, label="Predicted", color="red")
ax2.legend()
fig.suptitle("Model with day of week categorisation (Monday, etc.)")

timestamp
2017-06-01 00:00:00+02:00    3
2017-06-01 00:10:00+02:00    3
2017-06-01 00:20:00+02:00    3
2017-06-01 00:30:00+02:00    3
2017-06-01 00:40:00+02:00    3
dtype: int64

OLS Regression Results
Dep. Variable:	energy	R-squared:	0.638
Model:	OLS	Adj. R-squared:	0.632
Method:	Least Squares	F-statistic:	97.27
Date:	Fri, 11 Oct 2024	Prob (F-statistic):	1.89e-148
Time:	20:44:18	Log-Likelihood:	-15392.
No. Observations:	731	AIC:	3.081e+04
Df Residuals:	717	BIC:	3.088e+04
Df Model:	13
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
heating_degree_days:0.0	1.128e+08	9.15e+06	12.320	0.000	9.48e+07	1.31e+08
heating_degree_days:1.0	1.058e+08	8.86e+06	11.950	0.000	8.85e+07	1.23e+08
heating_degree_days:2.0	9.898e+07	8.71e+06	11.358	0.000	8.19e+07	1.16e+08
heating_degree_days:3.0	1.096e+08	8.72e+06	12.564	0.000	9.24e+07	1.27e+08
heating_degree_days:4.0	1.093e+08	9.29e+06	11.762	0.000	9.11e+07	1.28e+08
heating_degree_days:5.0	2.839e+07	9.33e+06	3.043	0.002	1.01e+07	4.67e+07
heating_degree_days:6.0	2.828e+07	9.22e+06	3.067	0.002	1.02e+07	4.64e+07
Intercept:0.0	6.692e+08	4.47e+07	14.964	0.000	5.81e+08	7.57e+08
Intercept:1.0	6.575e+08	4.51e+07	14.591	0.000	5.69e+08	7.46e+08
Intercept:2.0	3.184e+08	4.52e+07	7.044	0.000	2.3e+08	4.07e+08
Intercept:3.0	6.277e+08	4.42e+07	14.211	0.000	5.41e+08	7.14e+08
Intercept:4.0	6.426e+08	4.41e+07	14.565	0.000	5.56e+08	7.29e+08
Intercept:5.0	2.676e+08	4.36e+07	6.138	0.000	1.82e+08	3.53e+08
Intercept:6.0	2.663e+08	4.43e+07	6.017	0.000	1.79e+08	3.53e+08

Omnibus:	58.978	Durbin-Watson:	0.679
Prob(Omnibus):	0.000	Jarque-Bera (JB):	247.407
Skew:	-0.216	Prob(JB):	1.89e-54
Kurtosis:	5.817	Cond. No.	7.14

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

[15]:

Text(0.5, 0.98, 'Model with day of week categorisation (Monday, etc.)')

../_images/user_guide_thermosensitivity_real_19_3.png

Adding the school holidays#

The previous categorisation of occupancy is not enough to explain the daily energy consumption. We need to add the school holidays to the model.

[16]:

%pip install vacances-scolaires-france jours-feries-france

Looking in indexes: https://pypi.org/simple, https://gitlab.ecoco2.com/api/v4/groups/recherche/-/packages/pypi/simple
Requirement already satisfied: vacances-scolaires-france in /home/ubuntu/mambaforge/envs/eat/lib/python3.11/site-packages (0.10.0)
Requirement already satisfied: jours-feries-france in /home/ubuntu/mambaforge/envs/eat/lib/python3.11/site-packages (0.7.0)
Note: you may need to restart the kernel to use updated packages.

[17]:

from jours_feries_france import JoursFeries
from vacances_scolaires_france import SchoolHolidayDates

jf = JoursFeries()
shd = SchoolHolidayDates()


def is_bank_holiday(date):
    return jf.is_bank_holiday(date)


def is_holiday(date):
    return shd.is_holiday_for_zone(date, "B")


def school_open_close(series: pd.Series):
    """Return a series of categories based on the day of the week of the index"""
    timestamps = series.index
    is_open = []
    for timestamp in timestamps:
        if timestamp.weekday() > 4:
            is_open.append("weekend")
        elif is_bank_holiday(timestamp.date()):
            is_open.append("weekend")
        elif is_holiday(timestamp.date()):
            is_open.append("holiday")
        else:
            is_open.append("Open")
    return pd.Series(data=is_open, index=series.index)

[18]:

data = pd.read_parquet(files[0]).loc[
    "2016-02-01":"2018-10-01"
]  # Selecting dates where the behaviour is more stable
categories = school_open_close(data)
display(categories.head())
display(categories.value_counts())

timestamp
2016-02-01 00:00:00+01:00    Open
2016-02-01 00:10:00+01:00    Open
2016-02-01 00:20:00+01:00    Open
2016-02-01 00:30:00+01:00    Open
2016-02-01 00:40:00+01:00    Open
dtype: object

Open       66240
weekend    43338
holiday    30672
Name: count, dtype: int64

[19]:

ts_holidays = CategoricalThermoSensitivity(
    energy_data=data["energy"],
    temperature_data=data["temperature"],
    categories=categories,
    frequency="1D",
    degree_days_type="auto",
    degree_days_computation_method="mean",
)
ts_holidays.fit()
display(ts_holidays.model.summary())

OLS Regression Results
Dep. Variable:	energy	R-squared:	0.819
Model:	OLS	Adj. R-squared:	0.818
Method:	Least Squares	F-statistic:	877.3
Date:	Fri, 11 Oct 2024	Prob (F-statistic):	0.00
Time:	20:44:41	Log-Likelihood:	-19653.
No. Observations:	974	AIC:	3.932e+04
Df Residuals:	968	BIC:	3.935e+04
Df Model:	5
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
heating_degree_days:Open	8.125e+07	1.96e+06	41.464	0.000	7.74e+07	8.51e+07
heating_degree_days:holiday	8.248e+06	2.87e+06	2.874	0.004	2.62e+06	1.39e+07
heating_degree_days:weekend	4.009e+06	2.52e+06	1.589	0.112	-9.42e+05	8.96e+06
Intercept:Open	2.965e+08	8.88e+06	33.375	0.000	2.79e+08	3.14e+08
Intercept:holiday	6.924e+07	1.12e+07	6.181	0.000	4.73e+07	9.12e+07
Intercept:weekend	6.59e+07	1.03e+07	6.403	0.000	4.57e+07	8.61e+07

Omnibus:	123.564	Durbin-Watson:	2.009
Prob(Omnibus):	0.000	Jarque-Bera (JB):	621.575
Skew:	-0.462	Prob(JB):	1.06e-135
Kurtosis:	6.803	Cond. No.	7.90

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

[20]:

fig, ax1 = plt.subplots(1, 1, figsize=(10, 5))
aggregated_data = ts_holidays.aggregated_data.dropna(how="any", axis=0)
aggregated_data[["energy"]].plot(ax=ax1)
predictions = ts_holidays.model.predict()

ax1.plot(aggregated_data.index, predictions, label="Predicted")

n_cat = len(categories.unique())
fig, axarr = plt.subplots(1, n_cat, figsize=(5 * n_cat, 10))
axarr = axarr.flatten() if n_cat > 1 else [axarr]
for i, cat in enumerate(categories.unique()):
    mask = aggregated_data["category"] == cat
    ax = axarr[i]
    ax.scatter(
        aggregated_data[mask]["temperature"],
        aggregated_data[mask]["energy"],
        label="True",
    )
    ax.scatter(
        aggregated_data[mask]["temperature"], predictions[mask], label="Predicted"
    )
    ax.set_title(f"Energy consumption vs temperature \n for {cat}")
    ax.set_xlabel("mean Daily Temperature (°C)")
    ax.set_ylabel("Energy consumption (kWh)")
    ax.legend()

../_images/user_guide_thermosensitivity_real_25_0.png

../_images/user_guide_thermosensitivity_real_25_1.png

[21]:

categories2 = school_open_close(data2)
display(categories2.head())

timestamp
2017-06-01 00:00:00+02:00    Open
2017-06-01 00:10:00+02:00    Open
2017-06-01 00:20:00+02:00    Open
2017-06-01 00:30:00+02:00    Open
2017-06-01 00:40:00+02:00    Open
dtype: object

[22]:

ts_holidays2 = CategoricalThermoSensitivity(
    energy_data=data2["energy"],
    temperature_data=data2["temperature"],
    categories=categories2,
    frequency="1D",
    degree_days_type="auto",
    degree_days_computation_method="mean",
)
ts_holidays2.fit()
display(ts_holidays2.model.summary())

fig, ax1 = plt.subplots(1, 1, figsize=(10, 5))
aggregated_data = ts_holidays2.aggregated_data.dropna(how="any", axis=0)
aggregated_data[["energy"]].plot(ax=ax1)
predictions = ts_holidays2.model.predict()

ax1.plot(aggregated_data.index, predictions, label="Predicted")

n_cat = len(categories2.unique())
fig, axarr = plt.subplots(1, n_cat, figsize=(5 * n_cat, 10))
axarr = axarr.flatten() if n_cat > 1 else [axarr]
for i, cat in enumerate(categories2.unique()):
    mask = aggregated_data["category"] == cat
    ax = axarr[i]
    ax.scatter(
        aggregated_data[mask]["temperature"],
        aggregated_data[mask]["energy"],
        label="True",
    )
    ax.scatter(
        aggregated_data[mask]["temperature"], predictions[mask], label="Predicted"
    )
    ax.set_title(f"Energy consumption vs temperature \n for {cat}")
    ax.set_xlabel("mean Daily Temperature (°C)")
    ax.set_ylabel("Energy consumption (kWh)")
    ax.legend()

OLS Regression Results
Dep. Variable:	energy	R-squared:	0.787
Model:	OLS	Adj. R-squared:	0.784
Method:	Least Squares	F-statistic:	332.5
Date:	Fri, 11 Oct 2024	Prob (F-statistic):	3.11e-236
Time:	20:44:58	Log-Likelihood:	-15199.
No. Observations:	731	AIC:	3.042e+04
Df Residuals:	722	BIC:	3.046e+04
Df Model:	8
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
heating_degree_days:Open	1.177e+08	3.85e+06	30.561	0.000	1.1e+08	1.25e+08
heating_degree_days:holiday	5.941e+07	6.18e+06	9.606	0.000	4.73e+07	7.16e+07
heating_degree_days:weekend	3.016e+07	5.08e+06	5.942	0.000	2.02e+07	4.01e+07
cooling_degree_days:Open	3.163e+08	8.13e+07	3.893	0.000	1.57e+08	4.76e+08
cooling_degree_days:holiday	3.339e+07	1.85e+07	1.800	0.072	-3.02e+06	6.98e+07
cooling_degree_days:weekend	1.371e+07	2.5e+07	0.548	0.584	-3.54e+07	6.28e+07
Intercept:Open	7.199e+08	2.05e+07	35.128	0.000	6.8e+08	7.6e+08
Intercept:holiday	3.007e+08	3.35e+07	8.974	0.000	2.35e+08	3.67e+08
Intercept:weekend	2.615e+08	2.5e+07	10.463	0.000	2.12e+08	3.11e+08

Omnibus:	96.161	Durbin-Watson:	1.242
Prob(Omnibus):	0.000	Jarque-Bera (JB):	390.747
Skew:	0.542	Prob(JB):	1.41e-85
Kurtosis:	6.414	Cond. No.	31.0

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

../_images/user_guide_thermosensitivity_real_27_1.png

../_images/user_guide_thermosensitivity_real_27_2.png

Conclusion of Daily Analysis#

Using no category, the model is not able to capture the occupancy of the building. Using the day of week or the weekend is better, as more data is available for these categories, but their are some large discrepancies between the model and the data.

This may be due to the fact that the building is a school building and the school holidays are not taken into account.

Automatic Day categorisation#

The following section shows how to use the automatic days categorisation.

[23]:

from pandas import DataFrame
from energy_analysis_toolbox.thermosensitivity.daily_analysis import (
    AutoCategoricalThermoSensitivity,
)


ts_auto = AutoCategoricalThermoSensitivity(
    energy_data=data["energy"],
    temperature_data=data["temperature"],
    degree_days_type="auto",
    degree_days_computation_method="integral",
)
ts_auto.fit()
ts_auto

[23]:

AutoCategoricalThermoSensitivity(frequency=1D,
        degree_days_type=heating,
        degree_days_base_temperature={'heating': np.float64(17.18)},
        degree_days_computation_method=integral,
        interseason_mean_temperature=20)

                            OLS Regression Results
==============================================================================
Dep. Variable:                 energy   R-squared:                       0.606
Model:                            OLS   Adj. R-squared:                  0.601
No. Observations:                 974   F-statistic:                     113.5
Covariance Type:            nonrobust   Prob (F-statistic):          1.02e-183
=================================================================================================
                                    coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------------------
heating_degree_days:Friday     6.519e+07   4.79e+06     13.601      0.000    5.58e+07    7.46e+07
heating_degree_days:Monday     7.269e+07   4.72e+06     15.399      0.000    6.34e+07     8.2e+07
heating_degree_days:Saturday   1.876e+06   4.76e+06      0.394      0.693   -7.46e+06    1.12e+07
heating_degree_days:Sunday     1.747e+06   4.83e+06      0.362      0.718   -7.73e+06    1.12e+07
heating_degree_days:Thursday   6.525e+07   4.64e+06     14.063      0.000    5.61e+07    7.44e+07
heating_degree_days:Tuesday    6.511e+07   4.65e+06     13.991      0.000     5.6e+07    7.42e+07
heating_degree_days:Wednesday  3.289e+07   4.53e+06      7.265      0.000     2.4e+07    4.18e+07
Intercept:Friday               2.018e+08    2.4e+07      8.393      0.000    1.55e+08    2.49e+08
Intercept:Monday               2.043e+08   2.41e+07      8.493      0.000    1.57e+08    2.52e+08
Intercept:Saturday             6.484e+07   2.39e+07      2.716      0.007     1.8e+07    1.12e+08
Intercept:Sunday               6.367e+07    2.4e+07      2.650      0.008    1.65e+07    1.11e+08
Intercept:Thursday             2.014e+08   2.41e+07      8.361      0.000    1.54e+08    2.49e+08
Intercept:Tuesday                2.1e+08    2.4e+07      8.755      0.000    1.63e+08    2.57e+08
Intercept:Wednesday             8.94e+07    2.4e+07      3.719      0.000    4.22e+07    1.37e+08
=================================================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

[24]:

ts_auto.merge_and_fit()
ts_auto

[24]:

AutoCategoricalThermoSensitivity(frequency=1D,
        degree_days_type=heating,
        degree_days_base_temperature={'heating': np.float64(17.18)},
        degree_days_computation_method=integral,
        interseason_mean_temperature=20)

                            OLS Regression Results
==============================================================================
Dep. Variable:                 energy   R-squared:                       0.604
Model:                            OLS   Adj. R-squared:                  0.602
No. Observations:                 974   F-statistic:                     295.8
Covariance Type:            nonrobust   Prob (F-statistic):          4.58e-192
======================================================================================================================
                                                         coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------------------------------------
heating_degree_days:Monday-Tuesday-Thursday-Friday  6.705e+07   2.34e+06     28.593      0.000    6.24e+07    7.16e+07
heating_degree_days:Saturday-Sunday                 1.813e+06   3.38e+06      0.536      0.592   -4.83e+06    8.45e+06
heating_degree_days:Wednesday                       3.289e+07   4.52e+06      7.281      0.000     2.4e+07    4.18e+07
Intercept:Monday-Tuesday-Thursday-Friday            2.045e+08    1.2e+07     17.047      0.000    1.81e+08    2.28e+08
Intercept:Saturday-Sunday                           6.425e+07   1.69e+07      3.802      0.000    3.11e+07    9.74e+07
Intercept:Wednesday                                  8.94e+07    2.4e+07      3.727      0.000    4.23e+07    1.36e+08
======================================================================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

[25]:

fig, ax1 = plt.subplots(1, 1, figsize=(10, 5))
aggregated_data = ts_auto.aggregated_data.dropna(how="any", axis=0)
aggregated_data[["energy"]].plot(ax=ax1)
predictions = ts_auto.model.predict()

ax1.plot(aggregated_data.index, predictions, label="Predicted")

n_cat = len(ts_auto.resampled_categories.unique())
fig, axarr = plt.subplots(1, n_cat, figsize=(5 * n_cat, 10), sharey=True)
axarr = axarr.flatten() if n_cat > 1 else [axarr]
for i, cat in enumerate(ts_auto.resampled_categories.unique()):
    mask = aggregated_data["category"] == cat
    ax = axarr[i]
    ax.scatter(
        aggregated_data[mask]["temperature"],
        aggregated_data[mask]["energy"],
        label="True",
    )
    ax.scatter(
        aggregated_data[mask]["temperature"], predictions[mask], label="Predicted"
    )
    ax.set_title(f"Energy consumption vs temperature \n for {cat}")
    ax.set_xlabel("mean Daily Temperature (°C)")
    if i == 0:
        ax.set_ylabel("Energy consumption (kWh)")
    ax.legend()

../_images/user_guide_thermosensitivity_real_32_0.png

../_images/user_guide_thermosensitivity_real_32_1.png

[26]:

ts_auto2 = AutoCategoricalThermoSensitivity(
    energy_data=data2["energy"],
    temperature_data=data2["temperature"],
    degree_days_type="auto",
    degree_days_computation_method="integral",
)
ts_auto2.fit()
ts_auto2.merge_and_fit()
ts_auto2

[26]:

AutoCategoricalThermoSensitivity(frequency=1D,
        degree_days_type=heating,
        degree_days_base_temperature={'heating': np.float64(17.97)},
        degree_days_computation_method=integral,
        interseason_mean_temperature=20)

                            OLS Regression Results
==============================================================================
Dep. Variable:                 energy   R-squared:                       0.629
Model:                            OLS   Adj. R-squared:                  0.627
No. Observations:                 731   F-statistic:                     246.3
Covariance Type:            nonrobust   Prob (F-statistic):          1.33e-153
======================================================================================================================
                                                         coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------------------------------------
heating_degree_days:Monday-Tuesday-Thursday-Friday  9.722e+07   4.06e+06     23.941      0.000    8.92e+07    1.05e+08
heating_degree_days:Saturday-Sunday                 2.474e+07   5.91e+06      4.186      0.000    1.31e+07    3.63e+07
heating_degree_days:Wednesday                        8.77e+07   7.91e+06     11.090      0.000    7.22e+07    1.03e+08
Intercept:Monday-Tuesday-Thursday-Friday            6.002e+08   2.39e+07     25.115      0.000    5.53e+08    6.47e+08
Intercept:Saturday-Sunday                           2.554e+08   3.35e+07      7.627      0.000     1.9e+08    3.21e+08
Intercept:Wednesday                                  2.76e+08   4.85e+07      5.686      0.000    1.81e+08    3.71e+08
======================================================================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

[27]:

fig, ax1 = plt.subplots(1, 1, figsize=(10, 5))
aggregated_data = ts_auto2.aggregated_data.dropna(how="any", axis=0)
aggregated_data[["energy"]].plot(ax=ax1)
predictions = ts_auto2.model.predict()

ax1.plot(aggregated_data.index, predictions, label="Predicted")

n_cat = len(ts_auto2.resampled_categories.unique())
fig, axarr = plt.subplots(1, n_cat, figsize=(5 * n_cat, 10), sharey=True)
axarr = axarr.flatten() if n_cat > 1 else [axarr]
for i, cat in enumerate(ts_auto2.resampled_categories.unique()):
    mask = aggregated_data["category"] == cat
    ax = axarr[i]
    ax.scatter(
        aggregated_data[mask]["temperature"],
        aggregated_data[mask]["energy"],
        label="True",
    )
    ax.scatter(
        aggregated_data[mask]["temperature"], predictions[mask], label="Predicted"
    )
    ax.set_title(f"Energy consumption vs temperature \n for {cat}")
    ax.set_xlabel("mean Daily Temperature (°C)")
    if i == 0:
        ax.set_ylabel("Energy consumption (kWh)")
    ax.legend()

../_images/user_guide_thermosensitivity_real_34_0.png

../_images/user_guide_thermosensitivity_real_34_1.png

Conclusion of Automatic Day categorisation#

As you can see, the model is able to detect that the building has three different occupancy categories: - usual school days Monday, Tuesday, Thursday, Friday - usual weekends Saturday, Sunday - Wednesdays

However, the effect of the holidays is not taken into account in this implementation.

Conclusion#

As you can see, in contrast with synthetic data, the different days behaviors due to different building occupancy leads to multiple thermosensitivity model.

Discussion / Next steps#

The next step would be to improve the classifications. For instance, we can re-label the periods according to the observed energy consumption. One way to do so is the following: - compute the predicted energy consumption of the period for each category - estimate the probability of each category given the energy consumption of the period and the standard deviation of the energy consumption of the category - re-label the periods with the category with the highest probability (Max a posteriori estimation)

However, this cannot be used to detect unexpected occupancy, on the contrary.

Another way would be an unsupervised classification of the periods. But for now this is a be too complex for the model.

Using eat for thermosensitivity analysis: Real data#

Thermo-sensitivity analysis#

Modeling the Weekly energy#

Conclusion of Weekly Analysis#

Modeling the Daily energy#

Adding the school holidays#

Conclusion of Daily Analysis#

Automatic Day categorisation#

Conclusion of Automatic Day categorisation#

Conclusion#

Discussion / Next steps#

Using `eat` for thermosensitivity analysis: Real data#