energy_analysis_toolbox.timeseries.extract_features.basics module#
Contains basic tools to extract features in timeseries.
- energy_analysis_toolbox.timeseries.extract_features.basics.index_to_timesteps(time_indexes: DatetimeIndex, last_step: float | None = None) array[source]#
Return the array of interval durations of a DatetimeIndex.
- Parameters:
time_indexes (pd.DatetimeIndex) – A sequence of time-steps in chronological order.
last_step (float, optional) – Duration of the last time-step in the series in (s). The default is
Nonemeaning that the same duration as the former-last one is used.
- Raises:
EATEmptyDataError : – If
time_indexesis empty.EATUndefinedTimestepError : – If
time_indexescontains only one element andlast_stepisNone.EATInvalidTimestepDurationError : – If
last_step < 0.
- Returns:
np.array – The series of durations in (s). The element i is the duration before the next index.
Important
The input
time_indexesshould contain at least two elements (one if the duration of the last time-step is provided).See also
energy_analysis_toolbox.timeseries.extract_features.basics.timestep_durationswhich works on timeseries by applying this function to its index.
- energy_analysis_toolbox.timeseries.extract_features.basics.intervals_over(series: Series, low_tshd: float, *, return_positions: bool = False) DataFrame[source]#
Return the overconsumption limits when the series values are over
low_tshd.- Parameters:
series (pd.Series) – The series in which overconsumption of consecutive values over
low_tshdare searched.low_tshd (float) – Lower threshold on the values in the series for interval extractions : consecutive values of the series elements over (strict) this threshold are searched.
return_positions (bool, default False) – If
True, return a second Dataframe with interval bounds as integer positions in the providedseriesinstead of the labels. Default isFalse.
- Returns:
overconsumption (pd.DataFrame) – The dataframe of overconsumption, with two columns:
EATK.start_fandEATK.end_. Each row contains :in
EATK.start_fthe label of the first instant of an interval when the values in the series are > low_tshd.in
EATK.end_the label of the first instant after this interval
such that the row describes interval as
[start, end[.iloc_bounds (pd.Dataframe, optional) – Dataframe with same structure as
overconsumption, except that the values for the start and ends of overconsumption are integer positions (“ilocs”) in the original series instead of labels.
Notes
The algorithm used in this function is the following.
[1] First, locate the instants when the value of
serieschanges either:from a value
=< low_tshdto value> low_tshdfrom a value
> low_tshdto value=< low_tshd
These shifts are interval limits.
[2] The limits of the timeseries are special cases. These bounds are considered as shifts if the value is over the threshold at these positions.
Recalling the indexation is for overconsumption which are open on the right side :
an interval start is the index of a positive change in values (first index of value
> low_tshd),an interval end is the index of a negative change (first index
=< low_tshdvalue).
[3] Accordingly, with [1] and [2], shifts with even indices are interval starts while those with odd ones are interval ends.
[4] The dataframe of interval bounds is assembled and returned. The ilocs are returned as well in case
return_positionsisTrue.Example
>>> time_begin = pd.Timestamp("2018-07-06 05:00:00") >>> time_range = pd.date_range( start=time_begin, periods=8, inclusive='left', freq=pd.DateOffset(seconds=300)) >>> series = pd.Series(np.array([0, 1, 1.5, 2., 2., 3., 1.5, 0.]), index=time_range) >>> intervals_over(series, 1.5) start end 0 2018-07-06 05:15:00 2018-07-06 05:30:00 >>> intervals_over(series, 42) Empty DataFrame Columns: [start, end] Index: []
- energy_analysis_toolbox.timeseries.extract_features.basics.timestep_durations(timeseries: Series, last_step: float | None = None) Series[source]#
Return the series of timestep durations of a timeseries.
- Parameters:
timeseries (pd.Series) – A timeseries in chronological order.
last_step (float, optional) – Duration of the last time-step in the series in (s). The default is
Nonemeaning that the same duration as the former-last one is used.
- Raises:
EATEmptyDataError : – If the series is empty.
EATUndefinedTimestepError : – If the series contains only one element and
last_stepisNone.
- Returns:
pd.Series – The series of durations in (s). The element at iloc i is the duration before the next index in the series.
Important
The input timeseries should contain at least two elements (one if the duration of the last time-step is provided).
See also
energy_analysis_toolbox.timeseries.extract_features.basics.index_to_timestepswhich works directly from the series index.