energy_analysis_toolbox.timeseries.extract_features.basics module#

Contains basic tools to extract features in timeseries.

energy_analysis_toolbox.timeseries.extract_features.basics.index_to_timesteps(time_indexes: DatetimeIndex, last_step: float | None = None) array[source]#

Return the array of interval durations of a DatetimeIndex.

Parameters:
  • time_indexes (pd.DatetimeIndex) – A sequence of time-steps in chronological order.

  • last_step (float, optional) – Duration of the last time-step in the series in (s). The default is None meaning that the same duration as the former-last one is used.

Raises:
  • EATEmptyDataError : – If time_indexes is empty.

  • EATUndefinedTimestepError : – If time_indexes contains only one element and last_step is None.

  • EATInvalidTimestepDurationError : – If last_step < 0.

Returns:

np.array – The series of durations in (s). The element i is the duration before the next index.

Important

The input time_indexes should contain at least two elements (one if the duration of the last time-step is provided).

See also

energy_analysis_toolbox.timeseries.extract_features.basics.timestep_durations which works on timeseries by applying this function to its index.

energy_analysis_toolbox.timeseries.extract_features.basics.intervals_over(series: Series, low_tshd: float, *, return_positions: bool = False) DataFrame[source]#

Return the overconsumption limits when the series values are over low_tshd.

Parameters:
  • series (pd.Series) – The series in which overconsumption of consecutive values over low_tshd are searched.

  • low_tshd (float) – Lower threshold on the values in the series for interval extractions : consecutive values of the series elements over (strict) this threshold are searched.

  • return_positions (bool, default False) – If True, return a second Dataframe with interval bounds as integer positions in the provided series instead of the labels. Default is False.

Returns:

  • overconsumption (pd.DataFrame) – The dataframe of overconsumption, with two columns: EATK.start_f and EATK.end_. Each row contains :

    • in EATK.start_f the label of the first instant of an interval when the values in the series are > low_tshd.

    • in EATK.end_ the label of the first instant after this interval

    such that the row describes interval as [start, end[.

  • iloc_bounds (pd.Dataframe, optional) – Dataframe with same structure as overconsumption, except that the values for the start and ends of overconsumption are integer positions (“ilocs”) in the original series instead of labels.

Notes

The algorithm used in this function is the following.

[1] First, locate the instants when the value of series changes either:

  • from a value =< low_tshd to value > low_tshd

  • from a value > low_tshd to value =< low_tshd

These shifts are interval limits.

[2] The limits of the timeseries are special cases. These bounds are considered as shifts if the value is over the threshold at these positions.

Recalling the indexation is for overconsumption which are open on the right side :

  • an interval start is the index of a positive change in values (first index of value > low_tshd),

  • an interval end is the index of a negative change (first index =< low_tshd value).

[3] Accordingly, with [1] and [2], shifts with even indices are interval starts while those with odd ones are interval ends.

[4] The dataframe of interval bounds is assembled and returned. The ilocs are returned as well in case return_positions is True.

Example

>>> time_begin = pd.Timestamp("2018-07-06 05:00:00")
>>> time_range = pd.date_range(
        start=time_begin,
        periods=8,
        inclusive='left',
        freq=pd.DateOffset(seconds=300))
>>> series = pd.Series(np.array([0, 1, 1.5, 2., 2., 3., 1.5, 0.]), index=time_range)
>>> intervals_over(series, 1.5)
                start                 end
    0 2018-07-06 05:15:00 2018-07-06 05:30:00
>>> intervals_over(series, 42)
    Empty DataFrame
    Columns: [start, end]
    Index: []
energy_analysis_toolbox.timeseries.extract_features.basics.timestep_durations(timeseries: Series, last_step: float | None = None) Series[source]#

Return the series of timestep durations of a timeseries.

Parameters:
  • timeseries (pd.Series) – A timeseries in chronological order.

  • last_step (float, optional) – Duration of the last time-step in the series in (s). The default is None meaning that the same duration as the former-last one is used.

Raises:
  • EATEmptyDataError : – If the series is empty.

  • EATUndefinedTimestepError : – If the series contains only one element and last_step is None.

Returns:

pd.Series – The series of durations in (s). The element at iloc i is the duration before the next index in the series.

Important

The input timeseries should contain at least two elements (one if the duration of the last time-step is provided).

See also

energy_analysis_toolbox.timeseries.extract_features.basics.index_to_timesteps which works directly from the series index.