energy_analysis_toolbox.timeseries.resample.index_transformation module#

Transforms indices of a time series to a new index according to a given function.

energy_analysis_toolbox.timeseries.resample.index_transformation.data_to_datetimeindex(data: Series | DataFrame | DatetimeIndex) DatetimeIndex[source]#

Convert the data to DatetimeIndex.

Used to allow the use of the same functions for Series, DataFrame and DatetimeIndex.

Parameters:

data (pd.Series | pd.DataFrame | pd.DatetimeIndex) – A pandas object

Returns:

pd.DatetimeIndex – Return data if already an index, else the index of the data

Raises:

ValueError – If the data cannot be converted to pandas.DateTimeIndex

energy_analysis_toolbox.timeseries.resample.index_transformation.estimate_timestep(data: Series | DataFrame | DatetimeIndex, method: str = 'median') float[source]#

Return an estimation of the sampling period of a time series.

Note

Each method has its own advantages and drawbacks. The best method depends on the data. For instance:

  • if the data is regularly spaced, the mode is the best choice.

  • if the data is irregularly spaced, the kde is the best choice.

  • the median is not sensitive to outliers, and is a good choice if

    the data is irregularly spaced and has outliers.

  • the mean is almost never a good choice.

Parameters:
  • data (pd.Series, pd.DataFrame, pd.DatetimeIndex) – the data to analyse. Must have (or be) a DatetimeIndex.

  • method ({'mean', 'median', 'mode', 'kde'}, optional) – the method used to compute the expected timestep. Defaults to ‘median’.

Returns:

float – the expected timestep of the data in (s).

Raises:

ValueError – If the method is not one of {‘mean’, ‘median’, ‘mode’, ‘kde’}.

energy_analysis_toolbox.timeseries.resample.index_transformation.fill_data_holes(data: ~energy_analysis_toolbox.timeseries.resample.index_transformation.T, method: str = 'mode', security_factor: float = 2, fill_value: float = <NA>) T[source]#

Return the data with new entries where the interval is too long.

Note

the new indexes are created using the expected timestep determined by method. The duration between the last new entry of a hole and the next (existing) entry is less or equal than the expected timestep.

Parameters:
  • data (pd.Series | pd.DataFrame) – The Data to process. Must have a DatetimeIndex

  • method ({'mean', 'median', 'mode', 'kde'}, optional) – The method to estimate the expected Frequency, by default “mode”. See estimate_timestep for more details.

  • security_factor (float, optional) – The factor used to determine a timestep is too long compared to the expected frequency, by default 2.

  • fill_value (float, optional) – The value of the newly created entries, by default pd.NA

Returns:

pd.Series | pd.DataFrame – A copy of data with new created entries, sorted.

energy_analysis_toolbox.timeseries.resample.index_transformation.fill_missing_entries(data: ~pandas.core.series.Series | ~pandas.core.frame.DataFrame, sampling_period: float, security_factor: float = 2, fill_value: float = <NA>) Series | DataFrame[source]#

Fill the data with new entries where the interval is too long.

Note

The duration between the last new entry of a hole and the next (existing) entry is less or equal than the sampling_period.

Parameters:
  • data (pd.Series | pd.DataFrame) – The Data to process. Must have a DatetimeIndex

  • sampling_period (float,) – The expected sampling period in (s)

  • security_factor (float, optional) – The factor used to determine when a timestep is too long compared to the sampling_period, which means that sampling_period * security_factor is the maximum duration (excluded) between two entries. By default 2.

  • fill_value (float, optional) – The value of the newly created entries, by default pd.NA

Returns:

pd.Series | pd.DataFrame – A copy of data with new created entries, sorted by index.

energy_analysis_toolbox.timeseries.resample.index_transformation.index_to_freq(index: DatetimeIndex, freq: str | Timedelta | None, origin: str | Timestamp | None = None, last_step_duration: float | None = None) DatetimeIndex[source]#

Return the expected index from resampling a time series to a given frequency.

Parameters:
  • index (pd.DatetimeIndex) – the index of the data to resample

  • freq (str, pd.Timedelta) – the freq to which the series is resampled. Must be a valid pandas frequency.

  • origin ({None, 'floor', 'ceil', pd.Timestamp}) –

    What origin should be used for the target resampling range. The following values are possible :

    • None : the default. Use the first index as the data a starting point.

    • 'floor' : use the first index of the data, floored to the passed freq resolution.

    • 'ceil' : use the first index of the data, ceiled to the passed freq resolution.

    • a pd.Timestamp : use the passed timestamp as starting point. The code tries to localize the value to the timezone of the first index in the data. Accordingly :

      • if the passed value is time-naive, it is localized to the timezone of the data;

      • if the data is time-naive, the timezone of the passed value is ignored and it is processed as if it were time-naive.

  • last_step_duration (float, optional) – the duration of the last step of the resampling in (s). If None, the duration of the former-last time-step is used. Used to deduce the end of the resampling range.

Returns:

pd.DatetimeIndex – The resulting index of the resampling. Empty if the passed index is empty.

energy_analysis_toolbox.timeseries.resample.index_transformation.max_kde_time_step(data: Series | DataFrame | DatetimeIndex) float[source]#

Return the maximum probable timestep of a time series.

Note

It differs from the Mode as the distribution is first estimated using a KDE. Then, the max of this distribution is used.

Warning

The KDE cannot be estimated if the data is regularly spaced. In this case, use another method.

Parameters:

data (pd.Series, pd.DataFrame, pd.DatetimeIndex) – the data to analyse. Must have (or be) a DatetimeIndex.

Returns:

float – the mode timestep of the data in (s).

energy_analysis_toolbox.timeseries.resample.index_transformation.mean_time_step(data: Series | DataFrame | DatetimeIndex) float[source]#

Return the mean timestep of a time series.

Parameters:

data (pd.Series, pd.DataFrame, pd.DatetimeIndex) – the data to analyse. Must have (or be) a DatetimeIndex.

Returns:

float – the mean timestep of the data in (s).

energy_analysis_toolbox.timeseries.resample.index_transformation.median_time_step(data: Series | DataFrame | DatetimeIndex) float[source]#

Return the median timestep of a time series.

Parameters:

data (pd.Series, pd.DataFrame, pd.DatetimeIndex) – the data to analyse. Must have (or be) a DatetimeIndex.

Returns:

float – the median timestep of the data in (s).

energy_analysis_toolbox.timeseries.resample.index_transformation.mode_time_step(data: Series | DataFrame | DatetimeIndex) float[source]#

Return the mode timestep of a time series.

Warning

The mode is the most frequent value. If there are several values with the same frequency, the first one is returned. If the values vary slightly around a central value, the mode is not representative of the data.

Parameters:

data (pd.Series, pd.DataFrame, pd.DatetimeIndex) – the data to analyse. Must have (or be) a DatetimeIndex.

Returns:

float – the mode timestep of the data in (s).

energy_analysis_toolbox.timeseries.resample.index_transformation.tz_convert_or_localize(timeseries: Series | DataFrame, tz: str | BaseTzInfo | None) Series | DataFrame[source]#

Assign the requested timezone to the index of a timeseries.

Parameters:
  • timeseries (pd.Series or pd.DataFrame) – Timeseries to convert.

  • tz (str or pytz.timezone or None) – Timezone to assign to the index of the timeseries.

Returns:

  • pd.Series – Timeseries with the requested timezone assigned to its index.

  • .. note:: – This function is just syntactic sugar to avoid dealing with the TypeError when applying tz_convert to a time-naive timeseries.

Important

When localizing a time-naive timeseries, the ambiguous and nonexistent arguments are set to True and ‘NaT’ respectively. This means that ambiguous times are localized to the beginning of the DST period and non-existent times are converted to ‘NaT’.