Aligning shifted series with MP #1051

joshualeond · 2024-12-12T22:01:51Z

joshualeond
Dec 12, 2024

Hi, I'm just coming across this package and it looks great! I'm curious if the Matrix Profile would be a possible solution for aligning shifted time series?

Here's an example of what I mean:

import numpy as np
import polars as pl
import matplotlib.pyplot as plt

def generate_data():
  np.random.seed(42)
  timestamps = np.linspace(0, 10, 500)
  base_signal = np.sin(2 * np.pi * timestamps)

  shifts = [0, 0.5, 1, -0.3]
  sensor_ids = [f"sensor_{i}" for i in range(len(shifts))]
  data = []
  for shift, sensor_id in zip(shifts, sensor_ids):
    shift_samples = int(shift * 60)
    signal = np.roll(base_signal, shift_samples) + np.random.normal(0, 0.1, len(base_signal))
    for ts, val in zip(timestamps, signal):
      data.append({"timestamp": ts, "value": val, "sensor_id": sensor_id})

  df = pl.DataFrame(data)
  return df, timestamps, sensor_ids

def plot_signals(df, title):
  plt.figure(figsize = (10, 6))
  sensor_ids = df["sensor_id"].unique().to_list()
  for sensor_id in sensor_ids:
    sensor_data = df.filter(pl.col("sensor_id") == sensor_id)
    plt.plot(sensor_data["timestamp"].to_numpy(), sensor_data["value"].to_numpy(), label = sensor_id)
  plt.title(title)
  plt.xlabel("Timestamp")
  plt.ylabel("Value")
  plt.legend()
  plt.show()

df, timestamps, sensor_ids = generate_data()
plot_signals(df, "Original Time Series")

Is it possible to use the matrix profile (perhaps the multivariate mstump approach) to align these series? Or is this a place where dynamic time warping (DTW) would be better suited?

Answered by seanlaw

Dec 13, 2024

It's a little hard to tell what you are asking but AB-joins are somewhat different from using stumpy.mass.

For an AB-join, you have time series A, time series B, and a window size m. For every sliding window "subsequence" in A, you are hunting for its one-nearest (subsequence) neighbor in B. Note that the nearest neighbor can exist ANYWHERE in B. For example, the first subsequence in A can have its nearest neighbor located at the END of B while the second subsequence in A (which is shifted over by one index value) can have its nearest neighbor located at the BEGINNING of B. In other words, there is no guaranteed "ordering" in the nearest neighbors (they are where they are and we can't mak…

View full answer

seanlaw · 2024-12-13T02:17:22Z

seanlaw
Dec 13, 2024
Maintainer

@joshualeond Thank you for your question and welcome to the STUMPY community. So, I think the answer depends on your data and what you are hoping to accomplish. With this (contrived) example, it's basically a phase shift and so you might be better off computing autocorrelation to determine what kind of shift would give you the best alignment. Matrix profiles would only tell you, locally, where the nearest neighbor is across two time series if you performed an AB-join. Unless your time series experiences some compression/expansion then I'm not sure that DTW would be helpful either. Thus, the answer is really, "it depends"

2 replies

joshualeond Dec 13, 2024
Author

Thanks for taking the time to review this Sean!

Interesting point about the autocorrelation that I should consider. Yes, the example of course is contrived and is a simplified version of what I'm actually attempting to use stumpy to solve. My real-world example is of course messier but in general (with the eye) you can "see" how shifting signals around left/right could align them closely and that's what I'm attempting to do in a clean/programmatic way.

I played more with this and using stumpy.mass seems to get me close to what I'm looking for. Now need to use against my real examples to see how well this works.

Is the AB-join similar to doing this stumpy.mass and np.argmin as I'm doing here?

import numpy as np
import polars as pl
import matplotlib.pyplot as plt
import stumpy

def generate_data():
    np.random.seed(42)
    timestamps = np.linspace(0, 10, 500)
    base_signal = np.sin(2 * np.pi * timestamps)

    shifts = [0, 0.5, 1, -0.3]
    sensor_ids = [f"sensor_{i}" for i in range(len(shifts))]
    data = []
    for shift, sensor_id in zip(shifts, sensor_ids):
      shift_samples = int(shift * 60)
      signal = np.roll(base_signal, shift_samples) + np.random.normal(0, 0.1, len(base_signal))
      for ts, val in zip(timestamps, signal):
        data.append({"timestamp": ts, "value": val, "sensor_id": sensor_id})

  df = pl.DataFrame(data)
  return df, timestamps, sensor_ids

def plot_signals(df, title):
    plt.figure(figsize=(10, 6))
    sensor_ids = df["sensor_id"].unique().to_list()
    for sensor_id in sensor_ids:
        sensor_data = df.filter(pl.col("sensor_id") == sensor_id)
        plt.plot(sensor_data["timestamp"].to_numpy(), sensor_data["value"].to_numpy(), label=sensor_id)
    plt.title(title)
    plt.xlabel("Timestamp")
    plt.ylabel("Value")
    plt.legend()
    plt.show()

df, timestamps, sensor_ids = generate_data()
plot_signals(df, "Original Time Series")

def shift_signal(signal, shift_amount):
    """
    Shift a signal to the left or right without wrapping.
    If shifted left and points run out, the series ends early.
    If shifted right, the beginning is filled with NaNs.
    """
    if shift_amount > 0:  # Shift right
        return np.concatenate((np.full(shift_amount, np.nan), signal[:-shift_amount]))
    elif shift_amount < 0:  # Shift left
        shift_amount = abs(shift_amount)
        return signal[shift_amount:]
    return signal  # No shift


def align_and_facet_plot_all(reference_sensor_id, df, timestamps, window_size=50):
    # Prepare signals and sensor IDs
    sensor_ids = df["sensor_id"].unique().to_list()
    reference_signal = df.filter(pl.col("sensor_id") == reference_sensor_id)["value"].to_numpy()

    # Define a consistent style for each sensor
    styles = {sensor_id: {"color": plt.cm.tab10(i)} for i, sensor_id in enumerate(sensor_ids)}

    fig, axes = plt.subplots(2, 1, figsize=(12, 10), sharex=True)

    # Plot all original signals on the top facet
    for sensor_id in sensor_ids:
        sensor_signal = df.filter(pl.col("sensor_id") == sensor_id)["value"].to_numpy()
        axes[0].plot(
            timestamps[:len(sensor_signal)],
            sensor_signal,
            label=f"Original: {sensor_id}",
            **styles[sensor_id]
        )

    axes[0].set_title("Original Signals")
    axes[0].set_ylabel("Value")
    axes[0].legend()

    # Plot the reference signal and aligned signals on the bottom facet
    axes[1].plot(
        timestamps[:len(reference_signal)],
        reference_signal,
        label=f"Reference: {reference_sensor_id}",
        linewidth=2,
        **styles[reference_sensor_id]
    )

    for sensor_id in sensor_ids:
        if sensor_id == reference_sensor_id:
            continue

        target_signal = df.filter(pl.col("sensor_id") == sensor_id)["value"].to_numpy()

        # Compute distance profile and find optimal shift
        distance_profile = stumpy.mass(reference_signal[:window_size], target_signal)
        optimal_shift_index = np.argmin(distance_profile)

        # Align the target signal using clipping instead of wrapping
        aligned_signal = shift_signal(target_signal, -optimal_shift_index)

        # Plot the aligned signal
        axes[1].plot(
            timestamps[:len(aligned_signal)],
            aligned_signal,
            label=f"Aligned: {sensor_id}",
            **styles[sensor_id]
        )

    axes[1].set_title("Aligned Signals (Shifted)")
    axes[1].set_xlabel("Timestamp")
    axes[1].set_ylabel("Value")
    axes[1].legend()

    plt.tight_layout()
    plt.show()

# Example: Align and facet plot for sensor_0 with all other sensors
align_and_facet_plot_all("sensor_0", df, timestamps, window_size=450)

seanlaw Dec 13, 2024
Maintainer

It's a little hard to tell what you are asking but AB-joins are somewhat different from using stumpy.mass.

For an AB-join, you have time series A, time series B, and a window size m. For every sliding window "subsequence" in A, you are hunting for its one-nearest (subsequence) neighbor in B. Note that the nearest neighbor can exist ANYWHERE in B. For example, the first subsequence in A can have its nearest neighbor located at the END of B while the second subsequence in A (which is shifted over by one index value) can have its nearest neighbor located at the BEGINNING of B. In other words, there is no guaranteed "ordering" in the nearest neighbors (they are where they are and we can't make any assumptions).

In the case of stumpy.mass, you have a window size m and you are selecting a subsequence of size m from time series A (i.e., A[i : i + m], where i is some start index). And then, you compute the distance profile using stumpy.mass and argmin to find out its one nearest neighbor. So, in essence, you only care about the A[i : i + m] and how it compares to B. And where ever the nearest neighbor is, you'll shift the ENTIRE time series A by the same amount. Basically, you are "overfitting" A by shifting relative to A[i : i + m]. If you know a "good" A[i : i + m] to choose then stumpy.mass is probably "good enough".

In the AB-join, you are finding ALL nearest neighbors and NOT assuming that any particular A[i : i + m] is more/less important than the others. But then you fall into the issue that:

Some subsequences in A do not have a good match in B
Some subsequences in A may need to be shifted to the right (along B) while OTHER subsequences in A may need to be shifted to the left (along B). So which way "should" you shift? Taking an average shift is bad too because you may end up not shifting at all

Again, I don't know your data so you may not have any of these issues. I'm trying not to make too many assumptions and wanted to provide some perspective for you to consider.

Answer selected by joshualeond

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aligning shifted series with MP #1051

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Aligning shifted series with MP #1051

joshualeond Dec 12, 2024

Replies: 1 comment · 2 replies

seanlaw Dec 13, 2024 Maintainer

joshualeond Dec 13, 2024 Author

seanlaw Dec 13, 2024 Maintainer

joshualeond
Dec 12, 2024

Replies: 1 comment 2 replies

seanlaw
Dec 13, 2024
Maintainer

joshualeond Dec 13, 2024
Author

seanlaw Dec 13, 2024
Maintainer