Introduction

Our aim is to evaluate whether a machine can detect a recurring seuqential pattern within a univariate time series (i.e., a single vector of observations that are temporally ordered and autocorrelated). This just looks like a long vector of numbers. These numbers could reflect continuous measurement of temperatures, pupil size, etc. Here are the libraries we’ll use –

library(TraMineR)
library(zoo)
library(tidyverse)
library(TSMining)

We’ll start by creating some dummy data in which structure a repeating pattern such as a specific number sequence. Here it’s 1-10 repeated 10 times. We’ll call that signal1. We’ll also create a different number pattern (21,22,23,24,25,24, 23,22,21) repeated 10x and call that signal2. Each of these signals constitute a little pattern. Time series researchers call these snippets ‘subsequences’. Motifs are repeating subsequences.

signal1 <- data.frame(rep(seq(1, 10), 10))
colnames(signal1) <- "obs"
signal2 <- data.frame(rep(c(21, 22, 23, 24, 25, 24, 22, 21), 10))
colnames(signal2) <- "obs"

What’s a signal withut noise? Let’s embed our two motifs into a noise distribution. We’ll create this by randomly sampling integers 1-40 with replacement (200 observations) then combine signal1, signal2, and noise into a single vector. Convert that to a time series object, and plot it.

noise <- data.frame(sample(c(1:40), 200, replace = T))
colnames(noise) <- "obs"
both <- rbind(signal1, noise, signal2)
both.ts <- as.ts(both)
plot.ts(both.ts, col = "red")

We have reached the critical point where we need to see if an agnostic (unsupervised) machine learning algoirthm can detect and extract the two motifs using univariate motif detection. We won’t tell it what to look for (that’s the unsupervised part), but we know the ground truth in advance. It needs to detect two motifs….and so it does.

both.look <- Func.motif(ts = both.ts, global.norm = T, local.norm = F, window.size = 20, 
    overlap = 0, w = 6, a = 5, mask.size = 5, max.dist.ratio = 1.2, count.ratio.1 = 1.1, 
    count.ratio.2 = 1.1)
length(both.look$Indices)
## [1] 2

Next step – get ggplot2 to identify and color the two motifs within the original time series.

col.motifs <- Func.visual.SingleMotif(single.ts = both.ts, window.size = 20, 
    motif.indices = both.look$Indices)

# Determine the total number of motifs discovered
n <- length(unique(col.motifs$data.1$Y))

# plot it
ggplot(data = col.motifs$data.1) + geom_line(aes(x = 1:dim(col.motifs$data.1)[1], 
    y = obs)) + geom_point(aes(x = 1:dim(col.motifs$data.1)[1], y = obs, color = Y, 
    shape = Y)) + scale_shape_manual(values = seq(1:n))