1 Introduction

Our aim is to evaluate whether a machine can detect a recurring sequential pattern within a univariate time series (i.e., a single vector of observations that are temporally ordered). The time series we are working with are univariate – they look just like a long vector of numbers. The observations could reflect continuous measurement of air temperature, pupil size, heart rate, reaction time, etc. Most of these data are autocorrelated – i.e., where you are today predicts where you will be tomorrow.

# load some libraries
library(zoo)
library(tidyverse)
library(TSMining)
library(ngram)

2 Simulating a noisy time series, signals embedded

We’ll start by simulating some data in which there exists a repeating pattern such as a specific number sequence. Here it’s 1-10 repeated 10 times. We’ll call that signal1. We’ll also create a different number pattern (21,22,23,24,25,24, 23,22,21) repeated 10x and call that signal2. Each of these signals constitute their own unique little pattern or motif. Time series researchers call these snippets ‘subsequences’. A motif is a special case of a repeating subsequence. For example, we might expect to see a few years of cooling temperatures with a gradual return to baseline in the air temperature after a giant volcano explodes. This motif is a meaningful pattern nested within climate time series.

signal1 <- data.frame(rep(seq(1, 10), 10))
colnames(signal1) <- "obs"
signal2 <- data.frame(rep(c(21, 22, 23, 24, 25, 24, 22, 21), 10))
colnames(signal2) <- "obs"

What’s a signal withut noise? Let’s embed our two motifs into a noise distribution. We’ll create this by randomly sampling integers 1-40 with replacement (200 observations) then combine signal1, signal2, and noise into a single vector. Convert that to a time series object, and plot it.

set.seed(1234)
noise <- data.frame(sample(c(1:40), 200, replace = T))
colnames(noise) <- "obs"
both <- rbind(signal1, noise, signal2)
both.ts <- as.ts(both)
plot.ts(both.ts, col = "red")

3 Check for motifs

We have reached the critical point where we need to see if an agnostic (unsupervised) machine learning algoirthm can detect and extract the two motifs we embedded in there (signal1 and signal2) using univariate motif detection. We won’t tell it what to look for (that’s the unsupervised part), but we know the ground truth in advance. It needs to detect two motifs….and so it does.

both.look <- Func.motif(ts = both.ts, global.norm = T, local.norm = F, window.size = 20, 
    overlap = 0, w = 6, a = 5, mask.size = 5, max.dist.ratio = 1.2, count.ratio.1 = 1.1, 
    count.ratio.2 = 1.1)
length(both.look$Indices)
## [1] 2

Next step – get ggplot2 to identify and mark (color) the two motifs within the original time series.

col.motifs <- Func.visual.SingleMotif(single.ts = both.ts, window.size = 20, 
    motif.indices = both.look$Indices)

# Determine the total number of motifs discovered
n <- length(unique(col.motifs$data.1$Y))

# plot it
ggplot(data = col.motifs$data.1) + geom_line(aes(x = 1:dim(col.motifs$data.1)[1], 
    y = obs)) + geom_point(aes(x = 1:dim(col.motifs$data.1)[1], y = obs, color = Y, 
    shape = Y)) + scale_shape_manual(values = seq(1:n))