Dynamic Time Warping explained using Python and HAR dataset

Source Node: 1123947

This article was published as a part of the Data Science Blogathon

Introduction to Dynamic Time Wraping

The Time series classification is a very common task where you will have data from various domains like Signal processing, IoT, human activity, and more and the ultimate aim is to train a specific model so that it can predict the class of any time series with almost perfect accuracy. The given dataset should have labeled time sequences so that our model can predict the class of the time series accurately.

One Classic solution to this problem is by using the method of the K Nearest neighbor algorithm. Here in this article, we are going to skip over the usual approach of Euclidean distance and we will use the Dynamic Time Warping or DTW metric. This method does take into consideration that when we are comparing two different time series, they might vary in length and speed.

Cover Image | Dynamic Time Wraping

The approach is not that complex but something that is quite exclusive to the DTW or Dynamic Time Warping algorithm. Some common mistakes might include tuning the hyperparameters on the KNN model and simply overlooking the DTW part but one should not do that. One major disadvantage of the algorithm is that the time complexity increases almost exponentially for huge datasets with large sequences. Such models might not be possible to be trained in a reasonable amount of time. One might need to perform some additional tweaks in the algorithm to speed it up which we will discuss later.

How Dynamic Time Warping Works?

DTW or Dynamic Time Warping is a dynamic type of programming algorithm where one big type of problem is subdivided into various subproblems, if possible. Then the solution of the subproblems is stored for later use instead of computing it again. This is quite similar to Dynamic programming which is used in Data Structures and algorithms and hence that is the basic definition of the algorithm.

Those who forget the past are condemned to repeat it ” – George Santayana

In this article, we will focus on the Data Science part of the algorithm and use python as our base language to understand how the algorithm works. We will be using the library module known as “dtaidistance” which will help us to calculate the distance between two sine waves that are phase-shifted from each other.

To install the package, use the following code :

!pip install dtaidistance

For a notebook environment or remove the exclamation mark if you are installing via command prompt

Code :

from dtaidistance import dtw
from dtaidistance import dtw_visualisation as dtwvis
from random import sample
import numpy as np
x = np.arange(0, 20, .5)
s1 = np.sin(x)
s2 = np.sin(x - 1)
path = dtw.warping_path(s1, s2)
dtwvis.plot_warping(s1, s2, path)
distance = dtw.distance(s1, s2)

Output :

Optimal Warping distance between 2 sine waves

The above figure shows the Optimal Warping distance between 2 sine waves

The above figure shows the most optimal distance between all the points between the 2 Sine waves. The Dynamic programming matrix can also be plotted or also known as the programming matrix. This will show all the warping paths and as shown in the image below.

Code :

d, paths = dtw.warping_paths(s1, s2, window=20)
best_path = dtw.best_path(paths)
dtwvis.plot_warpingpaths(s1, s2, paths, best_path)

Output :

The Cost Matrix | Dynamic Time Wraping

The Cost Matrix

This type of matrix is also constructed which implementing another algorithm known as the Needleman-Wunsch algorithm. This algorithm is also used to align proteins and nucleotide sequences in bioinformatics. Along with this, another algorithm is used which is Levenshtein Distance. Both of these two algorithms belong to the classification of Dynamic Programming family algorithms.

In the second figure, each call is a number that represents the distance of any two respective points of data that are compared. One data point for each sequence. A darker colour signifies a lower distance and we can retrieve the most optimal path which Is extracted in the form of a red line as shown in the image. The time complexity of this is exactly O(m,n) where m and n are the respective sequences and have a quadratic cost. It is common in the real world that the sequences are large and the KNN model is not fully optimised, the overall model can take quite some time to get trained.

In our case, we are fully aware of the context for our problem and how our Dynamic time warping algorithm will work and hence we will also optimise the model to reduce the total time taken to train it. For most cases, we find that the optimal pathway is through the diagonal of the matrix as is also the case for our example. In the real world, the massive or huge phase shift is also uncommon and hence it is expected that the phase shift will be near about close to each other or else the cost will become too high. These all can be inferred from the dataset exploration phase and the important frequencies can also be extracted with Fourier Transformation to make sure that there are no problems in the data.

In the last code, we displayed the matrix using the ‘warping_paths()’ function which by default uses one argument ‘use_pruning=True’ and hence the top right and bottom left corners are pruned for optimisation. This means that after a certain cost, the values are not calculated as it would be CPU intensive and we have already found our desired result. In older versions of the module or the version before 2.3.2, you had to explicitly pass this parameter to the function. Now we will move onto a real case scenario to use our model

Real Case Study to Understand Dynamic Time Wraping

Human Activity recognition or HAR dataset is what we will use here and it can be downloaded from the UCL library which contains prelabeled time series data. This data has been captured with the help of a smartphone carried by a human being and the action can be classified into some specific activities like Sitting, Walking, Standing, Laying, Walking_Downstairs or Walking_Upstairs. All these can be inferred from the Angular velocity and Linear acceleration which can be obtained from the Smartphone hardware. Every observation is data that consists of 561 vector features with the time and frequency domain variable and a specific label that denotes the activity of a person at that moment. Our goal here is to build a model which can predict any activity of a human being by using the data from the smartphone.

Code :

x_train_file = open(r'UCI HAR Dataset/train/X_train.txt', 'r')
y_train_file = open(r'UCI HAR Dataset/train/y_train.txt', 'r')
x_test_file = open(r'UCI HAR Dataset/test/X_test.txt', 'r')
y_test_file = open(r'UCI HAR Dataset/test/y_test.txt', 'r')
x_train = []
y_train = []
x_test = []
y_test = []
labels = {1:'WALKING', 2:'WALKING UPSTAIRS', 3:'WALKING DOWNSTAIRS', 4:'SITTING', 5:'STANDING', 6:'LAYING'}
for x in x_train_file: x_train.append([float(ts) for ts in x.split()])
for y in y_train_file: y_train.append(int(y.rstrip('n')))
for x in x_test_file: x_test.append([float(ts) for ts in x.split()])
for y in y_test_file: y_test.append(int(y.rstrip('n')))
x_train = np.array(x_train)
y_train = np.array(y_train)
x_test = np.array(x_test)
y_test = np.array(y_test)
colors = ['#D62728','#2C9F2C','#FD7F23','#1F77B4','#9467BD', '#8C564A','#7F7F7F','#1FBECF','#E377C2','#BCBD27']
idx=0

Hereafter we are gonna test the two methods and this part is important as one part will show the execution time without optimisation and the other will be with it.

Code without Optimisation :

for r in range(len(x_test)): distance = dtw.distance(x_train[idx], x_test[r])

Execution Time of cell :

Wall time: 25min 16s

Code with Optimisation :

for r in range(len(x_test)): distance = dtw.distance(x_train[idx], x_test[r], window=20, use_pruning='True')

Execution Time of cell :

Wall time: 1min 42s

As you can see the significant difference on my laptop running Jupyter notebook natively on an i5 8th generation chip. We thus will use a KNN algorithm with k=20 and a window size of 20 for the Dynamic Time Warping Function. The ‘idx’ variable helps to store the index of the time series for our test set.

We will now define a function for our model and return the result based on the number of neighbours of the KNN provided and also the index of the respective time series of our test set and then will return one of the labels which can be :

Labels = { WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING }

Functional Code :

def classifyNN(k:int, idx:int) -> str: idxs=range(0,x_train.shape[0]) n=x_train.shape[0] distances=[] counters={} c=1; max_value=0 for r in range(n): distances.append(dtw.distance(x_test[idx], x_train[idxs[r]],window=10,use_pruning=True)) NN=sorted(range(len(distances)), key=lambda i: distances[i], reverse=False)[:k] for l in labels.values(): counters[l]=0 for r in NN: l=labels[y_train[r]] counters[l]+=1 if (counters[l])>max_value: max_value=counters[l] #print('NN(%d) has label %s' % (c,l)) c+=1 # Label with maximum frequency keys = [k for k in counters if counters[k] == max_value] # returning one random sample in case if a tie exists between multiple labels return (sample(keys,1)[0]) 

Test Cases

Here we will demonstrate some of the sample test cases but you can feel free to experiment with any ‘idx’ value and find the corresponding label for that test case.

1. First Test Case: Standing

Code for displaying the data :

k=20
idx=3
plt.plot(x_test[idx], label=labels[y_test[idx]], color=colors[y_test[idx]-1], linewidth=2)
plt.xlabel('Samples @50Hz')
plt.legend(loc='upper left')
plt.tight_layout()

Output :

Visualization of Standing variable

Code for finding the Test Label :

%%time
classifyNN(k,idx)

Output :

Wall time: 3min 13s 'STANDING' 

2. Second Test Case: Sitting

Code for Displaying the data :

k=20
idx=200
plt.plot(x_test[idx], label=labels[y_test[idx]], color=colors[y_test[idx]-1], linewidth=2)
plt.xlabel('Samples @50Hz')
plt.legend(loc='upper left')
plt.tight_layout()

Output :

Visualization of sitting variable

Code for finding the Test Label :

%%time
classifyNN(k,idx)

Output :

Wall time: 3min 15s 'SITTING' 

3. Third Test Case: Walking

Code for Displaying the data :

k=20
idx=401
plt.plot(x_test[idx], label=labels[y_test[idx]], color=colors[y_test[idx]-1], linewidth=2)
plt.xlabel('Samples @50Hz')
plt.legend(loc='upper left')
plt.tight_layout()

Output :

Visualization of Walking Variable | Dynamic Time Warping

Code for finding the Test Label :

%%time
classifyNN(k,idx)

Output :

Wall time: 3min 9s 'WALKING'

So in all these examples, we can see that the algorithm was able to correctly label the time series and has great accuracy. The windowing functionality also greatly reduces the time taken for the model to get trained and if you do not believe me, you can manually change the window argument and see the difference in the time required o train the model.

Is Parallelisation a possibility for optimization of Dynamic Time Wraping?

Well, the next general question here is about parallelization since when NumPy and pandas data frame were working on the huge dataset and something needed to be done in a shorter time, Spark was created among other reasons to use the best-case scenario of having multiple processors in a machine.

Similarly here we can see that in this current form of Dynamic Time Warping, parallelization is very tough and I will update it in the next article if I find any methods but you can provide some other optimizations to the algorithm to reduce the complexity but that will also affect the accuracy of the model. So the challenge remains to reduce the time complexity without much reduction in the accuracy of the model.

One possibility is to speed up the process if you calculate the DTW pair distances of all the time series data in the dataset and that does achieve a level of parallelism by the concept. This will help us to construct the distance matrix faster and also increasing the pruning option will allow the training to overlook more data points but that might not be the optimal solution in all cases.

Conclusion

Time series classification can be made very effective by using the combination of both KNN and DTW algorithms and here the dataset used is one of many possible scenarios where you can use this model to classify the data into various labels and by changing the input source to a live data or an input data stream, you can determine what activity a user is doing in real-time. There are many applications that use this in real-time like StepSetGo is an application that pays you in their currency for walking. So if you use this model and retrieve the user data directly from the smartphone sensors, you can determine the activity of the user at that moment.

Thank you for reading till the end. See you in my next article.

Arnab Mondal

Python Developer & Data Engineer | Freelance Tech Writer

Link to my other articles

Link to the collab Notebook

Image Sources:

  1. Image 1: https://unsplash.com/photos/BXOXnQ26B7o
  2. Image 2: https://unsplash.com/photos/ePAL0f6jjr0

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

Source: https://www.analyticsvidhya.com/blog/2021/10/dynamic-time-warping-explained-using-python-and-har-dataset/

Time Stamp:

More from Analytics Vidhya