Introduction to Timeseries Analysis using Python, Numpy only.

Published in

Becoming Human: Artificial Intelligence Magazine

6 min readDec 20, 2018

We will cover moving average, alternative line smoothing without averaging periods, detecting outliers, noise filtering and ARIMA.

TLDR, straight to the repository, https://github.com/huseinzol05/Machine-Learning-Numpy/tree/master/timeseries

We love to understand something from basic, without need to much depends on a lot of libraries, put a ‘Numpy only’ is a nice title clickbait. But as the title said, I will promised I will use Numpy only, and some help matplotlib for time series visualization and seaborn for nice visualization (I mean it).

In this story, I will use Tesla stock market! No particular reason why. Just want to use it. I downloaded from here, https://finance.yahoo.com/quote/TSLA/history?p=TSLA

Moving Average

We always heard from people, especially people that study stock market,

“if you want to understand stock market, please study moving average. By overlapping many of N-periods moving averages, you can know this stock going to achieve sky high!”

Not exactly, for sure, obviously. Moving average simply average or mean of certain N period. If my N is 3, and my period is a daily based, so I will average 3 days including current period, (t-2 + t-1 + t) / 3 , simple as that.

Trending AI Articles:

1. A Short Machine Learning Explanation
2. Natural vs Artificial Neural Networks
3. A.I. of the People, by the People, for the People
4. Face detection with OpenCV and Deep Learning from image

In python, we can write like this,

def moving_average(signal, period):
    buffer = [np.nan] * period
    for i in range(period,len(signal)):
        buffer.append(signal[i-period:i].mean())
    return buffer

reason why I put nan just want to make sure matplotlib will not plot on that period.

What we can observed from moving average? A trend!

If my N is 40, and my period is daily based, moving average will tells us what is exactly happen in last 40 days. Look at the yellow line between 50 and 100 x-axis, even there is sudden down (I called it sudden down and up), it restored back around 70-ish x-axis. Based on the red line, still at around 70-ish, red line is not really affected on that sudden down. So what we can say, TESLA usually returned from sudden up or down around 11–14 N series.

But the problem with Moving Average, it does not care so much about current period, t . As we always said, moving on from past, but not totally forget it.

Linearly Weighted Moving Average

Linearly Weighted Moving Average is a method of calculating the momentum of the price of an asset over a given period of time. This method weights recent data more heavily than older data, and is used to analyze trends.

If my N is 3, and my period is a daily based, ((t-2 * 1) + (t-1 * 2) + (t * 3)) / (1 + 2 + 3) .

In python, we can write like this,

def linear_weight_moving_average(signal, period):
    buffer = [np.nan] * period
    for i in range(period, len(signal)):
        buffer.append(
            (signal[i - period : i] * (np.arange(period) + 1)).sum()
            / (np.arange(period) + 1).sum()
        )
    return buffer

It is not really much difference from normal Moving Average, but what we can observe here is, the tendency of current period, t going to increase or not. But the high impact that tendency come from fresh periods, first quarter we can say.

moving on from moving average!

Alternative line smoothing

Sometime we just want to filter out some noisy spikes on the time series with need to remove some periods. Like moving average, the curse of moving average, we had to remove early N periods.

Anchor based

This method will take partial from t-1 plus t with given ratio, that is all.

In python, we can write like this,

def anchor(signal, weight):
    buffer = []
    last = signal[0]
    for i in signal:
        smoothed_val = last * weight + (1 - weight) * i
        buffer.append(smoothed_val)
        last = smoothed_val
    return buffer

Line smoothing usually not really has a strong purpose for a stock market, but for signal processing, yes.

Detecting outliers

When saying about outliers on time series, we mean it on sudden huge up and down spikes.

https://vignette.wikia.nocookie.net/psychology/images/b/bb/Normal_distribution_and_scales.gif/revision/latest?cb=20060916084308

This is a normal distribution, and look at ‘Z scores’, we assumed our outliers on TESLA is less than -2 and higher than +2.

First, we need to scale our time series,

std_signal = (signal - np.mean(signal)) / np.std(signal)

In python, we can write like this,

def detect(signal, treshold = 2.0):
    detected = []
    for i in range(len(signal)):
        if np.abs(signal[i]) > treshold:
            detected.append(i)
    return detected

The red ‘x’ is 3% of outliers.

Noise filtering

But I would like to call, Noise removal and get. Because we will able to plot the smooth signal and noise signal.

To filter out the noise,

def removal(signal, repeat):
    copy_signal = np.copy(signal)
    for j in range(repeat):
        for i in range(3, len(signal)):
            copy_signal[i - 1] = (copy_signal[i - 2] + copy_signal[i]) / 2
    return copy_signal

t-1 = (t-2 + t) / 2 and repeatedly so many times.

To get the noise,

def get(original_signal, removed_signal):
    buffer = []
    for i in range(len(removed_signal)):
        buffer.append(original_signal[i] - removed_signal[i])
    return np.array(buffer)

And this is how to use it,

removed_signal = removal(signal, 30)
noise = get(signal, removed_signal)

If you look at here, even at the original time series, around 200 x-axis is greater than around 70-ish x-axis. But based on noise removal get here, value of the noise at 70-ish x-axis is greater than 200 x-axis.

ARIMA, Autoregressive integrated Moving Average

Good thing about ARIMA, we able to use it to forecast future trend based on historical trend. Very classic but most of people don’t get exactly how it works, but they use it daily!

3 important parameters you need to know about ARIMA, ARIMA(p, d, q) .

‘p’ for the order (number of time lags).
‘d’ for degree of differencing.
‘q’ for the order of the moving-average.

‘p’ is how long the periods we need to look back.
‘d’ is the skip value during calculating future differences.
‘q’ is how many periods for moving average.

In python we can write like this,

def moving_average(signal, period):
    buffer = []
    for i in range(period, len(signal)):
        buffer.append(signal[i - period : i].mean())
    return bufferdef auto_regressive(signal, p, d, q, future_count = 10):
    """
    p = the order (number of time lags)
    d = degree of differencing
    q = the order of the moving-average
    """
    buffer = np.copy(signal).tolist()
    for i in range(future_count):
        ma = moving_average(np.array(buffer[-p:]), q)
        forecast = buffer[-1]
        for n in range(0, len(ma), d):
            forecast -= buffer[-1 - n] - ma[n]
        buffer.append(forecast)
    return buffer

Let say my moving average is 3, q= 3, d = 1
t+1 = t — (t-2 — MA[t-1]) — (t-3 — MA[t-2]) — (t-4 — MA[t-3])

Simple as that.

Let say I want to forecast 15 periods ahead,

future_count = 15
predicted_15 = auto_regressive(signal,15,1,2,future_count)
predicted_30 = auto_regressive(signal,30,1,2,future_count)

Look the differences between ARIMA 15 MA and ARIMA 30 MA. ARIMA 15 MA learnt from last 15 days, ARIMA 30 MA learnt from last 30 days.

We are done for now!

Yep, that’s all I want to cover for now. Hopefully you can understand what basically market analyst said to you, maybe last time it was sounds very jargon, but no more!

Feel free to contact me at husein.zol05@gmail.com