My New Genetic Algorithm For Time Series

--

I developed new algorithm for timeseries forecast. This basically elimination algorithm which finds fittest points for general dataset and final points on data. Then, according to assumption, if last points test data error is low than the best gen is selected for forecasting.

Main Idea

Here is the main idea of algorithm finding first n minimum error gene to last n points of predicted data points. With randomly playin 2*n points, assumption is if the first n data fits well, other n value will fit well.

I shall start my algorithm by steps.

Trending AI Articles:

1. Deep Learning Book Notes, Chapter 1

2. Deep Learning Book Notes, Chapter 2

3. Machines Demonstrate Self-Awareness

4. Visual Music & Machine Learning Workshop for Kids

Firstly you can find the dataset here:

First the libraries which I used:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn import metrics
from scipy import stats
from scipy import spatial
import time
from sklearn.metrics import mean_absolute_error
import random
import math
import time

Actually I used numpy and pandas mainly.

data = pd.read_csv('monthlyBeer.csv')

I loaded data into “data” variable.

I justify the column names firstly for easy use.

data.columns = ['Month','beerProduction']

Next, I create my variables to create population.Before, I fill the Na if there is any.

data = data.bfill()

Then the variables for population here:

beerProduction_mean= data.beerProduction.mean()
beerProduction_DiffMean = data.beerProduction.diff().mean()
beerProduction_Diff_Var = (np.var(data.beerProduction.diff()))**(1/2)
nDay = 60

Especially, nDay variable is used for all the functions on genetic algorithm system.

Justify Function

This function simply justifies past data similar to last data points. This is made because system learns from past data. You can change this to ’n’ last data if you want.

def justice_data(dataFrame,Series,day_range):
for i in range(dataFrame.shape[0]-day_range*4):
values_mean = Series[dataFrame.shape[0]-day_range*2:dataFrame.shape[0]].values.mean()
Series[i:i+nDay*2] = Series[i:i+nDay*2] + (values_mean-Series[i:i+nDay*2].mean())

return dataFrame

Here I simply add and substract mean of last points to other data points

Population Creation

This function creates gens according to differences and between means with normal distribation.

def createPopulation(adet,day_range,mean,diffmean):
Population = []
for i in range(adet):
gen = []
for j in range(day_range*2):
gen.append(random.randint(int(mean-diffmean)-1,int(mean+diffmean)+1))
Population.append(gen)

return Population

adet means number of population which I take 2000 for use and hold them in a python list.

Training the Population

Before the mutation or crossover, I simply select best gens which fits the data points best with mea score.

def train(dataFrame,Series,GenHavuzu,day_range):
seçili_genler = []

for i in range(dataFrame.shape[0]-day_range*2):
values = Series[i:i+day_range*2].values
min_mae = mean_absolute_error(GenHavuzu[0],values)
for gen in GenHavuzu:
mae = mean_absolute_error(gen,values)
if mae < min_mae:
min_mae = mae
seçili_genler.append([GenHavuzu.index(gen),i])


return seçili_genler

After defined the functions I justify the data.

for i in range(100):
data_train = justice_data(data,data.beerProduction,nDay)

However, for the function transformation, data and data_train are the same. You can see the result by plotting

plt.plot(data_train.beerProduction,color = 'green')
plt.plot(data.beerProduction,color ='red')
plt.ylabel('simulation result of ratios')
plt.show()

I train 2000 number gen for future process. After which pass the process I will select the gens which passes the 2000 number.

topList = []
value = 0
i = 0
j = 0
while( i <= len(seçili_Genler)):
try:
if seçili_Genler[j][1] == i:
value = seçili_Genler[i][0]
j += 1
else:
topList.append(value)
print(value)
i += 1
except IndexError:
break

This Process for selecting best options for genes.

For the probability part, I hold the values as this.

öncelikliListe, öncelikliListe_counts = np.unique(topList,return_counts=True)
modifiedGen = []

Then the crossover function comes:

def crossover(Series,day_range):
global data
global öncelikliListe
global öncelikliListe_counts
global genHavuzu
global topList
global modifiedGen
genHavuzu = np.array(genHavuzu)
for i in range(data.shape[0]-day_range*4,data.shape[0]-day_range*2):
öncelikliListe, öncelikliListe_counts = np.unique(topList,return_counts=True)
values = Series[i:i+day_range*2].values
run = True
batch_threshold = 20
batch = 0
while(run):
if batch >= batch_threshold:
run = False
genHavuzu_Selected = np.random.choice(öncelikliListe,4,p = öncelikliListe_counts/sum(öncelikliListe_counts))
oldGen_1_15 = np.random.choice(genHavuzu[genHavuzu_Selected[0]-1],int(day_range/2))
oldGen_2_15 = np.random.choice(genHavuzu[genHavuzu_Selected[1]-1],int(day_range/2))
oldGen_3_15 = np.random.choice(genHavuzu[genHavuzu_Selected[2]-1],int(day_range/2))
oldGen_4_15 = np.random.choice(genHavuzu[genHavuzu_Selected[3]-1],int(day_range/2))
modifiedGen = np.concatenate((oldGen_1_15, oldGen_2_15,oldGen_3_15,oldGen_4_15),axis =None)
target = mean_absolute_error(modifiedGen,values)
val_1 = genHavuzu[genHavuzu_Selected[0]-1]
val_2 = genHavuzu[genHavuzu_Selected[1]-1]
val_3 = genHavuzu[genHavuzu_Selected[2]-1]
val_4 = genHavuzu[genHavuzu_Selected[3]-1]
thr_1 = mean_absolute_error(val_1,values)
thr_2 = mean_absolute_error(val_2,values)
thr_3 = mean_absolute_error(val_3,values)
thr_4 = mean_absolute_error(val_4,values)
if target < thr_1 and target < thr_2 and target < thr_3 and target < thr_4:
print("Completed")
genHavuzu = np.vstack((genHavuzu, modifiedGen))
topList.append(len(genHavuzu))
batch += 1

Here I select from genHavuzu (means gen pool) by probability weights and mix them with day_range value. This is 15, four of them fits the 60 size. If the four of them pass the old 4 one, I add the modified gen into gen pool.

crossover(data.beerProduction,nDay)

And lastly, the priority list are updated again:

öncelikliListe, öncelikliListe_counts = np.unique(topList,return_counts=True)
modifiedGen = []

Mutation Function

This function mutates the gen with 10 of values.

def mutation_gen(gen,Series):
x = np.random.choice(gen,10)
for i in range(len(gen)):
if gen[i] in x:
gen[i] = np.random.choice(Series.values)


return gen

It selects randomly from the series values.

Modification Function

def modification(Series,day_range):
global data
global öncelikliListe
global öncelikliListe_counts
global genHavuzu
global topList
global modifiedGen
mutated_kromozom = np.zeros(day_range*2)
nonMutated_kromozom = np.zeros(day_range*2)
genHavuzu = np.array(genHavuzu)
for i in range(data.shape[0]-day_range*4,data.shape[0]-day_range*2):
öncelikliListe, öncelikliListe_counts = np.unique(topList,return_counts=True)
values = Series[i:i+day_range*2].values
run = True
batch_threshold = 100
batch = 0
while(run):
if batch >= batch_threshold:
run = False
genHavuzu_Selected = np.random.choice(öncelikliListe,1,p = öncelikliListe_counts/sum(öncelikliListe_counts))
mutated_kromozom = mutation_gen(list(genHavuzu[genHavuzu_Selected[0]-1]),Series)
nonMutated_kromozom = genHavuzu[genHavuzu_Selected[0]-1]
thr_1 = mean_absolute_error(mutated_kromozom,values)
org_1 = mean_absolute_error(nonMutated_kromozom,values)
batch += 1
if thr_1 < org_1:
print("Completed")
genHavuzu = np.vstack((genHavuzu[:,0], mutated_kromozom))
topList.append(len(genHavuzu))

However, I could not update the öncelikliListe for priorities, This selects the first eliminated gens.

modification(data.beerProduction,nDay)

Then I select the last N gen from list beginning from first pool size

seçilmişGenler = genHavuzu[2000:len(genHavuzu)]#seçilmişGenler means selected gens

Maximum Fit Selection

This function provides to select minimum error gen for last prediction

def select_the_max(Series,day_range):
global seçilmişGenler
seçilmişGenler = list(seçilmişGenler)
min_mae = mean_absolute_error(seçilmişGenler[0][0:day_range],Series[data.shape[0]-day_range:data.shape[0]])
lastGen = []
for gen in seçilmişGenler:
if mean_absolute_error(gen[0:day_range],Series[data.shape[0]-day_range:data.shape[0]]) < min_mae:
min_mae = mean_absolute_error(gen[0:day_range],Series[data.shape[0]-day_range:data.shape[0]])
lastGen = gen
return lastGen,min_mae

I select the last gen to ‘sonGen’ variable and the error as ‘Hata’

sonGen,Hata = select_the_max(data.beerProduction,nDay)

Last Value Modification

The last gen is modified if there is another data points fits the gene. Again the 2*n data points are modified but first n is controlled.

def sonDegerModifikasyon(Series,day_range):
global sonGen
Last_gen = np.zeros(day_range)
for i in range(100000):
mutated_kromozom = mutation_gen(list(sonGen),Series)
target = Series[data.shape[0]-day_range:data.shape[0]]
val_1 = sonGen[0:day_range]
org_1 = mean_absolute_error(val_1,target)
thr_1 = mean_absolute_error(mutated_kromozom[0:day_range],target)
if thr_1 < org_1:
if (mean_absolute_error(mutated_kromozom[0:day_range],target)) < (mean_absolute_error(Last_gen[0:day_range],target)):
Last_gen = mutated_kromozom
print("Completed")


return Last_gen

Last Eliminated Gen with Modification

The last gene is generated by last modification

Son_elenmiş_gen = sonDegerModifikasyon(data.beerProduction,nDay)

Plot of last Fitted Last Values

plt.plot(Son_elenmiş_gen,color = 'green')
plt.plot(data['beerProduction'].values[data.shape[0]-nDay:data.shape[0]],color ='red')
plt.ylabel('simulation result of ratios')
plt.show()

The mae of the past n data is:

target = data.beerProduction[data.shape[0]-nDay:data.shape[0]]
val_1 = Son_elenmiş_gen[0:nDay]
org_1 = mean_absolute_error(val_1,target)
org_1
12.7552167839006

So it is a great optimization algorithm to fit past data to last n points.

Last Words

Actually, there is one step further which is testing every last n data points on actaul data and seeing the actual error on validation data set. I leave it to you this and mail me or comment here for any significant information. Thanks for reading.

Don’t forget to give us your 👏 !

--

--