Game experients

In [34]:
import numpy as np
import matplotlib.pyplot as plt
In [36]:
plt.rcParams['figure.figsize'] = (12.0, 8.0)

Just some fun with binary conversions

In [2]:
def IntToBin(number):
    
    # this function returns this list
    out = []
    
    run = number 
    while(run > 0):
        run , resto = divmod(run,2)
        out.insert(0,resto)
        
    return out    
In [3]:
def BinToInt(lista):
    
    out = 0 
    expoente = len(lista) - 1
    
    for i in range(len(lista)):
        out += lista[i] * (2**expoente)
        expoente -= 1
    
    return out    

One common operation between numbers, specially simple if they are already in binary format, is the xor!

In [4]:
def XorLists(lista_a , lista_b):
    
    # we copy the lists and help_a will be the longer one        
    if len(lista_a) > len(lista_b):
        help_a = lista_a
        help_b = lista_b
    else :
        help_a = lista_b
        help_b = lista_a
        
    # we add 0's in the front of the shorter until both have equal length    
    while (len(help_b) < len(help_a)):
        help_b.insert(0,0)
        
    # now we perform the xor over the correspondent elements of both lists
    return list(map(lambda x , y : x ^ y , help_a , help_b))        
In [5]:
lista100 = IntToBin(100)
lista12343 = IntToBin(12343)

xorlista = XorLists(lista100 , lista12343)
BinToInt(xorlista)
Out[5]:
12371

What we have done corresponds indeed to the usual concept of xor between two integers in Python.

In [6]:
print(100 ^ 12343)
print(12343 ^ 100)
12371
12371

Application :: How fast can we make powers of integers?

In [7]:
expoente = IntToBin(100)
expoente
Out[7]:
[1, 1, 0, 0, 1, 0, 0]
In [8]:
3**BinToInt(expoente)
Out[8]:
515377520732011331036461129765621272702107522001
In [9]:
base = 3
resultado = 1

for i in range(len(expoente)- 1 , -1 , -1 ):
    if (expoente[i] == 1):
        resultado *= base
    base *= base
    print(base , resultado)
9 1
81 1
6561 81
43046721 81
1853020188851841 81
3433683820292512484657849089281 150094635296999121
11790184577738583171520872861412518665678211592275841109096961 515377520732011331036461129765621272702107522001

What numbers are given as signed powers of 2?

In [50]:
def signedCodification(x):
    out = -1
    if (x > 0.5):
        out = 1
    return int(out)
In [51]:
cumsum = 0 

ngames = 2000
game_length = 10
values = []

for i in range(ngames):
    cumsum = 0
    for j in range(game_length):
        cumsum += signedCodification(random.random())*2**j
        values.append(cumsum) 
    
after_values = list(set([abs(x) for x in values]))
after_values.sort()
In [52]:
print(after_values)
[1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 101, 103, 105, 107, 109, 111, 113, 115, 117, 119, 121, 123, 125, 127, 129, 131, 133, 135, 137, 139, 141, 143, 145, 147, 149, 151, 153, 155, 157, 159, 161, 163, 165, 167, 169, 171, 173, 175, 177, 179, 181, 183, 185, 187, 189, 191, 193, 195, 197, 199, 201, 203, 205, 207, 209, 211, 213, 215, 217, 219, 221, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 251, 253, 255, 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 285, 287, 289, 291, 293, 295, 297, 299, 301, 303, 305, 307, 309, 311, 313, 315, 317, 319, 321, 323, 325, 327, 329, 331, 333, 335, 337, 339, 341, 343, 345, 347, 349, 351, 353, 355, 357, 359, 361, 363, 365, 367, 369, 371, 373, 375, 377, 379, 381, 383, 385, 387, 389, 391, 393, 395, 397, 399, 401, 403, 405, 407, 409, 411, 413, 415, 417, 419, 421, 423, 425, 427, 429, 431, 433, 435, 437, 439, 441, 443, 445, 447, 449, 451, 453, 455, 457, 459, 461, 463, 465, 467, 469, 471, 473, 475, 477, 479, 481, 483, 485, 487, 489, 491, 493, 495, 497, 499, 501, 503, 505, 507, 509, 511, 513, 515, 517, 519, 521, 523, 525, 527, 529, 531, 533, 537, 539, 541, 543, 545, 547, 549, 551, 553, 555, 557, 559, 561, 563, 565, 567, 569, 571, 573, 575, 577, 579, 581, 583, 585, 587, 589, 591, 593, 595, 597, 599, 601, 603, 605, 607, 609, 611, 613, 615, 617, 619, 621, 623, 625, 627, 629, 631, 633, 635, 637, 639, 641, 643, 645, 647, 649, 651, 653, 655, 657, 659, 661, 663, 665, 667, 669, 671, 673, 675, 677, 681, 683, 685, 687, 689, 691, 693, 695, 697, 699, 701, 703, 705, 707, 709, 711, 713, 715, 717, 719, 721, 723, 725, 727, 729, 731, 733, 735, 737, 739, 741, 743, 745, 747, 749, 751, 753, 755, 757, 759, 761, 763, 765, 767, 769, 771, 773, 775, 777, 779, 781, 783, 785, 787, 789, 791, 795, 797, 799, 801, 803, 805, 807, 809, 811, 813, 815, 817, 819, 821, 823, 825, 827, 829, 831, 833, 835, 837, 839, 841, 843, 845, 847, 849, 851, 853, 855, 857, 859, 861, 863, 865, 867, 869, 871, 873, 875, 877, 879, 881, 883, 885, 887, 889, 891, 893, 895, 897, 899, 901, 903, 905, 907, 909, 911, 913, 915, 917, 919, 921, 923, 925, 927, 929, 931, 933, 935, 937, 939, 941, 943, 945, 947, 949, 951, 953, 955, 957, 959, 961, 963, 965, 967, 969, 971, 973, 975, 977, 979, 981, 983, 985, 987, 989, 991, 993, 995, 997, 999, 1001, 1003, 1005, 1007, 1009, 1011, 1013, 1015, 1017, 1019, 1021, 1023]

Is indeed very easy to see that we can get any odd integer using this method. Basically every odd number, because is an integer, has a binary expansion with the last term always equal to 1.

In [53]:
IntToBin(19)
Out[53]:
[1, 0, 0, 1, 1]
In [54]:
BinToInt([1,1,-1,-1,1])
Out[54]:
19
In [55]:
IntToBin(245)
Out[55]:
[1, 1, 1, 1, 0, 1, 0, 1]
In [57]:
BinToInt([1, 1, 1, 1, 1, -1, 1, -1])
Out[57]:
245

\textbf{Question:} can you generalize the idea of this algorithm and from that generalization create a signed binary expansion for any odd integer?

Double or Nothing or Half or Nothing

In the first version of the game we assume that the number of draws is fixed. Suppose that $n=10$.

In [10]:
def HalfGameRun(ndraws):
    
    def codification(x):
        if (x > 0.5): 
            return 1 
        else: 
            return -1 
    
    tmp = np.array(list(map(lambda x : codification(x) , np.random.uniform(size=ndraws))))
    
    out = np.multiply(np.array([1.0/2**i for i in range(ndraws)]),tmp)  

    return out

def DoubleGameRun(ndraws):
    
    def codification(x):
        if (x > 0.5): 
            return 1 
        else: 
            return -1 
    
    tmp = np.array(list(map(lambda x : codification(x) , np.random.uniform(size=ndraws))))
    
    out = np.multiply(np.array([2**i for i in range(ndraws)]),tmp)  

    return out
In [11]:
np.sum(DoubleGameRun(100))
Out[11]:
-297119446206652438270140514331
In [12]:
ndraws = 100
nsimul = 100000

simulation = np.array([])

for i in range(nsimul):
    simulation = np.append(simulation,np.sum(DoubleGameRun(ndraws)))
In [13]:
print(np.mean(simulation))
print(np.std(simulation))
4.724352561605003e+26
7.329813234206436e+29

Get Even Game

This game is very simple and it is based in a simple repetition of Heads and Tails. You start with €1, you choose one of the possible results (we will suppose Heads) and you have to pay €1 each time you make a play. If the result of the draw is Heads you receive €2, otherwise you receive nothing. The game stops when you run out of cash. Wins the player who gets a longer series.

After $n$ draws, if positive, your cash will be $2S_n-n$, where $S_n=X_0+\dots+X_{n-1}$ and $X_k=1$ if the result is Heads and $0$ otherwise. Again, we start by generating a random number in the interval $[0,1)$ and convert it to $0$ or $1$.

In [14]:
def codification(x):
    if (x > 0.5): 
        return 1 
    else: 
        return 0 

Because we do not know in advance how many numbers we will need to generate, the most efficient method is to use the \textbf{random} library.

In [15]:
import random

One simple experiment of the game will be given by the following code.

In [16]:
ndraws = 1
currentvalue = 2*codification(random.random()) - 1

while(currentvalue != 0):
    currentvalue += 2*codification(random.random()) - 1
    ndraws += 1
    
print("The number of draws until we comeback to zero: %d" %ndraws)
The number of draws until we comeback to zero: 46

Putting everything in a cycle we will simulate the game several times.

In [63]:
nsimuls = 10000
resultados = []

for i in range(nsimuls):
    currentvalue = 2 * codification(random.random()) - 1
    ndraws = 1
    
    while (currentvalue != 0):
        currentvalue += 2 * codification(random.random()) - 1
        ndraws += 1
    
    resultados.append(ndraws)
In [64]:
print(min(resultados),max(resultados))
2 118232978
In [19]:
# in order the algorithm works perfectly you need to sort the list
resultados.sort()

array_resultados = np.array([[],[]],dtype=np.int64).transpose()

number = resultados[0]
count = 1 ;

for i in range(0,len(resultados)):
    if (resultados[i]==number):
        count += 1
    else :
        array_resultados = np.append(array_resultados,[[number,count]],axis = 0)
        number = resultados[i]
        count = 1
In [65]:
print(array_resultados)
[[        2      4978]
 [        4      1266]
 [        6       636]
 ...
 [  5997378         1]
 [ 11384444         1]
 [125943938         1]]
In [21]:
# array_resultados = np.array([[],[]],dtype=np.int64).transpose()

# for k in range(min(resultados),max(resultados)+1):
#     count_k = resultados.count(k)
#     #len([i for i, j in enumerate(resultados) if j == k])
#     if (count_k > 0):
#         array_resultados = np.append(array_resultados,[[k,count_k]],axis = 0)
        
# print(array_resultados.shape)
In [66]:
tamanho = int(array_resultados.shape[0]/10)
coord_x = array_resultados[0:tamanho,0]
coord_y = (1.0/nsimuls) * array_resultados[0:tamanho,1]
# the series to which we want to make a comparison
comp_y = np.array([1.0/i**2 for i in coord_x])

plt.figure(figsize=(12,8))
plt.plot(coord_x,coord_y)
plt.plot(coord_x,comp_y)
plt.show()

Reading Results From File

In [23]:
file_results = open("GetEvenSimulations.txt" , "r")
In [24]:
resultados = np.array([int(x) for x in file_results])
In [25]:
resultados[0:10]
Out[25]:
array([ 4,  6,  2,  2,  2,  6, 50, 76,  2,  6])
In [26]:
print(min(resultados),max(resultados))
2 1584856
In [27]:
m = max(resultados)
[i for i, j in enumerate(resultados) if j == m]
Out[27]:
[1316]
In [28]:
resultados[30928]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-28-95d556328dcc> in <module>
----> 1 resultados[30928]

IndexError: index 30928 is out of bounds for axis 0 with size 1695
In [ ]:
bignumbers = [x for x in resultados if x > 1000000]
In [ ]:
print(len(bignumbers))
In [ ]:
 

Online Algorithm to Compute the mean

Suppose that we have a sample, call it $(x_i)$ with $i\in\{0,\dots,N-1\}$, and we want to estimate its mean value, call it mu. The usual method to do it is to use the sample mean from which we obtain\

\begin{equation}\label{eq:sgd.10} \hat\mu_N = \frac{1}{N}\sum_{i=0}^{N-1} x_i. \end{equation}

Plain and simple. Now suppose that $N$ is very large, really, really large and you have a old, really, really old computer. Your computer is so old that it does not have enough memmory to load the array on it (and you do not know how to do incremental mean). Because your sample is really big, it comes to you that perhaps you do not need to compute the sum of all elements. The idea is that under a certain degree of accuracy

$$ \hat\mu_N = \hat\mu_{N-1} = \dots = \hat\mu_{N-p}. $$

If $N$ is big then $p$ may be also big. Indeed, for $N=0,\dots,k$ we easily derive the incremental mean function as

\begin{equation}\label{eq:sgd.30} \hat\mu_N = \left(\frac{N}{N+1}\right) \hat\mu_{N-1} + \left(\frac{1}{N+1}\right)x_N. \end{equation}

The last recursive relation is translated into python using the next simple code.

In [30]:
def incremental_mean(oldmean , newvalue , npts):
    out = (npts/(npts + 1.0)) * oldmean + newvalue / (npts + 1.0)
    return out  

We will test this function using a very well known distribution...

In [31]:
# we will generate a gaussian deviates (more on this in the future) with the following parameters
data_mean = 2 
data_std = 3
data_size = 5000

data = data_mean + data_std * np.random.randn(data_size)
print(data.mean())
1.9922503525996107

We now use the incremental mean to compute the (online) version of mean computation.

In [32]:
media = 0
comprimento = data.size
cummedia = np.empty(comprimento,float)

for i in range(cummedia.size):
    newvalue = data[i]
    # the value of npts must be in accordance with the expression in incremental_mean
    npts = i 
    
    cummedia[i] = incremental_mean(media , newvalue , npts)
    media = cummedia[i]
    
print('The mean value of the array computed by an online algorithm is %.14f.\n' %cummedia[cummedia.size - 1])
The mean value of the array computed by an online algorithm is 1.99225035259961.

Because we have stored the incremental results in an array we can compute the evolution of the mean.

In [37]:
steps = np.arange(data.size)
exact_mean = data.mean() * np.ones_like(steps)

plt.plot(steps, cummedia,label='Incremental')
plt.plot(steps , exact_mean , label = 'Exact Value')
plt.legend(('Incremental', 'Exact Value'),loc='upper right')
plt.show()

The natural question to ask is: given $\varepsilon>0$, what would be the magnitude of $x_N$ such that $|\hat\mu_N-\hat\mu_{N-1}|>\varepsilon$? Straightforward computations show that

\begin{equation}\label{eq:sgd.40} |x_N-\hat\mu_{N-1}|>(N+1)\cdot\varepsilon. \end{equation}

Suppose that you computed the mean value of the array. From the observed values of the array you can forecast if it is probable that the first element that you have left will verify equation (\ref{eq:sgd.40}). For that we will compute the \textbf{Empirical Cumulative Distribution Function}.

One particular aspect that will be in the core idea of \textbf{Kolmogorov-Smirnov Test} is how the results that we have found adjust to the cumulative distribution function of the random variable that we are expecting to find.

In [44]:
# the code of an empirical cumulative distributioin function
def my_ecdf(in_array):
    cumulation = np.cumsum(np.ones_like(in_array))/in_array.size
    return cumulation
In [45]:
data_ecdf = my_ecdf(data)
coord_x = np.sort(data)

plt.plot(coord_x,data_ecdf)
plt.show()

Certainly, if the size of our sample gets smaller the fit tend to get worst.

In [49]:
data2_size = 24
data2 = data_mean + data_std * np.random.randn(data2_size)
data2_ecdf = my_ecdf(data2)
coord_x = np.sort(data2)

plt.plot(coord_x,data2_ecdf)
plt.show()

Online algorithm for the variance

Let $S_N$ be the sum of the square errors of $x_k$ in relation to the mean ${\hat\mu}_N$,

$$%\label{eq:sgd.50} S_N=\sum_{k=0}^{N-1}(x_k-\hat\mu_N)^2. $$

Straightforward computations show that $S_N$ is given by

$$\label{eq:sgd.60} S_N=\sum_{k=0}^{N-1}x_k^2 - N\cdot\hat\mu_N^2. $$

This leads to the recurrent relation over $S_N$

\begin{equation}\label{eq:sgd.70} S_N - S_{N-1} = x_N^2-\hat\mu_{N-1}^2-N(\hat\mu_{N}^2-\hat\mu_{N-1}^2) \end{equation}

From equations (\ref{eq:sgd.30}) and (\ref{eq:sgd.70}) we can deduce a much more simple expression for the recurrence

\begin{equation}\label{eq:sgd.80} S_N = S_{N-1} + (x_N-\hat\mu_{N-1})(x_N-\hat\mu_{N}). \end{equation}

The expression (\ref{eq:sgd.70}) would be useful to compute a \textbf{stability interval} for the next value of the time series, while expression (\ref{eq:sgd.80}) the essential tool the \textit{online computation} of the variance. Because the former is more challenging, we start with the latter. Keep in mind that to compute $S_N$ we need three values: $x_N$, $\mu_{N-1}$ and $\mu_N$.

In [58]:
def incremental_sn (oldmu , newmu , olds , newvalue):
    out = olds + (newvalue - newmu) * (newvalue - oldmu)
    return out
In [69]:
vararray = np.zeros(len(resultados))

mu_first = resultados[0]
mu_second = incremental_mean(mu_first, resultados[1] , 1)

s_old = 0
s_new = incremental_sn(mu_first , mu_second , s_old , resultados[1])

vararray[0] = s_new

for i in range(1 , len(resultados)):
    
    mu_second = incremental_mean(mu_first , resultados[i] , i)
    s_new = incremental_sn(mu_first , mu_second , s_old , resultados[i])
        
    vararray[i] = s_new/(i+1)
    
    s_old = s_new
    mu_first = mu_second

print("The mean of the process is %f.\n" % mu_second)
print("The comparison of the variance computed by both methods:")
print(s_new/(len(resultados)),np.var(resultados))
The mean of the process is 13948.308800.

The comparison of the variance computed by both methods:
1403305203619.055 1403305203619.0552
In [74]:
limit = int(vararray.size/2)
t = np.arange(vararray[0:limit].size)

plt.plot(t,vararray[0:limit])
plt.show()
In [ ]: