-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathrl_data.log
837 lines (837 loc) · 26.9 KB
/
rl_data.log
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
random action
updating q value for state action [80, 1]
Game End Reward 9.343841360766286
random action
updating q value for state action [60, 2]
Game End Reward 39.55378395392297
random action
updating q value for state action [40, 3]
Game End Reward 9.171866128855822
random action
updating q value for state action [20, 3]
Game End Reward 21.51351721025024
greedy action
updating q value for state action [120, 6]
Game End Reward 25.97829540343922
random action
updating q value for state action [100, 3]
Game End Reward 12.642061703331333
random action
updating q value for state action [0, 3]
Game End Reward 24.0739131507851
random action
updating q value for state action [80, 1]
Game End Reward 9.77947204108604
greedy action
updating q value for state action [60, 2]
Game End Reward 10.291875822441826
random action
updating q value for state action [40, 5]
Game End Reward 26.073436509262574
random action
updating q value for state action [20, 6]
Game End Reward 26.111856563287922
random action
updating q value for state action [120, 4]
Game End Reward 20.73612301738786
random action
updating q value for state action [100, 5]
Game End Reward 38.50634490474235
random action
updating q value for state action [0, 5]
Game End Reward 21.47826750436422
greedy action
updating q value for state action [80, 1]
Game End Reward 5.739564362360942
random action
updating q value for state action [60, 2]
Game End Reward 8.723105936252374
random action
updating q value for state action [40, 1]
Game End Reward 6.57878944265639
random action
updating q value for state action [20, 6]
Game End Reward 33.45085540847236
random action
updating q value for state action [120, 2]
Game End Reward 8.701748547923351
random action
updating q value for state action [100, 6]
Game End Reward 25.635421778247775
random action
updating q value for state action [0, 4]
Game End Reward 20.377970843248697
random action
updating q value for state action [80, 6]
Game End Reward 25.945856382451446
random action
updating q value for state action [60, 2]
Game End Reward 13.346742576890506
random action
updating q value for state action [40, 5]
Game End Reward 24.583667171730816
random action
updating q value for state action [20, 6]
Game End Reward 27.43310764239817
random action
updating q value for state action [120, 6]
Game End Reward 26.735366793536237
random action
updating q value for state action [100, 4]
Game End Reward 11.87373495758947
random action
updating q value for state action [0, 2]
Game End Reward 20.796053300278896
greedy action
updating q value for state action [80, 6]
Game End Reward 37.67003492592088
random action
updating q value for state action [60, 4]
Game End Reward 23.79980936156216
greedy action
updating q value for state action [40, 5]
Game End Reward 16.492668394450746
greedy action
updating q value for state action [20, 6]
Game End Reward 27.071860908778245
random action
updating q value for state action [120, 4]
Game End Reward 28.980359785091927
random action
updating q value for state action [100, 1]
Game End Reward 35.4792182296936
random action
random action
updating q value for state action [80, 3]
Game End Reward 24.20579526448255
random action
updating q value for state action [60, 4]
Game End Reward 25.3143829085635
random action
updating q value for state action [40, 4]
Game End Reward 22.07637811064836
random action
updating q value for state action [20, 5]
Game End Reward 22.336656682670984
random action
updating q value for state action [120, 4]
Game End Reward 25.02488673335916
greedy action
updating q value for state action [100, 5]
Game End Reward 28.54866883419926
random action
updating q value for state action [0, 4]
Game End Reward 20.007031050626903
random action
updating q value for state action [80, 1]
Game End Reward 31.30026857865581
random action
updating q value for state action [60, 4]
Game End Reward 30.207075977036002
random action
updating q value for state action [40, 3]
Game End Reward 27.978445237213187
random action
updating q value for state action [20, 3]
Game End Reward 25.77975788557736
random action
updating q value for state action [120, 4]
Game End Reward 30.39988652547522
random action
updating q value for state action [100, 1]
Game End Reward 31.79688648873258
random action
updating q value for state action [0, 1]
Game End Reward 27.180126896201344
random action
updating q value for state action [80, 3]
Game End Reward 31.203872771374176
random action
updating q value for state action [60, 4]
Game End Reward 29.760871304966045
random action
updating q value for state action [40, 5]
Game End Reward 28.837195699871014
random action
updating q value for state action [20, 6]
Game End Reward 26.944549167163768
random action
updating q value for state action [120, 5]
Game End Reward 17.376569794205867
random action
updating q value for state action [100, 4]
Game End Reward 29.92905503742468
random action
updating q value for state action [0, 5]
Game End Reward 25.210638430874724
greedy action
updating q value for state action [80, 6]
Game End Reward 30.109700824071833
random action
updating q value for state action [60, 3]
Game End Reward 29.14433095146153
random action
updating q value for state action [40, 3]
Game End Reward 29.083397346197465
random action
updating q value for state action [20, 5]
Game End Reward 24.905293284846262
random action
updating q value for state action [120, 5]
Game End Reward 30.356353527545668
random action
updating q value for state action [100, 6]
Game End Reward 30.837819200314826
greedy action
updating q value for state action [0, 5]
Game End Reward 24.58165570125314
random action
updating q value for state action [80, 6]
Game End Reward 29.390692505965927
random action
updating q value for state action [60, 6]
Game End Reward 23.807046573787385
greedy action
updating q value for state action [40, 5]
Game End Reward 20.171595693181338
random action
updating q value for state action [20, 1]
Game End Reward 23.19160240161582
random action
updating q value for state action [120, 2]
Game End Reward 26.378424476952947
random action
updating q value for state action [100, 5]
Game End Reward 29.145441237004597
random action
updating q value for state action [0, 4]
Game End Reward 20.128971572407934
random action
updating q value for state action [80, 2]
Game End Reward 22.40520411676025
greedy action
updating q value for state action [60, 4]
Game End Reward 27.870844837136648
random action
updating q value for state action [40, 4]
Game End Reward 29.25228552532381
greedy action
updating q value for state action [20, 6]
Game End Reward 26.097427601287087
greedy action
updating q value for state action [120, 4]
Game End Reward 30.207829857879794
random action
updating q value for state action [100, 6]
Game End Reward 29.7040651947978
greedy action
updating q value for state action [0, 5]
Game End Reward 23.40789571349562
greedy action
updating q value for state action [80, 6]
Game End Reward 30.13456244480689
greedy action
updating q value for state action [60, 4]
Game End Reward 29.560664153409366
random action
updating q value for state action [40, 2]
Game End Reward 28.819391859280312
random action
updating q value for state action [20, 5]
Game End Reward 27.380797527532224
greedy action
updating q value for state action [120, 4]
Game End Reward 26.681551799725746
random action
random action
updating q value for state action [80, 6]
Game End Reward 34.432929097311174
random action
updating q value for state action [60, 1]
Game End Reward 42.12575528414885
random action
updating q value for state action [40, 4]
Game End Reward 35.20371037581967
random action
updating q value for state action [20, 5]
Game End Reward 33.17678539143012
random action
updating q value for state action [120, 6]
Game End Reward 40.037907321827014
random action
updating q value for state action [100, 6]
Game End Reward 40.22780601072451
greedy action
updating q value for state action [0, 5]
Game End Reward 28.298495781508805
random action
updating q value for state action [80, 1]
Game End Reward 24.423559463168363
random action
updating q value for state action [60, 3]
Game End Reward 36.16093597733551
random action
updating q value for state action [40, 6]
Game End Reward 36.48251510288803
random action
updating q value for state action [20, 1]
Game End Reward 30.958864396687893
random action
updating q value for state action [120, 6]
Game End Reward 40.8349482441104
random action
updating q value for state action [100, 6]
Game End Reward 38.93334588146719
random action
updating q value for state action [0, 6]
Game End Reward 28.54811956125932
random action
updating q value for state action [80, 1]
Game End Reward 40.23895502893038
random action
updating q value for state action [60, 3]
Game End Reward 36.795828399826675
random action
updating q value for state action [40, 5]
Game End Reward 33.94940433827703
random action
updating q value for state action [20, 2]
Game End Reward 29.724681400928116
random action
updating q value for state action [120, 6]
Game End Reward 37.75093865807377
random action
updating q value for state action [100, 6]
Game End Reward 39.43881577244646
random action
updating q value for state action [0, 6]
Game End Reward 28.093033847843778
random action
updating q value for state action [80, 6]
Game End Reward 37.57860097180023
random action
updating q value for state action [60, 1]
Game End Reward 38.286010489230044
random action
updating q value for state action [40, 4]
Game End Reward 33.43944597922111
random action
updating q value for state action [20, 5]
Game End Reward 32.02358274995137
random action
updating q value for state action [120, 6]
Game End Reward 39.18165309247461
random action
updating q value for state action [100, 1]
Game End Reward 38.708691230023035
random action
updating q value for state action [0, 1]
Game End Reward 35.744177504616836
random action
updating q value for state action [80, 3]
Game End Reward 36.56859826012678
random action
updating q value for state action [60, 5]
Game End Reward 35.59822284752898
random action
updating q value for state action [40, 1]
Game End Reward 37.5184032020784
random action
updating q value for state action [20, 1]
Game End Reward 33.25243653825826
greedy action
updating q value for state action [120, 6]
Game End Reward 39.07417228995576
greedy action
updating q value for state action [100, 6]
Game End Reward 37.656402872991094
random action
updating q value for state action [0, 2]
Game End Reward 30.159964219097624
random action
updating q value for state action [80, 3]
Game End Reward 35.811202709578446
random action
updating q value for state action [60, 1]
Game End Reward 40.052408724935596
random action
updating q value for state action [40, 3]
Game End Reward 36.97482319040136
random action
updating q value for state action [20, 3]
Game End Reward 31.323829300099682
random action
updating q value for state action [120, 5]
Game End Reward 29.977208591102315
random action
updating q value for state action [100, 5]
Game End Reward 38.76876839830445
greedy action
updating q value for state action [0, 5]
Game End Reward 26.61938060162802
random action
updating q value for state action [80, 3]
Game End Reward 39.08538674142474
random action
updating q value for state action [60, 3]
Game End Reward 37.03510534092006
random action
updating q value for state action [40, 5]
Game End Reward 35.920255685123465
random action
updating q value for state action [20, 2]
Game End Reward 34.01793117192189
greedy action
updating q value for state action [120, 6]
Game End Reward 38.97113655464847
random action
updating q value for state action [100, 4]
Game End Reward 36.68114754382258
random action
updating q value for state action [0, 1]
Game End Reward 40.07092632222562
random action
updating q value for state action [80, 5]
Game End Reward 37.338657965781145
random action
updating q value for state action [60, 6]
Game End Reward 36.94025728990717
greedy action
updating q value for state action [40, 5]
Game End Reward 34.894276912487975
random action
updating q value for state action [20, 2]
Game End Reward 35.75050735625988
greedy action
updating q value for state action [120, 6]
Game End Reward 39.5520992561246
greedy action
updating q value for state action [100, 6]
Game End Reward 38.851317543217505
random action
updating q value for state action [0, 3]
Game End Reward 31.180405791671404
greedy action
updating q value for state action [80, 6]
Game End Reward 39.42797796230083
random action
updating q value for state action [60, 6]
Game End Reward 38.04669652079543
random action
updating q value for state action [40, 5]
Game End Reward 33.10071644304248
random action
updating q value for state action [20, 5]
Game End Reward 32.62783867071031
greedy action
updating q value for state action [120, 6]
Game End Reward 39.356727302139326
greedy action
updating q value for state action [100, 6]
Game End Reward 39.01816854829009
random action
updating q value for state action [0, 5]
Game End Reward 28.576361281496272
greedy action
updating q value for state action [80, 6]
Game End Reward 38.4433779389518
greedy action
updating q value for state action [60, 4]
Game End Reward 36.8179987209884
greedy action
updating q value for state action [40, 5]
Game End Reward 36.54985873487747
greedy action
updating q value for state action [20, 5]
Game End Reward 33.93393621541946
random action
updating q value for state action [120, 4]
Game End Reward 39.69728580366836
greedy action
updating q value for state action [100, 6]
Game End Reward 39.70001285770568
random action
updating q value for state action [0, 2]
Game End Reward 28.985524873638077
random action
updating q value for state action [80, 1]
Game End Reward 39.64811415118043
random action
updating q value for state action [60, 5]
Game End Reward 33.19554570417451
random action
updating q value for state action [40, 4]
Game End Reward 36.01958423199125
random action
updating q value for state action [20, 5]
Game End Reward 32.87626663401947
greedy action
updating q value for state action [120, 6]
Game End Reward 39.17071485151373
random action
updating q value for state action [100, 6]
Game End Reward 39.202889961795144
greedy action
updating q value for state action [0, 5]
Game End Reward 29.54944241270548
greedy action
updating q value for state action [80, 6]
Game End Reward 36.67599176659837
random action
updating q value for state action [60, 3]
Game End Reward 32.10898447765345
greedy action
updating q value for state action [40, 5]
Game End Reward 36.490635886329876
greedy action
updating q value for state action [20, 5]
Game End Reward 33.60016662680288
random action
updating q value for state action [120, 4]
Game End Reward 39.308886539614264
greedy action
updating q value for state action [100, 6]
Game End Reward 38.44247925738107
greedy action
updating q value for state action [0, 5]
Game End Reward 27.827213094959287
random action
updating q value for state action [100, 4]
Game End Reward 19.806996532648824
greedy action
updating q value for state action [699.3157867655627, 6]
Game End Reward 18.00439053961515
greedy action
updating q value for state action [699.3157867655627, 6]
Game End Reward 13.414064051444504
random action
updating q value for state action [836.2510309503735, 6]
Game End Reward 38.50527733826115
random action
updating q value for state action [699.3157867655627, 1]
Game End Reward 29.04569700213011
random action
updating q value for state action [584.8035476425734, 4]
Game End Reward 29.999187133880195
random action
updating q value for state action [489.04256961953797, 4]
Game End Reward 29.84549245283512
random action
updating q value for state action [408.96235302295804, 4]
Game End Reward 28.84531293476801
random action
updating q value for state action [341.9951893353393, 2]
Game End Reward 29.626787015138916
random action
updating q value for state action [285.99382966174574, 4]
Game End Reward 23.870131875524635
random action
updating q value for state action [239.16263490008038, 3]
Game End Reward 22.363048088065316
greedy action
updating q value for state action [200.00000000000003, 6]
Game End Reward 27.202610970202407
random action
updating q value for state action [180, 6]
Game End Reward 23.841481096250405
random action
updating q value for state action [160, 1]
Game End Reward 26.842176615648285
random action
updating q value for state action [140, 1]
Game End Reward 25.86421616861459
random action
updating q value for state action [1000.0, 2]
Game End Reward 32.428821326838985
random action
updating q value for state action [836.2510309503735, 3]
Game End Reward 17.074507035071065
random action
updating q value for state action [699.3157867655627, 5]
Game End Reward 18.34999937469025
random action
updating q value for state action [584.8035476425734, 1]
Game End Reward 16.37890164033055
random action
updating q value for state action [489.04256961953797, 6]
Game End Reward 18.34348175134539
random action
updating q value for state action [408.96235302295804, 6]
Game End Reward 16.32299595174137
greedy action
updating q value for state action [341.9951893353393, 2]
Game End Reward 23.31607357307069
random action
updating q value for state action [285.99382966174574, 1]
Game End Reward 17.469664385725086
random action
updating q value for state action [239.16263490008038, 3]
Game End Reward 20.005411976366645
random action
updating q value for state action [200.00000000000003, 6]
Game End Reward 17.69879600630414
random action
updating q value for state action [180, 3]
Game End Reward 19.506276410795305
random action
updating q value for state action [160, 5]
Game End Reward 19.782007608985634
greedy action
updating q value for state action [140, 1]
Game End Reward 15.7799327553571
random action
updating q value for state action [1000.0, 3]
Game End Reward 19.91889447979771
random action
updating q value for state action [836.2510309503735, 5]
Game End Reward 16.743952707608678
greedy action
updating q value for state action [699.3157867655627, 1]
Game End Reward 16.003764138686183
random action
updating q value for state action [584.8035476425734, 6]
Game End Reward 18.920743925908898
greedy action
updating q value for state action [489.04256961953797, 4]
Game End Reward 18.828001759410096
random action
updating q value for state action [408.96235302295804, 2]
Game End Reward 17.39678237016604
random action
updating q value for state action [341.9951893353393, 2]
Game End Reward 18.60084815705456
random action
updating q value for state action [285.99382966174574, 3]
Game End Reward 18.662900307982653
greedy action
updating q value for state action [239.16263490008038, 3]
Game End Reward 19.646569667263467
random action
updating q value for state action [200.00000000000003, 1]
Game End Reward 17.15608413371681
greedy action
updating q value for state action [180, 6]
Game End Reward 17.01428572517141
random action
updating q value for state action [160, 5]
Game End Reward 16.38535842317867
greedy action
updating q value for state action [140, 1]
Game End Reward 20.238056933645034
random action
updating q value for state action [1000.0, 1]
Game End Reward 17.2255590655202
random action
updating q value for state action [836.2510309503735, 3]
Game End Reward 19.15645545792034
greedy action
updating q value for state action [699.3157867655627, 1]
Game End Reward 19.149821050398586
random action
updating q value for state action [584.8035476425734, 3]
Game End Reward 22.527344785723763
greedy action
updating q value for state action [489.04256961953797, 4]
Game End Reward 18.397330257528917
random action
updating q value for state action [408.96235302295804, 4]
Game End Reward 18.98634197110436
random action
updating q value for state action [341.9951893353393, 1]
Game End Reward 16.277432568151507
random action
updating q value for state action [285.99382966174574, 2]
Game End Reward 21.690971999144708
greedy action
updating q value for state action [239.16263490008038, 3]
Game End Reward 19.349802193868175
greedy action
updating q value for state action [200.00000000000003, 6]
Game End Reward 18.329524259268197
greedy action
updating q value for state action [180, 6]
Game End Reward 17.609010391688432
random action
updating q value for state action [160, 1]
Game End Reward 20.20931924836193
random action
updating q value for state action [140, 4]
Game End Reward 16.351481860210182
greedy action
updating q value for state action [1000.0, 2]
Game End Reward 20.23285231701408
random action
updating q value for state action [836.2510309503735, 4]
Game End Reward 15.029651898524047
random action
updating q value for state action [699.3157867655627, 1]
Game End Reward 16.89065586470211
random action
updating q value for state action [584.8035476425734, 1]
Game End Reward 13.181009810477478
greedy action
updating q value for state action [489.04256961953797, 4]
Game End Reward 17.996216380301547
greedy action
updating q value for state action [408.96235302295804, 4]
Game End Reward 21.355771684764736
random action
updating q value for state action [341.9951893353393, 1]
Game End Reward 17.770802783904028
random action
updating q value for state action [285.99382966174574, 3]
Game End Reward 23.714839817658593
random action
updating q value for state action [239.16263490008038, 2]
Game End Reward 24.00564784366897
greedy action
updating q value for state action [200.00000000000003, 6]
Game End Reward 17.67492268512957
greedy action
updating q value for state action [180, 6]
Game End Reward 23.174198079588912
random action
updating q value for state action [160, 1]
Game End Reward 16.89619924497378
random action
updating q value for state action [140, 2]
Game End Reward 19.286547369466007
greedy action
updating q value for state action [1000.0, 2]
Game End Reward 23.82112992552769
random action
updating q value for state action [836.2510309503735, 1]
Game End Reward 16.950307190697956
greedy action
updating q value for state action [699.3157867655627, 1]
Game End Reward 16.776805016210833
random action
updating q value for state action [584.8035476425734, 5]
Game End Reward 20.39770287157264
random action
updating q value for state action [489.04256961953797, 2]
Game End Reward 21.093829773782183
random action
updating q value for state action [408.96235302295804, 1]
Game End Reward 20.94548230734246
random action
updating q value for state action [341.9951893353393, 4]
Game End Reward 20.366265843146277
greedy action
updating q value for state action [285.99382966174574, 3]
Game End Reward 21.9254186881289
random action
updating q value for state action [239.16263490008038, 3]
Game End Reward 20.903398437417263
random action
updating q value for state action [200.00000000000003, 3]
Game End Reward 19.29099680980261
random action
updating q value for state action [180, 6]
Game End Reward 21.183134835652403
random action
updating q value for state action [160, 2]
Game End Reward 20.61168730807495
random action
updating q value for state action [140, 6]
Game End Reward 17.52653893326114
random action
updating q value for state action [1000.0, 1]
Game End Reward 17.02409738541832
random action
updating q value for state action [836.2510309503735, 2]
Game End Reward 21.56961130489339
random action
updating q value for state action [699.3157867655627, 3]
Game End Reward 16.670399131562295
random action
updating q value for state action [584.8035476425734, 1]
Game End Reward 12.932324926283536
random action
updating q value for state action [489.04256961953797, 2]
Game End Reward 15.567153104752592
greedy action
updating q value for state action [408.96235302295804, 4]
Game End Reward 20.23350822911776
greedy action
updating q value for state action [341.9951893353393, 2]
Game End Reward 19.425944958286582
random action
updating q value for state action [285.99382966174574, 3]
Game End Reward 16.812344865536737
random action
updating q value for state action [239.16263490008038, 1]
Game End Reward 19.496855813676746
random action
updating q value for state action [200.00000000000003, 1]
Game End Reward 14.047283899412426
random action
updating q value for state action [180, 6]
Game End Reward 17.798367959979984
random action
updating q value for state action [160, 2]
Game End Reward 11.928756269534096
random action
updating q value for state action [140, 5]
Game End Reward 12.364943822630318
random action
updating q value for state action [1000.0, 4]
Game End Reward 12.859574701517083
random action
updating q value for state action [836.2510309503735, 2]
Game End Reward 12.86751887247064
random action
greedy action
updating q value for state action [20, 5]
Game End Reward 37.612515810095275
greedy action
updating q value for state action [120, 6]
Game End Reward 37.07021082198242
random action
updating q value for state action [699.3157867655627, 1]
Game End Reward 21.110781665308835
greedy action
greedy action
updating q value for state action [699.3157867655627, 6]
Game End Reward 39.54592339250181
greedy action
updating q value for state action [60, 4]
Game End Reward 30.414523347525567
greedy action
updating q value for state action [20, 4]
Game End Reward 0.3237217547165116
random action
greedy action
updating q value for state action [408.96235302295804, 6]
Game End Reward 39.15015018302328
updating q value for state action [408.96235302295804, 6]
Game End Reward 11.549983705919024
greedy action
updating q value for state action [200.00000000000003, 6]
Game End Reward 38.27757638032055
greedy action
updating q value for state action [200.00000000000003, 6]
Game End Reward 0.45900560418337677
greedy action
updating q value for state action [0, 5]
Game End Reward 22.89970652337709
random action
updating q value for state action [0, 2]
Game End Reward 25.001924521798692
greedy action
updating q value for state action [200.00000000000003, 6]
Game End Reward 20.810752034320547
random action
updating q value for state action [0, 6]
Game End Reward 22.11197248879304
greedy action
greedy action
updating q value for state action [0, 5]
Game End Reward 3.980310560666731
random action
updating q value for state action [0, 6]
Game End Reward 97.88688411516314
updating q value for state action [0, 6]
updating q value for state action [0, 6]
Game End Reward 6.968632602729431
Game End Reward 10.839910010946383