forked from mbaityje/plankifier
-
Notifications
You must be signed in to change notification settings - Fork 0
/
readme.txt
147 lines (109 loc) · 7.83 KB
/
readme.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
Zooplankton classification software kit
---------------------------------------------------------------
contents
name run description
0 convnet python3 convnet.py script that builds a cnn and trains it on a given dataset of images
1 features python3 features.py script that builds a model consisting of a cnn for images and a mlp for tabular feture data
2 binary python3 binary.py script that build a cnn for binary classification for a chosen class and mixes a balance of other classes
3 analyze python3 analyze.py script that reads in training output of the cnn and visualizes logarithmic time evolution and the val_loss differences between hyperparameters
4 binary-compare python3 binary-compare.py script that reads in training output of the binary classifier and visualizes the impact of the data set on validation
----------------------------------------------------------------
0: convnet
the most important parsed parameter is datapath, all other default values should lead to decent results
if one argument is changed from the default value, the output name will contain the change
example for a training run with 100 epochs, the adam optimizer with amsgrad, a specific data directory and live training output into a log file in the dir the script runs in:
python3 convnet.py -datapath='~/specific/data/' -totEpochs=100 -opt='adam_2' -verbose=1 >> trainingresults.log &
argument type default description
cpu bool False performs training only on cpus
gpu bool False performs training on gpus
datapath str './data/' directory which must contain classes as subdirectories with a directory 'training_images' inside
outpath str './out/' (created) directory for the training output, a subdirectory will be created with the parameters of the run inside the name
verbose int 1 one of [0,1,2] for amount of output of training documentation
totEpochs int 10 total number of epochs for the training
opt str 'sgd_1' Choice of the minimization algorithm
bs int 8 Batch size
lr float 0.0001 Learning Rate
height int 128 Image height, must be the same as width
width int 128 Image width, must be the same as height
depth int 3 Number of channels (3 for RGB)
testSplit float 0.2 Fraction of examples in the validation set
aug bool True Perform data augmentation
augtype string 'standard' Augmentation type
augparameter float 0 Augmentation parameter when testing one type of augmentaion, ignored for standard augmentation
implemented optimizer choices (the learning rate is set for all by -lr):
-opt description
'adam_1' Adam without amsgrad, beta_1=0.9, beta2=0.999
'adam_2' Adam with amsgrad, beta_1=0.9, beta2=0.999
'sgd_1' stochastic gradient descent without nesterov
'sgd_2' stochastic gradient descent with nesterov
'sgd_3' stochastic gradient descent with nesterov and momentum of 0.1
'sgd_4' stochastic gradient descent without nesterov and momentum of 0.1
'rmsprop' RMSprop with rho = 0.9
'adagrad' Adagrad
'adadelta' Adadelta with rho = 0.95
'adamax' Adamax with beta_1 = 0.9, beta_2 = 0.999
'nadam' Nadam with beta_1 = 0.9, beta_2 = 0.999
implemented choices for individual data augmentation:
-augtype -augparameter description
'rotate' Degree range for random rotations
'v_shift' width shift: fraction of total width, if < 1, or pixels if >= 1
'h_shift' height shift: fraction of total height, if < 1, or pixels if >= 1
'shear' Shear Intensity (Shear angle in counter-clockwise direction in degrees)
'zoom' Range for random zoom [lower, upper] = [1-args.augparameter, 1+args.augparameter]
'h_flip' enables flippling, no -augparameter required
'v_flip' enables flippling, no -augparameter required
'brightness' Range for picking a brightness shift value from [lower, upper] = [args.augparameter,1-args.augparameter]
'rescale' multiply the data by the value provided after applying all other transformations
'standard' performs mixed augmentation with rotation_range=360, width_shift_range=0.2,
height_shift_range=0.2, shear_range=0.3, zoom_range=0.2, horizontal_flip=True, vertical_flip=True
no -augparameter required
----------------------------------------------------------------
1: features
this script not only takes the image data as an input but also the tabular features files, which have to be in the class directories
the most important parsed parameter is datapath, all other default values should lead to decent results
argument type default description
cpu bool False performs training only on cpus
gpu bool False performs training on gpus
datapath str './small_data/' directory which must contain classes as subdirectories with a directory 'training_images' inside
outpath str './out/' (created) directory for the training output, a subdirectory will be created with the parameters of the run inside the name
verbose int 1 one of [0,1,2] for amount of output of training documentation
totEpochs int 10 total number of epochs for the training
bs int 8 Batch size
lr float 0.0001 Learning Rate
height int 128 Image height, must be the same as width
width int 128 Image width, must be the same as height
depth int 3 Number of channels (3 for RGB)
testSplit float 0.2 Fraction of examples in the validation set
so far, SGD is implemented and no data augmentation is performed
----------------------------------------------------------------
2: binary
this script takes the key argument which is the class that is going to be identified. It then takes a balanced mix from all other classes given by datapath.
for the cnn the optimizing methods 'adam' , 'sgs' and 'rmsprop' are implemented. For Binary classification, RMSprop seems to yield best results. Binary Crossentropy is the loss function.
argument type default description
cpu bool False performs training only on cpus
gpu bool False performs training on gpus
datapath str './data/' directory which must contain classes as subdirectories with a directory 'training_images' inside
outpath str './out/' (created) directory for the training output, a subdirectory will be created with the parameters of the run inside the name
opt str 'sgd' Choice of the minimization algorithm
totEpochs int 10 total number of epochs for the training
bs int 8 Batch size
lr float 0.0001 Learning Rate
height int 128 Image height, must be the same as width
width int 128 Image width, must be the same as height
depth int 3 Number of channels (3 for RGB)
testSplit float 0.2 Fraction of examples in the validation set
key str 'dinobryon' to be identified class. Must be the name of a subdirectory of datapath
limit int 0 if 0: takes all images, if !=0: takes only a number of images. Result will be 50/50 class/nonclass images
number1 int 256 nodenumbers of the first cnn layer
number2 int 128 nodenumbers of the second cnn layer
number3 int 64 nodenumbers of the third cnn layer
----------------------------------------------------------------
3: analyze
this script takes the training output (the epoch log files) and visualizes them
the argument path is there for a folder with the title of the hyperparameter beeing changed and should have subdirectories of the individual runs with the parameter value in the names
as generated by convnet.py. the argument epochnumber is self explanatory and not mandatory.
----------------------------------------------------------------
4: analyze
this script takes the training output (the epoch log files) and visualizes them
the argument path is there for a folder with subdirectories of the individual runs with the key and limit parameters as names as generated by binary.py
the argument epochnumber is self explanatory and not mandatory.