Help to trim the dataset #68

AlexNarbut · 2017-04-04T18:58:33Z

Please, help me. Describing my problem - i try to trim the train and value datasets, because my laptop cannot work with this huge sets (write,that i have little RAM (cpu calcukating) or have little memory on harddisk (GPU) ). I have ubuntu and try copy a part of datasets like this (sort by name and copy first 1000 items)
find ./datasets/train2014 -maxdepth 1 -type f | sort |head -1000 |xargs cp -t ./datasets/train/dummy
find ./datasets/val2014 -maxdepth 1 -type f | sort | head -1000|xargs cp -t ./datasets/val/dummy
this work good, but when i try to teach network, i get this error

Optimize
/home/alex/torch/install/bin/lua: ...alex/torch/install/share/lua/5.1/threads/threads.lua:183: [thread 1 callback] ./datasets/style.lua:53: Error reading: /home/alex/Desktop/texture_nets/datasets/train/dummy/COCO_train2014_000000286564.jpg
stack traceback:
[C]: in function 'assert'
./datasets/style.lua:53: in function '_loadImage'
./datasets/style.lua:32: in function 'get'
./dataloader.lua:92: in function <./dataloader.lua:84>
(tail call): ?
[C]: in function 'xpcall'
...alex/torch/install/share/lua/5.1/threads/threads.lua:234: in function 'callback'
...e/alex/torch/install/share/lua/5.1/threads/queue.lua:65: in function <...e/alex/torch/install/share/lua/5.1/threads/queue.lua:41>
[C]: in function 'pcall'
...e/alex/torch/install/share/lua/5.1/threads/queue.lua:40: in function 'dojob'
[string " local Queue = require 'threads.queue'..."]:15: in main chunk
stack traceback:
[C]: in function 'error'
...alex/torch/install/share/lua/5.1/threads/threads.lua:183: in function 'dojob'
./dataloader.lua:143: in function 'loop'
./dataloader.lua:62: in function 'get'
train.lua:137: in function 'opfunc'
...home/alex/torch/install/share/lua/5.1/optim/adam.lua:37: in function 'optim_method'
train.lua:174: in main chunk
[C]: in function 'dofile'
.../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: ?

why style.lua tryes to read COCO_train2014_000000286564.jpg (i dont have this file in train directory, last image in train - COCO_train2014_000000007510.jpg in train , and in value - COCO_val2014_000000014226.jpg)

How to trim train and value directory CORRECT ?

DmitryUlyanov · 2017-04-19T20:05:12Z

This is most likely because you have other folders than train and val in the same directory (train2014, val2014). Or try to remove all files in gen directory and run again (gen contains cache).

Vladkryvoruchko · 2017-04-20T07:41:00Z

I suggest you not to use such a huge amount of images from whole dataset.
I've come to conclusion, as well as other researchers, that training gives better results if its trained on a 8-30 images

AlexNarbut · 2017-04-20T09:29:24Z

Where can I find a small dataset for learning with train images and their results ... because everywhere are offered huge datasets(8-50 gb)... i understand,that i need smal(aboot 100 mb)? but who to trip this big datasets correct(image's names in datasets/train and /datasets/val do NOT match and i didnt see Similar images)

Vladkryvoruchko · 2017-04-20T20:46:38Z

@Axelkiller just use 16 random images with different structure and context.

AlexNarbut · 2017-04-22T13:28:20Z

@Vladkryvoruchko i can get random from train folder, but how to take appropriate images from value folder

wulabs · 2017-05-07T07:34:25Z

What is the effect of training on various data sets? i.e. a set of portraits of people vs landscapes, or digital art vs real photos? How does that affect the styled images? Also how does training on a < 30 dataset compare against 80K+? Wouldn't that significantly overfit and hence only be able to cater to those types of images?

DmitryUlyanov · 2017-05-07T17:05:04Z

@wulabs Did not see much difference if dataset is changed, but did not experimented much with it.

wulabs · 2017-05-07T18:28:25Z

So does this mean if a training dataset of size 20 is used, training iterations can be < 100 to get similar results? (MSCOCO 80K @ 50K iterations ~= training dataset of 20 @ 12.5 iterations). In this way can training time be reduced?

DmitryUlyanov · 2017-05-07T19:05:34Z

Well, it is arguable what are "good results" and "similar results", in texture nets v1 we've experimented with learning on 16 photos, and it worked better than on 80K. But with instance normalization I never tried to fit 16 photos. Report the results if you will experiment with that! With a small dataset you probably will need to find a good point to stop, as it is too easy to overfit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help to trim the dataset #68

Help to trim the dataset #68

AlexNarbut commented Apr 4, 2017

DmitryUlyanov commented Apr 19, 2017 •

edited

Loading

Vladkryvoruchko commented Apr 20, 2017

AlexNarbut commented Apr 20, 2017

Vladkryvoruchko commented Apr 20, 2017

AlexNarbut commented Apr 22, 2017

wulabs commented May 7, 2017 •

edited

Loading

DmitryUlyanov commented May 7, 2017

wulabs commented May 7, 2017

DmitryUlyanov commented May 7, 2017

Help to trim the dataset #68

Help to trim the dataset #68

Comments

AlexNarbut commented Apr 4, 2017

DmitryUlyanov commented Apr 19, 2017 • edited Loading

Vladkryvoruchko commented Apr 20, 2017

AlexNarbut commented Apr 20, 2017

Vladkryvoruchko commented Apr 20, 2017

AlexNarbut commented Apr 22, 2017

wulabs commented May 7, 2017 • edited Loading

DmitryUlyanov commented May 7, 2017

wulabs commented May 7, 2017

DmitryUlyanov commented May 7, 2017

DmitryUlyanov commented Apr 19, 2017 •

edited

Loading

wulabs commented May 7, 2017 •

edited

Loading