Shadertoys-dataset

This repository contains the code to download, build and update the Shadertoys dataset. The dataset is made up from fragment shader programs published on Shadertoy and annotated with additional metadata for downstream filtering. Datasets are hosted on Huggingface, (no longer public). (maybe we name it Shadertoys-2 to avoid overwriting anything)

The main use case for this dataset is various evaluation benchmarks for (code-) language models.

This project is not affiliated with Shadertoy. It makes use of the Shadertoy.com API.

To-Dos

This project is still in progress, all datasets currently published will see a major refactor.

pin and branch/archive Return Completion (shadereval-1) test set
dynamically split train/test based on shaderID hash (might not do a train split)
[~] public repository for builder scripts (you are here!)
(self-)publish TaCoS 2023 paper.
[~] redo structure
[~] add thumbnails (text2img?)
improved license detecting and tagging using scan-code
potentially webscraping and tagging sources/unlisted? -> current RFC: pygfx/shadertoy#27

Related work

shaders21k also sources shader programs from Shadertoy.com, however it provides rendered frames for visual representation learning. It is available as a alreanative to downloading from the API.
The-Stack has a GLSL subset. This data is sourced from GitHub.
The-Stack-v2 sources data from a larger archive.

Requirements

Setup

To access shader programs that are published for public+api a Shadertoy account and API key is required. Request a key and setup a SHADERTOY_KEY environment variable.

If you want to use shaders20k (Shadertoy subset of shaders21k), please download the all_codes.zip and place it to ./data/shaders20k/.

Dependencies

For parsing shaders tree-sitter-glsl will be used.
For license detection scancode-toolkit is used.
For testing shaders wgpu-shadertoy is used.

Usage

There is currently two out of three scripts available. Plenty of defaults are set and example files are provided in ./data/

Download

$>python download.py --mode full --num_shaders 100

will download the newest 100 shaders from Shadertoy.com via the API and save them to the ./data/raw/ directory as a .jsonl file.

To extract and translate shaders from the shaders20k dataset use:

$>python download.py --mode shaders20k

see download.py --help for more options. Or look at the source

Annotate

$>python annotate.py --mode "redo" --columns "license, functions"

this flattens the nested renderpasses into a single dict and adds relevant information like licenses, function indicies and test-validation. It seems to only do take a few minutes now. alternatively the mode update allows to overwrite the columns of already flattened files.

Upload (missing) /prepare

Filter and scripts to build train/test split and upload them to Huggingface aren't written yet.

License note

The contents of this repository (builder scripts, metadata) are distributed under the Apache 2.0 license. However the contents of the dataset itself are under their respective license. We do our best to annotate licenses to allow for filtering. Please see the field license in the dataset. Some metadata (including licenses) might be out of date, therefore we recommend checking the source

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
annotate.py		annotate.py
data_exploration.ipynb		data_exploration.ipynb
dev_treeparse.ipynb		dev_treeparse.ipynb
download.py		download.py
freezing_shader1.py		freezing_shader1.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Shadertoys-dataset

To-Dos

Related work

Requirements

Setup

Dependencies

Usage

Download

Annotate

Upload (missing) /prepare

License note

About

Languages

License

Vipitis/shadertoys-dataset

Folders and files

Latest commit

History

Repository files navigation

Shadertoys-dataset

To-Dos

Related work

Requirements

Setup

Dependencies

Usage

Download

Annotate

Upload (missing) /prepare

License note

About

Topics

Resources

License

Stars

Watchers

Forks

Languages