Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix performance regression on GPU benchmark suite #12

Open
delcypher opened this issue Jan 20, 2016 · 2 comments
Open

Fix performance regression on GPU benchmark suite #12

delcypher opened this issue Jan 20, 2016 · 2 comments

Comments

@delcypher
Copy link
Member

Here are some notable changes for the fully explored benchmarks.

Size of uncommon results for all FULLY_EXPLORED: 4
shoc/s3d/qssab/kernel.bpl
snapshot-9-mapconstidx/merged.yml : FinalResultType.FULLY_EXPLORED (/tmp/pbs.1302100.cx1b/wd/workdir-543) (894.2382166739553 secs)
snapshot-12/merged.yml : FinalResultType.TIMED_OUT (/tmp/pbs.1296550.cx1b/wd/workdir-543) (900.0 secs)

CUDA50/6_Advanced/fastWalshTransform/fwtBatch1Kernel.bpl
snapshot-9-mapconstidx/merged.yml : FinalResultType.UNKNOWN (/tmp/pbs.1296547.cx1b/wd/workdir-247) (900.0 secs)
snapshot-12/merged.yml : FinalResultType.FULLY_EXPLORED (/tmp/pbs.1296550.cx1b/wd/workdir-247) (67.72944867641975 secs)

gpgpu-sim_ispass2009/RAY/renderPixel.bpl
snapshot-9-mapconstidx/merged.yml : FinalResultType.FULLY_EXPLORED (/tmp/pbs.1296547.cx1b/wd/workdir-385) (14.94364638502399 secs)
snapshot-12/merged.yml : FinalResultType.TIMED_OUT (/tmp/pbs.1296550.cx1b/wd/workdir-385) (900.0 secs)

shoc/devicememory/readGlobalMemoryCoalesced/kernel.bpl
snapshot-9-mapconstidx/merged.yml : FinalResultType.FULLY_EXPLORED (/tmp/pbs.1296547.cx1b/wd/workdir-498) (21.07033052512755 secs)
snapshot-12/merged.yml : FinalResultType.UNKNOWN (/tmp/pbs.1296550.cx1b/wd/workdir-498) (900.0 secs)
@delcypher
Copy link
Member Author

Note for shoc/s3d/qssab/kernel.bpl the change appears to happen between snapshot-9 and snapshot-10. snapshot-10 adds support for non-aliasing symbolic indices which isn't supposed to matter for the GPU benchmarks (the pattern we used on worked on mathematical integers). Seeing as the time is close to the timeout it's likely that adding support for non-aliasing symbolic indicies added some overhead or that the machines I used were a little bit inconsistent in their performance.

@delcypher
Copy link
Member Author

Looks like for gpgpu-sim_ispass2009/RAY/renderPixel.bpl and shoc/devicememory/readGlobalMemoryCoalesced/kernel.bpl the regression happened between snapshot10-mapsymidx and snapshot-11-effcntclone

shoc/devicememory/readGlobalMemoryCoalesced/kernel.bpl
snapshot-10-mapsymidx/merged.yml : FinalResultType.FULLY_EXPLORED (/tmp/pbs.1302112.cx1b/wd/workdir-498) (607.015669389007 secs)
snapshot-11-effctclone/merged.yml : FinalResultType.UNKNOWN (/tmp/pbs.1296549.cx1b/wd/workdir-498) (900.0 secs)

gpgpu-sim_ispass2009/RAY/renderPixel.bpl
snapshot-10-mapsymidx/merged.yml : FinalResultType.FULLY_EXPLORED (/tmp/pbs.1296548.cx1b/wd/workdir-385) (14.141802517386774 secs)
snapshot-11-effctclone/merged.yml : FinalResultType.TIMED_OUT (/tmp/pbs.1296549.cx1b/wd/workdir-385) (900.0 secs)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant