Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance difference between cuSZ and SZ #44

Open
mawpolaris opened this issue May 12, 2021 · 3 comments
Open

Performance difference between cuSZ and SZ #44

mawpolaris opened this issue May 12, 2021 · 3 comments
Labels
doc Improvements or additions to documentation
Projects

Comments

@mawpolaris
Copy link

Hi Team,

Have you observed any significant differences in compression ratio by using SZ and cuSZ? (the same data, error mode and error bound).

@jtian0
Copy link
Collaborator

jtian0 commented May 12, 2021 via email

@mawpolaris
Copy link
Author

Hi Nate,

Thank you for your response. That make sense.

To clarify, SZ (v2.1+) and cuSZ (current version, version number?)

  1. share the same predictors? or compression quality optimizer?
  2. share the same linear scale quantization?
  3. have different Huffman encoding approaches and dictionary encoding?

@jtian0
Copy link
Collaborator

jtian0 commented Jun 6, 2021

Hi @mawpolaris,

For the time being, we can say cuSZ release 0.2.2 onward (as the updates only enhances performance). In general, SZ 2.1 is far more mature than cuSZ in (1) having preprocess, more compression mode (e.g., point-wise) and autotuning, and (2) having Lorenzo predictor and Linear Regression, whereas cuSZ has Lorenzo (we are working on new predictors).

  1. They share the same Lorenzo predictor. However, there are many factors that affect data quality as quality optimizer.
  • preprocessing such as log transform and point-wise transform
  • PSNR as a goal to autotune eb
  • initial values from which we predict border values (as if padding). cuSZ predicts from zeros while SZ determines optimal values for e.g. application-specific metrics. Please also note that naive setting of zeros can result in a significant higher PSNR than SZ (with the same eb), as is pointed out in Table 8 on page 10 of this doc, but it is not necessarily better when it comes to applications: it's data dependent.
  • The PSNR as a generic metric can be used this way: SZ guarantees a lower-bound of PSNR when the eb is relative to the data range, e.g. 64 for 1e-3, 84 for 1e-4.
  1. The linear scaling can be the same. SZ has extra optimizer to decide the linear scaling range $[-r, +r]$; out-of-range quantization values are outliers. This is to optimize compression ratio.
  2. Currently, the Huffman encoding is the same except cuSZ partitions data (therefore it has overhead in padding bits and partitioning metadata).

I will also try to update an FAQ to address these problems.

@jtian0 jtian0 added this to todo in release 0.3 Jul 14, 2021
@jtian0 jtian0 moved this from todo to in progress in release 0.3 Aug 11, 2021
@jtian0 jtian0 added the doc Improvements or additions to documentation label Feb 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Improvements or additions to documentation
Projects
release 0.3
  
in progress
Development

No branches or pull requests

2 participants