Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stress testing best practices? #3667

Open
siennathesane opened this issue Jun 17, 2024 · 5 comments
Open

Stress testing best practices? #3667

siennathesane opened this issue Jun 17, 2024 · 5 comments

Comments

@siennathesane
Copy link

siennathesane commented Jun 17, 2024

I'm working on implementing load tests for a raft implementation on top of Pebble, and I'm running into sync issues with inserting many large batches.

Here's a snippet of log storage and fetching

type PebbleStorage struct {
	logger *logging.Logger // internal package, compatible with most logging impls
	db     *pebble.DB
}

// *raft.Log is just `github.com/hashicorp/raft`
func (p *PebbleStorage) StoreLogs(logs []*raft.Log) error {
	rangeMin := uint64(0)
	rangeMax := uint64(len(logs))

	// use all available CPUs to chunk effectively
	ranges := utilities.GenerateRanges(rangeMin, rangeMax, uint64(len(logs)/runtime.NumCPU()))

	errG := new(errgroup.Group)
	for idx := range ranges {
		errG.Go(func() error {
			batch := p.db.NewBatchWithSize(int(ranges[idx].To - ranges[idx].From))

			for index := ranges[idx].From; index < ranges[idx].To; index++ {
				log := logs[index]
				key := buildLogsKey(log.Index)

				payload, err := proto.Marshal(encodeLog(log))
				if err != nil {
					return err
				}

				if err := batch.Set(key, payload, nil); err != nil {
					return err
				}
			}

			return batch.Commit(&pebble.WriteOptions{Sync: true})
		})
	}

	if err := errG.Wait(); err != nil {
		return err
	}

	return p.db.Flush()
}

func (p *PebbleStorage) GetLog(index uint64, log *raft.Log) error {

	val, closer, err := p.db.Get(buildLogsKey(index))
	if err != nil {
		return err
	}
	defer func(closer io.Closer) {
		err := closer.Close()
		if err != nil {
			panic(errors.Join(errors.New("unable to close closer from pebble"), err))
		}
	}(closer)

	if len(val) == 0 || val == nil {
		return errors.New(fmt.Sprintf("missing log index %d key", index))
	}

	buf := make([]byte, len(val))
	copy(buf, val)

	innerLog := new(pb.Log)
	if err := proto.Unmarshal(buf, innerLog); err != nil {
		return err
	}
	if log == nil {
		*log = *new(raft.Log)
	}
	*log = *decodeLog(innerLog)

	return nil
}

Here's my stress test:

type PebbleStorageTestSuite struct {
	suite.Suite // github.com/stretchr/testify
}

func (t *PebbleStorageTestSuite) TestExtremeLogLoad() {
	const maxCount = 10_000_000

	store, err := NewPebbleStorage(t.conf)
	defer store.Close()
	t.Require().Nil(err)

	var logs []*raft.Log
	for i := uint64(0); i < maxCount; i++ {
		logs = append(logs, &raft.Log{
			Index: i,
			Term:  i,
		})
	}

	store.logger.Info("storing logs", "count", maxCount)

	err = store.StoreLogs(logs)
	t.Require().Nil(err, "there must not be an error submitting %d logs", maxCount)

	store.logger.Info("logs stored", "count", maxCount)

	ranges := utilities.GenerateRanges(0, maxCount, 10_000)

	errG := new(errgroup.Group)
	for idx := range ranges {
		errG.Go(func() error {
			for iter := ranges[idx].From; iter < ranges[idx].To; iter++ {
				var log raft.Log
				err = store.GetLog(iter, &log)
				t.Require().NoError(err, "there must not be an error fetching log index %d", iter)
				t.Require().Equal(iter, log.Index, "log index %d must match", iter)
				t.Require().Equal(iter, log.Term, "log term %d must match", iter)
			}
			return nil
		})
	}

	t.Require().NoError(errG.Wait(), "there must not be any errors fetching raft keys")
}

I'm consistently getting 10+ pebble: not found errors on keys despite all batches having sync points plus a final flush to disk, so I suspect I'm doing something wrong. I was reading the docs on large batches, but I don't know what I'm doing wrong. I have noticed most of the keys that aren't found are the same keys, so I suspect there's internal sync issues but I expected the forced sync points to alleviate that.

What is the right way to sync large batch inserts for immediate reads for stress testing?

Jira issue: PEBBLE-40

@blathers-crl blathers-crl bot added this to Incoming in (Deprecated) Storage Jun 17, 2024
@nicktrav nicktrav added O-community C-question Further information is requested labels Jun 18, 2024
@RaduBerinde
Copy link
Member

Hi,

One thing I'm seeing is that you are capturing idx inside the func of a goroutine which until Go 1.22 was a mistake (https://go.dev/blog/loopvar-preview). What version of Go are you using?

@siennathesane
Copy link
Author

I'm using go.1.22.3

@RaduBerinde
Copy link
Member

Overall the way the API is used is correct, this should be working. There must be some detail that's wrong. Could I get a pointer to utilities.GenerateRanges? Just want to check that To is indeed exclusive as this code assumes.

@siennathesane
Copy link
Author

Sure, it's pretty straightforward.

// IteratorRange represents a range of numbers to chunk slices into.
type IteratorRange struct{ From, To uint64 }

// GenerateRanges generates a list of ranges from min to max with a batch size.
func GenerateRanges(min, max uint64, batchSize uint64) []IteratorRange {
	if batchSize == 0 {
		batchSize = 1
	}

	var ranges []IteratorRange
	// this has to be from <= max to include the last batch
	for from := min; from <= max; from += batchSize {
		// without -1 the first batch will be 10-20 and the second batch will be 20-30
		// with -1 the first batch will be 10-19 and the second batch will be 20-29
		to := (from + batchSize) - 1
		if to > max {
			to = max
		}
		ranges = append(ranges, IteratorRange{from, to})
	}

	return ranges
}

@RaduBerinde
Copy link
Member

Well, it looks like To is inclusive (it's from+batchsize-1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Community
Development

No branches or pull requests

3 participants