Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GODRIVER-2443 Use RawValue in Cursor #1411

Closed
wants to merge 15 commits into from

Conversation

prestonvasquez
Copy link
Collaborator

@prestonvasquez prestonvasquez commented Oct 5, 2023

GODRIVER-2443

Summary

See "Background &Motivation" for the explanation of the problem this PR is trying to solve.


This PR proposes changing the Cursor.Current type to bson.RawValue. This would allow us to preload anything we wanted into BatchCursor.currentBatch and would deprecate the need for bsoncore.DocumentSequence. The server response for the distinct operation could then be processed like this:

bytes, _ := bson.Marshal(resp)
values, _ := bson.Raw(mockSrvRespBytes).Lookup("values").Array().Values()

We would need to retype BatchCursor.currentBatch and CursorResponse.FirstBatch to []bson.RawValue.

With this change, the user has three options for decoding cursor.Current:

var str string
cursor.Current.Unmarshal(&str)
str, _ := cur.Current.StringValueOK()
var str string
cursor.Decode(&str)

Architecturally, this proposal can be can be represented by the following sequence diagram:

rawvalue_preloaded (4)

Background & Motivation

The goal of GODRIVER-2443 is to return a cursor from the Collection.Distinct() function, rather than []any. There are two notable problems in doing this:

The first (1) problem is that the distinct command will not return a cursor, unlike find and aggregate. Because of this, there is no need to make succeeding getMore trips to the server to load new batches of data. The Java Driver resolves this by preloading the results of the operation into a list that the user can iterate through via the cursor. The Go Driver has a document-specific solution to this problem in mongo.NewCursorFromDocuments.

The second (2) problem is that the distinct command could return non-document data, unlike find and aggregate. For example, the command db.runCommand({ distinct: "grass", key: "name" }) could result in the following server response:

{
  values: [ 3.14159265, 'red fescue', { name: 'Kentucky Bluegrass' } ],
  ...
}

Where the values that we should iterate over are

{"$numberDouble":"3.14159265"}
"red fescue"
{"name": "Kentucky Bluegrass"}

The current design for mongo.Cursor is such that the “Current” type is bson.Raw, which is an encoded BSON document. Furthermore, bsoncore.DocumentSequence is the underlying iterator that is used to load a sequence of documents for the cursor and is, of course, document-specific.

@prestonvasquez prestonvasquez changed the base branch from v1 to master October 5, 2023 01:16
Copy link
Collaborator

@matthewdale matthewdale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bsoncore.Iterator is a big improvement on the confusing DocumentSequence type! However, I'm not sure that Cursor is the right type to use for Distinct.

Comment on lines +333 to +336
var results []bson.M
if err = cur.All(context.TODO(), &results); err != nil {
log.Fatal(err)
}
Copy link
Collaborator

@matthewdale matthewdale Oct 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned that the Cursor API doesn't really fit the requirements of the distinct API. In the example code here, the user must provide a Context to iterate the cursor because the Cursor might have to call getMore. However, distinct returns all results and there is no server-side cursor to iterate, so that Context and getMore logic are unnecessary. Additionally, there's no way to access the full BSON result, only one value at a time.

Looking for uses of Distinct using Sourcegraph, I found that the most common code is something like:

results, err := coll.Distinct(ctx, "myField", filter)
if err != nil {
	return err
}

reportTimes := make([]int64, len(results))
for i, result := range results {
	reportTimes[i] = result.(int64)
}

Users tend to expect the returned values to be all the same type. Handling of unexpected types varies.

A better user experience uses an API similar to SingleResult:

var res []string
err := coll.Distinct(ctx, "myField", filter).Decode(&res)
if err != nil {
	return err
}

If a user wanted to handle different types with different logic, they could call Raw instead (Raw would return a bson.RawArray, which will be added with GODRIVER-1639):

res, err := coll.Distinct(ctx, "myField", filter).Raw()
if err != nil {
	return err
}

values, err := res.Values()
if err != nil {
	return err
}

for _, val := range values {
	switch val.Type {
	// Handle different values.
	}
}

It's not clear if the actual SingleResult type can be reused for Distinct, but at least the API seems more appropriate.

@prestonvasquez
Copy link
Collaborator Author

Closing in favor of separating the iterator (GODRIVER-3049) from the requirement of returning a decodable struct from the distinct method (GODRIVER-2443, repurposed).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants