-
Notifications
You must be signed in to change notification settings - Fork 887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GODRIVER-2443 Use RawValue in Cursor #1411
Conversation
…hor (mongodb#1393) Co-authored-by: Qingyang Hu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bsoncore.Iterator
is a big improvement on the confusing DocumentSequence
type! However, I'm not sure that Cursor
is the right type to use for Distinct
.
var results []bson.M | ||
if err = cur.All(context.TODO(), &results); err != nil { | ||
log.Fatal(err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned that the Cursor
API doesn't really fit the requirements of the distinct
API. In the example code here, the user must provide a Context
to iterate the cursor because the Cursor
might have to call getMore
. However, distinct
returns all results and there is no server-side cursor to iterate, so that Context
and getMore
logic are unnecessary. Additionally, there's no way to access the full BSON result, only one value at a time.
Looking for uses of Distinct
using Sourcegraph, I found that the most common code is something like:
results, err := coll.Distinct(ctx, "myField", filter)
if err != nil {
return err
}
reportTimes := make([]int64, len(results))
for i, result := range results {
reportTimes[i] = result.(int64)
}
Users tend to expect the returned values to be all the same type. Handling of unexpected types varies.
A better user experience uses an API similar to SingleResult
:
var res []string
err := coll.Distinct(ctx, "myField", filter).Decode(&res)
if err != nil {
return err
}
If a user wanted to handle different types with different logic, they could call Raw
instead (Raw
would return a bson.RawArray
, which will be added with GODRIVER-1639):
res, err := coll.Distinct(ctx, "myField", filter).Raw()
if err != nil {
return err
}
values, err := res.Values()
if err != nil {
return err
}
for _, val := range values {
switch val.Type {
// Handle different values.
}
}
It's not clear if the actual SingleResult
type can be reused for Distinct
, but at least the API seems more appropriate.
Closing in favor of separating the iterator (GODRIVER-3049) from the requirement of returning a decodable struct from the distinct method (GODRIVER-2443, repurposed). |
GODRIVER-2443
Summary
See "Background &Motivation" for the explanation of the problem this PR is trying to solve.
This PR proposes changing the Cursor.Current type to
bson.RawValue
. This would allow us to preload anything we wanted intoBatchCursor.currentBatch
and would deprecate the need forbsoncore.DocumentSequence
. The server response for the distinct operation could then be processed like this:We would need to retype
BatchCursor.currentBatch
andCursorResponse.FirstBatch
to[]bson.RawValue
.With this change, the user has three options for decoding cursor.Current:
Architecturally, this proposal can be can be represented by the following sequence diagram:
Background & Motivation
The goal of GODRIVER-2443 is to return a cursor from the Collection.Distinct() function, rather than []any. There are two notable problems in doing this:
The first (1) problem is that the distinct command will not return a cursor, unlike find and aggregate. Because of this, there is no need to make succeeding getMore trips to the server to load new batches of data. The Java Driver resolves this by preloading the results of the operation into a list that the user can iterate through via the cursor. The Go Driver has a document-specific solution to this problem in mongo.NewCursorFromDocuments.
The second (2) problem is that the distinct command could return non-document data, unlike find and aggregate. For example, the command db.runCommand({ distinct: "grass", key: "name" }) could result in the following server response:
Where the values that we should iterate over are
The current design for mongo.Cursor is such that the “Current” type is bson.Raw, which is an encoded BSON document. Furthermore, bsoncore.DocumentSequence is the underlying iterator that is used to load a sequence of documents for the cursor and is, of course, document-specific.