Skip to content
/ picker Public

Utilities that pair with the official Go Elasticsearch package

License

Notifications You must be signed in to change notification settings

chanced/picker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

picker

picker is composed of types for Elasticsearch 7. For now, the purpose is to have strongly typed mappings, fields, and analyzers that can be marshaled and unmarshaled, to be used with the official Elasticsearch go client.

Aggregations are currently a map[string]interface{}. There are ~70 aggregations and its sort of an all-or-nothing. I plan to get to them all but this is incredibly tedious and time consuming.

s, err := picker.NewSearch(picker.SearchParams{
    	Query: &picker.QueryParams{
    		Intervals: picker.IntervalsQueryParams{
    			Field: "my_text",
    			Rule: picker.AllOfRuleParams{
    				Ordered: true,
    				Intervals: picker.Ruleset{
    					picker.MatchRuleParams{
    						Query:   "my favorite food",
    						MaxGaps: 0,
    						Ordered: true,
    					},
    					picker.AnyOfRuleParams{
    						Intervals: picker.Ruleset{
    							picker.MatchRuleParams{Query: "brisket"},
    							picker.MatchRuleParams{Query: "kimchy fries"},
    						},
    					},
    				},
    			},
    		},
    	},
})
_ = err
data, _ := json.MarshalIndent(s, "", "  ")
fmt.Println(string(data))

Produces:

{
  "query": {
    "intervals": {
      "my_text": {
        "all_of": {
          "intervals": [
            {
              "match": {
                "query": "my favorite food",
                "max_gaps": 0,
                "ordered": true
              }
            },
            {
              "any_of": {
                "intervals": [
                  {
                    "match": {
                      "query": "brisket"
                    }
                  },
                  {
                    "match": {
                      "query": "kimchy fries"
                    }
                  }
                ]
              }
            }
          ],
          "ordered": true
        }
      }
    }
  }
}

picker will unmarshal shorthand queries (like { "term": {"my_field": "my_string" } }) but always marshals into long-form ({"term":{"my_field": {"value": "my_string" } } })

Todo

Testing is incredibly sparse at the moment. I'm merely using this list to keep track of which structures are theoretically complete. It does not indicate the doneness of the items as testing is undoubtly lacking.

Common types

  • Binary
    Binary value encoded as a Base64 string.
  • Boolean
    true and false values.
  • Keyword
    used for structured content such as IDs, email addresses, hostnames, status codes, zip codes, or tags.
  • Constant keyword [X-Pack]
    Constant keyword is a specialization of the keyword field for the case that all documents in the index have the same value.
  • Wildcard [X-Pack]
    The wildcard field type is a specialized keyword field for unstructured machine-generated content you plan to search using grep-like wildcard and regexp queries. The wildcard type is optimized for fields with large values or high cardinality.
  • Long
    Long is a signed 64-bit integer with a minimum value of -263 and a maximum value of 263-1.
  • Integer
    Integer is a signed 64-bit integer with a minimum value of -263 and a maximum value of 263-1.
  • Byte
    Byte is a signed 8-bit integer with a minimum value of -128 and a maximum value of 127.
  • Float
    Float is a single-precision 32-bit IEEE 754 floating point number, restricted to finite values.
  • Double
    Double is a double-precision 64-bit IEEE 754 floating point number, restricted to finite values.
  • Short
    Short is signed 16-bit integer with a minimum value of -32,768 and a maximum value of 32,767.
  • HalfFloat
    HalfFloat is a half-precision 16-bit IEEE 754 floating point number, restricted to finite values.
  • UnsignedLong
    UnsignedLongFieldParams are params for an UnsignedLongField which is an unsigned 64-bit integer with a minimum value of 0 and a maximum value of 264-1.
  • ScaledFloat
    Numeric types, such as long and double, used to express amounts.
  • Date
    Date field type
  • Date nanoseconds
    Date nanoseconds field type
  • Alias
    Defines an alias for an existing field.

Objects and relational types

  • Object
    A JSON object.
  • Flattened
    An entire JSON object as a single field value.
  • Nested
    A JSON object that preserves the relationship between its subfields.
  • Join
    Defines a parent/child relationship for documents in the same index.

Structured data types

  • Long range
    LongRangeField is a range of signed 64-bit integers with a minimum value of -263 and maximum of 263-1.
  • Integer range
    Range types, such as long_range, double_range, date_range, and ip_range.
  • Float range
    FloatRangeField is a range of single-precision 32-bit IEEE 754 floating point values.
  • Double range
    Range types, such as long_range, double_range, date_range, and ip_range.
  • Date range
    DateRangeField is a range of date values. Date ranges support various date formats through the format mapping parameter. Regardless of the format used, date values are parsed into an unsigned 64-bit integer representing milliseconds since the Unix epoch in UTC. Values containing the now date math expression are not supported
  • IP range
    IPRangeField is a range of ip values supporting either IPv4 or IPv6 (or mixed) addresses.
  • IP
    IPv4 and IPv6 addresses.
  • Version [X-Pack]
    Software versions. Supports Semantic Versioning precedence rules.
  • Murmur3 [Plugin]
    Compute and stores hashes of values.

Aggregate data types

Text search types

Document ranking types

  • Dense vector [X-Pack]
    Records dense vectors of float values.
  • Sparse vector [X-Pack] [Deprecated]
    Records sparse vectors of float values.
  • Rank feature
    Records a numeric feature to boost hits at query time.
  • Rank features
    Records numeric features to boost hits at query time.

Spatial data types

  • Geo point
    Latitude and longitude points.
  • Geo shape
    Complex shapes, such as polygons.
  • Point
    Arbitrary cartesian points.
  • Shape
    Arbitrary cartesian geometries.

Other types

  • Percolator
    Indexes queries written in Query DSL.

Queries

Compound queries wrap other compound or leaf queries, either to combine their results and scores, to change their behaviour, or to switch from query to filter context.

  • Boolean
    The default query for combining multiple leaf or compound query clauses, as must, should, must_not, or filter clauses. The must and should clauses have their scores combined — the more matching clauses, the better — while the must_not and filter clauses are executed in filter context.
  • Boosting
    Return documents which match a positive query, but reduce the score of documents which also match a negative query.
  • Constant score
    A query which wraps another query, but executes it in filter context. All matching documents are given the same “constant” _score.
  • Disjunction max
    A query which accepts multiple queries, and returns any documents which match any of the query clauses. While the bool query combines the scores from all matching queries, the dis_max query uses the score of the single best- matching query clause.
  • Function score
    Modify the scores returned by the main query with functions to take into account factors like popularity, recency, distance, or custom algorithms implemented with scripting.

The full text queries enable you to search analyzed text fields such as the body of an email. The query string is processed using the same analyzer that was applied to the field during indexing.

  • Intervals
    A full text query that allows fine-grained control of the ordering and proximity of matching terms.
  • Match
    The standard query for performing full text queries, including fuzzy matching and phrase or proximity queries.
  • Match bool prefix
    Creates a bool query that matches each term as a term query, except for the last term, which is matched as a prefix query
  • Match phrase
    Like the match query but used for matching exact phrases or word proximity matches.
  • Match phrase prefix
    Like the match_phrase query, but does a wildcard search on the final word.
  • Multi-match
    The multi-field version of the match query.
  • Common Terms [Deprecated]
    A more specialized query which gives more preference to uncommon words.
  • Query string
    Supports the compact Lucene query string syntax, allowing you to specify AND|OR|NOT conditions and multi-field search within a single query string. For expert users only.
  • Simple query string
    A simpler, more robust version of the query_string syntax suitable for exposing directly to users.

Elasticsearch supports two types of geo data: geo_point fields which support lat/lon pairs, and geo_shape fields, which support points, lines, circles, polygons, multi-polygons, etc.

  • Geo bounding box
    Finds documents with geo-points that fall into the specified rectangle.
  • Geo distance
    Finds documents with geo-points within the specified distance of a central point.
  • Geo polygon [Deprecated]
    Find documents with geo-points within the specified polygon.
  • Geo shape
    • geo-shapes which either intersect, are contained by, or do not intersect with the specified geo-shape
    • geo-points which intersect the specified geo-shape

Shape queries [X-Pack]

Like geo_shape Elasticsearch supports the ability to index arbitrary two dimension (non Geospatial) geometries making it possible to map out virtual worlds, sporting venues, theme parks, and CAD diagrams. Elasticsearch supports two types of cartesian data: point fields which support x/y pairs, and shape fields, which support points, lines, circles, polygons, multi-polygons, etc.

  • Shape
    • shapes which either intersect, are contained by, are within or do not intersect with the specified shape
    • points which intersect the specified shape
  • Nested
    Documents may contain fields of type nested. These fields are used to index arrays of objects, where each object can be queried (with the nested query) as an independent document.
  • Has child
    A join field relationship can exist between documents within a single index. The has_child query returns parent documents whose child documents match the specified query, while the has_parent query returns child documents whose parent document matches the specified query.
  • Has parent
    Returns child documents whose joined parent document matches a provided query. You can create parent-child relationships between documents in the same index using a join field mapping.
  • Parent ID
    Returns child documents joined to a specific parent document. You can use a join field mapping to create parent-child relationships between documents in the same index.
  • Match all
    The most simple query, which matches all documents, giving them all a _score of 1.0.
  • Match none
    This is the inverse of the match_all query, which matches no documents.
  • Distance feature
    A query that computes scores based on the dynamically computed distances between the origin and documents' date, date_nanos and geo_point fields. It is able to efficiently skip non-competitive hits.
  • More like this
    This query finds documents which are similar to the specified text, document, or collection of documents.
  • Percolate
    This query finds queries that are stored as documents that match with the specified document.
  • Rank feature
    A query that computes scores based on the values of numeric features and is able to efficiently skip non-competitive hits.
  • Script
    This query allows a script to act as a filter. Also see the function_score query.
  • Script score
    A query that allows to modify the score of a sub-query with a script.
  • Wrapper
    A query that accepts other queries as json or yaml string.
  • Pinned [X-Pack]
    A query that promotes selected documents over others matching a given query.

You can use term-level queries to find documents based on precise values in structured data. Examples of structured data include date ranges, IP addresses, prices, or product IDs. Unlike full-text queries, term-level queries do not analyze search terms. Instead, term-level queries match the exact terms stored in a field.

  • Exists
    Returns documents that contain any indexed value for a field.
  • Fuzzy
    Returns documents that contain terms similar to the search term. Elasticsearch measures similarity, or fuzziness, using a Levenshtein edit distance.
  • IDs
    Returns documents based on their document IDs.
  • Prefix
    Returns documents that contain a specific prefix in a provided field.
  • Range
    Returns documents that contain terms within a provided range.
  • Regexp
    Returns documents that contain terms matching a regular expression.
  • Term
    Returns documents that contain an exact term in a provided field.
  • Terms
    Returns documents that contain one or more exact terms in a provided field.
  • Terms set
    Returns documents that contain a minimum number of exact terms in a provided field. You can define the minimum number of matching terms using a field or script.
  • Type [Deprecated]
    Returns documents of the specified type.
  • Wildcard
    Returns documents that contain terms matching a wildcard pattern.

Span queries are low-level positional queries which provide expert control over the order and proximity of the specified terms. These are typically used to implement very specific queries on legal documents or patents.

It is only allowed to set boost on an outer span query. Compound span queries, like span_near, only use the list of matching spans of inner span queries in order to find their own spans, which they then use to produce a score. Scores are never computed on inner span queries, which is the reason why boosts are not allowed: they only influence the way scores are computed, not spans.

Span queries cannot be mixed with non-span queries (with the exception of the span_multi query).

  • Span containing
    Accepts a list of span queries, but only returns those spans which also match a second span query.
  • Field masking span
    Allows queries like span-near or span-or across different fields.
  • Span first
    Accepts another span query whose matches must appear within the first N positions of the field.
  • Span multi
    Wraps a term, range, prefix, wildcard, regexp, or fuzzy query.
  • Span near
    Accepts multiple span queries whose matches must be within the specified distance of each other, and possibly in the same order.
  • Span not
    Wraps another span query, and excludes any documents which match that query.
  • Span or
    Combines multiple span queries — returns documents which match any of the specified queries.
  • Span term
    The equivalent of the term query but for use with other span queries.
  • Span within
    The result from a single span query is returned as long is its span falls within the spans returned by a list of other span queries.
  • Append
    Appends one or more values to an existing array if the field already exists and it is an array. Converts a scalar to an array and appends one or more values to it if the field exists and it is a scalar. Creates an array containing the provided values if the field doesn’t exist. Accepts a single value or an array of values.
  • Bytes
    Converts a human readable byte value (e.g. 1kb) to its value in bytes (e.g. 1024). If the field is an array of strings, all members of the array will be converted.
  • Circle [X-Pack]
    Converts circle definitions of shapes to regular polygons which approximate them.
  • Community ID [X-Pack]
    Computes the Community ID for network flow data as defined in the Community ID Specification. You can use a community ID to correlate network events related to a single flow.
  • Convert
    Converts a field in the currently ingested document to a different type, such as converting a string to an integer. If the field value is an array, all members will be converted.
    The supported types include: integer, long, float, double, string, boolean, and auto.
    Specifying boolean will set the field to true if its string value is equal to true (ignore case), to false if its string value is equal to false (ignore case), or it will throw an exception otherwise.
  • CSV
    Extracts fields from CSV line out of a single text field within a document. Any empty field in CSV will be skipped.
  • Date
    Parses dates from fields, and then uses the date or timestamp as the timestamp for the document. By default, the date processor adds the parsed date as a new field called @timestamp. You can specify a different field by setting the target_field configuration parameter. Multiple date formats are supported as part of the same date processor definition. They will be used sequentially to attempt parsing the date field, in the same order they were defined as part of the processor definition.
  • Date Index Name
    The purpose of this processor is to point documents to the right time based index based on a date or timestamp field in a document by using the date math index name support.
    The processor sets the _index metadata field with a date math index name expression based on the provided index name prefix, a date or timestamp field in the documents being processed and the provided date rounding.
  • Dissect
    Similar to the Grok Processor, dissect also extracts structured fields out of a single text field within a document. However unlike the Grok Processor, dissect does not use Regular Expressions. This allows dissect’s syntax to be simple and for some cases faster than the Grok Processor.
    Dissect matches a single text field against a defined pattern.
  • DotExpander
    Expands a field with dots into an object field. This processor allows fields with dots in the name to be accessible by other processors in the pipeline. Otherwise these fields can’t be accessed by any processor.
  • Drop
    Drops the document without raising any errors. This is useful to prevent the document from getting indexed based on some condition.
  • Enrich [X-Pack]
    The enrich processor can enrich documents with data from another index. See enrich data section for more information about how to set this up.
  • Fail
    Raises an exception. This is useful for when you expect a pipeline to fail and want to relay a specific message to the requester.
  • Fingerprint [X-Pack]
    Computes a hash of the document’s content. You can use this hash for content fingerprinting.
  • Foreach
    Processes elements in an array of unknown length.
  • GeoIP
    The geoip processor adds information about the geographical location of IP addresses, based on data from the Maxmind databases. This processor adds this information by default under the geoip field. The geoip processor can resolve both IPv4 and IPv6 addresses.
  • Grok
    Extracts structured fields out of a single text field within a document. You choose which field to extract matched fields from, as well as the grok pattern you expect will match. A grok pattern is like a regular expression that supports aliased expressions that can be reused.
  • Gsub
    Converts a string field by applying a regular expression and a replacement. If the field is an array of string, all members of the array will be converted. If any non-string values are encountered, the processor will throw an exception.
  • HTML Strip
    Removes HTML tags from the field. If the field is an array of strings, HTML tags will be removed from all members of the array.
  • Inference [X-Pack]
    Uses a pre-trained data frame analytics model to infer against the data that is being ingested in the pipeline.
  • Join
    Joins each element of an array into a single string using a separator character between each element. Throws an error when the field is not an array.
  • JSON
    Converts a JSON string into a structured JSON object.
  • KV
    This processor helps automatically parse messages (or specific event fields) which are of the foo=bar variety.
  • Lowercase
    Converts a string to its lowercase equivalent. If the field is an array of strings, all members of the array will be converted.
  • NetworkDirection [X-Pack]
    Calculates the network direction given a source IP address, destination IP address, and a list of internal networks.
  • Pipeline
    Executes another pipeline.
  • Remove
    Removes existing fields. If one field doesn’t exist, an exception will be thrown.
  • Rename
    Renames an existing field. If the field doesn’t exist or the new name is already used, an exception will be thrown.
  • Script
    Allows inline and stored scripts to be executed within ingest pipelines.
  • Set
    Sets one field and associates it with the specified value. If the field already exists, its value will be replaced with the provided one.
  • Set Security User
    Sets user-related details (such as username, roles, email, full_name, metadata, api_key, realm and authentication_type) from the current authenticated user to the current document by pre-processing the ingest. The api_key property exists only if the user authenticates with an API key. It is an object containing the id and name fields of the API key. The realm property is also an object with two fields, name and type. When using API key authentication, the realm property refers to the realm from which the API key is created. The authentication_type property is a string that can take value from REALM, API_KEY, TOKEN and ANONYMOUS.
  • Sort
    Sorts the elements of an array ascending or descending. Homogeneous arrays of numbers will be sorted numerically, while arrays of strings or heterogeneous arrays of strings + numbers will be sorted lexicographically. Throws an error when the field is not an array.
  • Split
    Splits a field into an array using a separator character. Only works on string fields.
  • Trim
    Trims whitespace from field. If the field is an array of strings, all members of the array will be trimmed.
  • Uppercase
    Converts a string to its uppercase equivalent. If the field is an array of strings, all members of the array will be converted.
  • URL Decode
    URL-decodes a string. If the field is an array of strings, all members of the array will be decoded.
  • URI Parts [X-Pack]
    Parses a Uniform Resource Identifier (URI) string and extracts its components as an object. This URI object includes properties for the URI’s domain, path, fragment, port, query, scheme, user info, username, and password.
  • User Agent
    The user_agent processor extracts details from the user agent string a browser sends with its web requests. This processor adds this information by default under the user_agent field.

An aggregation summarizes your data as metrics, statistics, or other analytics.

Bucket aggregations don’t calculate metrics over fields like the metrics aggregations do, but instead, they create buckets of documents. Each bucket is associated with a criterion (depending on the aggregation type) which determines whether or not a document in the current context "falls" into it. In other words, the buckets effectively define document sets. In addition to the buckets themselves, the bucket aggregations also compute and return the number of documents that "fell into" each bucket.

Bucket aggregations, as opposed to metrics aggregations, can hold sub-aggregations. These sub-aggregations will be aggregated for the buckets created by their "parent" bucket aggregation.

There are different bucket aggregators, each with a different "bucketing" strategy. Some define a single bucket, some define fixed number of multiple buckets, and others dynamically create the buckets during the aggregation process.

  • Adjacency matrix
    A bucket aggregation returning a form of adjacency matrix. The request provides a collection of named filter expressions, similar to the filters aggregation request. Each bucket in the response represents a non-empty cell in the matrix of intersecting filters.
  • Auto-interval date histogram
    A multi-bucket aggregation similar to the Date histogram except instead of providing an interval to use as the width of each bucket, a target number of buckets is provided indicating the number of buckets needed and the interval of the buckets is automatically chosen to best achieve that target. The number of buckets returned will always be less than or equal to this target number.
  • Children
    A special single bucket aggregation that selects child documents that have the specified type, as defined in a join field.
  • Composite
    A multi-bucket aggregation that creates composite buckets from different sources.
    Unlike the other multi-bucket aggregations, you can use the composite aggregation to paginate all buckets from a multi-level aggregation efficiently. This aggregation provides a way to stream all buckets of a specific aggregation, similar to what scroll does for documents.
    The composite buckets are built from the combinations of the values extracted/created for each document and each combination is considered as a composite bucket.
  • Date histogram
    This multi-bucket aggregation is similar to the normal histogram, but it can only be used with date or date range values. Because dates are represented internally in Elasticsearch as long values, it is possible, but not as accurate, to use the normal histogram on dates as well. The main difference in the two APIs is that here the interval can be specified using date/time expressions. Time-based data requires special support because time-based intervals are not always a fixed length.
  • Date range
    A range aggregation that is dedicated for date values. The main difference between this aggregation and the normal range aggregation is that the from and to values can be expressed in Date Math expressions, and it is also possible to specify a date format by which the from and to response fields will be returned. Note that this aggregation includes the from value and excludes the to value for each range.
  • Diversified sampler
    Like the sampler aggregation this is a filtering aggregation used to limit any sub aggregations' processing to a sample of the top-scoring documents. The diversified_sampler aggregation adds the ability to limit the number of matches that share a common value such as an "author".
  • Filter
    Defines a single bucket of all the documents in the current document set context that match a specified filter. Often this will be used to narrow down the current aggregation context to a specific set of documents.
  • Filters
    Defines a multi bucket aggregation where each bucket is associated with a filter. Each bucket will collect all documents that match its associated filter.
  • Geo-distance
    A multi-bucket aggregation that works on geo_point fields and conceptually works very similar to the range aggregation. The user can define a point of origin and a set of distance range buckets. The aggregation evaluate the distance of each document value from the origin point and determines the buckets it belongs to based on the ranges (a document belongs to a bucket if the distance between the document and the origin falls within the distance range of the bucket).
  • Geohash grid
    A multi-bucket aggregation that groups geo_point and geo_shape values into buckets that represent a grid. The resulting grid can be sparse and only contains cells that have matching data. Each cell is labeled using a geohash which is of user-definable precision.
  • Geotile grid
    A multi-bucket aggregation that groups geo_point and geo_shape values into buckets that represent a grid. The resulting grid can be sparse and only contains cells that have matching data. Each cell corresponds to a map tile as used by many online map sites. Each cell is labeled using a "{zoom}/{x}/{y}" format, where zoom is equal to the user-specified precision.
  • Global
    Defines a single bucket of all the documents within the search execution context. This context is defined by the indices and the document types you’re searching on, but is not influenced by the search query itself.
  • Histogram
    A multi-bucket values source based aggregation that can be applied on numeric values or numeric range values extracted from the documents. It dynamically builds fixed size (a.k.a. interval) buckets over the values.
  • IP range
    Just like the dedicated date range aggregation, there is also a dedicated range aggregation for IP typed fields
  • Missing
    A field data based single bucket aggregation, that creates a bucket of all documents in the current document set context that are missing a field value (effectively, missing a field or having the configured NULL value set). This aggregator will often be used in conjunction with other field data bucket aggregators (such as ranges) to return information for all the documents that could not be placed in any of the other buckets due to missing field data values.
  • Multi Terms [X-Pack]
    A multi-bucket value source based aggregation where buckets are dynamically built - one per unique set of values. The multi terms aggregation is very similar to the terms aggregation, however in most cases it will be slower than the terms aggregation and will consume more memory. Therefore, if the same set of fields is constantly used, it would be more efficient to index a combined key for this fields as a separate field and use the terms aggregation on this field.
  • Nested
    A special single bucket aggregation that enables aggregating nested documents.
  • Parent
    A special single bucket aggregation that selects parent documents that have the specified type, as defined in a join field.
  • Range
    A multi-bucket value source based aggregation that enables the user to define a set of ranges - each representing a bucket. During the aggregation process, the values extracted from each document will be checked against each bucket range and "bucket" the relevant/matching document. Note that this aggregation includes the from value and excludes the to value for each range.
  • Rare terms
    A multi-bucket value source based aggregation which finds "rare" terms — terms that are at the long-tail of the distribution and are not frequent. Conceptually, this is like a terms aggregation that is sorted by _count ascending. As noted in the terms aggregation docs, actually ordering a terms agg by count ascending has unbounded error. Instead, you should use the rare_terms aggregation
  • Reverse nested
    A special single bucket aggregation that enables aggregating on parent docs from nested documents. Effectively this aggregation can break out of the nested block structure and link to other nested structures or the root document, which allows nesting other aggregations that aren’t part of the nested object in a nested aggregation.
  • Sampler
    A filtering aggregation used to limit any sub aggregations' processing to a sample of the top-scoring documents.
  • Significant terms
    An aggregation that returns interesting or unusual occurrences of terms in a set.
  • Significant text
    An aggregation that returns interesting or unusual occurrences of free-text terms in a set.
  • Terms
    A multi-bucket value source based aggregation where buckets are dynamically built - one per unique value.
  • Variable width histogram
    This is a multi-bucket aggregation similar to Histogram. However, the width of each bucket is not specified. Rather, a target number of buckets is provided and bucket intervals are dynamically determined based on the document distribution. This is done using a simple one-pass document clustering algorithm that aims to obtain low distances between bucket centroids. Unlike other multi-bucket aggregations, the intervals will not necessarily have a uniform width.

The aggregations in this family compute metrics based on values extracted in one way or another from the documents that are being aggregated. The values are typically extracted from the fields of the document (using the field data), but can also be generated using scripts.

Numeric metrics aggregations are a special type of metrics aggregation which output numeric values. Some aggregations output a single numeric metric (e.g. avg) and are called single-value numeric metrics aggregation, others generate multiple metrics (e.g. stats) and are called multi-value numeric metrics aggregation. The distinction between single-value and multi-value numeric metrics aggregations plays a role when these aggregations serve as direct sub-aggregations of some bucket aggregations (some bucket aggregations enable you to sort the returned buckets based on the numeric metrics in each bucket).

  • Avg
    A single-value metrics aggregation that computes the average of numeric values that are extracted from the aggregated documents. These values can be extracted either from specific numeric fields in the documents, or be generated by a provided script.
  • Boxplot [X-Pack]
    A boxplot metrics aggregation that computes boxplot of numeric values extracted from the aggregated documents. These values can be generated by a provided script or extracted from specific numeric or histogram fields in the documents.
  • Cardinality
    A single-value metrics aggregation that calculates an approximate count of distinct values. Values can be extracted either from specific fields in the document or generated by a script.
  • Extended stats
    A multi-value metrics aggregation that computes stats over numeric values extracted from the aggregated documents. These values can be extracted either from specific numeric fields in the documents, or be generated by a provided script.
    The extended_stats aggregations is an extended version of the stats aggregation, where additional metrics are added such as sum_of_squares, variance, std_deviation and std_deviation_bounds.
  • Geo-bounds
    A metric aggregation that computes the bounding box containing all geo values for a field.
  • Geo-centroid
    metric aggregation that computes the weighted centroid from all coordinate values for geo fields.
  • Geo-Line [X-Pack]
    The geo_line aggregation aggregates all geo_point values within a bucket into a LineString ordered by the chosen sort field. This sort can be a date field, for example. The bucket returned is a valid GeoJSON Feature representing the line geometry.
  • Matrix stats
    The matrix_stats aggregation is a numeric aggregation that computes the following statistics over a set of document fields
  • Max
    A single-value metrics aggregation that keeps track and returns the maximum value among the numeric values extracted from the aggregated documents. These values can be extracted either from specific numeric fields in the documents, or be generated by a provided script.
  • Median absolute deviation
    This single-value aggregation approximates the median absolute deviation of its search results.
    Median absolute deviation is a measure of variability. It is a robust statistic, meaning that it is useful for describing data that may have outliers, or may not be normally distributed. For such data it can be more descriptive than standard deviation.
  • Min
    single-value metrics aggregation that keeps track and returns the minimum value among numeric values extracted from the aggregated documents. These values can be extracted either from specific numeric fields in the documents, or be generated by a provided script.
  • Percentile ranks
    A multi-value metrics aggregation that calculates one or more percentile ranks over numeric values extracted from the aggregated documents. These values can be generated by a provided script or extracted from specific numeric or histogram fields in the documents.
  • Percentiles
    A multi-value metrics aggregation that calculates one or more percentiles over numeric values extracted from the aggregated documents. These values can be generated by a provided script or extracted from specific numeric or histogram fields in the documents.
  • Rate [X-Pack]
    A rate metrics aggregation can be used only inside a date_histogram and calculates a rate of documents or a field in each date_histogram bucket. The field values can be generated by a provided script or extracted from specific numeric or histogram fields in the documents.
  • Scripted metric
    A metric aggregation that executes using scripts to provide a metric output.
  • Stats
    A multi-value metrics aggregation that computes stats over numeric values extracted from the aggregated documents. These values can be extracted either from specific numeric fields in the documents, or be generated by a provided script.
    The stats that are returned consist of: min, max, sum, count and avg.
  • String stats [X-Pack]
    A multi-value metrics aggregation that computes statistics over string values extracted from the aggregated documents. These values can be retrieved either from specific keyword fields in the documents or can be generated by a provided script.
  • Sum
    A single-value metrics aggregation that sums up numeric values that are extracted from the aggregated documents. These values can be extracted either from specific numeric or histogram fields in the documents, or be generated by a provided script.
  • T-test [X-Pack]
    A t_test metrics aggregation that performs a statistical hypothesis test in which the test statistic follows a Student’s t-distribution under the null hypothesis on numeric values extracted from the aggregated documents or generated by provided scripts. In practice, this will tell you if the difference between two population means are statistically significant and did not occur by chance alone.
  • Top hits
    A top_hits metric aggregator keeps track of the most relevant document being aggregated. This aggregator is intended to be used as a sub aggregator, so that the top matching documents can be aggregated per bucket.
  • Top metrics [X-Pack]
    The top_metrics aggregation selects metrics from the document with the largest or smallest "sort" value.
  • Value count
    A single-value metrics aggregation that counts the number of values that are extracted from the aggregated documents. These values can be extracted either from specific fields in the documents, or be generated by a provided script. Typically, this aggregator will be used in conjunction with other single-value aggregations. For example, when computing the avg one might be interested in the number of values the average is computed over.
  • Weighted avg
    A single-value metrics aggregation that computes the weighted average of numeric values that are extracted from the aggregated documents. These values can be extracted either from specific numeric fields in the documents.

Pipeline aggregations work on the outputs produced from other aggregations rather than from document sets, adding information to the output tree. There are many different types of pipeline aggregation, each computing different information from other aggregations, but these types can be broken down into two families:

Parent A family of pipeline aggregations that is provided with the output of its parent aggregation and is able to compute new buckets or new aggregations to add to existing buckets.

Sibling Pipeline aggregations that are provided with the output of a sibling aggregation and are able to compute a new aggregation which will be at the same level as the sibling aggregation.

  • Average bucket
    A sibling pipeline aggregation which calculates the mean value of a specified metric in a sibling aggregation. The specified metric must be numeric and the sibling aggregation must be a multi-bucket aggregation.
  • Bucket script
    A parent pipeline aggregation which executes a script which can perform per bucket computations on specified metrics in the parent multi-bucket aggregation. The specified metric must be numeric and the script must return a numeric value.
  • Bucket selector
    A parent pipeline aggregation which executes a script which determines whether the current bucket will be retained in the parent multi-bucket aggregation. The specified metric must be numeric and the script must return a boolean value. If the script language is expression then a numeric return value is permitted. In this case 0.0 will be evaluated as false and all other values will evaluate to true.
  • Bucket sort
    A parent pipeline aggregation which sorts the buckets of its parent multi-bucket aggregation. Zero or more sort fields may be specified together with the corresponding sort order. Each bucket may be sorted based on its _key, _count or its sub-aggregations. In addition, parameters from and size may be set in order to truncate the result buckets.
  • Cumulative cardinality [X-Pack]
    A parent pipeline aggregation which calculates the Cumulative Cardinality in a parent histogram (or date_histogram) aggregation.
  • Cumulative sum
    A parent pipeline aggregation which calculates the cumulative sum of a specified metric in a parent histogram (or date_histogram) aggregation. The specified metric must be numeric and the enclosing histogram must have min_doc_count set to 0 (default for histogram aggregations).
  • Derivative
    A parent pipeline aggregation which calculates the derivative of a specified metric in a parent histogram (or date_histogram) aggregation. The specified metric must be numeric and the enclosing histogram must have min_doc_count set to 0 (default for histogram aggregations).
  • Extended stats bucket
    A sibling pipeline aggregation which calculates a variety of stats across all bucket of a specified metric in a sibling aggregation. The specified metric must be numeric and the sibling aggregation must be a multi-bucket aggregation.
  • Inference bucket [X-Pack]
    A parent pipeline aggregation which loads a pre-trained model and performs inference on the collated result fields from the parent bucket aggregation.
  • Max bucket
    A sibling pipeline aggregation which identifies the bucket(s) with the maximum value of a specified metric in a sibling aggregation and outputs both the value and the key(s) of the bucket(s). The specified metric must be numeric and the sibling aggregation must be a multi-bucket aggregation.
  • Min bucket
    A sibling pipeline aggregation which identifies the bucket(s) with the minimum value of a specified metric in a sibling aggregation and outputs both the value and the key(s) of the bucket(s). The specified metric must be numeric and the sibling aggregation must be a multi-bucket aggregation.
  • Moving average[Deprecated]
  • Moving function
    Given an ordered series of data, the Moving Function aggregation will slide a window across the data and allow the user to specify a custom script that is executed on each window of data. For convenience, a number of common functions are predefined such as min/max, moving averages, etc.
  • Moving percentiles [X-Pack]
    Given an ordered series of percentiles, the Moving Percentile aggregation will slide a window across those percentiles and allow the user to compute the cumulative percentile.
  • Normalize[X-Pack]
    A parent pipeline aggregation which calculates the specific normalized/rescaled value for a specific bucket value. Values that cannot be normalized, will be skipped using the skip gap policy.
  • Percentiles bucket
    A sibling pipeline aggregation which calculates percentiles across all bucket of a specified metric in a sibling aggregation. The specified metric must be numeric and the sibling aggregation must be a multi-bucket aggregation.
  • Serial differencing
    Serial differencing is a technique where values in a time series are subtracted from itself at different time lags or periods. For example, the datapoint f(x) = f(xt) - f(xt-n), where n is the period being used.
  • Stats bucket
    A sibling pipeline aggregation which calculates a variety of stats across all bucket of a specified metric in a sibling aggregation. The specified metric must be numeric and the sibling aggregation must be a multi-bucket aggregation.
  • Sum bucket
    A sibling pipeline aggregation which calculates the sum across all buckets of a specified metric in a sibling aggregation. The specified metric must be numeric and the sibling aggregation must be a multi-bucket aggregation.
  • Standard
    The standard analyzer divides text into terms on word boundaries, as defined by the Unicode Text Segmentation algorithm. It removes most punctuation, lowercases terms, and supports removing stop words.
  • Simple
    The simple analyzer divides text into terms whenever it encounters a character which is not a letter. It lowercases all terms.
  • Whitespace
    The whitespace analyzer divides text into terms whenever it encounters any whitespace character. It does not lowercase terms.
  • Stop
    The stop analyzer is like the simple analyzer, but also supports removal of stop words.
  • Keyword
    The keyword analyzer is a “noop” analyzer that accepts whatever text it is given and outputs the exact same text as a single term.
  • Pattern
    The pattern analyzer uses a regular expression to split the text into terms. It supports lower-casing and stop words.
  • Languages
    Elasticsearch provides many language-specific analyzers like english or french.
  • Fingerprint
    The fingerprint analyzer is a specialist analyzer which creates a fingerprint which can be used for duplicate detection.

About

Utilities that pair with the official Go Elasticsearch package

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages