Skip to content

Embedding

Doug Blank edited this page May 27, 2023 · 7 revisions

kangas.datatypes.embedding

Embedding Objects

class Embedding(Asset)

An Embedding asset.

__init__

def __init__(embedding=None,
             name=None,
             text=None,
             color=None,
             projection="pca",
             include=True,
             file_name=None,
             metadata=None,
             source=None,
             unserialize=False,
             dimensions=PROJECTION_DIMENSIONS,
             scale=False,
             **kwargs)

Create an embedding vector.

Arguments:

  • embedding - a vector (list of numbers)

  • name - (str) a name that provides the color (if not given below) and set name to which this embedding point belongs

  • text - (str) text that will show when you hover over point in expanded view

  • color - (str) a string that represents a color for the chart, typically given as a "rrggbb" hex string where "rr" is between "00" and "ff".

  • projection - (str) the type of projection either 'pca', 'umap', or 't-sne'

  • include - (bool) whether to include this vector when determining the projection. Useful if you want to see one part of the datagrid in the project of another.

  • dimensions - (int) maximum number of dimensions

  • kwargs - (dict) optional keyword arguments for projection algorithm

  • scale - (bool) boolean indicating whether each column should be normalized

  • kwargs - (keys, values) passed to the projection constructor

  • NOTE - when using 't-sne', you cannot have any row that is excluded from the projection. That is because t-SNE does not allow arbitrary mappings.

Example:

>>> import kangas as kg
>>> dg = kg.DataGrid()
>>> for row in rows:
>>>     target = row[0]
>>>     kg.append([kg.Embedding(row[1:], name=target)])
>>> dg.save("embeddings.datagrid")

log_and_serialize

def log_and_serialize(datagrid, row_id)

Override to save row_id.

Table of Contents

Clone this wiki locally