Skip to content

Releases: Desbordante/desbordante-core

Desbordante 2.1.0

28 Jun 19:46
Compare
Choose a tag to compare

Release Notes

This minor release serves as a necessary step for isolating code of the console interface and moving it into a separate repository. Our final goal is to create a dedicated Python package called desbordante-cli, which will be implemented purely in Python. It will depend on the core desbordante package that contains the C++ code for pattern mining and validation.

As such, we plan to make minor releases of the core package in the future, followed by the console ones. These releases will contain fewer features, but will come out a lot more frequently. The idea here is to make a release as soon as each individual algorithm is ready rather than accumulating several of them as we did previously. Once a sufficient number of features have been accumulated, a major release will be published, primarily for promotion purposes. It will not provide any new functionality, but will include all the accumulated changes since the last major release.

Changes:

  • We have added support for a novel class of algorithms — the dynamic ones. The idea is to track changes in the dataset in order to update their result on-the-fly rather than processing the whole table again. As a result, they can be up to several orders of magnitude faster than classic (static) ones in some situations. Along with devising dynamic infrastructure, we have implemented the first dynamic algorithm — a dynamic functional dependency validator. A Python interface and an example are provided.
  • We have added support for discovery of differential dependencies. Differential dependency is a relatively novel type of pattern which is very handy for detecting a particular relationship between two column sets. It can be seen as an extension of functional dependency which works well on dirty data. See the article about the pattern for more information. Its implemented discovery algorithm (Split) comes with a Python interface and an example.
  • Discovery of association rules is now available via the Python and console interfaces. An example is also available.

Miscellaneous:

  • Greatly improved the metric functional dependency verification example.
  • Added approximate inclusion dependency discovery algorithms to the C++ core. Python interface, console interface, and an example are still in development.
  • Fixed Python bindings for association rules: the AR objects can be properly copied now.
  • Extended simple statistics module with ten string-related statistics; they are available via the Python interface.
  • Fixed a CLI-breaking bug related to the CFD discovery algorithm.
  • Improved column type deduction in the C++ core.

Desbordante 2.0.0

16 Apr 21:55
Compare
Choose a tag to compare

Release Notes

This major release brings a lot of improvements. Its primary focus is Desbordante’s core: we add several new primitives for pattern discovery.

Changes:

  • New feature: discovery of exact order dependencies. This primitive allows you to discover patterns related to orderings of columns, e.g. pay increasing with grade. It is available with two different axiomatizations — set-based and list-based. The latter is faster, but may miss some dependencies, while the former is more accurate, but computationally more demanding. Note that they present dependencies in different formats.
  • New feature: discovery of probabilistic functional dependencies for both existing metrics: PerTuple and PerValue. This primitive helps in discovering a special case of approximate functional dependencies that better detects multiple violations in a small set of clusters. Provided examples illustrate the differences between existing AFD formulation and those PFDs, as well as show some of the potential use cases of PFDs.
  • New feature: discovery of inclusion dependencies. This primitive can help users to recover primary key — foreign key relationships, or to find joinable columns in a table or a collection of tables. It is available as an exact algorithm (Spider) and as an approximate one (Faida), with Faida potentially producing errors but being much faster.
  • New feature: we extend the set of supported data types by adding graphs. We started with supporting graph functional dependencies (GFD), and Desbordante can now validate GFDs. GFDs allow users to define patterns in graphs, specifying conditions both on graph structure and node content. Graph dependencies can be a bit tricky, so we provide illustrated examples.
  • We’ve made discovery of conditional functional dependencies available in Python. This primitive can be considered as:
    1. An another way to define approximate functional dependencies, which, unlike other approaches, offers rich semantics (context), helping in understanding complex cases when the exact FD does not hold;
    2. An AFDs discovery algorithm which provides control over how frequent and how consistent this pattern is;
    3. A building block for many existing data repair algorithms.
  • We’ve also made validation of approximate unique column combinations available in Python. This primitive is suitable for defining keys in tables and for detecting partial duplicates over a subset of columns. As is usually the case with any validation primitive, we additionally provide discovery of exceptions and computation of improved thresholds.

For all introduced primitives, we provide descriptive examples. All primitives are supported in the console version of Desbordante, with the help file containing references to papers in which these primitives are described.

Miscellaneous:

  • We have established a github organization and gathered all repositories related to our project in one place.
  • We have extended the coverage of the option for limiting the maximum size of the left-hand side to all functional dependency discovery algorithms. This should allow users to speed up the FD discovery if they do not need dependencies with large LHSes.
  • We’ve added many new example scripts. Since the project is currently under-documented, we hope this will be helpful for our potential users. You can see them here.
  • To improve our overall documentation level, we have also published several guides — see the papers section.

Desbordante 1.1.0

19 Feb 10:32
Compare
Choose a tag to compare

Release Notes

Key enhancements of this minor release concern Python bindings. Namely, we've organized our algorithms into intuitive Python submodules based on primitives and we've provided default algorithms for each one, simplifying usage.

Detailed changes are the following:

  • Every primitive available in the library now gets its own submodule, mining and verifying are kept separately:
    • Every primitive’s submodule contains the structures relevant to it. For example, the UCC class may now be accessed as desbordante.ucc.UCC.
    • All algorithms for mining/verifying a primitive are located in the respective primitive’s algorithms submodule. For example, the UccVerifier algorithm may now be accessed as desbordante.ucc_verification.algorithms.UccVerifier. The same holds true for simple statistics. The algorithm to extract them may be accessed as desbordante.statistics.algorithms.DataStats.
    • Every algorithms submodule has a default algorithm for ease of use (example: desbordante.fd.algorithms.Default)
  • Restored exceptions for metric dependency verification
  • Various enhancements to the FD class:
    • FD str representation now uses column names instead of indices
    • Added FD methods to facilitate easy conversion to Python structures
    • Added hashing and equality methods (i.e. FDs can now be inserted into sets)
  • The “table” option no longer gets special treatment:
    • Removed .load_data(path, separator, has_header, **kwargs) overload
    • The option can now be set like a normal one (ex: algo.load_data(table=(path, separator, has_header), …) or algo.load_data(table=dataframe, …))
  • The names and descriptions of options available for an algorithm are now listed in its docstring
  • Fixed bug with error option for afd_verification

Desbordante 1.0.0

11 Dec 07:54
Compare
Choose a tag to compare

Release Notes

Key enhancements:

  • Python Bindings: We've added Python bindings for many patterns, allowing you to mine pattern instances and other useful information such as exceptions. In order to install the bindings, simply issue pip install desbordante.
  • Examples and Demos: To facilitate your experience with Desbordante, we've prepared a variety of code samples. Explore our example scripts in the 'examples' folder. For an interactive experience, visit our demos at https://desbordante.streamlit.app/.
  • Enhanced Console Support: The console interface has been rewritten to Python using Python bindings. We also added help descriptions for supported patterns.
  • New Pattern Support: This version introduces support for metric functional dependencies, algebraic constraints, and unique column combinations. We've also expanded the range of simple statistics Desbordante can discover.