diff --git a/docs/gotchas.rst b/docs/gotchas.rst index d31522d5..19efe6e8 100644 --- a/docs/gotchas.rst +++ b/docs/gotchas.rst @@ -12,30 +12,16 @@ problem is not addressed here. Bad interaction with third-party packages ----------------------------------------- -Parallel processing with generic Python objects is a difficult task, and while -ML-Ensemble is routinely tested to function seamlessly with Scikit-learn, other machine -learning libraries can cause bad behaviour during parallel estimations. This -is unfortunately a fundamental problem rooted in how `Python runs processes in parallel`_, -and in particular that Python is not thread-safe. ML-Ensemble is by configured -to avoid such issues to the greatest extent possible, but issues can occur. +ML-Ensemble is designed to work with any estimator that implements a minimal API, and is specifically unit tested to work with Scikit-learn. When using estimators from other libraries, it can happen that the estimation stalls and fails to complete. A clear sign of this is if there is no python process with high CPU usage. -In particular, ensemble can run either on multiprocessing or multithreading. -For standard Scikit-learn use cases, the GIL_ can be released and -multithreading used. This will speed up estimation and consume less memory. -However, Python is not inherently thread-safe, so this strategy is not stable. -For this reason, the safest choice to avoid corrupting the estimation process -is to use multiprocessing instead. This requires creating sub-process to run -each job, and so increases additional overhead both in terms of job management -and sharing memory. As of this writing, the default setting in ML-Ensemble is -'multiprocessing', but you can change this variable globally: see :ref:`configs`. +Due to how `Python runs processes in parallel`_, child workers can receive a corrupted thread state that causes the worker to try to acquire more threads than are available, resulting in a deadlock. If this happens, raise an issue at the Github repository. +There are a few things to try that might alleviate the problem: -In Python 3.4+, ML-Ensemble defaults to ``'forkserver'`` on unix systems -and ``'spawn'`` on Windows for generating sub-processes. These require more -overhead than the default ``'fork'`` method, but avoids corrupting the thread -state and as such is much more stable against third-party conflict. These -conflicts are caused by each worker thinking they have more threads available -than they actually do, leading to deadlocks and race conditions. For more -information on this issue see the `Scikit-learn FAQ`_. + #. ensure that all estimators in the ensemble or evaluator has ``n_jobs`` or ``nthread`` equal to ``1``, + #. change the ``backend`` parameter to either ``threading`` or ``multiprocessing`` depending on what the current setting is, + #. try using ``multiprocessing`` together with a fork method (see :ref:`configs`). + +For more information on this issue see the `Scikit-learn FAQ`_. Array copying during fitting ---------------------------- diff --git a/docs/index.rst b/docs/index.rst index 2bb1516a..ef73deb1 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -10,11 +10,7 @@ estimator. By leveraging API elements from deep learning libraries like Keras_ for building ensembles, it is straightforward to build deep ensembles with complex interactions. -ML-Ensemble is open for contributions at all levels. There are -some low hanging fruit to build introductory example, use cases and -general benchmarks. If you would like to get involved, reach out to the -project's Github_ repository. We are currently in beta testing, so please do -report any bugs or issues by creating an issue_. If you are interested in +ML-Ensemble is open for contributions at all levels.If you would like to get involved, reach out to the project's Github_ repository. We are currently in beta testing, so please report any bugs or issues by creating an issue_. If you are interested in contributing to development, see :ref:`dev` for a quick introduction to ensemble implementation, or check out the issue tracker. diff --git a/docs/updates.rst b/docs/updates.rst index 277f3ac6..05e73a04 100644 --- a/docs/updates.rst +++ b/docs/updates.rst @@ -27,7 +27,13 @@ Change log * 07/2017 Release_ of version 0.1.5.1 and 0.1.5.2 - Bug fixes - - ```clear_cache`` function to check for residual caches. Safeguard against old caches not being killed. + - ``clear_cache`` function to check for residual caches. Safeguard against old caches not being killed. + + * 08/2017 Release_ of version 0.1.6 + - Propagate sparse input features + - On the fly prediction array generation + - Threading as default backend, ``fork`` as default fork method + - Bug fixes .. _Release: https://github.com/flennerhag/mlens/releases .. _Feature propagation: diff --git a/mlens/__init__.py b/mlens/__init__.py index e02c1684..3f93b987 100644 --- a/mlens/__init__.py +++ b/mlens/__init__.py @@ -11,7 +11,7 @@ import mlens.config from mlens.config import clear_cache -__version__ = "0.1.5.dev0" +__version__ = "0.1.6" __all__ = ['base', 'utils',