Skip to content

Sublinear memory optimization for deep learning, reduce GPU memory cost to train deeper nets

Notifications You must be signed in to change notification settings

Bihaqo/tf-memonger

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

A TensorFlow implementation of the idea from the paper Training Deep Nets with Sublinear Memory Cost

The code is messy and doesn't really work, but can be a starting point for someone who want to properly reimplement the idea in TensorFlow.

gradients function from TensorFlow gets as input a variable V and builds a computational graph that computes the gradient of V w.r.t. the parameters. The backward pass computational graph uses the values of all nodes of the forward computational graph. try.ipynb contains my reimplementation of gradients function. It also builds a graph to compute the gradient, but it only uses the values from the nodes listed in store_activations_set on the backward pass, all other necessary values are recomputed on the fly. Right now store_activations_set is hardcoded, but MemoryOptimizer.py is a still not working attempt to smartly choose what to put into store_activations_set based on the analysis of the forward computation graph.

try.ipynb builds a simple MNIST network and makes a few iterations of optimization. Use TensorBoard to explore the constructed graph.

About

Sublinear memory optimization for deep learning, reduce GPU memory cost to train deeper nets

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages