Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the deep-learning-from-scratch to live in python files #37

Open
1 of 3 tasks
yacineMahdid opened this issue Apr 13, 2021 · 4 comments
Open
1 of 3 tasks
Assignees
Labels
enhancement New feature or request

Comments

@yacineMahdid
Copy link
Owner

yacineMahdid commented Apr 13, 2021

Currently most of the code lives in Jupyter notebook, I should move most of the code into .py script so that I can reuse the code.

  • activation functions
  • deep learning framework
  • optimization algorithm
@yacineMahdid yacineMahdid added the enhancement New feature or request label Apr 13, 2021
@yacineMahdid yacineMahdid self-assigned this Apr 13, 2021
@yacineMahdid yacineMahdid moved this from To do to In progress in Deep Learning From Scratch Improved Apr 13, 2021
@yacineMahdid
Copy link
Owner Author

yacineMahdid commented Apr 14, 2021

Will need to figure out how to structure the optimizer and take a step, the way my functions for optimization work right now might not be optimal.

After looking at an example of how pytorch works it seems that the way I structured it might work. I just need to have the gradient per weight and I'll be good to go.

In a nutshell this is what we will be doing

    for param in model.parameters():
        param -= learning_rate * param.grad

But we can wrap this around in a class like format as so:

learning_rate = 1e-3
optimizer = torch.optim.RMSprop(model.parameters(), lr=learning_rate)
[...]
    # Before the backward pass, use the optimizer object to zero all of the
    # gradients for the variables it will update (which are the learnable
    # weights of the model). This is because by default, gradients are
    # accumulated in buffers( i.e, not overwritten) whenever .backward()
    # is called. Checkout docs of torch.autograd.backward for more details.
    optimizer.zero_grad()

    # Backward pass: compute gradient of the loss with respect to model
    # parameters
    loss.backward()

    # Calling the step function on an Optimizer makes an update to its
    # parameters
    optimizer.step()

This means that the optimizer will have access to the model parameters as well as the gradients. The one thing that is weird in Pytorch is that the loss as access to the model parameters.

I'll simplify this right now since I still don't have a dynamic graph solver implemented!

@yacineMahdid
Copy link
Owner Author

What I should have is something like this:

optimizer = SGD(model.parameters(), optimizer_parameters...)
[...]
optimizer.zero_grad() # this will remove all the gradients accumulated
optimizer.backward() # since it already has access to the graph and to the gradients.
optimizer.step() # do one gradient descent step

@yacineMahdid
Copy link
Owner Author

Little correction, we shouldn't have the optimizer doing the backward pass since this will only depends on the model and not on the optimizer!

We should be doing this instead:

optimizer = SGD(model.parameters(), optimizer_parameters...)
[...]
optimizer.zero_grad() # this will remove all the gradients accumulated
model.backward() # the behavior of backward will be architecture specific
optimizer.step() # do one gradient descent step

@yacineMahdid
Copy link
Owner Author

We should do a full run with the optimizer + activation + framework otherwise I'm running a bit blindly if I try to code up all the optimizer first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
No open projects
Development

No branches or pull requests

1 participant