Skip to content

Commit

Permalink
update practices
Browse files Browse the repository at this point in the history
  • Loading branch information
Sajtospoga01 committed Dec 15, 2023
1 parent 3153be2 commit 8015951
Show file tree
Hide file tree
Showing 6 changed files with 213 additions and 17 deletions.
133 changes: 133 additions & 0 deletions docs/source/conventions_and_guidelines/conventions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
## Conventions

### Documentation
Most of the codebase runs on sphinx-autodoc, which generates documentation from the docstrings in the codebase. The main reason why we use this is so we have an easy to, also most of the time up to date, documentation of all the projects in one place.
This proviced:
1. A centralised place for all the documentation
2. A way to easily understand why certain things are in place
3. So new people can get up to speed with the project and the team

Template for the documentation generator action can be found below:
```yaml
# Simple workflow for deploying static content to GitHub Pages
name: Deploy Documentation on Pages

on:
# Runs on pushes targeting the default branch
push:
branches: ["main"]

# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:

# Sets permissions of the GITHUB_TOKEN to allow deployment to GitHub Pages
permissions:
contents: read
pages: write
id-token: write

# Allow only one concurrent deployment, skipping runs queued between the run in-progress and latest queued.
# However, do NOT cancel in-progress runs as we want to allow these production deployments to complete.
concurrency:
group: "pages"
cancel-in-progress: false

jobs:
# Single deploy job since we're just deploying
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python 3.10
uses: actions/setup-python@v3
with:
python-version: "3.10"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install .[dev]
pip install -U sphinx
pip install furo
- name: Build documentation
run: |
cd docs
sphinx-apidoc -e -M --force -o . ../utilities/
make html
- name: Upload build data
uses: actions/upload-artifact@v3
with:
name: documentation
path: ./docs/build/

deploy:
needs: build
environment:
name: documentation
url: ${{ steps.deployment.outputs.page_url }}
runs-on: ubuntu-latest
steps:

- name: Checkout
uses: actions/checkout@v3
- name: Setup Pages
uses: actions/configure-pages@v3
- name: Download built directory
uses: actions/download-artifact@v3
with:
name: documentation
- name: Upload artifact
uses: actions/upload-pages-artifact@v1
with:
# Upload entire repository
path: '.'
- name: Deploy to GitHub Pages
id: deployment
uses: actions/deploy-pages@v1
with:
folder: build

```

The action above can be updated to use other triggers, or to build from a project file.

>**Note:** The action above is a template, available as a tempalte repository (which is strongly encouraged to be used), however it should be updated to fit the project it is used for.
### Typing

Python typing is used to provide type hints so developers better understand what the code is doing, and to provide better static analysis of the codebase.

Typing is added to the codebase using the following syntax:
```python
def function_name(arg1: type, arg2: type) -> return_type:
...
```

Or in practice
```python
def add(a: int, b: int) -> int:
return a + b
```

This can be use all around, such that it works for objects, as well:
```python
class Person:
def __init__(self, name: str, age: int):
self.name = name
self.age = age

def get_person_age(person: Person) -> int:
return person.age

```

It is recommended to set your IDEs type checker to basic to help you identify common problems.

### Linting

Linting is used to provide a consistent code style across the codebase, and to help identify common problems.

The template repository provides the necessary configuration for linting, and it is recommended to use it.



8 changes: 8 additions & 0 deletions docs/source/conventions_and_guidelines/conventions_tree.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Conventions and guidelines
==========================

.. mdinclude:: conventions.md




10 changes: 4 additions & 6 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,6 @@ Welcome to teamdocs's documentation!
====================================


On this page you find the documentation for the GU Orbit software team.
The documentation contains information about:
- The Team
- The Goal
- The Projects
- The Development procedures

.. mdinclude:: main.md

Expand All @@ -21,6 +15,10 @@ The documentation contains information about:
:caption: Table of contents:

projects/project_tree.rst
conventions_and_guidelines/conventions_tree.rst
tips_and_recommendations/tips_tree.rst




Indices and tables
Expand Down
33 changes: 22 additions & 11 deletions docs/source/main.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,26 @@
# Hello world

a lot of other stuf here ...
```python
print("Hello world")
```
On this page you find the documentation for the GU Orbit software team.
The documentation contains information about:
- The Team
- The Goal
- The Projects
- The Development procedures

<details>
<summary>Click to expand</summary>
## The Team

The GU Orbit software team is a team of students at the University of Glasgow, developing models and onboard processing software, and pipelines for the ASTREOUS 1 CubeSat project of GU Orbit.

## The Goal

Goal is to achieve on efficient onboard processing capabilities for the ASTREOUS 1 CubeSat project of GU Orbit, providing better overall selection of data to be downlinked.
Using deep learning we aim to identify common features and patterns, making data selection more effective.


Before you start working on the team, familiarise yourself with the documentation of the team you are in the following sections:
- How to contribute
- How to communicate
- How to distribute work
- How to do code reviews
- Conventions to follow

```python
print("Hello world")
```

</details>
6 changes: 6 additions & 0 deletions docs/source/tips_and_recommendations/tips_tree.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
tips and recommendations
========================

.. mdinclude:: training.md

.. mdinclude:: testing.md
40 changes: 40 additions & 0 deletions docs/source/tips_and_recommendations/training.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
## Training and Testing Deep models

### Identifying common problems

In most cases it is recommended to use some sort of visualizer to monitor and log the models performance during training.
There are many visualizers but my recommendation is to either use:
- Tensorboard, which comes built in with tensorflow, can be only accessed locally
- Weights and Biases, which is a third party tool that can be used with any framework, accessible from the browser

**Overfitting:** Very common problem, where the model starts fitting the training data too well, and lacks generalization. Can be identified by looking at the training and validation loss, and seeing the validation loss increase while the training loss decreases.

**Underfitting:** Another common problem, where the model is not complex enough to fit the data, or just the data has too much noise such the model can't generalise over it. Can be identified by having large errors on both the training and validation data.

**Vanishing gradient:** A problem that occurs when the gradient of the loss function is too small, and the model can't learn. Can be identified by looking at the gradients of the model, and seeing them get smaller and smaller.

**Exploding gradient:** A problem that occurs when the gradient of the loss function is too large, and the model can't learn. Can be identified by looking at the gradients of the model, and seeing them get larger and larger.

### Selecting hyperparameters
Hyperparameters are the parameters of the model that are passed by the user such as batch size, learning rate, etc...
General rule of thumb:
- Set learning rate to 0.001
- Set batch size to the maximum that fits in memory
- Set number of epochs to 5-6 for fine tuning, 30-100 for training from scratch (depending on the size of the model and dataset)
- Set optimizer to Adam
- Set loss function to categorical crossentropy for classification, mean squared error for regression

It is also recommended to calculate accuracy during training and validation, however in case of generative image tasks this becomes expensive.

### Data augmentation
Data augmentation is a technique used to increase the size of the dataset by applying transformations to the data, such as rotation, flipping, etc...
Many dataloader libraries have this built in, such as tensorflow, pytorch, and our utilities library.

### Transfer learning
Transfer learning is a technique used to use a model that was trained on a different task, and use it for a different task. This is done by removing the last layer of the model, and adding a new one, and training only the new layer. This is useful when the dataset is small, and the model is large, as it allows the model to learn from a larger dataset, and then fine tune it to the new task.

### Early stopping
Early stopping is a technique used to stop training when the model stops learning. This is done by monitoring the validation loss, and stopping when it stops decreasing. This is useful to prevent overfitting, and to save time.

### Model checkpointing
Model checkpointing is a technique used to save the model during training, so that it can be loaded later. This is useful to prevent losing progress.

0 comments on commit 8015951

Please sign in to comment.