Skip to content

Commit

Permalink
change folder name to cater windows (#13)
Browse files Browse the repository at this point in the history
Co-authored-by: Emma Ai <[email protected]>
  • Loading branch information
emmaai and Emma Ai authored May 27, 2021
1 parent 453860d commit 867f8af
Show file tree
Hide file tree
Showing 5 changed files with 287 additions and 0 deletions.
85 changes: 85 additions & 0 deletions auxfiles/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
`job_normal.sh` is used to submit individual job to PBS queue

`job_sub_in.sh` is used to submit jobs in bulk to PBS queue

`wetland.sh` is used to setup the environment to run wit tooling

`job_normal.sh`
======

usage:
---

`qsub -l ncpus=$num_cpus,mem=${mem}GB -v threads=$((num_cpus * 4)),feature=$feature,datasets=$file,aggregate=$aggregate,pdyaml=$PDYAML,shapefile=$shapefile job_normal.sh`

`$num_cpus`, `$mem` refer the manual of `qsub`

`threads`, `feature`, `datasets` and `aggregate` are the parameters required by `job_normal`.

- `threads` is the number of threads used in `OpenMP`. We oversubscribe it by 4 times of the CPUS for a). To employ the hyper-threading technique; b). increase the efficiency of CPU usage since the job is I/O bound.

- `$feature` is the parameter in `--feature-list $feature` in `wetland_brutal.py wit-cal`.

- `$datasets` is the parameter in `--datasets $datasets` in `wetland_brutal.py wit-cal`

- `$aggregate` is the parameter in `--aggregate $aggregate` in `wetland_brutal.py wit-cal`

- `$PDYAML` is the parameter in `--product-yaml $PDYAML` in `wetland_brutal.py wit-cal`

- `$shapefile` is the parameter in `wetland_brutal.py wit-cal`

Example
------

`qsub -N anae_1005 -l ncpus=48,mem=192GB -v threads=192,feature=anae//new/contain_1005.txt,datasets=anae//query/1005.pkl,aggregate=0,pdyaml=/g/data/u46/users/ea6141/wlinsight/fc_pd.yaml,shapefile=/g/data/r78/DEA_Wetlands/shapefiles/MDB_ANAE_Aug2017_modified_2019_SB_3577.shp job_normal.sh`

`job_sub_in.sh`
=============

usage:
-----

`./job_sub_in.sh $input $shapefile $aggregate`

`$input` is the folder where the feature list and query results are stored
`$shapefile` is the shape file with all the polygons for the job
`$aggregate` is the number of days in aggregation if it happens

Example:

`./job_sub_in.sh sadew/ shapefiles/waterfowlandwetlands_3577.shp 15`

Note: Add your work folder as prefix if needed.

In the file:
-----------

`PDYAML` is the virtual product recipe, which should be modified to the correct path, e.g, `$youworkingfolder/wit_tooling/aux/fc_pd.yaml`

`num_thread` is calculated regards to how many polygons would be parallelized, yet the minimum should be `9`, DONOT change it.

`mem` is calculated as how a job would be charged on NCI, the multiplier `UMEM=4` CAN be dialed up until total memory hits the limit of `192GB`. As in the script, when the aggregation over time slices is required, we set `UMEM=8`.


`wetland.sh`
============

usage:
-----
`source wetland.sh`

Note: you need to run `wetland.sh` to set up the correct modules to run `wit_tooling`

In the file:
-----

`module load dea/20200316` The file is currently loading the 16-03-2020 version of the datacube. This may change if the datacube gets updated.

`module load openmpi/4.0.1` Satisfying requirements for openmpi

`PYTHONUSERBASE` is where you installed customised packages. `wit_tooling` will be installed in this folder if you installed it as --user
`PYTHONPATH` adds the datacube-stats refactor branch to the front of your path so that we can use it.

Example:
`export PYTHONUSERBASE=/g/data/r78/rjd547/python_setup/`

96 changes: 96 additions & 0 deletions auxfiles/fc_pd.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
juxtapose:
- collate:
- product: ls8_fc_albers
measurements: [BS, PV, NPV]
source_filter:
product: ls8_level1_scene
gqa_iterative_mean_xy: [0, 1]
dataset_predicate: wit_tooling.ls8_on
- product: ls7_fc_albers
measurements: [BS, PV, NPV]
source_filter:
product: ls7_level1_scene
gqa_iterative_mean_xy: [0, 1]
dataset_predicate: wit_tooling.ls7_on
- product: ls5_fc_albers
measurements: [BS, PV, NPV]
source_filter:
product: ls5_level1_scene
gqa_iterative_mean_xy: [0, 1]
dataset_predicate: wit_tooling.ls5_on_1ym
- collate:
- transform: make_mask
input:
product: ls8_pq_albers
fuse_func: datacube.helpers.ga_pq_fuser
flags:
contiguous: True
cloud_acca: no_cloud
cloud_fmask: no_cloud
cloud_shadow_acca: no_cloud_shadow
cloud_shadow_fmask: no_cloud_shadow
blue_saturated: False
green_saturated: False
red_saturated: False
nir_saturated: False
swir1_saturated: False
swir2_saturated: False
mask_measurement_name: pixelquality
- transform: make_mask
input:
product: ls7_pq_albers
fuse_func: datacube.helpers.ga_pq_fuser
flags:
contiguous: True
cloud_acca: no_cloud
cloud_fmask: no_cloud
cloud_shadow_acca: no_cloud_shadow
cloud_shadow_fmask: no_cloud_shadow
blue_saturated: False
green_saturated: False
red_saturated: False
nir_saturated: False
swir1_saturated: False
swir2_saturated: False
mask_measurement_name: pixelquality
- transform: make_mask
input:
product: ls5_pq_albers
fuse_func: datacube.helpers.ga_pq_fuser
flags:
contiguous: True
cloud_acca: no_cloud
cloud_fmask: no_cloud
cloud_shadow_acca: no_cloud_shadow
cloud_shadow_fmask: no_cloud_shadow
blue_saturated: False
green_saturated: False
red_saturated: False
nir_saturated: False
swir1_saturated: False
swir2_saturated: False
mask_measurement_name: pixelquality
- transform: wit_tooling.external_stats.TCIndex
input:
collate:
- product: ls8_nbart_albers
measurements: [blue, green, red, nir, swir1, swir2]
source_filter:
product: ls8_level1_scene
gqa_iterative_mean_xy: [0, 1]
dataset_predicate: wit_tooling.ls8_on
- product: ls7_nbart_albers
measurements: [blue, green, red, nir, swir1, swir2]
source_filter:
product: ls7_level1_scene
gqa_iterative_mean_xy: [0, 1]
dataset_predicate: wit_tooling.ls7_on
- product: ls5_nbart_albers
measurements: [blue, green, red, nir, swir1, swir2]
source_filter:
product: ls5_level1_scene
gqa_iterative_mean_xy: [0, 1]
dataset_predicate: wit_tooling.ls5_on_1ym
- product: wofs_albers
measurements: [water]
fuse_func: digitalearthau.utils.wofs_fuser
14 changes: 14 additions & 0 deletions auxfiles/job_normal.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
#!/bin/bash
#PBS -P u46
#PBS -q normal
#PBS -l storage=gdata/rs0+gdata/fk4+gdata/v10+gdata/r78+gdata/u46+scratch/r78
#PBS -l walltime=2:00:00
#PBS -l jobfs=1GB
#PBS -l wd

source $HOME/setup-datacube-up2date.sh

echo $threads $feature $datasets $aggregate $pdyaml $shapefile
export OMP_NUM_THREADS=$threads
export NUMEXPR_MAX_THREADS=$threads
mpirun -np 9 -bind-to none python3 -m mpi4py.futures wetland_brutal.py wit-cal --feature-list $feature --datasets $datasets --aggregate $aggregate --product-yaml $pdyaml $shapefile
76 changes: 76 additions & 0 deletions auxfiles/job_sub_in.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
#!/bin/bash
NCPUS=48
MEM=$((48*4))
#how much memory charged over 1 cpu
UMEM=4

# change this accordingly
PDYAML=fc_pd.yaml

# $1: folder with polygon list and pickled datasets
# $2: shape file
# $3: days interval if aggregate

echo start to process $1 $2 $3

if [ ! -s $2 ]; then
echo shape file $2 not exist
exit
fi

if [ ! -d $1/query ]; then
echo query results should be in $1/query
exit
fi

if [ ! -d $1/new ]; then
echo feature lists should be in $1/new
exit
fi

PDYAML=$(readlink -f $PDYAML)
shapefile=$(readlink -f $2)
AGGREGATE=$3

for file in $1/query/*.pkl; do
tile_id=$(echo $file | sed 's/.*\/\([_0-9]\+\).*/\1/g')
feature=$1/new/contain_$tile_id.txt
aggregate=0
if [ ! -s $feature ]; then
feature=$1/new/intersect_$tile_id.txt
if [ ! -s $feature ]; then
echo feature list for $tile_id not exist
continue
else
# note: some big polygons might need dial up a bit
# we dial up to double for aggregation over time slices
aggregate=$AGGREGATE
UMEM=8
fi
else
aggregate=0
UMEM=4
fi
num_thread=$(cat $feature | wc -l)
if [ $num_thread -lt 9 ]; then
num_thread=9
else
if [ $num_thread -gt $NCPUS ]; then
num_thread=$NCPUS
fi
fi

mem=$((num_thread * UMEM))
if [ $mem -gt $MEM ]; then
mem=$MEM
fi
echo qsub -N ${1//\/}_$tile_id -l ncpus=$num_thread,mem=${mem}GB -v threads=$((num_thread * 4)),feature=$feature,datasets=$file,aggregate=$aggregate,pdyaml=$PDYAML,shapefile=$shapefile job_normal.sh
jobid=$(qselect -N ${1//\/}_$tile_id)
if [ "$jobid" == "" ]; then
qsub -N ${1//\/}_$tile_id -l ncpus=$num_thread,mem=${mem}GB -v threads=$((num_thread * 4)),feature=$feature,datasets=$file,aggregate=$aggregate,pdyaml=$PDYAML,shapefile=$shapefile job_normal.sh
else
for i in $(seq 1 2); do
jobid=$(qsub -W depend=afterany:$jobid -N ${1//\/}_$tile_id -l ncpus=$num_thread,mem=${mem}GB -v threads=$((num_thread * 4)),feature=$feature,datasets=$file,aggregate=$aggregate,pdyaml=$PDYAML,shapefile=$shapefile job_normal.sh)
done
fi
done
16 changes: 16 additions & 0 deletions auxfiles/wetland.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
#!usr/bin/env bash

#code written by Emma A and Bex D on 25.03.2020

module use /g/data/v10/public/modules/modulefiles/

module load dea/20200316

module load openmpi/4.0.1

export PYTHONUSERBASE=/g/data/r78/rjd547/python_setup/
export PYTHONPATH=/g/data1a/r78/rjd547/jupyter_notebooks/datacube-stats:$PYTHONPATH


#to run this code, type source wetland.sh

0 comments on commit 867f8af

Please sign in to comment.