🌟 Instructions for training.
You can apply InstructCV to new images by following the steps below.
Step 1. Download the pre-trained weights we provided. Or you can download it manually from Google Drive | BaiduNet Disk
bash scripts/download_pretain_weights.sh
Step 2. Run the following command:
python edit_cli.py --input <path_to_the_dictionary_you_created> --output <path_to_save> --edit <language_instructions>
# a specific example:
python edit_cli.py --input imgs/ --output outputs/ --edit "segment the cat."
We trained our model using the checkpoint provided by Stable Diffusion V1.5
# Stable Diffusion V1.5
bash scripts/download_checkpoints.sh
# The checkpoint we provided (finetune with our training data for 50 epochs)
bash scripts/download_pretrained_weights.sh
python main.py --name <exp_name> --base configs/train.yaml --train --gpus 0,1,2,3,4,5,6,7
sbatch scripts/slurm_train
Specialized model - Classification
Resnet-50 (Pretained on ImageNet)
python baselines/classification/cls.py --model supervised --dataset pets --steps 100
python baselines/classification/cls.py --model supervised --dataset caltech --steps 100
ViT-16 (Pretained on ImageNet21k)
python baselines/classification/cls.py --model ViT-16 --dataset pets --steps 300
Specialized model - Semantic Segmentation
SegFormer
download the pretrained weights (SegFormer-B5) from here.
python tools/test.py local_configs/segformer/B1/segformer.b1.512x512.ade.160k.py /path/to/checkpoint_file
Mask2Former
download the pretrained weights (Swin-L IN2k with 160k iterations) from here
Specialized model - Monocular Depth Estimation
BTS
We follow instructions here to reproduce the results.
Binsformer
We follow instructions here to reproduce the results.
Specialized model - Object Detection
Faster RCNN We run Faster R-CNN models in Detectron2
Mask RCNN We run Mask R-CNN models (Backbone: R-101-FPN, Lr schd: 2x) in mmdetection
DETR We follow instructions here to reproduce the results.
Vision generalists
Generalist models
Unified-IO we use xl_1000k.bin as the pre-trained model. It takes ~27s to inference single image. Pixel2Seq To repoduce their results using repo they provided, you need to change all the dict[str, tf.Tensor] to dict, as it will exist error like "TypeError: 'type' is not subscriptable" caused by dependencies version differences.
Change the data_root in dataset_configs.py