ConEntail

Source code for EACL 2023 Paper ConEntail: An Entailment-based Framework for Universal Zero and Few Shot Classification with Supervised Contrastive Pretraining

Supervised Pretraining Data

You can either download our preprocessed supervised pretrained data (128 examples per label) Google_Drive. You don't have to install CrossFit env if you download the data.

move the downloaded data to

mkdir raw_data
mkdir raw_data/gym

How to build your own customized data: You need to install crossfit env:

CrossFit Environment

# Create a new conda environment (optional)
conda create -n crossfit python=3.6.9
conda activate crossfit
# For building the NLP Few-shot Gym
pip install datasets==1.4.0 py7zr wget
# For reproducing the baseline methods
pip install torch==1.1.0 higher==0.2.1 scikit-learn==0.24.1 scipy==1.4.1 rouge==1.0.0
pip install git+https://github.com/huggingface/transformers.git@7b75aa9fa55bee577e2c7403301ed31103125a35

Download the datasets

conda activate crossfit
cd scripts
bash zero_para_download.sh

ConEntail Environment

Install the conda environment

conda create -n entail2 python=3.6.9
conda activate entail2
pip install -r requirements.txt
pip install -e .

This step is used to compose the individual datasets into a single dataset for supervised pretraining. You can skip this step if you download the preprocessed data. Be sure to use conda activate entail2 before running the following command.

# generate the supervised pretraining dataset
python entail2/dataloader/gym2entail_multitask.py

Run

see scripts for more examples.

Training

CUDA_VISIBLE_DEVICES=0 \
python entail2/runner/runner.py \
--learning_rate 1e-5 \
--warmup_ratio 0.06 \
--train_batch_size 32 \
--num_train_epochs 10 \
--bert_name bert \
--model_name entail2 \
--use_sampler \
--mode train;

Evaluation

For evaluation, you have to make sure you have downloaded individual datasets through crossfit or from huggingface datasets (and put the data in raw_data/gym). You don't have to download all the datasets. As long as you have a dataset of interest, you can modify the scripts below for a customized evaluation.

e.g., zero-shot evaluation, see example for complete scripts, and you can use for-loop to run multiple models on multiple test sets. Few-shot evaluation: here and here

First, you need to generate the test sets and zero-shot support sets (only label names)

    python scripts/gen_singletask_test.py \
    --data_dir raw_data/gym \
    --task_dir ${TASK}
    python scripts/gen_singletask_zeroshot_support.py \
    --data_dir raw_data/gym \
    --task_dir ${TASK} --shots 1 --times 1

Then, you'll need to run the model on each task:

    python entail2/runner/runner.py \
    --data_dir raw_data/gym \
    --task_dir ${TASK} \
    --model ${MODEL} \
    --test_times 1 \
    --test_shots 1 \
    --mode test

Other baselines: modify the ${MODEL} variable in scrips to

MODELS=(efl_no_cl entail2 crossfit unifew)

Citation

@article{zhang2023conentail,
      title={ConEntail: An Entailment-based Framework for Universal Zero and Few Shot Classification with Supervised Contrastive Pretraining}, 
      author={Zhang, Ranran Haoran and Fan, Aysa Xuemo and Zhang, Rui},
      booktitle={EACL 2023},
      year={2022},
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
cases		cases
entail2		entail2
scripts		scripts
tests		tests
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ConEntail

Supervised Pretraining Data

ConEntail Environment

Run

Citation

About

Releases

Packages

Languages

psunlpgroup/ConEntail

Folders and files

Latest commit

History

Repository files navigation

ConEntail

Supervised Pretraining Data

ConEntail Environment

Run

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages