Skip to content

Repository for Vectorize Fake-News/Claims through various models, e.g. Llama, Bert, tfidf and count vectorizers

License

Notifications You must be signed in to change notification settings

skmalviya/Fake_Vectorizer

Repository files navigation

Fake_Vectorizer:

Repository for Vectorize Fake-News/Claims through various models, e.g. Llama, Bert, tfidf and count vectorizers

Env requirements:

huggingfnltk==3.8.1
nltk==3.8.1
sentence-transformers==2.2.2
numpy==1.25.2
tokenizers==0.13.3
torch==2.0.1
transformers==4.31.0

Download data:

mkdir -p data/FEVER
wget https://fever.ai/download/fever/train.jsonl -O data/FEVER/train.jsonl
wget https://fever.ai/download/fever/shared_task_dev.jsonl -O data/FEVER/dev.jsonl

Run Vectorizer: It saves the embeddings in pickle file inside the Out_embeddings directory:

python Text_vectorizer_Transformer.py --data_path data/FEVER/ --model bert-base-uncased \
--output_path Out_embeddings/

Find Distance: It finds the most similar claim in training data for a given claim in dev data:

python Fake_distance.py --model bert-base-uncased --emb_path Out_embeddings/

FineTuning: It finetunes a transformer model, e.g. bert-base-uncased, roberta_base, on the training data. The finetuned model will later be used to generate the embedding and distances:

  1. Following link, masked-language modeling (MLM) using the HuggingFace Trainer function:
python FineTuning_MaskedLM.py --data_path data/FEVER/ --model bert-base-uncased --batch_size 64
  1. Following link, masked-language modeling (MLM) using the HuggingFace Trainer and DataCollatorForLanguageModeling functions:
python Finetuning_MaskedLM_data_collator.py --data_path data/FEVER/ --model bert-base-uncased \
 --batch_size 64
  1. Following link, masked-language modeling (MLM) using Accelerator library:
python Finetuning_MaskedLM_accelerator.py --data_path data/FEVER/ --model bert-base-uncased \
 --batch_size 64

About

Repository for Vectorize Fake-News/Claims through various models, e.g. Llama, Bert, tfidf and count vectorizers

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages