Skip to content

zanuura/Bert2Bert_Summarization_Liputan6

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Bert2Bert Liputan6

This is Bert2Bert EncoderDecoderModel train on Liputan6 Dataset Canonical, this model was base on this Documentation and this notebook

How to Use?

Install the package

Colab:

!pip install torch
!pip install transformers[torch]
!pip install evaluate
!pip install datasets

Cmd:

pip install torch
pip install transformers[torch]
pip install evaluate
pip install datasets

Install the Model

git clone https://github.com/zanuura/Bert2Bert_Summarization_Liputan6

Import Package

from transformers import EncoderDecoderModel, AutoTokenizer, pipeline
import datasets

Load Model and Tokenizer

model = EncoderDecoderModel.from_pretrained("Bert2Bert_Summarization_Liputan6/model/") # insert the path
tokenizer = AutoTokenizer.from_pretrained("Bert2Bert_Summarization_Liputan6/model/") # you also can change the tokenizer from bert-base-uncased

Test the Model

## this is test with Liputan6 Test Dataset

## Load rouge for validation

rouge = datasets.load_metric("rouge")

def generate_summary(batch):

  inputs = tokenizer(batch['clean_article'], padding="max_length", truncation=True, max_length=512, return_tensors="pt")
  input_ids = inputs.input_ids.to("cuda")
  attention_mask = inputs.attention_mask.to("cuda")

  outputs = model.generate(input_ids, attention_mask=attention_mask)
  outputs_str = tokenizer.batch_decode(outputs, skip_special_tokens=True)

  batch['pred'] = outputs_str

  return batch

results = test_data.map(generate_summary, batched=True, batch_size=batch_size, remove_columns=["clean_article"])

pred_str = results['pred']
label_str = results['clean_summary']

rouge_output = rouge.compute(predictions=pred_str, references=label_str, rouge_types=["rouge2"])["rouge2"].mid

print(rouge_output)

References:

Hope you enjoyit 😎.

Releases

No releases published

Packages

No packages published