Skip to content

dubeyakshat07/Deep-Learning-powered-Auto-CodeLogic-generator-for-Julia-R-and-Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deep-Learning-powered-Auto-CodeLogic-generator-for-Julia

Introduction

  1. This is an automatic code generator which generates and predicts next sequence of codes for Julia Language. Python is used to build the whole working model.
  2. GPT-2 model was trained on scripts written in Julia Language.

image

The complete worklfow including the data pre-processing.

Steps to get started with fine-tuning the GPT-2 model on Julia scripts

git clone https://github.com/dubeyakshat07/Deep-Learning-powered-Auto-CodeLogic-generator-for-Julia-R-and-Python.git
pip install -r requirements.txt

Now, clone the following two repository and place it under the folder /dataset/Python

git clone https://github.com/dubeyakshat07/Script-files-for-Auto-Code-Generator.git
git clone https://github.com/RohitRathore1/Julia_Workshop/tree/main/Julia%20Workshop

After placing the above two repositories in the instructed directories, you will need to execute the convert.py script which will perform all the pre-processing required training the GPT-2 model

python convert.py --segment_len 256 --stride 10 --dev_size 0.1

Finally, it's time to start the fine-tuning of the GPT-2 model. Hence, execute the following line of code. This will fine-tune the distilgpt2 variant of GPT-2. The other variants available are {"distilgpt2": "distilgpt2", "gpt2": "gpt2", "gpt2_medium": "gpt2-medium", "gpt2_large": "gpt2-large"}

python train.py --model_select distilgpt2

Predicting the next sequence of codes

The fine-tuned model is saved in /model/distilgpt2_fine_tuned_coder/0_GPTSingleHead

To start with the predictions execute the following block of codes

from transformers import AutoTokenizer,AutoModelWithLMHead
tokenizer = AutoTokenizer.from_pretrained("/model/distilgpt2_fine_tuned_coder/0_GPTSingleHead")
model = AutoModelWithLMHead.from_pretrained("/model/distilgpt2_fine_tuned_coder/0_GPTSingleHead")

use_cuda=True
context="def factorial"
lang="python" # The framework is build completely upon Python

if use_cuda:
    model.to("cuda")

input_ids = tokenizer.encode("<python> " + context,
                                     return_tensors='pt') if lang == "python" else tokenizer.encode(
            "<java> " + context, return_tensors='pt')
outputs = model.generate(input_ids=input_ids.to("cuda") if use_cuda else input_ids,
                         max_length=30,
                         temperature=0.7,
                         num_return_sequences=1)

decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(decoded)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages