Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FileNotFoundError: [Errno 2] No such file or directory: 'data/src-train.embed' #1

Open
li3cmz opened this issue Dec 3, 2019 · 4 comments

Comments

@li3cmz
Copy link

li3cmz commented Dec 3, 2019

Looking forward to your reply.

@gmftbyGMFTBY
Copy link
Owner

gmftbyGMFTBY commented Dec 3, 2019

Sorry, I didn't show the tutorial for this repo.
Please run the utils.py to generate the model-ready data

python utils.py --mode dataset --dataset xxxx

Dataset's name should be the same as the folder's name under the folder data

I push the new commit and you can check it.
After processing the dataset, you can run the code by running

./run.sh $CUDA_DEVICE $dataset_name

It should be noted that the BERT-RUBER requires bert-as-service to generate the BERT contextual word embeddings.

I will update the README in a few days, and create a new repo which contains our proposed learning-based method (Better than RUBER and BERT-RUBER).

Thank you for your attention.

@gmftbyGMFTBY
Copy link
Owner

gmftbyGMFTBY commented Dec 3, 2019

As for the datasets, you can check the following benchmarks:

You need to process the dataset into the single-turn dialogue format, which contains 6 pure text file:

  • src-train.txt: context or query of the conversation, one line one sentence
  • tgt-train.txt: the ground-truth response for the query, one line one sentence
  • src-test.txt
  • tgt-test.txt
  • src-dev.txt
  • tgt-dev.txt

After running python utils.py --mode dataset --dataset xxxx, you can get the *.embed file which will be used for training the model.

@li3cmz
Copy link
Author

li3cmz commented Dec 3, 2019

Wow! Thank you for replying promptly! I will go to have a try!
And I will keep attention for your new repo. Thanks~

@ImmortalCi
Copy link

@gmftbyGMFTBY @li3cmz
Hello, I want to reproduce the result and deploy this model to accomplish unreferenced text generation evaluation. This is the topic of my thesis, could you provide processed datasets? Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants