RSSN for MHCH+SSA

Core implementation of EMNLP-2021 paper: A Role-Selected Sharing Network for Joint Machine-Human Chatting Handoff and Service Satisfaction Analysis

Requirements

Python 3.6 or higher
tensorflow==1.14.0
Keras==2.2.5
tqdm==4.35.0
jieba==0.39

Environment

Tesla V100 16GB GPU
CUDA 10.1

Data Format

Each json file is a data list that includes dialogue samples. The format of a dialogue sample is shown as follows:

{
   "session": [
     {
       "content": "啥时候能收到",
       "label": 0,
       "round": 1,
       "role": "c2b",
       "senti": "3"
     },
     {
       "content": "一般2到3天哦",
       "label": 0,
       "round": 2,
       "role": "b2c",
       "senti": "3"
     },
     {
       "content": "皮肤黑，不知道选哪个",
       "label": 0,
       "round": 3,
       "role": "c2b",
       "senti": "3"
     },
     {
       "content": "不知道选哪个？适合自己的才是最好的，推荐直接下单体验，7天内可无理由退货。若问题还没解决，可以请“人工”",
       "label": 1,
       "round": 4,
       "role": "b2c",
       "senti": "3"
     },
     {
       "content": "废话啊",
       "label": 1,
       "round": 5,
       "role": "b2c",
       "senti": "1"
     },
     {
       "content": "皮肤黑，我不知道买哪个, 不推荐我就不买了!",
       "label": 0,
       "round": 6,
       "role": "c2b",
       "senti": "2"
     }
   ],
   "sessionID": "4",
   "score": "1"
 }

Here we use json format to save these dialogues:

“session”: the current dialogue
- “content”: utterance content
- “label”: the current utterance is transferable or normal
- “round”: the order of the utterance in the current dialogue
- “role”: role information of the current utterance
- “senti”: sentiment of the current utterance(1 and 2 denote negative, 3 denotes neutral, 4 and 5 denote positive)
“sessionid”: ID of the current dialogue
“score”: Overall satisfaction of the current dialogue ( 1 denotes unsatisfied, 2 denotes met satisfied, 3 denotes well satisfied)

Our experiments are conducted based on two publicly available Chinese customer service dialogue datasets, namely Clothes and Makeup, collected by Song et al. (2019) from Taobao. Meanwhile, we also annotate the transferable/normal labels for both datasets according to the existing specifications (Liu et al., 2020). For the security of private information from customers, we performed the data desensitization and converted words to IDs following Song et al. (2019). The vocab.pkl contains the pre-trained glove word embeddings of token ids.

Usage

Data processing

To construct the vocabulary from the pre-trained word embeddings and corpus. For the security of private information from customers, we performed the data desensitization and converted words to IDs. We save the processed data into pickle file.

Train the model (including training, validation, and testing)

python main.py --phase train --suffix .128 --mode train --ways mt --model_name rssn --data_name clothes

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
MHCH_SSA		MHCH_SSA
config		config
resources		resources
tf_models		tf_models
LICENSE		LICENSE
Network.py		Network.py
README.md		README.md
data_loader.py		data_loader.py
data_prepare.py		data_prepare.py
main.py		main.py
requirement.txt		requirement.txt
run.sh		run.sh
utility.py		utility.py
vocab.py		vocab.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RSSN for MHCH+SSA

Requirements

Environment

Data Format

Usage

About

Languages

License

LauJames/RSSN

Folders and files

Latest commit

History

Repository files navigation

RSSN for MHCH+SSA

Requirements

Environment

Data Format

Usage

About

Resources

License

Stars

Watchers

Forks

Languages