Dataset and replication package for the paper Where is Your App Frustrating Users? (ICSE 2022).
- Python 3.6
- Main libraries: requirements.txt
- Pre-trained BERT
- Path: pytorch_version/prev_trained_model/
The directory dataset
is for the convenience of viewing the data, which contains labeled data and unlabeled data, respectively.
The data directory when running our code: pytorch_version/CLUEdatasets/cluener/
app: App name
text: Review sentence
senti: Sentence sentiment (negative: [-5, -1], positive: [1, 5])
label: The problematic feature phrase, the beginning position and the ending position
run_ner_crf.sh
You can change the configures in run_ner_crf.sh
, including the learning_rate
, per_gpu_train_batch_size
, per_gpu_eval_batch_size
, num_train_epochs
, etc. The other important parameters are
overwrite_output_dir -- whether overwrite the output directory
do_train -- whether train the model
do_eval -- whether evluate the model
do_predict -- whether predict the results of new data
This should be run after obtaining the results of Problematic Feature Extraction.
Need revision based on the format of your output data, and assign your data to the variable domain_docs
.
preprocess/clustering.py
pytorch_version/outputs/cluener_output/bert/