Pseudo relevance feedback from user queries #35

bwbaugh · 2013-03-28T03:17:42Z

We could attempt to do online self-training from the user queries that are submitted to the system if the confidence in the label was above a certain threshold (like 80%).

We should definitely specifically log any queries used for training so that the classifier can be brought back to the same point should the server be restarted. In addition to logging the document and label to an automatically-labeled-set file, we could actually update the classifier in real-time as well.

We should probably first check that the combined text-label pair is unique, because I don't think we want to allow the classifier to be biased from learning from the same instance more than once. In the case that the text is the same but the labels are different, we might want to flag that for review.

This would only work if the classifier was centralized, because of it we had multiple instances loaded (to increase performance for the number of requests handled per second) then each classifier would be learning from different novel data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pseudo relevance feedback from user queries #35

Pseudo relevance feedback from user queries #35

bwbaugh commented Mar 28, 2013

Pseudo relevance feedback from user queries #35

Pseudo relevance feedback from user queries #35

Comments

bwbaugh commented Mar 28, 2013