Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pseudo relevance feedback from user queries #35

Open
bwbaugh opened this issue Mar 28, 2013 · 0 comments
Open

Pseudo relevance feedback from user queries #35

bwbaugh opened this issue Mar 28, 2013 · 0 comments

Comments

@bwbaugh
Copy link
Owner

bwbaugh commented Mar 28, 2013

We could attempt to do online self-training from the user queries that are submitted to the system if the confidence in the label was above a certain threshold (like 80%).

We should definitely specifically log any queries used for training so that the classifier can be brought back to the same point should the server be restarted. In addition to logging the document and label to an automatically-labeled-set file, we could actually update the classifier in real-time as well.

We should probably first check that the combined text-label pair is unique, because I don't think we want to allow the classifier to be biased from learning from the same instance more than once. In the case that the text is the same but the labels are different, we might want to flag that for review.

This would only work if the classifier was centralized, because of it we had multiple instances loaded (to increase performance for the number of requests handled per second) then each classifier would be learning from different novel data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant