Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Paragraph Detection] Should detected paragraphs be allowed to be nested? #2

Open
chseifert opened this issue Aug 25, 2015 · 2 comments

Comments

@chseifert
Copy link

Nested paragraphs are detected sometimes. See EEXCESS/jarvis#6
for an example.

@mgrani
Copy link

mgrani commented Aug 26, 2015

The paragraphs are also very different in granularity. Some are short, some a larger. From my POV this is ok, but it influences the query accuracy. Maybe we need to decompose paragraphs further on a window of 1-3 sentences for obtaining good query terms.

@schloett
Copy link

Jarvis uses an adaptation of an earlier version of the paragraph detection. In the current version, this bug should not be present.
But I agree, that it would make sense to subdivide long paragraphs. However, I am not sure, if a fixed length window would be appropriate or if other features could be exploited. For the task of finding subparagraphs, the markup might provide indicators, for the task of query generation, filtering the keywords (remove outliers) might be another solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants