You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The paragraphs are also very different in granularity. Some are short, some a larger. From my POV this is ok, but it influences the query accuracy. Maybe we need to decompose paragraphs further on a window of 1-3 sentences for obtaining good query terms.
Jarvis uses an adaptation of an earlier version of the paragraph detection. In the current version, this bug should not be present.
But I agree, that it would make sense to subdivide long paragraphs. However, I am not sure, if a fixed length window would be appropriate or if other features could be exploited. For the task of finding subparagraphs, the markup might provide indicators, for the task of query generation, filtering the keywords (remove outliers) might be another solution.
Nested paragraphs are detected sometimes. See EEXCESS/jarvis#6
for an example.
The text was updated successfully, but these errors were encountered: