s-stemmer deviates from paper? #157

markharwood · 2019-06-17T15:57:27Z

I see that bees doesn't stem to bee and tomatoes doesn't stem to tomato.

Is this misinterpreting the logic in the original paper?
I ask because I work on elasticsearch and discovered that we have a similar issue. See elastic/elasticsearch#42892 (comment) for my notes on the confusion.

The text was updated successfully, but these errors were encountered:

Yomguithereal · 2019-06-17T17:18:57Z

Hello @markharwood. That's entirely possible because I think I wrote my implementation reading Lucene's one, which should be the same as ES is using. Do you, by chance, have a link to, or the pdf, of the original article? As stated here I only could find a paper referencing the algorithm and explaining its broad intentions.

markharwood · 2019-06-24T10:34:56Z

No, I only saw the same paper as you. I've just tried sending an email to the original paper author - I'm sure she'd like to see her algorithm implemented correctly too.

markharwood · 2019-07-09T10:32:29Z

I heard back from Donna, the paper author. She agrees the bees/employees words should fall into rule 3 and remove the S. However that logic would make rule 2 redundant.
Rule 1 also has some weird looking exceptions which don't appear to relate to any common English words that I know of.

The origins of the S-stemmer algorithm appear to be lost in time - Donna didn't author it and suggested the logic may be connected to the SMART system from wayback when.

Rather than trying to resolve that I've been working on an alternative plural stemmer for elasticsearch here

Yomguithereal · 2019-07-09T10:45:52Z

Cool. Can you tell me when you feel your stemmer is done and when it's merged into ES and I will be able to replicate here if you want. Or feel free to open a PR if you want to do it also.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

s-stemmer deviates from paper? #157

s-stemmer deviates from paper? #157

markharwood commented Jun 17, 2019

Yomguithereal commented Jun 17, 2019

markharwood commented Jun 24, 2019

markharwood commented Jul 9, 2019 •

edited

Loading

Yomguithereal commented Jul 9, 2019

s-stemmer deviates from paper? #157

s-stemmer deviates from paper? #157

Comments

markharwood commented Jun 17, 2019

Yomguithereal commented Jun 17, 2019

markharwood commented Jun 24, 2019

markharwood commented Jul 9, 2019 • edited Loading

Yomguithereal commented Jul 9, 2019

markharwood commented Jul 9, 2019 •

edited

Loading