Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matching uppercase letters in a lowercase string #158

Open
rcongiu opened this issue Dec 27, 2023 · 2 comments
Open

Matching uppercase letters in a lowercase string #158

rcongiu opened this issue Dec 27, 2023 · 2 comments

Comments

@rcongiu
Copy link

rcongiu commented Dec 27, 2023

In here, to match emoticons: Line 211 https://github.com/rasbt/machine-learning-book/blob/bc27b404956c1555777282624eb5b8c50c818bfd/ch15/ch15_part2.ipynb#L211

shouldn't this betext.upper()instead of text.lower() since we have capital P and D in the match expression ? Or in alternative make the regex ignore the case , re.findall('(?::|;|=)(?:-)?(?:\)|\(|D|P)', text.lower(), flags=re.IGNORECASE).
Like it is now, it looks for uppercase letters in a string that's all lower case so it will never match anything.

@rasbt
Copy link
Owner

rasbt commented Dec 28, 2023

Thanks for the comment. I think if it is all in upper case it, characters like ":-)" would become "":_)" etc. I think instead of doing text.lower(), which would catch things like ":-P". To preserve the original characters, it could perhaps be just

emoticons = re.findall('(?::|;|=)(?:-)?(?:\)|\(|D|P)', text)

@rcongiu
Copy link
Author

rcongiu commented Dec 28, 2023

Thanks for the comment. I think if it is all in upper case it, characters like ":-)" would become "":_)" etc.

I don't think so, .lower() and .upper() only work on actual letters and would not change the "-" to "_".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants