-
Notifications
You must be signed in to change notification settings - Fork 705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create requestLikeBrowser function #255
Comments
The function should have also an option to abort downloading responses with a content-type not matching a specific selected ones. Then we can use this class in |
Maybe we can create a |
Sure, although I wouldn't over-engineer this. Developers are familiar with the |
During this task, we might also try to fix bug #266 |
BTW the |
Also, the function should be able to tell what is the final URL (after all redirects) |
It would be great if the new function also addressed the issue where SSL connections over proxy leak sockets in CLOSE_WAIT state, which eventually leads to EMFILE errors. See request/request#2440 for details |
Considering this would be a second monkey patch of the request package we need to do, maybe we could explore other options too and switch to a more maintained HTTP client. |
@petrpatek @mnmkng @mtrunkat I put together the specification for this new function. It's in a form of a pull request where the new functions are defined and commented. See #353 IMHO the best way to get this done it is to first implement |
Isn't this already done by @petrpatek ? Can we close? |
Yes, it's in latest already. Development is still ongoing though. |
It will download HTML using the
request
package, but it will emulate HTTP headers of normal browser to reduce the chance of bot detection. Once done and tested, we should use this function inCheerioCrawler
by default.In the first version, let's just emulate Firefox with the latest user agent. In the future, we could support other browsers and user agents, so make the function the way that its functionality might be extended in the future, e.g. have there some
options
param.Here's a code snippet that can be used for start.
The text was updated successfully, but these errors were encountered: