-
Notifications
You must be signed in to change notification settings - Fork 350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
3.0.0: Move to using GraphQL #208
Comments
Looks like a good idea! One of the best features of x-ray is the ability to have the nested selectors crawl to different pages. Is there a provision in gdom for this? |
@matthewmueller any reason why this graphQl-ish way of doing things would be better suited for expressing crawlers than what x-ray does atm? Is there anything lacking in expressiveness currently? |
@gebrits expressivity-wise i don't think it would add too much, but it would greatly simplify the internals of the library, put us on a more standard course and provide better error handling. |
@Crazometer yep, you can |
Looks similar to the way Osmosis is doing things. IMHO it seems more intuitive to compose the data structure in code rather than in a string. Probably makes extending and debugging easier too. Here's the Osmosis equivalent of the example: osmosis('http://news.ycombinator.com')
.set({
items:
osmosis.find('tr.athing')
.set({
rank: 'td span.rank',
title: 'td.title a',
sitebit: 'span.comhead a',
url: 'td.title a@href',
attrs: {
score: 'span.score',
user: 'a:eq(0)',
comments: 'a:eq(2)'
}
})
}).data(console.log); |
@rchipka thanks for chiming in! it definitely puts more emphasis on using a GraphiQL or some other GraphQL-aware IDE. Using one of those would improve the dev experience for crawling most websites. Actually now wondering if we should just be providing primitives to do this stuff, rather than some fancy control flow libraries. Basically more building blocks that you can put together to handle more complicated login flows. |
@matthewmueller No problem. I was just checking out how things are going on x-ray and this issue stood out to me. I think you should keep it as actual JS code and not a domain specific language in a string. You can implement a similar GraphQL look in pure JS by creating |
I definitely think gdom has showed us the way forward:
Here's a basic node implementation:
There's some improvements we can make by allowing you to swap out drivers easily (http, phantom, electron) and also around pagination.
I'm not sure I'll have time to work on this without community support, but I'd also love for someone to own this transition :-)
The text was updated successfully, but these errors were encountered: