Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.0.0: Move to using GraphQL #208

Open
matthewmueller opened this issue Aug 31, 2016 · 7 comments
Open

3.0.0: Move to using GraphQL #208

matthewmueller opened this issue Aug 31, 2016 · 7 comments
Labels

Comments

@matthewmueller
Copy link
Owner

matthewmueller commented Aug 31, 2016

I definitely think gdom has showed us the way forward:

Here's a basic node implementation:

There's some improvements we can make by allowing you to swap out drivers easily (http, phantom, electron) and also around pagination.

I'm not sure I'll have time to work on this without community support, but I'd also love for someone to own this transition :-)

@jspri
Copy link
Contributor

jspri commented Sep 3, 2016

Looks like a good idea!

One of the best features of x-ray is the ability to have the nested selectors crawl to different pages. Is there a provision in gdom for this?

@0xgeert
Copy link
Contributor

0xgeert commented Sep 5, 2016

@matthewmueller any reason why this graphQl-ish way of doing things would be better suited for expressing crawlers than what x-ray does atm? Is there anything lacking in expressiveness currently?

@matthewmueller
Copy link
Owner Author

matthewmueller commented Sep 10, 2016

@gebrits expressivity-wise i don't think it would add too much, but it would greatly simplify the internals of the library, put us on a more standard course and provide better error handling.

@matthewmueller
Copy link
Owner Author

@Crazometer yep, you can visit within a page to another page.

@rchipka
Copy link

rchipka commented Nov 25, 2016

Looks similar to the way Osmosis is doing things.

IMHO it seems more intuitive to compose the data structure in code rather than in a string. Probably makes extending and debugging easier too.

Here's the Osmosis equivalent of the example:

osmosis('http://news.ycombinator.com')
  .set({
    items:
      osmosis.find('tr.athing')
       .set({
          rank:    'td span.rank',
          title:   'td.title a',
          sitebit: 'span.comhead a',
          url:     'td.title a@href',
          attrs: {
             score:    'span.score',
             user:     'a:eq(0)',
             comments: 'a:eq(2)'
          }
      })
  }).data(console.log);

@matthewmueller
Copy link
Owner Author

@rchipka thanks for chiming in! it definitely puts more emphasis on using a GraphiQL or some other GraphQL-aware IDE. Using one of those would improve the dev experience for crawling most websites.

Actually now wondering if we should just be providing primitives to do this stuff, rather than some fancy control flow libraries. Basically more building blocks that you can put together to handle more complicated login flows.

@rchipka
Copy link

rchipka commented Nov 26, 2016

@matthewmueller No problem. I was just checking out how things are going on x-ray and this issue stood out to me.

I think you should keep it as actual JS code and not a domain specific language in a string. You can implement a similar GraphQL look in pure JS by creating x.page(), x.query(), etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants