3.0.0: Move to using GraphQL #208

matthewmueller · 2016-08-31T06:40:14Z

I definitely think gdom has showed us the way forward:

Here's a basic node implementation:

https://github.com/IamNotUrKitty/gdom-node

There's some improvements we can make by allowing you to swap out drivers easily (http, phantom, electron) and also around pagination.

I'm not sure I'll have time to work on this without community support, but I'd also love for someone to own this transition :-)

jspri · 2016-09-03T07:57:50Z

Looks like a good idea!

One of the best features of x-ray is the ability to have the nested selectors crawl to different pages. Is there a provision in gdom for this?

0xgeert · 2016-09-05T05:00:39Z

@matthewmueller any reason why this graphQl-ish way of doing things would be better suited for expressing crawlers than what x-ray does atm? Is there anything lacking in expressiveness currently?

matthewmueller · 2016-09-10T09:20:36Z

@gebrits expressivity-wise i don't think it would add too much, but it would greatly simplify the internals of the library, put us on a more standard course and provide better error handling.

matthewmueller · 2016-09-10T09:21:48Z

@Crazometer yep, you can visit within a page to another page.

rchipka · 2016-11-25T20:22:41Z

Looks similar to the way Osmosis is doing things.

IMHO it seems more intuitive to compose the data structure in code rather than in a string. Probably makes extending and debugging easier too.

Here's the Osmosis equivalent of the example:

osmosis('http://news.ycombinator.com')
  .set({
    items:
      osmosis.find('tr.athing')
       .set({
          rank:    'td span.rank',
          title:   'td.title a',
          sitebit: 'span.comhead a',
          url:     'td.title a@href',
          attrs: {
             score:    'span.score',
             user:     'a:eq(0)',
             comments: 'a:eq(2)'
          }
      })
  }).data(console.log);

matthewmueller · 2016-11-26T06:30:22Z

@rchipka thanks for chiming in! it definitely puts more emphasis on using a GraphiQL or some other GraphQL-aware IDE. Using one of those would improve the dev experience for crawling most websites.

Actually now wondering if we should just be providing primitives to do this stuff, rather than some fancy control flow libraries. Basically more building blocks that you can put together to handle more complicated login flows.

rchipka · 2016-11-26T17:20:05Z

@matthewmueller No problem. I was just checking out how things are going on x-ray and this issue stood out to me.

I think you should keep it as actual JS code and not a domain specific language in a string. You can implement a similar GraphQL look in pure JS by creating x.page(), x.query(), etc.

lathropd added the feature label Apr 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3.0.0: Move to using GraphQL #208

3.0.0: Move to using GraphQL #208

matthewmueller commented Aug 31, 2016 •

edited

Loading

jspri commented Sep 3, 2016

0xgeert commented Sep 5, 2016

matthewmueller commented Sep 10, 2016 •

edited

Loading

matthewmueller commented Sep 10, 2016

rchipka commented Nov 25, 2016

matthewmueller commented Nov 26, 2016

rchipka commented Nov 26, 2016

3.0.0: Move to using GraphQL #208

3.0.0: Move to using GraphQL #208

Comments

matthewmueller commented Aug 31, 2016 • edited Loading

jspri commented Sep 3, 2016

0xgeert commented Sep 5, 2016

matthewmueller commented Sep 10, 2016 • edited Loading

matthewmueller commented Sep 10, 2016

rchipka commented Nov 25, 2016

matthewmueller commented Nov 26, 2016

rchipka commented Nov 26, 2016

matthewmueller commented Aug 31, 2016 •

edited

Loading

matthewmueller commented Sep 10, 2016 •

edited

Loading