Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scrapy-loader-upkeep migration #12

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Conversation

BurnzZ
Copy link
Member

@BurnzZ BurnzZ commented May 23, 2020

Covers these items in #11 :

  • code migration
  • test migration

NOTES:

  • The tests for py3.5 are failing because of f-strings. Are we going to deprecate py3.5 soon?

@codecov
Copy link

codecov bot commented May 23, 2020

Codecov Report

Merging #12 into master will increase coverage by 0.14%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #12      +/-   ##
==========================================
+ Coverage   98.00%   98.14%   +0.14%     
==========================================
  Files           4        4              
  Lines         251      270      +19     
==========================================
+ Hits          246      265      +19     
  Misses          5        5              
Impacted Files Coverage Δ
itemloaders/__init__.py 97.67% <100.00%> (+0.28%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 37a63cd...dd96b7e. Read the comment docs.

@BurnzZ BurnzZ requested review from ejulio, Gallaecio and kmike May 23, 2020 07:44
@BurnzZ BurnzZ self-assigned this May 23, 2020
@BurnzZ BurnzZ added the enhancement New feature or request label May 23, 2020
@BurnzZ BurnzZ force-pushed the scrapy-loader-upkeep branch from fe2f1b8 to 1713944 Compare May 23, 2020 08:09
if parsed_data in (None, []):
parser_label += "/missing"

self.stats.inc_value(parser_label)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes itemloaders depend on scrapy.stats, at least as far as interface is concerned. Do you think we can avoid it? I can see 2 main ways:

  • ask for a different stats object here, maybe a plain dictionary; allow to pass real stats in Scrapy's itemloader
  • move stats handling to Scrapy's wrapper instead of this package

We can also start decoupling scrapy Stats from scrapy, but I'd rather not do this now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kmike , the itemloaders accept Scrapy's stats instance via DI and could work without it, as per this.

Accepting the actual Scrapy stats instance is crucial for the loader's usage to be logged at the end of a job (and be logged in Scrapy Cloud's stats-tab), like this:

2019-06-16 14:32:32 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{ ...
  'parser/QuotesItemLoader/author/css/1': 10,
  'parser/QuotesItemLoader/quote/css/1/missing': 10,
  'parser/QuotesItemLoader/quote/css/2': 10
  ...
}

I'm all good for making it like a defaultdict(int) to function like the stat's inc_value, as long as we can pass it out to Scrapy's logging system.

However, I might be missing something? 🤔Is there another way to circumvent injecting Scrapy's stats to the itemloaders and still log the parser's usage?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess @kmike 's point here is the implicit dependency.
Even though, scrapy is not a requirement, we require a direct dependency from scrapy..
So, maybe a solution would be to create some sort of adapter (another one).
Though, I somehow foresee a problem.

If we allow ItemLoader to be extended with upkeep, we'll need to make upkeep the default base class for scrapy, as scrapy users we'll keep using scrapy's ItemLoader which is an extension to this one.
So, if they import this extension instead of scrapy's, they'll be losing scrapy's behavior.

@kmike
Copy link
Member

kmike commented May 25, 2020

The tests for py3.5 are failing because of f-strings. Are we going to deprecate py3.5 soon?

Currently Scrapy supports Python 3.5.1+ because it is Python availlable in Ubuntu 16.04 LTS. itemloaders should follow that, because Scrapy requires itemloders. We dropped Python 3.4 support just 7 months ago.

That said, there can be an argument to switch to Python 3.6. The share of 3.5 users, as reported by https://pypistats.org/packages/scrapy, is around 2-3%, which is not that large. This is better to be discussed in Scrapy issue tracker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants