-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scrapy-loader-upkeep migration #12
base: master
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #12 +/- ##
==========================================
+ Coverage 98.00% 98.14% +0.14%
==========================================
Files 4 4
Lines 251 270 +19
==========================================
+ Hits 246 265 +19
Misses 5 5
Continue to review full report at Codecov.
|
fe2f1b8
to
1713944
Compare
if parsed_data in (None, []): | ||
parser_label += "/missing" | ||
|
||
self.stats.inc_value(parser_label) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes itemloaders depend on scrapy.stats, at least as far as interface is concerned. Do you think we can avoid it? I can see 2 main ways:
- ask for a different stats object here, maybe a plain dictionary; allow to pass real stats in Scrapy's itemloader
- move stats handling to Scrapy's wrapper instead of this package
We can also start decoupling scrapy Stats from scrapy, but I'd rather not do this now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @kmike , the itemloaders accept Scrapy's stats instance via DI and could work without it, as per this.
Accepting the actual Scrapy stats instance is crucial for the loader's usage to be logged at the end of a job (and be logged in Scrapy Cloud's stats-tab), like this:
2019-06-16 14:32:32 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{ ...
'parser/QuotesItemLoader/author/css/1': 10,
'parser/QuotesItemLoader/quote/css/1/missing': 10,
'parser/QuotesItemLoader/quote/css/2': 10
...
}
I'm all good for making it like a defaultdict(int)
to function like the stat's inc_value
, as long as we can pass it out to Scrapy's logging system.
However, I might be missing something? 🤔Is there another way to circumvent injecting Scrapy's stats to the itemloaders and still log the parser's usage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess @kmike 's point here is the implicit dependency.
Even though, scrapy is not a requirement, we require a direct dependency from scrapy..
So, maybe a solution would be to create some sort of adapter (another one).
Though, I somehow foresee a problem.
If we allow ItemLoader
to be extended with upkeep
, we'll need to make upkeep
the default base class for scrapy, as scrapy users we'll keep using scrapy's ItemLoader
which is an extension to this one.
So, if they import this extension instead of scrapy's, they'll be losing scrapy's behavior.
Currently Scrapy supports Python 3.5.1+ because it is Python availlable in Ubuntu 16.04 LTS. itemloaders should follow that, because Scrapy requires itemloders. We dropped Python 3.4 support just 7 months ago. That said, there can be an argument to switch to Python 3.6. The share of 3.5 users, as reported by https://pypistats.org/packages/scrapy, is around 2-3%, which is not that large. This is better to be discussed in Scrapy issue tracker. |
Covers these items in #11 :
NOTES:
f-strings
. Are we going to deprecate py3.5 soon?