-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
back to legacy lot ids & add more original scrapers #4
Comments
…ts constructors + patch Aarhus scraper
+ move all original scrapers to `original/` path + move all new scrapers to `new/`
…riginal geojson file and scraped data)
Dear @jklmnn, I also update addresses if the website supplies more complete addresses, like an added zip code. And add public or source urls for each lot where available. Also i'm a bit more strict about the For Dresden i scraped the geo coordinates from the website when available and used the ParkAPI geojson if no coords are listed. The website coordinates have more digits so i thought this might be a good thing. But i guess it's possible that you and other contributors have picked more useful coordinates by hand, so this needs to be reviewed (not only for Dresden). Anyways, I do my best (to the best of my knowledge) to integrate the original scrapers and upgrade the meta-info where possible. Also wrote the Frankfurt opendata people about their outage (It stopped working on 2021/12/17) Boy, i'm really looking forward to get this project in production! Best regards and a happy new fear |
the original lot_ids did only consist of digits. Now they are "hamburg-1234". For consistency we might go back to digits only for this particular lot. We'll see.
+ add Scraper.ALLOW_SSL_FAILURE variable to allow, e.g., expired certificates (parken.heidelberg.de cert expired 2021/12/31)
… to scrape from the site so the original geojson was just ported and finito
the sub-pages for each parking lot do actually contain the lot-timestamp and num_free/capacity values even when the lot is closed. It takes a couple of extra requests but it's worth it i think. Attention: original lot ids contained characters "(", ")" and "&". Changed the name_to_legacy_id() function to remove these characters as they are potentially bad for filenames.
are "koeln-x-y123" (taken from the Köln feature identifier) We might change this back to the original legacy IDs but it was a bit more difficult here as the lot names in the live data and the names in the original ParkAPI geojson are quite different at times.
… for live capacity and full addresses. Attention: The lot "Byk Gulden Str." is currently out of order, does not have a linked page and is not in the geojson file!
Great work!
This is generally a good idea. However I can't say for sure if we can keep this if it goes into production. It might cause problems with legacy clients. |
Yes, replacing Nones with zeros in v1 api should be no problem. In the dumps, snapshots with None can probably just be skipped. There are incompatibilities with some lot_ids, though. And other tricky stuff ;) I'll implement the remaining scrapers and then do a scripted comparison with api.parkendd.de Then, we certainly have some stuff to discuss and find compromises |
The Frankfurt case: https://www.offenedaten.frankfurt.de/blog/aktualisierungverkehrsdaten From the email: ... Sobald vom Hersteller ein entsprechender Sicherheitspatch eingespielt wurde,.. hehe |
Okaaayyyhhh, here is the first attempt to compare https://github.com/defgsus/ParkAPI2/wiki/v1-api-comparison Just compared the 'city' metadata, not the lots. It's complex enough already. You can have a look if you like. I'm still preparing a more readable document with specific compatibility issues. One thing is sure. Using names for IDs will remain to be problematic. They do actually change occasionally. |
Sorry for the late reply. The problem with the lot IDs is that not all sources have real IDs, so we need to keep some kind of fallback. In the end, if there is no unique persistent ID and the data source decides to change the name, there isn't really anything we can do. |
Yes, it's complicated with those IDs. I'm really just picky because of later For daily use it's probably no problem if a lot name changes. Apart from the fact With the right measures and follow-up maintenance this can be somewhat managed. When porting your scrapers i found permanent IDs on some websites but with I found so many little compatibility challenges during the port that it In the midst of it i started writing the following overview. There are things i General changes to scrapers(no specific order, numbers are just for communication)
Individual scraper changes
That's it for now. Please let me know what you think and let us progress, slowly.. |
I just checked the available cities after our current outage, and the only city I can see missing is Hanau. So after we add this I'd say we can close this issue. |
All lots need to have the same ID as it was generated in the original ParkAPI by the geojson wrapper. (As discussed in issue #1)
In essence that means:
utils/strings/name_to_legacy_id
to convert the name stringsutils/strings/name_to_id
which allows-
separatorsbranch:
feature/legacy-ids
The text was updated successfully, but these errors were encountered: