-
-
Notifications
You must be signed in to change notification settings - Fork 995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[facebook] add support #5626
[facebook] add support #5626
Conversation
It looks like Facebook blocks your account for about one hour when running the extractor for too many images. |
I think photos and videos have a signature in the url, Facebook maybe can track you and ban using this info. |
I don't think i understand how you would be able to ban someone using a signature in the photo url. I think the most reasonable option would just be to use the request cookies (which include your account ids and such) to account-ban you. As i mentioned, it's not really a complete ban, it's only limited to some parts of the UI, and logging out (thus not sending the account request cookies) does remove the block, with the tradeoff that you can't view private or R-18 images. I still don't know if being logged out in advance prevents the ban altogether. If that's the case then i think i will add a warning about that. |
also added author followups for singular images
After doing some more testing i can tell that not using cookies still gets you blocked from viewing images, in the sense that you are forced to log in, and it happens much faster than when using them. Also, I'd love to know if the extractor works for anyone else other than me, so please feel free to let me know. |
Hi, I've tested your version and it seems to be working fine for pictures. I'm planning to save quite a few images from a public Facebook page and was wondering if using one of the --sleep commands could circumvent you from being blocked (or if Facebook just reacts to an arbitrary number of request, no matter the frequency). And just overall - does that mean I'm generally unable to connect to the Facebook services (like using gallery-dl with it) or does it just prevent browser/account interaction? Facebook video support would be nice too: Luckily in my case they weren't that many so I was able to download them one by one with yt-dlp... ...But they also don't support Album/Account video downloading (yet) like they do with YouTube for example. |
Hi, thank you for your feedback. I'm not sure if waiting to continue extracting would work, and if it did work, i have no idea for how long or after how many images the wait should start. That would require a lot of testing and unfortunately every time i get blocked i have to wait about 6 hours to try again. To be more specific, the "block" which I'm talking about only prevents you from accessing images by their URL (the way the extractor does it), but you can still access them by using the React user interface. As far as i can tell this block is only limited to this and you can do any other thing on Facebook About the video support, i will keep that in mind. I'm not sure how yt-dlp downloads them and i will check that out when i have time. |
Thanks for the info :) |
also fixed some bugs
No, I'm sorry, once you get blocked the photo page doesn't load at all (assuming you're loading it by its URL) so there is no way to know the metadata and stuff. This is the reason why I just added a way to continue the extraction from an image in the set/album instead of having to start from the beginning. Just take the photo URL and add "&setextract" to it to download the whole set from there instead of the photo alone. The user will be prompted with this URL if they get blocked while extracting |
Good idea for implementing that 👍🏻 Does it only work with the prompted URL because I just tried it up front by using an image link and adding "&setextract" to it but it gave me an 'unknown command' error after downloading just the singular image? Also, it seems like your video extraction only pulls the version without audio (the best one in the list of formats in yt-dlp but it gets merged with an audio-only version there by default)... So it would be best to either add the ffmpeg merge-by-default or have it select the "hd" format by default which has video+audio. |
The "&setextract" feature didn't work to you because you probably passed it to gallery-dl without using the double quotes (") and the command prompt recognized the "&" as the split character between two commands (you can use the ampersand to execute two commands in one line). That would also explain why it downloaded the image, and then gave you an "unknown command" message, as you probably don't have a command assigned to the "setextract" keyword By the way, after further inspection, I don't think there's a way to make an "all profile videos" extractor, as they don't share a set id i can use to navigate though them all. Good catch for the audio thing though, i wasn't wearing headphones :) |
Okay, several things I just realized by doing some trial and error 😅: Secondly, I need to then put the full link into quotation marks since it otherwise, as you said, detects the text after the ampersand as a secondary command (as marked red), giving me a 'syntax error' and not processing the rest further after downloading the image the link directs to. So this is how it's to be written to avoid any errors due to the link format and command logic etc. Now I got it to work successfully 👌🏻 Alternatively you can also just add the set id given to you by the previous downloading as it is in the folder name where your set images were saved to (e.g. the 'Timeline photos' set (all account images)) manually after the 'short' image link with the addition of "&set=" in front of it: |
I'm sorry if things got confusing 😥 at least you managed to make it work now. |
No problem, that's just what happens when the outgoing situation is slightly different :) |
There, i just had to change the matching URL pattern a little. Now it works even without including the set id. Hopefully it's the same for you. I recommend you to avoid this anyway, as Facebook acts a little weird when you navigate images without their set id. Sometimes their sequence gets changed or some images get skipped altogether. Or maybe it just works fine and i unintentionally bugfixed it some while ago, i don't know. |
Works 🤙🏻 |
the extractor should be ready :)
Sorry if this has been mentioned; but this seems to only be able to process the user, than the single post ( |
@fireattack I just fixed it. Let me know if now it works for you. |
I now get error:
|
@fireattack The problem is that the extractor can't really get all the images in the posts. You would have a better chance copying the set id of one of those images, which is in their urls (in your case it's |
Right now the extractor should get the first set id it can find in the post and try to extract that set. Sometimes the set contains way more images than the post, but you could still quit it when it's done I guess. I could technically make it quit by itself when it has extracted the same number of images in the posts (36+4) but I'm not sure if I want to. |
I'd like to finally review and merge this PR to include it in the v1.28.0 release. Sorry for taking forever to get to this. One of the general criticisms I have is your frequent use of |
The URLs in the test script should be the complete list of all of them. The only exception is the |
The formatter used for archive IDs during tests ignores b1985d6#diff-e74101faac0b9c340fefef8dd3418e77d07cf8ce4df6a1c5110c91853804d876R250 Just add |
I have written "most of the requested changes" in the commit message because I can't really replace the |
- more 'Sec-Fetch-…' headers - simplify 'text.nameext_from_url()' calls - replace 'sorted(…)[-1]' with 'max(…)' - fix '_interval_429' usage - use replacement fields in logging messages
get rid of '.*' and '.*?'
Seems to work just fine: cb286cb |
I forgot to tell you about the "&setextract" function, where if a photo URL ends with "&setextract" the extractor will instead go through the whole set starting from that photo. This is to avoid having to start from the beginning after being temporarily banned. I have added the remaining tests and adjusted the concerned patterns. Everything should work now. |
I'd like to test this, but the nightly builds are no good for Intel macs :( Is there some reason it's not being built as universal? You can build one arm64 and one x86_64 and then
Edit: So you can get the nightlies through pip, but, for some reason, if you try to do anything like Instead, you've got to use So |
Was able to download a friend's photos successfully, although they only have 6 photos ;) EDIT: Got another friend, 329 photos. Don't seem to be banned or anything. Is there like a 1 second delay in the code already, or is FB just slow? |
@MikeRich88 I'm glad to hear that it works for you. To answer to your question, the extractor has to make a request to a webpage before every image (+one at the beginning) that are really expensive, to get the necessary data. Just one request is ~1.5MB so yeah, unless Facebook optimizes his pages the delay will always be there. |
Fixes #470 and #2612 (probably duplicate)
For now it supports Account Photos & Photo Albums.
The only way it can work is by making one request per post, so it's not really optimized unfortunately.