Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] 部分图像 CDN 对 RSStT 和 Telegram 返回不同的图像格式,使 RSStT 的媒体文件探测变得无意义 #600

Open
Konano opened this issue Dec 18, 2024 · 3 comments

Comments

@Konano
Copy link

Konano commented Dec 18, 2024

Describe the bug

当图片是 webp 格式时,发送的内容会被 Telegram 处理成 sticker 从而看不到消息附加的文字内容。

Screenshots

image

image

Feed Content

<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
<channel>
<title>大麦网票务 - 全国 - 演唱会 - 周杰伦</title>
<link>https://search.damai.cn/search.htm</link>
<atom:link href="" rel="self" type="application/rss+xml"/>
<description>大麦网票务 - 全国 - 演唱会 - 周杰伦 - Powered by RSSHub</description>
<generator>RSSHub</generator>
<webMaster>[email protected] (RSSHub)</webMaster>
<language>en</language>
<lastBuildDate>Wed, 18 Dec 2024 03:13:52 GMT</lastBuildDate>
<ttl>5</ttl>
<item>
<title>周杰伦2025“嘉年华”世界巡回演唱会-三亚站</title>
<description><img src="https://img.alicdn.com/bao/uploaded/https://img.alicdn.com/imgextra/i1/2251059038/O1CN01aFvJjq2GdSfD6iXR0_!!2251059038.jpg" referrerpolicy="no-referrer"><p>艺人:<span class="c4">周杰伦</span></p><p>地点:三亚 | 三亚市体育中心白鹭体育场</p><p>时间:2025.03.28-03.30</p><p>票价:600-2000</p></description>
<link>https://detail.damai.cn/item.htm?id=836833582354</link>
<guid isPermaLink="false">https://detail.damai.cn/item.htm?id=836833582354</guid>
<author>艺人:周杰伦</author>
</item>
<item>
<title>周杰伦《嘉年华》世界巡回演唱会-迪拜站</title>
<description><img src="https://img.alicdn.com/bao/uploaded/https://img.alicdn.com/imgextra/i4/2251059038/O1CN01BC2hZR2GdSf73Q97m_!!2251059038.jpg" referrerpolicy="no-referrer"><p>艺人:<span class="c4">周杰伦</span></p><p>地点:迪拜 | Coca-Cola Arena</p><p>时间:2025.01.04-01.05</p><p>票价:1400-4700</p></description>
<link>https://detail.damai.cn/item.htm?id=859409611964</link>
<guid isPermaLink="false">https://detail.damai.cn/item.htm?id=859409611964</guid>
<author>艺人:周杰伦</author>
</item>
</channel>
</rss>

Expected behavior

最好是能转换图片格式,不行的话就纯文字发布。

Important log

@Rongronggg9
Copy link
Owner

RSStT 长久以来一直会为了 work around 这个问题而特判 WebP 并将其转换后发送。

问题是出在这个图片 CDN 上,它会根据 User-Agent 返回不同的媒体文件,有时候是 JPEG,有时候是 WebP……人类迷惑行为!

$ curl "$ALICDN_URL" -H 'User-Agent: Mozilla/5.0 (X11)' -v | file -
[...]
< content-type: image/jpeg
< content-length: 429315
[...]
< picasso-fmt: jpg2
[...]
/dev/stdin: JPEG image data, JFIF standard 1.01, resolution (DPI), density 72x72, segment length 16, baseline, precision 8, 1020x1360, components 3
$ curl "$ALICDN_URL" -H 'User-Agent: RSStT/2.9.0 RSS Reader (+https://git.io/RSStT)' -v | file -
[...]
< content-type: image/webp
< content-length: 358552
[...]
< picasso-fmt: jpg2webp
[...]
/dev/stdin: RIFF (little-endian) data, Web/P image, VP8 encoding, 1020x1360, Scaling: [none]x[none], YUV color, decoders should clamp

进一步测试,发现只要 UA 满足以下任一特征,就会导致返回 JPEG:桌面平台、Chrome、Safari;否则返回 WebP。

实际上,RSStT 自一开始设计时就考虑到节省流量,因此它发送媒体的方式不是下载后上传,而是将 URL 传递给 Telegram 的 API,最终让 Telegram 服务端完成下载。RSStT 只获取 HTTP header 以及数 KiB 的文件头部以确定文件类型、大小与分辨率。然后,根据这些信息,最终会决定媒体是否需要格式或分辨率转换,如果需要,则使用 wsrv.nl 提供的在线转换服务(同样不下载,直接组装并传递 URL)。

于是,对于这个图片 CDN,RSStT 获取到的文件类型可能并不与 Telegram 服务端下载到的相同,就导致了这个问题。

留意到,RSStT 默认的 UA(RSStT/2.9.0 RSS Reader (+https://git.io/RSStT))与 Telegram 服务端下载互联网资源所用的 UA(TelegramBot (like TwitterBot))均获取到 WebP。因此,似乎只有在配置了自定义 UA 的时候才会出现此问题。

就短期来讲,要规避这个问题,只需修改自定义 UA 避开上述特征(如 Android Firefox)。长远来讲,会考虑特判该域名,以及增加一个选项用于指定探测媒体时所用的 UA。

不行的话就纯文字发布。

/set/set_default 命令可以关闭媒体发送,这样就是纯文本了。

@Konano
Copy link
Author

Konano commented Dec 19, 2024

经过检查,部署的时候并没有自定义 UA。

services:
  main:
    # image: rongronggg9/rss-to-telegram:dev  # stable image: rongronggg9/rss-to-telegram
    image: rongronggg9/rss-to-telegram  # stable image: rongronggg9/rss-to-telegram
    container_name: rsstt  # need to be unique
    restart: unless-stopped
    volumes:
      - ./config:/app/config
    environment:
      - TOKEN=**REDACTION** # get it from @BotFather
      - MANAGER=**REDACTION** # get it from @userinfobot, can be a list (e.g., 1234567890;987654321)

# ↓------ To disable sending via Telegraph, comment out this area ------↓ #
# Get Telegraph API access tokens: https://api.telegra.ph/createAccount?short_name=RSStT&author_name=Generated%20by%20RSStT&author_url=https%3A%2F%2Fgithub.com%2FRongronggg9%2FRSS-to-Telegram-Bot
# Refresh the page every time you get a new token.
# If you have a lot of subscriptions, make sure to get at least 5 tokens.
#                            ↓ Replace with your access tokens ↓
      - TELEGRAPH_TOKEN=
          **REDACTION**
# ↑------ To disable sending via Telegraph, comment out this area ------↑ #

# Please read https://github.com/Rongronggg9/RSS-to-Telegram-Bot/blob/master/docs/advanced-settings.md for more details.
# ↓------ Advanced settings ------↓ #
      #- ERROR_LOGGING_CHAT=-1001234567890  # default: the first MANAGER
      - MULTIUSER=0  # default: 1
      #- CRON_SECOND=30  # 0-59, default: 0
      #- DATABASE_URL=postgres://username:password@host:port/db_name  # default: sqlite://path/to/config/db.sqlite3
      #- API_ID=1025907  # get it from https://core.telegram.org/api/obtaining_api_id
      #- API_HASH=452b0359b988148995f22ff0f4229750  # get it from https://core.telegram.org/api/obtaining_api_id
      #- IMG_RELAY_SERVER=https://wsrv.nl/?url=  # default: https://rsstt-img-relay.rongrong.workers.dev/
      #- IMAGES_WESERV_NL=https://t0.nl/  # default: https://wsrv.nl/
      #- USER_AGENT=Mozilla/5.0 (Android 12; Mobile; rv:68.0) Gecko/68.0 Firefox/96.0  # default: RSStT/2.x RSS Reader
      #- IPV6_PRIOR=1  # default: 0
      #- VERIFY_TLS=0  # default: 1
      #- T_PROXY=socks5://172.17.0.1:1080  # Proxy used to connect to the Telegram API
      #- R_PROXY=socks5://172.17.0.1:1080  # Proxy used to fetch feeds
      #- PROXY_BYPASS_PRIVATE=1  # default: 0
      #- PROXY_BYPASS_DOMAINS=example.com;example.net
      #- HTTP_TIMEOUT=30  # default: 12
      #- HTTP_CONCURRENCY=0  # default: 1024
      #- HTTP_CONCURRENCY_PER_HOST=0  # default: 16
      #- TABLE_TO_IMAGE=1  # default: 0
      #- TRAFFIC_SAVING=1  # default: 0
      #- LAZY_MEDIA_VALIDATION=1  # default: 0
      #- MANAGER_PRIVILEGED=1  # default: 0
      #- NO_UVLOOP=1  # default: 0
      #- MULTIPROCESSING=1  # default: 0
      #- EXECUTOR_NICENESS_INCREMENT=5  # default: 2
      #- DEBUG=1  # debug logging, default: 0
# ↑------ Advanced settings ------↑ #

Rongronggg9 added a commit that referenced this issue Dec 23, 2024
See #600 for more details.

Signed-off-by: Rongrong <[email protected]>
@Rongronggg9
Copy link
Owner

Rongronggg9 commented Dec 23, 2024

经过检查,部署的时候并没有自定义 UA。

那么可能这个 CDN 还实现(!)了其它的人类迷惑行为🤯


长远来讲,会考虑特判该域名

已在 4088d5d 中实施。留意到你使用 latest 镜像,需要切换成 dev 才能测试该 commit。

增加一个选项用于指定探测媒体时所用的 UA

TODO! 🕊 因此暂时不会关闭这个 issue。

@Rongronggg9 Rongronggg9 changed the title [BUG] 当图片是 webp 格式时会被 Telegram 处理成 sticker 从而看不到附加文字 [BUG] 部分图像 CDN 对 RSStT 和 Telegram 返回不同的图像格式,使 RSStT 的媒体文件探测变得无意义 Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants