Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【已提供解决方案】有个功能漏写了,实时转录的时候热词替换会起作用,但是将音视频文件拖动到客户端打开转录时,热词无效 #175

Open
mnotchc opened this issue Nov 28, 2024 · 1 comment

Comments

@mnotchc
Copy link

mnotchc commented Nov 28, 2024

Issue: 批量转录功能缺少热词替换

问题描述

在使用 CapsWriter-Offline 时发现,热词替换功能在实时转录时正常工作,但在批量转录音视频文件时不起作用。

原因分析

通过代码分析发现:

  1. 实时转录时,core_client.py 中的 main_mic() 函数会调用 update_hot_all() 来加载热词词典
  2. 而批量转录时,main_file() 函数没有加载热词词典的步骤
  3. 虽然 client_transcribe.py 中调用了 hot_sub() 函数,但由于热词词典为空,所以没有任何替换效果

解决方案

有两种修复方案:

  1. 修改源码(推荐):
    core_client.pymain_file() 函数开头添加热词加载:
async def main_file(files: List[Path]):
    show_file_tips()
    
    # 添加这行来加载热词
    update_hot_all()
    
    for file in files:
        # ...其他代码不变
  1. 使用外置程序:
    对于无法修改 core_client.exe 的用户,可以编写一个独立的脚本,在转录完成后手动应用热词替换。将以下代码保存到capswriter软件目录下
import sys
from pathlib import Path
import re
from util.client_hot_update import update_hot_all
from util.client_hot_sub import hot_sub
from util.client_cosmic import console
import srt_from_txt

def apply_hot_sub(file: Path):
    """对已转录的文件应用热词替换"""
    
    # 检查文件是否存在
    if not file.exists():
        console.print(f'文件不存在:{file}')
        return
        
    # 更新热词词典
    update_hot_all()
    
    # 读取原始文件
    merge_file = file.with_suffix('.merge.txt')
    if not merge_file.exists():
        console.print(f'找不到合并文本文件:{merge_file}')
        return
        
    with open(merge_file, 'r', encoding='utf-8') as f:
        text = f.read()
    
    # 应用热词替换
    text_new = hot_sub(text)
    
    # 生成分行版本
    text_split = re.sub('[,。?]', '\n', text_new)
    
    # 保存结果
    with open(merge_file, 'w', encoding='utf-8') as f:
        f.write(text_new)
    
    txt_file = file.with_suffix('.txt')
    with open(txt_file, 'w', encoding='utf-8') as f:
        f.write(text_split)
        
    # 重新生成srt
    srt_from_txt.one_task(txt_file)
    
    console.print(f'已完成热词替换并更新文件:\n{merge_file}\n{txt_file}\n{file.with_suffix(".srt")}')

if __name__ == '__main__':
    if len(sys.argv) < 2:
        print('用法: python apply_hot_sub.py <文件路径>')
        sys.exit(1)
        
    file = Path(sys.argv[1])
    apply_hot_sub(file)

建议

  1. 在下一版本更新中添加热词加载步骤
  2. 在文档中说明这个限制
  3. 可以考虑将热词加载功能模块化,确保两种转录模式都能正确初始化
@mnotchc
Copy link
Author

mnotchc commented Nov 28, 2024

因为无法正常运行源码的客户端(老是莫名其妙显示有库缺失),自行重新打包的exe也无法正常运行,只好出此下策,贵在简单可行

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant