Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ingest] Capture system-level audio with filtering #24

Closed
PSU3D0 opened this issue Jul 7, 2024 · 18 comments
Closed

[ingest] Capture system-level audio with filtering #24

PSU3D0 opened this issue Jul 7, 2024 · 18 comments

Comments

@PSU3D0
Copy link

PSU3D0 commented Jul 7, 2024

Meetings, videos, etc provide valuable context. Other audio sources, such as Spotify or from gaming, provide noisy data that is not likely to be ingested.

Audio API's vary by system. ALSA for Linux. May require unified interface for clean capture.

@louis030195
Copy link
Collaborator

im using https://github.com/RustAudio/cpal it works with all OSes

just need to pick all inputs instead of just 0 like it is currently

prob ship this today/tomorrow

@louis030195
Copy link
Collaborator

@PSU3D0 fyi added multi input recording (for example was recording with laptop mic + iphone mic)

still figuring out how to capture output (system audio)

RustAudio/cpal#896

@louis030195
Copy link
Collaborator

looking at this too

https://github.com/helmerapp/scap/

@louis030195
Copy link
Collaborator

@PSU3D0
Copy link
Author

PSU3D0 commented Jul 8, 2024

The dedicated screencapture look interesting. In particular the ability to capture distinct "targets" means it might solve the audio filtering problem directly. Additionally directly fetching screens of interest (chrome, VScode) instead of the entire screen solves a lot of other problems.

@louis030195
Copy link
Collaborator

louis030195 commented Jul 9, 2024

alright found solution will push soon

RustAudio/cpal#894

@louis030195
Copy link
Collaborator

@PSU3D0 i implemented it

it works on my computer

by any chance can you give a try? would be great!

you can just run the cli it will pick your default output (speaker or headphone) & input (mic)

screenpipe

@PSU3D0
Copy link
Author

PSU3D0 commented Jul 9, 2024

Checking this out! Any thoughts on difficulty of application filtering?

@PSU3D0
Copy link
Author

PSU3D0 commented Jul 9, 2024

Scap looks great as it can do application-level filtering, at least on video. Looking into it further...

@louis030195
Copy link
Collaborator

Checking this out! Any thoughts on difficulty of application filtering?

no idea

i think the common use case is to do stuff with meetings, don't think many people listen to music during meeting, but we can find out at some point

@PSU3D0
Copy link
Author

PSU3D0 commented Jul 9, 2024

Running into compile issues on Ubuntu 22.04 from screencapturekit

error[E0599]: no variant or associated item named `ScreenCaptureKit` found for enum `HostId` in the current scope
  --> screenpipe-audio/src/core.rs:92:71
   |
92 |             DeviceSpec::Output(_) => cpal::host_from_id(cpal::HostId::ScreenCaptureKit)?,
   |                                                                       ^^^^^^^^^^^^^^^^ variant or associated item not found in `HostId`

@louis030195
Copy link
Collaborator

that's really strange

there is this issue in CI too but on my laptop it build well

@louis030195
Copy link
Collaborator

i know ScreenCaptureKit does not work on linux but i clearly config it:

    let host = if cfg!(target_os = "macos") {
        match device_spec {
            // https://github.com/RustAudio/cpal/pull/894
            DeviceSpec::Output(_) => cpal::host_from_id(cpal::HostId::ScreenCaptureKit)?,
            _ => cpal::default_host(),
        }
    } else {
        cpal::default_host()
    };

@louis030195
Copy link
Collaborator

ok got it going to push fix

@louis030195
Copy link
Collaborator

louis030195 commented Jul 9, 2024

done (i hope it work?)

@louis030195
Copy link
Collaborator

@PSU3D0 lmk if u got it running, got it running on linux github codespace (i have mac) after a few fixes

@PSU3D0
Copy link
Author

PSU3D0 commented Jul 9, 2024

Will check later today!

@louis030195
Copy link
Collaborator

hey i think this is solved

whisper is using melfilter & other hacks to "focus" on the human voice + got option to pick list of audio device u want now (including screencapture using apple native stuff, not sure how it will behave on linux yet)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants