Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: use vision AI with element position and click #149

Open
guillegette opened this issue Oct 30, 2024 · 1 comment
Open

Question: use vision AI with element position and click #149

guillegette opened this issue Oct 30, 2024 · 1 comment
Labels
question Further information is requested
Milestone

Comments

@guillegette
Copy link

Hi team,

Sorry for creating an issue about this, please let me know if there is a better place to share thoughts and questions.

I am really enjoying this project but I can already see many instances where the framework is not able to solve the request. Is there a reason why we couldn't use a AI vision model where we give a screenshot of the web page and ask "where is the red button", so we get coordinates back that we can then pass back to the browser and click on it?

Would love to hear your thoughts about this.

@pkiv
Copy link
Contributor

pkiv commented Oct 31, 2024

Hey @guillegette! Thanks for sharing. I think this would be a great modifier to the useVision argument (useVisionCoordinates maybe?) We're discussing this in the community Slack here: https://stagehand-dev.slack.com/archives/C07UCP76U8G/p1730392286230649

@filip-michalsky filip-michalsky added the question Further information is requested label Nov 4, 2024
@kamath kamath added this to Stagehand Nov 29, 2024
@kamath kamath added this to the Extensibility milestone Nov 29, 2024
@kamath kamath moved this to Todo in Stagehand Nov 29, 2024
@kamath kamath removed the status in Stagehand Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
Status: No status
Development

No branches or pull requests

4 participants