Question: use vision AI with element position and click #149

guillegette · 2024-10-30T23:21:26Z

Hi team,

Sorry for creating an issue about this, please let me know if there is a better place to share thoughts and questions.

I am really enjoying this project but I can already see many instances where the framework is not able to solve the request. Is there a reason why we couldn't use a AI vision model where we give a screenshot of the web page and ask "where is the red button", so we get coordinates back that we can then pass back to the browser and click on it?

Would love to hear your thoughts about this.

pkiv · 2024-10-31T16:31:36Z

Hey @guillegette! Thanks for sharing. I think this would be a great modifier to the useVision argument (useVisionCoordinates maybe?) We're discussing this in the community Slack here: https://stagehand-dev.slack.com/archives/C07UCP76U8G/p1730392286230649

filip-michalsky added the question Further information is requested label Nov 4, 2024

kamath added this to Stagehand Nov 29, 2024

kamath added this to the Extensibility milestone Nov 29, 2024

kamath mentioned this issue Nov 29, 2024

Allow Developers to change DOM parsing method #65

Open

kamath moved this to Todo in Stagehand Nov 29, 2024

kamath removed the status in Stagehand Nov 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: use vision AI with element position and click #149

Question: use vision AI with element position and click #149

guillegette commented Oct 30, 2024

pkiv commented Oct 31, 2024

Question: use vision AI with element position and click #149

Question: use vision AI with element position and click #149

Comments

guillegette commented Oct 30, 2024

pkiv commented Oct 31, 2024