You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sorry for creating an issue about this, please let me know if there is a better place to share thoughts and questions.
I am really enjoying this project but I can already see many instances where the framework is not able to solve the request. Is there a reason why we couldn't use a AI vision model where we give a screenshot of the web page and ask "where is the red button", so we get coordinates back that we can then pass back to the browser and click on it?
Would love to hear your thoughts about this.
The text was updated successfully, but these errors were encountered:
Hi team,
Sorry for creating an issue about this, please let me know if there is a better place to share thoughts and questions.
I am really enjoying this project but I can already see many instances where the framework is not able to solve the request. Is there a reason why we couldn't use a AI vision model where we give a screenshot of the web page and ask "where is the red button", so we get coordinates back that we can then pass back to the browser and click on it?
Would love to hear your thoughts about this.
The text was updated successfully, but these errors were encountered: