Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guide to Running Web Interface on AWS EC2 #87

Open
mattyjacks opened this issue Oct 21, 2024 · 3 comments
Open

Guide to Running Web Interface on AWS EC2 #87

mattyjacks opened this issue Oct 21, 2024 · 3 comments

Comments

@mattyjacks
Copy link

I managed to get this thing working via AWS EC2! YAY! I decided to write a little guide on it.

First thing is you launch an Ubuntu Server 24.04 LTS instance, I use a t2.xlarge ($0.18 per hour) (you can turn it off when you're not using it to save money) with 25 GB of storage.
guide to creating google web scraper instance

Then you connect to the instance. Using EC2 Instance Connect with default username is fine.

Here are the commands you have to run:

git clone https://github.com/gosom/google-maps-scraper.git

sudo apt install golang-go

sudo apt-get update

sudo apt install golang-go

sudo apt-get install libatk1.0-0 libatk-bridge2.0-0 libcups2 libatspi2.0-0 libxcomposite1 libxdamage1 libxfixes3 libxrandr2 libgbm1 libpango-1.0-0 libcairo2 liboss4-salsa-asound2

sudo apt-get install liboss4-salsa-asound2

sudo apt-get update

sudo apt-get upgrade

sudo apt install nodejs npm

sudo npm install -g playwright

sudo apt-get install libasound2 libasound2-plugins

rm -rf ~/.cache/ms-playwright

playwright install

sudo npx playwright install-deps

uname -m

npx playwright install firefox

npx playwright install webkit

cd google-maps-scraper

go mod download

go build

(Adjust the number after -c depending on the number of cores your EC2 instance has, 1 less than the number of cores you have, the EC2 Instance I chose has 4 cores)

./google-maps-scraper -web -c 3

Edit inbound security group rules of the EC2 Instance to allow 8080 port range from anywhere

aws ec2 edit security group rules

Visit the port 8080 of the public IP address of the EC2, like 54.147.206.100:8080 , be sure to use HTTP instead of HTTPS or it won't connect

aws ec2 running scraper

Above is what the scraper looks like in action.

THANK YOU @gosom FOR YOUR WONDERFUL TOOL!

@gosom
Copy link
Owner

gosom commented Oct 21, 2024

@mattyjacks it is nice that it works for you but I have a few points here:

(1) The webapp is NOT DESIGNED (at the moment ) to be publicly available for security purposes. I HIGHLY recommend you IMMEDIATELY allow ONLY your ip to access the tool until an authentication system is in place.

(2) I think it's easier to run it via a docker container.

Thank you very much for trying this into AWS

@mattyjacks
Copy link
Author

Thank you for the quick response.

1: I'll be shutting down the tool as soon as this scrape-job is finished (to save money), and when I revive it a new IP address will be assigned from Amazon anyways. I wasn't planning on sharing the IP address that would let others access it.

In response to 2: Yeah, probably. I've never used docker before, tho.

I'm overall very satisfied with the result. One huge advantage of the AWS EC2 approach is it's not tying my IP address to the scraping activity in Google's eyes. Pretty paranoid about getting banned from Google.

@gosom
Copy link
Owner

gosom commented Oct 21, 2024

Even if you do not sharing the IP this is still not safe. People might break into your server.

I recommend in the firewall just to allow connections from your IP address.

Additionally, you might consider using proxies if you want to mask your IP address.

In any case the tool is for educational purposes only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants