Selenium+PhantomJS+Xpath抓取网页JS内容 #40

lovecn · 2016-11-20T09:03:04Z

from http://www.zhidaow.com/post/selenium-phantomjs-xpath
sudo pip install selenium
sudo apt-get install PhantomJS

Selenium下载地址：https://pypi.python.org/pypi/selenium#downloads
PhantomJS下载地址：http://phantomjs.org/download.html
PhantomJs可以看作一个没有页面的浏览器，有渲染引擎（QtWebkit）和JS引擎（JavascriptCore）。PhantomJs有DOM渲染，JS运行，网络访问，网页截图等多个功能。

使用PhantomJS，而不用Chromedriver和firefox，主要是因为PhantomJS的静默方式（后台运行，不打开浏览器）。
from selenium import webdriver

browser = webdriver.PhantomJS('D:\phantomjs.exe') #浏览器初始化；Win下需要设置phantomjs路径，linux下置空即可
url = 'http://www.zhidaow.com' # 设置访问路径
browser.get(url) # 打开网页
title = browser.find_elements_by_xpath('//h2') # 用xpath获取元素

for t in title: # 遍历输出
print t.text # 输出其中文本
print t.get_attribute('class') # 输出属性值

browser.quit() # 关闭浏览器。当出现异常时记得在任务浏览器中关闭PhantomJS，因为会有多个PhantomJS在运行状态，影响电脑性能

from selenium import webdriver

browser = webdriver.PhantomJS('D:\phantomjs.exe')
url = 'http://www.aizhan.com/siteall/tuniu.com/'
browser.get(url)
table = browser.find_elements_by_xpath('//*[@id="history1"]/table/tbody/tr[1]') # 用Xpath获取table元素

for t in table:
print t.text

browser.quit()

lovecn · 2016-11-20T09:06:07Z

PHP蜘蛛爬虫开发文档https://doc.phpspider.org/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Selenium+PhantomJS+Xpath抓取网页JS内容 #40

Selenium+PhantomJS+Xpath抓取网页JS内容 #40

lovecn commented Nov 20, 2016

lovecn commented Nov 20, 2016

Selenium+PhantomJS+Xpath抓取网页JS内容 #40

Selenium+PhantomJS+Xpath抓取网页JS内容 #40

Comments

lovecn commented Nov 20, 2016

lovecn commented Nov 20, 2016