Skip to content

Comparison with other parsers (1.0)

Imangazaliev edited this page Mar 25, 2016 · 1 revision

Parsers

Features

DiDom

  • Ability to search inside an element

DomCrawler

  • Easy work with forms

Simple HTML DOM

  • Ability to search inside an element
  • Ability to change the original document

Memory usage and execution speed

Test conditions:

  1. All parsers are being installed via Composer packet manager
  2. Task for parsers - receive all elements by a certain selector from HTML and put their text content out

Three files should be parsed in total:

  1. 100 elements ~ 1000 strings ~ 60 KB
  2. 1000 elements ~ 10000 strings ~ 600 KB
  3. 5000 elements ~ 50000 strings ~ 3000 KB

Script for testing:

require 'vendor/autoload.php';

$filepath = __DIR__.'/../files/'.$argv[1].'.html';
$html = file_get_contents($filepath);

$startMemory = memory_get_usage();
$startTime = microtime(true);

// parsing code

$time = microtime(true) - $startTime;
$memory = memory_get_usage() - $startMemory;

file_put_contents(__DIR__.'/time.txt', $time . PHP_EOL, FILE_APPEND);
file_put_contents(__DIR__.'/memory.txt', $memory . PHP_EOL, FILE_APPEND);

Test results

Memory consumption (bytes)

100 elements
  • Nokogiri - 125152
  • DiDom - 177024
  • Zend Dom - 217120
  • DomCrawler - 663632
  • Simple HTML DOM - 3093680
1000 elements
  • Nokogiri - 1033264
  • DiDom - 1671312
  • Zend Dom - 1674328
  • DomCrawler - 1823800
  • Simple HTML DOM - 28418904
5000 elements
  • Nokogiri - 4798024
  • DiDom - 7662488
  • Zend Dom - 7880344
  • DomCrawler - 9372056
  • Simple HTML DOM - out of memory

Time elapsed (seconds)

100 elements
  • DiDom - 0.016069
  • Nokogiri - 0.019341
  • DomCrawler - 0.032912
  • Zend Dom - 0.213485
  • Simple HTML DOM - 0.415827
1000 elements
  • DiDom - 0.2003519
  • Nokogiri - 0.2230408
  • DomCrawler - 0.2524929
  • Simple HTML DOM - 1.518197
  • Zend Dom - 5.382406
5000 elements
  • DiDom - 3.876836
  • Nokogiri - 4.875494
  • DomCrawler - 11.90424
  • Zend Dom - 179.160342
  • Simple HTML DOM - out of memory