<!---->
<!----><!----> Scraping data with PHP Simple HTML DOM Parser
<!---->
Stumble Upon it!
<!---->
Save to Del.icio.us
(9saves)
<!---->
Share on Twitter!
<!---->
8
comments
<!----> PHP Simple HTML DOM Parser
,
written in PHP5+, allows you to manipulate HTML in a very easy way.
Supporting invalid HTML, this parser is better then other PHP scripts
using complicated regexes to extract information from web pages.
Before getting the necessary info, a DOM should be created from
either URL or file. The following script extracts links & images
from a website:
echo file_get_html('http://www.yahoo.com/')->plaintext;
In the package files of this parser
(http://simplehtmldom.sourceforge.net/) you can find some scraping
examples from digg, imdb, slashdot. Let’s create one that extracts the
first 10 results (titles only) for the keyword “php” from Google:
$url = 'http://www.google.com/search?hl=en&q=php&btnG=Search';
// Create DOM from URL
$html = file_get_html($url);
// Match all 'A' tags that have the class attribute equal with 'l'
foreach($html->find('a[class=l]') as $key => $info)
{
echo ($key + 1).'. '.$info->plaintext."<br />\n";
} NOTEMake sure to include the parser before using any functions of it:
view plain
copy to clipboard
print
?
include
'simple_html_dom.php'
;
include 'simple_html_dom.php';
For more information regarding the usage of this function consider
checking the ‘PHP Simple HTML Dom Parser’ Manual. To download the
package files use the following URL: http://sourceforge.net/project/showfiles.php?group_id=218559
.