659441806 发表于 2017-4-6 08:41:34

使用php simple html dom parser解析html标签

  用了一下

PHP Simple HTML DOM Parser


  解析HTML页面,感觉还不错,它能创建一个DOM tree方便你解析html里面的内容。用来抓东西挺好的。
  附带一个例子,你也到sourceforge下载压缩包看里面的例子:



<!---->
<!----><!---->
Scraping data with PHP Simple HTML DOM Parser


<!---->  
http://www.bitrepository.com/wp-content/themes/simple-blue.2.5/simple-blue/images/stumbleupon-icon.png
 Stumble Upon it!


 
<!---->  
  http://www.bitrepository.com/wp-content/themes/simple-blue.2.5/simple-blue/images/delicious-icon.png
 Save to Del.icio.us
  (9saves)


 
<!---->  
http://www.bitrepository.com/wp-content/themes/simple-blue.2.5/simple-blue/images/twit-this.png
 Share on Twitter!


<!---->
8
comments


<!---->  PHP Simple HTML DOM Parser
,
written in PHP5+, allows you to manipulate HTML in a very easy way.
Supporting invalid HTML, this parser is better then other PHP scripts
using complicated regexes to extract information from web pages.
  Before getting the necessary info, a DOM should be created from
either URL or file. The following script extracts links & images
from a website:

view plain
copy to clipboard
print
?





[*]// Create DOM from URL or file
  

[*]$html
 = file_get_html(
'http://www.microsoft.com/'
);  

[*]  
[*]// Extract links
  

[*]foreach
(
$html
->find(
'a'

as
 
$element
)  

[*]       echo
 
$element
->href . 
'<br>'
;   

[*]  
[*]// Extract images
  

[*]foreach
(
$html
->find(
'img'

as
 
$element
)  

[*]       echo
 
$element
->src . 
'<br>'
;  




// Create DOM from URL or file
$html = file_get_html('http://www.microsoft.com/');
// Extract links
foreach($html->find('a') as $element)
echo $element->href . '<br>';
// Extract images
foreach($html->find('img') as $element)
echo $element->src . '<br>';

  The parser can also be used to modify HTML elements:

view plain
copy to clipboard
print
?





[*]// Create DOM from string
  

[*]$html
 = str_get_html(
'<div id="simple">Simple</div><div id="parser">Parser</div>'
);  

[*]  
[*]$html
->find(
'div'
, 1)->
class
 = 
'bar'
;  

[*]  
[*]$html
->find(
'div'
, 0)->innertext = 
'Foo'
;  

[*]  
[*]// Output: <div id="simple">Foo</div><div id="parser" class="bar">Parser</div>
  

[*]echo
 
$html
;  




// Create DOM from string
$html = str_get_html('<div id="simple">Simple</div><div id="parser">Parser</div>');
$html->find('div', 1)->class = 'bar';
$html->find('div', 0)->innertext = 'Foo';
// Output: <div id="simple">Foo</div><div id="parser" class="bar">Parser</div>
echo $html;

  Do you wish to retrieve content without any tags?

view plain
copy to clipboard
print
?





[*]echo
 file_get_html(
'http://www.yahoo.com/'
)->plaintext;  




echo file_get_html('http://www.yahoo.com/')->plaintext;
  In the package files of this parser
(http://simplehtmldom.sourceforge.net/) you can find some scraping
examples from digg, imdb, slashdot. Let’s create one that extracts the
first 10 results (titles only) for the keyword “php” from Google:

view plain
copy to clipboard
print
?





[*]$url
 = 
'http://www.google.com/search?hl=en&q=php&btnG=Search'
;  

[*]  
[*]// Create DOM from URL
  

[*]$html
 = file_get_html(
$url
);  

[*]  
[*]// Match all 'A' tags that have the class attribute equal with 'l'
  

[*]foreach
(
$html
->find(
'a'

as
 
$key
 => 
$info
)  

[*]{  
[*]echo
 (
$key
 + 1).
'. '
.
$info
->plaintext.
"<br />\n"
;  

[*]}  



$url = 'http://www.google.com/search?hl=en&q=php&btnG=Search';
// Create DOM from URL
$html = file_get_html($url);
// Match all 'A' tags that have the class attribute equal with 'l'
foreach($html->find('a') as $key => $info)
{
echo ($key + 1).'. '.$info->plaintext."<br />\n";
}
  NOTEMake sure to include the parser before using any functions of it:

view plain
copy to clipboard
print
?





[*]include
 
'simple_html_dom.php'
;  




include 'simple_html_dom.php';
  For more information regarding the usage of this function consider
checking the ‘PHP Simple HTML Dom Parser’ Manual. To download the
package files use the following URL: http://sourceforge.net/project/showfiles.php?group_id=218559
.
页: [1]
查看完整版本: 使用php simple html dom parser解析html标签