13916729435 发表于 2015-12-26 17:09:32

Parsing HTML using Perl

  As for this Task, we(students) were not allowed to use those built-in HTML parsers/modules in Perl; we need to parse the HTML file by writing our own regular expression(regex) functions.
  

  First I did some research:
  
  http://www.degraeve.com/tutorials/tutorial02.php
  This tutorial basically talks about some fundamental stuff, not really helpful, but you will know how to handle the POST form value.(split, name and value, get rid of weird characters, convert letters, etc.)
  

  http://www.perlmonks.org/?node_id=585311
  A guy asked how to recover HTML file using Perl. Some one suggested him to use the HTML::TokenParser
  This might be a overkill for this project; but this might be useful in future task.
  

  http://www.perl.com/pub/2006/01/19/analyzing_html.html
  Again, this guy(a teacher seems), used HTML::TreeBuilder to construct a tree structure and then did the parsing.
  
  http://www.foo.be/docs/tpj/issues/vol5_1/tpj0501-0003.html
  HTML::PARSER
  This is another module from CPAN
  
  Okay, it seems that lots of people have done that task before--.

  
  
  I NEED TO FIND SOMETHING THAT TEACH ME HOW TO BUILD A PARSE FROM SCRATCH!!!

  http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html

  
  Remove tags
  http://www.linuxquestions.org/questions/programming-9/perl-split-on-html-tag-89905/
  
  
页: [1]
查看完整版本: Parsing HTML using Perl