Parsing HTML using Perl

13916729435 发表于 2015-12-26 17:09:32

　　As for this Task, we(students) were not allowed to use those built-in HTML parsers/modules in Perl; we need to parse the HTML file by writing our own regular expression(regex) functions.
　　

　　First I did some research:
　　
　　http://www.degraeve.com/tutorials/tutorial02.php
　　This tutorial basically talks about some fundamental stuff, not really helpful, but you will know how to handle the POST form value.(split, name and value, get rid of weird characters, convert letters, etc.)
　　

　　http://www.perlmonks.org/?node_id=585311
　　A guy asked how to recover HTML file using Perl. Some one suggested him to use the HTML::TokenParser
　　This might be a overkill for this project; but this might be useful in future task.
　　

　　http://www.perl.com/pub/2006/01/19/analyzing_html.html
　　Again, this guy(a teacher seems), used HTML::TreeBuilder to construct a tree structure and then did the parsing.
　　
　　http://www.foo.be/docs/tpj/issues/vol5_1/tpj0501-0003.html
　　HTML::PARSER
　　This is another module from CPAN
　　
　　Okay, it seems that lots of people have done that task before--.

　　
　　
　　I NEED TO FIND SOMETHING THAT TEACH ME HOW TO BUILD A PARSE FROM SCRATCH!!!

　　http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html

　　
　　Remove tags
　　http://www.linuxquestions.org/questions/programming-9/perl-split-on-html-tag-89905/
　　
　　

页: [1]

运维网's Archiver

Parsing HTML using Perl