沈阳格力专卖店 发表于 2015-4-27 10:57:59

Python之lxml库学习笔记二

  使用XPath查找文本
  另一个抽取XML树的文本内容是XPath,
>>> print(html.xpath("string()")) #lxml.etree only!
TEXTTAIL
>>> print(html.xpath("//text()")) #lxml.etree only!
[’TEXT’, ’TAIL’]
  如果经常使用,可以包装成一个方法:
  >>> build_text_list = etree.XPath("//text()") # lxml.etreeonly!
>>> print(build_text_list(html))
[’TEXT’, ’TAIL’]
  也可以通过getparent方法得到父节点
  >>> texts = build_text_list(html)
>>>print(texts)
TEXT
>>> parent =texts.getparent()
>>> print(parent.tag)
body
>>>print(texts)
TAIL
>>>print(texts.getparent().tag)
br
You can also find out if it’s normaltext content or tail text:
>>>print(texts.is_text)
True
>>>print(texts.is_text)
False
>>> print(texts.is_tail)
True
  
  树的迭代:
  Elements提供一个树的迭代器可以迭代访问树的元素。
  >>> root = etree.Element("root")
>>>etree.SubElement(root, "child").text = "Child 1"
>>>etree.SubElement(root, "child").text = "Child 2"
>>>etree.SubElement(root, "another").text = "Child 3"
>>>print(etree.tostring(root,pretty_print=True))

Child1
Child 2
Child3

  >>> for element in root.iter():
... print("%s - %s" %(element.tag, element.text))
root – None
child - Child 1
child - Child2
another - Child 3
  如果知道感兴趣的tag,可以把tag的名字传给iter方法,起到过滤作用。
  >>> for element in root.iter("child"):
... print("%s - %s" %(element.tag, element.text))
child - Child 1
child - Child 2
  默认情况下,迭代器得到一个树的所有节点,包括ProcessingInstructions, Comments andEntity的实例。如果想确认只有Elements对象返回,可以把Element factory作为参数传入。
  >>> root.append(etree.Entity("#234"))
>>>root.append(etree.Comment("some comment"))
>>> for element inroot.iter():
... if isinstance(element.tag, basestring):
... print("%s -%s" % (element.tag, element.text))
... else:
... print("SPECIAL: %s - %s"% (element, element.text))
root - None
child - Child 1
child - Child2
another - Child 3
SPECIAL: ê - ê
SPECIAL:   - some comment

  >>> for element in root.iter(tag=etree.Element):
... print("%s -%s" % (element.tag, element.text))
root - None
child - Child 1
child -Child 2
another - Child 3
>>> for element inroot.iter(tag=etree.Entity):
... print(element.text)
ê
页: [1]
查看完整版本: Python之lxml库学习笔记二