设为首页 收藏本站
查看: 710|回复: 0

[经验分享] Python自然语言处理学习笔记(13):2.5 WordNet

[复制链接]

尚未签到

发表于 2015-4-25 10:28:11 | 显示全部楼层 |阅读模式
  原创翻译,如需转载,请与博主联系:yuxcer@126.com
  新手上路,翻译不恰之处,恳请指出,不胜感谢
  2.5 WordNet
  WordNet is a semantically oriented dictionary of English, similar to a traditional thesaurus(辞典)but with a richer structure. NLTK includes the English WordNet, with 155,287 words and 117,659 synonym(同义词)sets. We’ll begin by looking at synonyms and how they are accessed in WordNet.
  Senses and Synonyms 意义和同义词
  Consider the sentence in (1a). If we replace the word motorcar in (1a) with automobile, to get (1b), the meaning of the sentence stays pretty much the same:
  (1) a. Benz is credited with the invention of the motorcar.
  b. Benz is credited with the invention of the automobile.
  Since everything else in the sentence has remained unchanged, we can conclude that the words motorcar and automobile have the same meaning, i.e., they are synonyms.
  We can explore these words with the help of WordNet:
  >>> from nltk.corpus import wordnet as wn
  >>> wn.synsets('motorcar')
  [Synset('car.n.01')]
  Thus, motorcar has just one possible meaning and it is identified as car.n.01, the first noun sense of car. The entity car.n.01 is called a synset, or “synonym set,”(同义词集)a collection of synonymous words (or “lemmas”):
  >>> wn.synset('car.n.01').lemma_names
  ['car', 'auto', 'automobile', 'machine', 'motorcar']
  Each word of a synset can have several meanings, e.g., car can also signify a train carriage, a gondola(货车), or an elevator car. However, we are only interested in the single meaning that is common to all words of this synset. Synsets also come with a prose(平凡的) definition and some example sentences:
  >>> wn.synset('car.n.01').definition
  'a motor vehicle with four wheels; usually propelled by an internal combustion engine(内燃机)'
  >>> wn.synset('car.n.01').examples
  ['he needs a car to get to work']
  Although definitions help humans to understand the intended meaning of a synset, the words of the synset are often more useful for our programs. To eliminate ambiguity, we will identify these words as car.n.01.automobile, car.n.01.motorcar, and so on. This pairing of a synset with a word is called a lemma(一个同义词集的单词配对称为词条). We can get all the lemmas for a given synset①, look up a particular lemma②, get the synset corresponding to a lemma③, and get the “name” of a lemma④:
  >>> wn.synset('car.n.01').lemmas ①
  [Lemma('car.n.01.car'), Lemma('car.n.01.auto'), Lemma('car.n.01.automobile'),
  Lemma('car.n.01.machine'), Lemma('car.n.01.motorcar')]
  >>> wn.lemma('car.n.01.automobile') ②
  Lemma('car.n.01.automobile')
  >>> wn.lemma('car.n.01.automobile').synset ③
  Synset('car.n.01')
  >>> wn.lemma('car.n.01.automobile').name ④
  'automobile'
  Unlike the words automobile and motorcar, which are unambiguous and have one synset, the word car is ambiguous, having five synsets:
  >>> wn.synsets('car')
  [Synset('car.n.01'), Synset('car.n.02'), Synset('car.n.03'), Synset('car.n.04'),
  Synset('cable_car.n.01')]
  >>> for synset in wn.synsets('car'):
  ... print synset.lemma_names
  ...
  ['car', 'auto', 'automobile', 'machine', 'motorcar']
  ['car', 'railcar', 'railway_car', 'railroad_car']
  ['car', 'gondola']
  ['car', 'elevator_car']
  ['cable_car', 'car']
  For convenience, we can access all the lemmas involving the word car as follows:
  >>> wn.lemmas('car')
  [Lemma('car.n.01.car'), Lemma('car.n.02.car'), Lemma('car.n.03.car'),
  Lemma('car.n.04.car'), Lemma('cable_car.n.01.car')]
  Your Turn: Write down all the senses of the word dish that you can think of. Now, explore this word with the help of WordNet, using the same operations shown earlier.
  The WordNet Hierarchy WordNet层次结构
  WordNet synsets correspond to abstract concepts(抽象的概念), and they don’t always have corresponding words in English. These concepts are linked together in a hierarchy. Some concepts are very general, such as Entity, State, Event; these are called unique beginnersor root synsets(根同义词集). Others, such as gas guzzler(油老虎) and hatchback(带掀式后背的小轿车), are much more specific. A small portion of a concept hierarchy is illustrated in Figure 2-8.
DSC0000.jpg
  Figure 2-8. Fragment of WordNet concept hierarchy: Nodes correspond to synsets; edges indicate the hypernym(上位词)/hyponym(下位词) relation, i.e., the relation between superordinate(同hypernym) and subordinate(从属的) concepts
  WordNet makes it easy to navigate between concepts. For example, given a concept like motorcar, we can look at the concepts that are more specific—the (immediate) hyponyms.
  >>> motorcar = wn.synset('car.n.01')
  >>> types_of_motorcar = motorcar.hyponyms()
  >>> types_of_motorcar[26]
  Synset('ambulance.n.01')
  >>> sorted([lemma.name for synset in types_of_motorcar for lemma in synset.lemmas])
  ['Model_T', 'S.U.V.', 'SUV', 'Stanley_Steamer', 'ambulance', 'beach_waggon',
  'beach_wagon', 'bus', 'cab', 'compact', 'compact_car', 'convertible',
  'coupe', 'cruiser', 'electric', 'electric_automobile', 'electric_car',
  'estate_car', 'gas_guzzler', 'hack', 'hardtop', 'hatchback', 'heap',
  'horseless_carriage', 'hot-rod', 'hot_rod', 'jalopy', 'jeep', 'landrover',
  'limo', 'limousine', 'loaner', 'minicar', 'minivan', 'pace_car', 'patrol_car',
  'phaeton', 'police_car', 'police_cruiser', 'prowl_car', 'race_car', 'racer',
  'racing_car', 'roadster', 'runabout', 'saloon', 'secondhand_car', 'sedan',
  'sport_car', 'sport_utility', 'sport_utility_vehicle', 'sports_car', 'squad_car',
  'station_waggon', 'station_wagon', 'stock_car', 'subcompact', 'subcompact_car',
  'taxi', 'taxicab', 'tourer', 'touring_car', 'two-seater', 'used-car', 'waggon',
  'wagon']
  We can also navigate up(浏览) the hierarchy by visiting hypernyms. Some words have multiple paths, because they can be classified in more than one way. There are two paths between car.n.01 and entity.n.01 because wheeled_vehicle.n.01 can be classified as both a vehicle and a container.
  >>> motorcar.hypernyms()
  [Synset('motor_vehicle.n.01')]
  >>> paths = motorcar.hypernym_paths()
  >>> len(paths)
  2
  >>> [synset.name for synset in paths[0]]
  ['entity.n.01', 'physical_entity.n.01', 'object.n.01', 'whole.n.02', 'artifact.n.01',
  'instrumentality.n.03', 'container.n.01', 'wheeled_vehicle.n.01',
  'self-propelled_vehicle.n.01', 'motor_vehicle.n.01', 'car.n.01']
  >>> [synset.name for synset in paths[1]]
  ['entity.n.01', 'physical_entity.n.01', 'object.n.01', 'whole.n.02', 'artifact.n.01',
  'instrumentality.n.03', 'conveyance.n.03', 'vehicle.n.01', 'wheeled_vehicle.n.01',
  'self-propelled_vehicle.n.01', 'motor_vehicle.n.01', 'car.n.01']
  We can get the most general hypernyms (or root hypernyms) of a synset as follows:
  >>> motorcar.root_hypernyms()
  [Synset('entity.n.01')]
  Your Turn:Try out NLTK’s convenient graphical WordNet browser: nltk.app.wordnet(). Explore the WordNet hierarchy by following the hypernym and hyponym links.
  More Lexical Relations 更多词汇关系
  Hypernyms and hyponyms are called lexical relations(词汇关系) because they relate one synset to another. These two relations navigate up and down the “is-a” hierarchy. Another important way to navigate the WordNet network is from items to their components (meronyms 部分) or to the things they are contained in (holonyms整体). For example, the parts of a tree are its trunk(树干), crown(树冠), and so on; these are the part_meronyms(). The substance(实质) a tree is made of includes heartwood(心材) and sapwood(边材), i.e., the substance_meronyms(). A collection of trees forms a forest, i.e., the member_holonyms():
  >>> wn.synset('tree.n.01').part_meronyms()
  [Synset('burl.n.02'), Synset('crown.n.07'), Synset('stump.n.01'),
  Synset('trunk.n.01'), Synset('limb.n.02')]
  >>> wn.synset('tree.n.01').substance_meronyms()
  [Synset('heartwood.n.01'), Synset('sapwood.n.01')]
  >>> wn.synset('tree.n.01').member_holonyms()
  [Synset('forest.n.01')]
  To see just how intricate(复杂的) things can get, consider the word mint, which has several closely related senses. We can see that mint.n.04 is part of mint.n.02 and the substance from which mint.n.05 is made.
  >>> for synset in wn.synsets('mint', wn.NOUN):
  ... print synset.name + ':', synset.definition
  ...
  batch.n.02: (often followed by `of') a large number or amount or extent
  mint.n.02: any north temperate(北温带)plant of the genus Mentha with aromatic leaves and
  small mauve(淡紫色) flowers
  mint.n.03: any member of the mint family of plants
  mint.n.04: the leaves of a mint plant used fresh or candied
  mint.n.05: a candy(糖果) that is flavored(加味)with a mint oil
  mint.n.06: a plant where money is coined by authority of the government
  >>> wn.synset('mint.n.04').part_holonyms()
  [Synset('mint.n.02')]
  >>> wn.synset('mint.n.04').substance_holonyms()
  [Synset('mint.n.05')]
  There are also relationships between verbs. For example, the act of walking involves the act of stepping, so walking entails(蕴涵) stepping. Some verbs have multiple entailments:
  >>> wn.synset('walk.v.01').entailments()
  [Synset('step.v.01')]
  >>> wn.synset('eat.v.01').entailments()
  [Synset('swallow.v.01'), Synset('chew.v.01')]
  >>> wn.synset('tease.v.03').entailments()
  [Synset('arouse.v.07'), Synset('disappoint.v.01')]
  Some lexical relationships hold between lemmas, e.g., antonymy(反义词组):
  >>> wn.lemma('supply.n.02.supply').antonyms()
  [Lemma('demand.n.02.demand')]
  >>> wn.lemma('rush.v.01.rush').antonyms()
  [Lemma('linger.v.04.linger')]
  >>> wn.lemma('horizontal.a.01.horizontal').antonyms()
  [Lemma('vertical.a.01.vertical'), Lemma('inclined.a.02.inclined')]
  >>> wn.lemma('staccato.r.01.staccato').antonyms() 不连续的
  [Lemma('legato.r.01.legato')] 连奏的
  You can see the lexical relations, and the other methods defined on a synset, using
  dir(). For example, try dir(wn.synset('harmony.n.02')).
  Semantic Similarity 语义相似度
  We have seen that synsets are linked by a complex network of lexical relations. Given a particular synset, we can traverse the WordNet network to find synsets with related meanings. Knowing which words are semantically related is useful for indexing a collection of texts, so that a search for a general term such as vehicle will match documents containing specific terms such as limousine(豪华轿车).
  Recall that each synset has one or more hypernym paths that link it to a root hypernym such as entity.n.01. Two synsets linked to the same root may have several hypernyms in common(共同之处) (see Figure 2-8). If two synsets share a very specific hypernym—one that is low down(实情?) in the hypernym hierarchy—they must be closely related.
  >>> right = wn.synset('right_whale.n.01')
  >>> orca = wn.synset('orca.n.01')
  >>> minke = wn.synset('minke_whale.n.01')
  >>> tortoise = wn.synset('tortoise.n.01')
  >>> novel = wn.synset('novel.n.01')
  >>> right.lowest_common_hypernyms(minke)
  [Synset('baleen_whale.n.01')]
  >>> right.lowest_common_hypernyms(orca)
  [Synset('whale.n.02')]
  >>> right.lowest_common_hypernyms(tortoise)
  [Synset('vertebrate.n.01')]
  >>> right.lowest_common_hypernyms(novel)
  [Synset('entity.n.01')]
  Of course we know that whale is very specific (and baleen whale even more so), whereas vertebrate is more general and entity is completely general. We can quantify this concept of generality by looking up the depth of each synset:
  >>> wn.synset('baleen_whale.n.01').min_depth()
  14
  >>> wn.synset('whale.n.02').min_depth()
  13
  >>> wn.synset('vertebrate.n.01').min_depth()
  8
  >>> wn.synset('entity.n.01').min_depth()
  0
  Similarity measures have been defined over the collection of WordNet synsets that incorporate this insight. For example, path_similarityassigns a score in the range 0–1 based on the shortest path that connects the concepts in the hypernym hierarchy (-1 is returned in those cases where a path cannot be found 没有路径就返回-1). Comparing a synset with itself will return 1(与自己比较返回1). Consider the following similarity scores, relating right whale(露脊鲸) to minke whale(小须鲸
  ), orca(逆戟鲸), tortoise, and novel. Although the numbers won’t mean much, they decrease as we move away from the semantic space(语义空间) of sea creatures to inanimate objects(静物).
  >>> right.path_similarity(minke)
  0.25
  >>> right.path_similarity(orca)
  0.16666666666666666
  >>> right.path_similarity(tortoise)
  0.076923076923076927
  >>> right.path_similarity(novel)
  0.043478260869565216
  Several other similarity measures are available; you can type help(wn) for more information. NLTK also includes VerbNet, a hierarchical verb lexicon linked to WordNet. It can be accessed with nltk.corpus.verb net.

运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-60492-1-1.html 上篇帖子: python的接口和抽象类 下篇帖子: python threading获取线程函数返回值
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表