Google's Python Class 5 (Python Dict and File)

zycchen · 发表于 2017-4-27 07:39:26

　　原文:http://code.google.com/edu/languages/google-python-class/dict-files.html

Python Dict and File

Google Code University › Programming Languages

Dict Hash Table

Python有一个高效的哈希表数据结构:"dict"(字典). Dict可以使用花括号包含一组key-value值(类似JSON), 如 dict = {key1:value1, key2:value2, ... }. {}可以表示一个空的字典.
使用如dict['foo']的结构形式可以访问字典中的数据。Strings, numbers, 和 tuples(元组) 可以用作key，所有类型都可以用作value. 使用上述类型以外的类型作为key有可能引发问题（这是因为上述类型都是不可变的，作为key比较合适）.直接使用 dict['a']访问一个不存在的字典元素会抛出一个Error，所以访问之前需要使用in或者get(key)判断是否存在.

## Can build up a dict by starting with the the empty dict {}
## and storing key/value pairs into the dict like this:
## dict[key] = value-for-that-key
dict = {}
dict['a'] = 'alpha'
dict['g'] = 'gamma'
dict['o'] = 'omega'
print dict ## {'a': 'alpha', 'o': 'omega', 'g': 'gamma'}
print dict['a'] ## Simple lookup, returns 'alpha'
dict['a'] = 6 ## Put new key/value into dict
'a' in dict ## True
## print dict['z'] ## Throws KeyError
if 'z' in dict: print dict['z'] ## Avoid KeyError
print dict.get('z') ## None (instead of KeyError)

for循环默认会遍历字典的key. 这些key的顺序是不确定的. dict.keys() 和 dict.values()会返回键或值的序列. 字典有一个items()方法，可以把key-value转换成一个数组，这个数组中的每个元素都是两个元素形式的元组，这是遍历操作所有元素的最有效方法。

## By default, iterating over a dict iterates over its keys.
## Note that the keys are in a random order.
for key in dict: print key
## prints a g o

## Exactly the same as above
for key in dict.keys(): print key
## Get the .keys() list:
print dict.keys() ## ['a', 'o', 'g']
## Likewise, there's a .values() list of values
print dict.values() ## ['alpha', 'omega', 'gamma']
## Common case -- loop over the keys in sorted order,
## accessing each key/value
for key in sorted(dict.keys()):
print key, dict[key]

## .items() is the dict expressed as (key, value) tuples
print dict.items() ## [('a', 'alpha'), ('o', 'omega'), ('g', 'gamma')]
## This loop syntax accesses the whole dict by looping
## over the .items() tuple list, accessing one (key, value)
## pair on each iteration.
for k, v in dict.items(): print k, '>', v
## a > alpha o > omega g > gamma
有很多"iter"形式的变体函数，如iterkeys(), itervalues() 和 iteritems(),他们可以避免构造新的list -- 当数据量比较大的时候效果明显. 但我依然建议使用keys() 和 values() ,因为他们的名字比较有意义. 并且在Python3000以后性能有了较大改善，iterkeys不再需要了。
提示:从性能上考虑，字典是你应当使用的最有效的工具，在任何需要数据操作、传递的时候尽量使用.

Dict 格式化

% 可以方便的通过字典进行别名调用赋值:

hash = {}
hash['word'] = 'garfield'
hash['count'] = 42
s = 'I want %(count)d copies of %(word)s' % hash # %d for int, %s for string
# 'I want 42 copies of garfield'
Del

"del" 可以删除变量定义，同时也可以删除list或其他集合中的数据:

var = 6
del var # var no more!

list = ['a', 'b', 'c', 'd']
del list[0] ## Delete first element
del list[-2:] ## Delete last two elements
print list ## ['b']
dict = {'a':1, 'b':2, 'c':3}
del dict['b'] ## Delete 'b' entry
print dict ## {'a':1, 'c':3}
Files

open()函数会打开一个文件，按照正常的方法我们可以进行读写操作. f = open('name', 'r') 按照只读方式打开文件并把指针交给f，在结束操作后调用f.close()关闭文件。'r'、'w'、'a'分别表示读写和追加。'rU'被称作"Universal"模式，它可以自动在每一行结尾添加一个简单的 '\n'来进行换行，这样你就不需要手动追加了.for循环是最有效的遍历文件所有行的方法:

# Echo the contents of a file
f = open('foo.txt', 'rU')
for line in f: ## iterates over the lines of the file
print line, ## trailing , so print does not add an end-of-line char
## since 'line' already includes the end-of line.
f.close()
Reading one line at a time has the nice quality that not all the file needs to fit in memory at one time -- handy if you want to look at every line in a 10 gigabyte file without using 10 gigabytes of memory. The f.readlines() method reads the whole file into memory and returns its contents as a list of its lines. The f.read() method reads the whole file into a single string, which can be a handy way to deal with the text all at once, such as with regular expressions we'll see later.
For writing, f.write(string) method is the easiest way to write data to an open output file. Or you can use "print" with an open file, but the syntax is nasty: "print >> f, string". In python 3000, the print syntax will be fixed to be a regular function call with a file= optional argument: "print(string, file=f)".

Unicode文件

"codecs"模块提供访问unicode编码文件的功能:

import codecs
f = codecs.open('foo.txt', 'rU', 'utf-8')
for line in f:
# 这里的line表示了一个unicode编码字符串
For writing, use f.write() since print does not fully support unicode.

持续改进编程

编写一个Python程序的时候，不要尝试一次性把所有东西写完. 试着把程序分成很多里程碑, 比如 "第一步是从这些词中提取list" ，这样逐步对程序进行完善并测试，这是python的编写后直接运行带给我们的优势，我们要尽情使用。

账号		自动登录	找回密码
密码			立即注册

大疆运维招人啦，

C++ :try 语句块和异常处理

C++的多态

Red Hat RHCE 8 (EX294) Cert Guide

Java/C++ 区别：看完这一篇，就够用！

别再用过时库了！这 13 个顶级 C++ 库才是

c++ size_t 和 int 的区别

[经验分享] Google's Python Class 5 (Python Dict and File)

浏览过的版块

扫码加入运维网微信交流群