pandas 读写sql数据库

star870126 · 发表于 2018-10-18 13:07:28

　　如何从数据库中读取数据到DataFrame中？
　　使用pandas.io.sql模块中的sql.read_sql_query(sql_str,conn)和sql.read_sql_table(table_name,conn)就好了。
　　第一个是使用sql语句，第二个是直接将一个table转到dataframe中。
　　pandas提供这这样的接口完成此工作——read_sql()。下面我们用离子来说明这个方法。
　　我们要从sqlite数据库中读取数据，引入相关模块

　　read_sql接受两个参数，一个是sql语句，这个你可能需要单独学习；一个是con（数据库连接）、read_sql直接返回一个DataFrame对象
　　打印一下，可以看到已经成功的读取了数据
　　我们还可以使用index_col参数来规定将那一列数据设置为index
　　结果输出为：
　　当然，我们可以设置多个index，只要将index_col的值设置为列表
　　输出结果为：
　　写入数据库也很简单，下面第二句用于删除数据库中已有的表"weather_2012"，然后将df保存到数据库中的"weather_2012"表
　　假如我们使用的是mysql数据库也没问题，我们只需要建立与mysql的连接即可，用下面的con代替上面的con可以达到的效果相同。

补充：
　　（1）DateFrane 可以将结果转换成DataFrame
　　import pandas as pd
　　import pymysql
　　conn = pymysql.connect(host='127.0.0.1', port=3306, user='root', passwd='123456', db='db1')
　　cursor = conn.cursor()
　　# cursor.execute("DROP TABLE IF EXISTS test")#必须用cursor才行
　　sql = "select * from user"
　　df = pd.read_sql(sql,conn,)
　　aa=pd.DataFrame(df)
　　print aa
　　（2）存储
　　pd.io.sql.write_frame(df, "user_copy", conn)#不能用已经移除
　　pd.io.sql.to_sql(piece, "user_copy", conn,flavor='mysql',if_exists='replace')#必须制定flavor='mysql'
　　#!/usr/bin/env python
　　# -*- coding:utf-8 -*-
　　import pandas as pd
　　import pymysql
　　conn = pymysql.connect(host='127.0.0.1', port=3306, user='root', passwd='123456', db='db1')
　　cursor = conn.cursor()
　　# cursor.execute("DROP TABLE IF EXISTS user_copy")#必须用cursor才行
　　sql = "select * from user"
　　df = pd.read_sql(sql,conn,chunksize=2)
　　for piece in df:
　　aa=pd.DataFrame(piece)
　　# pd.io.sql.write_frame(df, "user_copy", conn)#不能用已经移除
　　pd.io.sql.to_sql(piece, "user_copy", conn,flavor='mysql',if_exists='replace')#必须制定flavor='mysql'
　　(3)根据条件添加一列数据
　　piece['xb'] = list(map(lambda x: '男' if x == '123' else '女', piece['pwd']))
　　(4)如果有汉字，链接时必须知道字符类型 charset="utf8"
　　(5)最后实现代码（迭代读取数据，根据一列内容新增一列，）
　　#!/usr/bin/env python
　　# -*- coding:utf-8 -*-
　　import pandas as pd
　　import pymysql
　　conn = pymysql.connect(host='127.0.0.1', port=3306, user='root', passwd='123456', db='db1',charset="utf8")
　　cursor = conn.cursor()
　　# cursor.execute("DROP TABLE IF EXISTS user_copy")#必须用cursor才行
　　sql = "select * from user"
　　df = pd.read_sql(sql,conn,chunksize=2)
　　for piece in df:
　　# pd.io.sql.write_frame(df, "user_copy", conn)#不能用已经移除
　　piece['xb'] = list(map(lambda x: '男' if x == '123' else '女', piece['pwd']))
　　print(piece)
　　pd.io.sql.to_sql(piece, "user_copy", conn,flavor='mysql',if_exists='append')#必须制定flavor='mysql'
　　(7)sqlalchemy链接  需要制定一些中文 create_engine("mysql+pymysql://root:123456@127.0.0.1:3306/jd?", max_overflow=5)
　　# 用sqlalchemy链接
　　from sqlalchemy import create_engine
　　engine = create_engine("mysql+pymysql://root:123456@127.0.0.1:3306/db1?charset=utf8")
　　sql = "select * from user"
　　df = pd.read_sql(sql,engine,chunksize=2)
　　for piece in df:
　　print(piece)
　　pd.io.sql.to_sql(piece, "user_copy", engine, flavor='mysql', if_exists='append')
　　pandas 选取数据 iloc和 loc的用法不太一样，iloc是根据索引， loc是根据行的数值
　　>>> import pandas as pd
　　>>> import os
　　>>> os.chdir("D:\\")
　　>>> d = pd.read_csv("GWAS_water.qassoc", delimiter= "\s+")
　　>>> d.loc[1:3]
　　CHR SNP BP  NMISS BETA    SE    R2    T    P
　　1 1 .  447    44  0.1800  0.1783  0.02369  1.009  0.3185
　　2 1 .  449    44  0.2785  0.2473  0.02931  1.126  0.2665
　　3 1 .  452    44  0.1800  0.1783  0.02369  1.009  0.3185
　　>>> d.loc[0:3]
　　CHR SNP BP  NMISS BETA    SE    R2    T    P
　　0 1 .  410    44  0.2157  0.1772  0.03406  1.217  0.2304
　　1 1 .  447    44  0.1800  0.1783  0.02369  1.009  0.3185
　　2 1 .  449    44  0.2785  0.2473  0.02931  1.126  0.2665
　　3 1 .  452    44  0.1800  0.1783  0.02369  1.009  0.3185
　　>>> d.iloc[0:3]
　　CHR SNP BP  NMISS BETA    SE    R2    T    P
　　0 1 .  410    44  0.2157  0.1772  0.03406  1.217  0.2304
　　1 1 .  447    44  0.1800  0.1783  0.02369  1.009  0.3185
　　2 1 .  449    44  0.2785  0.2473  0.02931  1.126  0.2665
　　>>> d.iloc[1:3,2]
　　1 447
　　2 449
　　Name: BP, dtype: int64
　　>>> d.iloc[0:3,2]
　　0 410
　　1 447
　　2 449
　　Name: BP, dtype: int64
　　>>> d.head()
　　CHR SNP BP  NMISS BETA    SE    R2    T    P
　　0 1 .  410    44  0.2157  0.1772  0.03406  1.2170  0.2304
　　1 1 .  447    44  0.1800  0.1783  0.02369  1.0090  0.3185
　　2 1 .  449    44  0.2785  0.2473  0.02931  1.1260  0.2665
　　3 1 .  452    44  0.1800  0.1783  0.02369  1.0090  0.3185
　　4 1 .  462    44  0.2548  0.2744  0.02012  0.9286  0.3584
　　>>> d.tail(3)
　　CHR SNP       BP  NMISS BETA    SE    R2    T    P
　　418704 12 .  19345588    44 -0.2207  0.2558  0.01743 -0.8631  0.393
　　418705 12 .  19345598    44 -0.2207  0.2558  0.01743 -0.8631  0.393
　　418706 12 .  19345611    44 -0.2207  0.2558  0.01743 -0.8631  0.393
　　>>> d.describe()
　　CHR          BP    NMISS       BETA          SE  \
　　count  418707.000000  4.187070e+05  418707.0  4.186820e+05  418682.00000
　　mean       5.805738  1.442822e+07    44.0 -4.271777e-03    0.21433
　　std       3.392930  8.933882e+06    0.0  2.330019e-01    0.05190
　　min       1.000000  4.100000e+02    44.0 -1.610000e+00    0.10130
　　25%       3.000000  7.345860e+06    44.0 -1.638000e-01    0.17320
　　50%       5.000000  1.371612e+07    44.0 -1.826000e-16    0.20670
　　75%       9.000000  2.051322e+07    44.0  1.391000e-01    0.25010
　　max       12.000000  4.238896e+07    44.0  1.467000e+00    0.67580
　　R2          T          P
　　count  418682.000000  4.186820e+05  4.186820e+05
　　mean       0.026268 -1.910774e-02  4.772397e-01
　　std       0.035903  1.095115e+00  2.944290e-01
　　min       0.000000 -5.582000e+00  2.034000e-08
　　25%       0.002969 -7.955000e-01  2.179000e-01
　　50%       0.012930 -8.468000e-16  4.624000e-01
　　75%       0.035910  6.712000e-01  7.254000e-01
　　max       0.531200  6.898000e+00  1.000000e+00
　　>>> d.sort_values(by="P").iloc[0:15]
　　CHR SNP       BP  NMISS BETA    SE    R2    T          P
　　42870    1 .  32316680    44  1.1870  0.1721  0.5312  6.898  2.034000e-08
　　29301    1 .  22184568    44  1.1870  0.1721  0.5312  6.898  2.034000e-08
　　29302    1 .  22184590    44  1.1870  0.1721  0.5312  6.898  2.034000e-08
　　29306    1 .  22184654    44  1.1870  0.1721  0.5312  6.898  2.034000e-08
　　29305    1 .  22184628    44  1.1870  0.1721  0.5312  6.898  2.034000e-08
　　29304    1 .  22184624    44  1.1870  0.1721  0.5312  6.898  2.034000e-08
　　112212 3 .  14365699    44  1.4670  0.2255  0.5018  6.504  7.490000e-08
　　29254    1 .  22167448    44  1.0780  0.1723  0.4822  6.254  1.713000e-07
　　69291    2 . 9480651    44  1.1140  0.1829  0.4690  6.091  2.939000e-07
　　29299    1 .  22180991    44  0.8527  0.1458  0.4488  5.848  6.574000e-07
　　101391 3 . 6959715    44  0.6782  0.1166  0.4462  5.817  7.285000e-07
　　29333    1 .  22198267    44  0.9252  0.1616  0.4383  5.724  9.888000e-07
　　195513 5 .  20178388    44  1.0350  0.1817  0.4359  5.697  1.082000e-06
　　29295    1 .  22180901    44  0.7469  0.1320  0.4324  5.657  1.236000e-06
　　29300    1 .  22181119    44  0.7469  0.1320  0.4324  5.657  1.236000e-06
　　>>> sort_D = d.sort_values(by="P").iloc[0:5]
　　>>> m_D = d.dropna()          #remove NA
　　>>> sort_C = d.sort_values(["P","CHR", "BP"])
　　>>> sort_C.to_csv(file_name, sep='\t', encoding='utf-8')
　　>>> d.sort_values(by="C", ascending=True)
　　>>> sort_D.to_csv("result.txt", sep= " ")
　　>>> sort_D.to_csv("result_no_index.txt", sep= " ", index=False)
　　>>>
　　参考
　　for m, i in enumerate(list(range(1,10))):
　　for n, j in enumerate(list(range(m+1,10))):
　　print i * j
　　安装：
　　pip install pandas
　　导入:
　　import pandas as pd
　　from pandas import Series,DataFrame
　　#Series
　　数据类型： Series,DataFrame
　　Series：与numpy中的一维数组相似
　　初始化：
　　方式一：
　　data = [1,2,3,4,5] #一般为序列
　　series_data =Series(data)  #不传入任何参数,索引默认从0开始
　　方式二：
　　indexes = ['name','shuxue','yuwen','huaxue','yingyu']
　　series_data =Series(['lizhen',1,2,3,4],index=indexes)  #索引为指定的索引值,此时索引为指定的值，索引的长度与值的长度一定要相等
　　方式三：
　　data = {'huaxue': 3, 'name': 'lizhen', 'shuxue': 1, 'yingyu': 4, 'yuwen': 2}
　　series_from_dict = Series(data)
　　查看索引：series_data.index
　　根据索引修改值： series_data.'shuxue' = 3
　　查看全部数据：series_data.values
　　设置数据名称： series_data.index.name = 'type'
　　根据索引查找列的值： series_data['yuwen']
　　获取多个索引的值：  series_data[['yingyu','yuwen']]
　　导出数据到指定格式(dict,clipboard,csv,json,string,sql)：
　　series_from_dict.to_dict()
　　两个Series相加：
　　具有相同的索引才可以相加, 当索引不同时,相加的结果为 NaN
　　只有值为整数时才有意义
　　判断索引是否存在：
　　index_name in series_data #返回True 或者 False
　　#DataFrame类似表或电子表格
　　初始化时传入等长列表或numpy数组组成的字典，自动增加索引，且全部列都会被有序排列
　　方式一：
　　data = {'state': ['Ohio','Ohio','Ohio'],
　　'year': [2000,2001,2002],
　　'pop': [1.5,1.7,3.6]
　　}
　　frame =DataFrame(data) #
　　方式二：
　　data = {'state': ['Ohio','Ohio','Ohio'],
　　'year': [2000,2001,2002],
　　'pop': [1.5,1.7,3.6]
　　}
　　frame = DataFrame(data,columns=['year','state','pop','debt'],index=['one','two','three'])
　　#数据展示按照column指定的格式
　　#若传入的列未找到,默认为NaN
　　方式三：
　　data = {'Nevada': {2001:2.4,2002:2.9},
　　'Ohio':{2000:1.5,2001s:1.7,2002:2.4},
　　}
　　frame = DataFrame(data)
　　#外层key解释为column name, 内层key解释为 index name, 内层key不存在时,对应的column默认NaN补齐
　　设置索引的名称： frame.idnex.name = 'self_index_name'
　　设置列的名称：  frame.columns.name = 'self_columns_name'
　　查看所有的值： frame.values
　　查看所有的列名： frame.columns
　　查看指定列的值：frame[column_name] 或 frame.column_name
　　查看前N行的值： frame.head(n)
　　查看后N行值： frame.tail(n)
　　查看指定索引行的值： frame.ix[[index_name1[,index_name2]]]
　　修改指定列的值： frame['column_name'] = 'new_value'
　　注意：当指定的值为单一值时, 会自动在所有的行上广播
　　指定多个值时, 长度需要和frame的行的长度相等
　　指定的值可以为Series, Series的索引必须与frame的索引名称相同,索引名不同时，默认插入NaN
　　删除不需要的列： del frame['column_name']
　　注意: 索引的名称无法更改
　　在使用pandas框架的DataFrame的过程中，如果需要处理一些字符串的特性，例如判断某列是否包含一些关键字，某列的字符长度是否小于3等等这种需求，如果掌握str列内置的方法，处理起来会方便很多。
　　下面我们来详细了解一下，Series类的str自带的方法有哪些。
　　1、cat() 拼接字符串
　　例子：
　　>>> Series(['a', 'b', 'c']).str.cat(['A', 'B', 'C'], sep=',')
　　0 a,A
　　1 b,B
　　2 c,C
　　dtype: object
　　>>> Series(['a', 'b', 'c']).str.cat(sep=',')
　　'a,b,c'
　　>>> Series(['a', 'b']).str.cat([['x', 'y'], ['1', '2']], sep=',')
　　0 a,x,1
　　1 b,y,2
　　dtype: object
　　2、split() 切分字符串
　　>>> import numpy,pandas;
　　>>> s = pandas.Series(['a_b_c', 'c_d_e', numpy.nan, 'f_g_h'])
　　>>> s.str.split('_')
　　0 [a, b, c]
　　1 [c, d, e]
　　2       NaN
　　3 [f, g, h]
　　dtype: object
　　>>> s.str.split('_', -1)
　　0 [a, b, c]
　　1 [c, d, e]
　　2       NaN
　　3 [f, g, h]
　　dtype: object
　　>>> s.str.split('_', 0)
　　0 [a, b, c]
　　1 [c, d, e]
　　2       NaN
　　3 [f, g, h]
　　dtype: object
　　>>> s.str.split('_', 1)
　　0 [a, b_c]
　　1 [c, d_e]
　　2       NaN
　　3 [f, g_h]
　　dtype: object
　　>>> s.str.split('_', 2)
　　0 [a, b, c]
　　1 [c, d, e]
　　2       NaN
　　3 [f, g, h]
　　dtype: object
　　>>> s.str.split('_', 3)
　　0 [a, b, c]
　　1 [c, d, e]
　　2       NaN
　　3 [f, g, h]
　　dtype: object
　　3、get()获取指定位置的字符串
　　>>> s.str.get(0)
　　0    a
　　1    c
　　2 NaN
　　3    f
　　dtype: object
　　>>> s.str.get(1)
　　0    _
　　1    _
　　2 NaN
　　3    _
　　dtype: object
　　>>> s.str.get(2)
　　0    b
　　1    d
　　2 NaN
　　3    g
　　dtype: object
　　4、join() 对每个字符都用给点的字符串拼接起来，不常用
　　>>> s.str.join("!")
　　0 a!_!b!_!c
　　1 c!_!d!_!e
　　2       NaN
　　3 f!_!g!_!h
　　dtype: object
　　>>> s.str.join("?")
　　0 a?_?b?_?c
　　1 c?_?d?_?e
　　2       NaN
　　3 f?_?g?_?h
　　dtype: object
　　>>>s.str.join(".")
　　0 a._.b._.c
　　1 c._.d._.e
　　2       NaN
　　3 f._.g._.h
　　dtype: object
　　5、contains()是否包含表达式
　　>>> s.str.contains('d')
　　0 False
　　1    True
　　2    NaN
　　3 False
　　dtype: object
　　6、replace()替换
　　>>> s.str.replace("_", ".")
　　0 a.b.c
　　1 c.d.e
　　2    NaN
　　3 f.g.h
　　dtype: object
　　7、repeat()重复
　　>>> s.str.repeat(3)
　　0 a_b_ca_b_ca_b_c
　　1 c_d_ec_d_ec_d_e
　　2             NaN
　　3 f_g_hf_g_hf_g_h
　　dtype: object
　　8、pad()左右补齐
　　>>> s.str.pad(10, fillchar="?")
　　0 ?????a_b_c
　　1 ?????c_d_e
　　2          NaN
　　3 ?????f_g_h
　　dtype: object
　　>>>
　　>>> s.str.pad(10, side="right", fillchar="?")
　　0 a_b_c?????
　　1 c_d_e?????
　　2          NaN
　　3 f_g_h?????
　　dtype: object
　　9、center() 中间补齐，看例子
　　>>> s.str.center(10, fillchar="?")
　　0 ??a_b_c???
　　1 ??c_d_e???
　　2          NaN
　　3 ??f_g_h???
　　dtype: object
　　10、ljust() 右边补齐，看例子
　　>>> s.str.ljust(10, fillchar="?")
　　0 a_b_c?????
　　1 c_d_e?????
　　2          NaN
　　3 f_g_h?????
　　dtype: object
　　11、rjust() 左边补齐，看例子
　　>>> s.str.rjust(10, fillchar="?")
　　0 ?????a_b_c
　　1 ?????c_d_e
　　2          NaN
　　3 ?????f_g_h
　　dtype: object
　　12、zfill()左边补0
　　>>> s.str.zfill(10)
　　0 00000a_b_c
　　1 00000c_d_e
　　2          NaN
　　3 00000f_g_h
　　dtype: object
　　13、wrap()在指定的位置加回车符号
　　>>> s.str.wrap(3)
　　0 a_b\n_c
　　1 c_d\n_e
　　2       NaN
　　3 f_g\n_h
　　dtype: object
　　14、slice() 按给点的开始结束位置切割字符串
　　>>> s.str.slice(1,3)
　　0    _b
　　1    _d
　　2 NaN
　　3    _g
　　dtype: object
　　15、slice_replace() 使用给定的字符串，替换指定的位置的字符
　　>>> s.str.slice_replace(1, 3, "?")
　　0 a?_c
　　1 c?_e
　　2    NaN
　　3 f?_h
　　dtype: object
　　>>> s.str.slice_replace(1, 3, "??")
　　0 a??_c
　　1 c??_e
　　2    NaN
　　3 f??_h
　　dtype: object
　　16、count()计算给定单词出现的次数
　　>>> s.str.count("a")
　　0    1
　　1    0
　　2 NaN
　　3    0
　　dtype: float64
　　17、startswith()判断是否以给定的字符串开头
　　>>> s.str.startswith("a");
　　0    True
　　1 False
　　2    NaN
　　3 False
　　dtype: object
　　18、endswith() 判断是否以给定的字符串结束
　　>>> s.str.endswith("e");
　　0 False
　　1    True
　　2    NaN
　　3 False
　　dtype: object
　　19、findall() 查找所有符合正则表达式的字符，以数组形式返回
　　>>> s.str.findall("[a-z]");
　　0 [a, b, c]
　　1 [c, d, e]
　　2       NaN
　　3 [f, g, h]
　　dtype: object
　　20、match()检测是否全部匹配给点的字符串或者表达式
　　>>> s
　　0 a_b_c
　　1 c_d_e
　　2    NaN
　　3 f_g_h
　　dtype: object
　　>>> s.str.match("[d-z]");
　　0 False
　　1 False
　　2    NaN
　　3    True
　　dtype: object
　　21、extract()抽取匹配的字符串出来，注意要加上括号，把你需要抽取的东西标注上
　　>>> s.str.extract("([d-z])");
　　0 NaN
　　1    d
　　2 NaN
　　3    f
　　dtype: object
　　22、len()计算字符串的长度
　　>>> s.str.len()
　　0    5
　　1    5
　　2 NaN
　　3    5
　　dtype: float64
　　23、strip() 去除前后的空白字符

　　>>>>
　　>>>>　　0    jack
　　1    jill
　　2 jesse
　　3 frank
　　dtype: object
　　24、rstrip() 去除后面的空白字符
　　25、lstrip() 去除前面的空白字符
　　26、partition()把字符串数组切割称为DataFrame，注意切割只是切割称为三部分，分隔符前，分隔符，分隔符后
　　27、rpartition()从右切起
　　>>> s.str.partition('_')
　　0 1 2
　　0 a _  b_c
　　1 c _  d_e
　　2  NaN  NaN  NaN
　　3 f _  g_h
　　>>> s.str.rpartition('_')
　　0 1 2
　　0  a_b _ c
　　1  c_d _ e
　　2  NaN  NaN  NaN
　　3  f_g _ h
　　28、lower() 全部小写
　　29、upper() 全部大写
　　30、find() 从左边开始，查找给定字符串的所在位置
　　>>> s.str.find('d')
　　0 -1
　　1    2
　　2 NaN
　　3 -1
　　dtype: float64
　　31、rfind()从右边开始，查找给定字符串的所在位置
　　32、index()查找给定字符串的位置，注意，如果不存在这个字符串，那么会报错！
　　33、rindex()从右边开始查找，给定字符串的位置
　　>>> s.str.index('_')
　　0    1
　　1    1
　　2 NaN
　　3    1
　　dtype: float64
　　34、capitalize()首字符大写
　　>>> s.str.capitalize()
　　0 A_b_c
　　1 C_d_e
　　2    NaN
　　3 F_g_h
　　dtype: object
　　35、swapcase()大小写互换
　　>>> s.str.swapcase()
　　0 A_B_C
　　1 C_D_E
　　2    NaN
　　3 F_G_H
　　dtype: object
　　36、normalize() 序列化数据，数据分析很少用到，咱们就不研究了
　　37、isalnum()是否全部是数字和字母组成
　　>>> s.str.isalnum()
　　0 False
　　1 False
　　2    NaN
　　3 False
　　dtype: object
　　38、isalpha()是否全部是字母
　　>>> s.str.isalpha()
　　0 False
　　1 False
　　2    NaN
　　3 False
　　dtype: object
　　39、isdigit()是否全部都是数字
　　>>> s.str.isdigit()
　　0 False
　　1 False
　　2    NaN
　　3 False
　　dtype: object
　　40、isspace() 是否空格
　　>>> s.str.isspace()
　　0 False
　　1 False
　　2    NaN
　　3 False
　　dtype: object
　　41、islower()是否全部小写
　　42、isupper() 是否全部大写
　　>>> s.str.islower()
　　0 True
　　1 True
　　2    NaN
　　3 True
　　dtype: object
　　>>> s.str.isupper()
　　0 False
　　1 False
　　2    NaN
　　3 False
　　dtype: object
　　43、istitle() 是否只有首字母为大写，其他字母为小写
　　>>> s.str.istitle()
　　0 False
　　1 False
　　2    NaN
　　3 False
　　dtype: object
　　44、isnumeric()是否是数字
　　45、isdecimal()是否全是数字
　　pandas获取列数据位常用功能，但在写法上还有些要注意的地方，在这里总结一下：
　　import pandas as pd
　　data1 = pd.DataFrame(...) #任意初始化一个列数为3的DataFrame
　　data1.columns=['a', 'b', 'c']
　　1.
　　data1['b']
　　#这里取到第2列（即b列）的值
　　2.
　　data1.b
　　#效果同1，取第2列（即b列）
　　#这里b为列名称，但必须是连续字符串，不能有空格。如果列明有空格，则只能采取第1种方法
　　3.
　　data1[data1.columns[1:]]
　　#这里取data1的第2列和第3列的所有数据
　　番外1.
　　data1[5:10]
　　#这里取6到11行的所有数据，而不是列数据
　　番外2.
　　data_raw_by_tick[2]
　　#非法，返回“KeyError: 2”
　　导出mysql数据，利用pandas生成excel文档，并发送邮件
　　#!/usr/bin/env python
　　# -*- coding: utf-8 -*-
　　import pandas
　　import pandas as pd
　　import MySQLdb
　　import MySQLdb.cursors
　　import os
　　import datetime
　　from email.mime.text import MIMEText
　　from email.mime.multipart import MIMEMultipart
　　import smtplib
　　#返回SQL结果的函数
　　def retsql(sql):
　　db_user = MySQLdb.connect('IP','用户名','密码','j数据库名(可以不指定)',cursorclass=MySQLdb.cursors.DictCursor(设置返回结果以字典的格式))
　　cursor = db_user.cursor()
　　cursor.execute("SET NAMES utf8;"(设置字符集为utf-8，不然在返回的结果中会显示乱码，即使数据库的编码设置就是utf-8))
　　cursor.execute(sql)
　　ret = cursor.fetchall()
　　db_user.close()
　　return ret
　　#生成xls文件的函数
　　def retxls(ret,dt):
　　file_name = datetime.datetime.now().strftime("/path/to/store/%Y-%m-%d-%H:%M") + dt + ".sql.xlsx"
　　dret = pd.DataFrame.from_records(ret)
　　dret.to_excel(filename,"Sheet1",engine="openpyxl"）###z注意openpyxl这个库可能在生成xls的时候出错，pip install openpyxls==1.8.6，其他版本似乎与pandas有点冲突，安装1.8.6的即可
　　print "Ok!!! the file in",file_name
　　return filename
　　#发送邮件的函数
　　##传入主题，显示名，目标邮箱，附件名
　　def sendm(sub,cttstr,to_list,file):
　　msg = MIMEMultipart()
　　att = MIMEText(open(file,'rb').read(),"base64","utf-8")
　　att["Content-Type"] = "application/octet-stream"
　　att["Content-Disposition"] = 'attachment; filename="sql查询结果.xlsx"'
　　msg['from'] = '发件人地址'
　　msg['subject'] = sub
　　ctt = MIMEText(cttstr,'plain','utf-8')
　　msg.attach(att)
　　msg.attach(ctt)
　　try:
　　server = smtplib.SMTP()
　　#server.set_debuglevel(1)  ###如果问题可打开此选项以便调试
　　server.connect("mail.example.com",'25')
　　server.starttls() ###如果开启了ssl或者tls加密，开启加密
　　server.login("可用邮箱用户名","密码")
　　server.sendmail(msg['from'],to_list,msg.as_string())
　　server.quit()
　　print 'ok!!!'
　　except Exception,e:
　　print str(e)
　　###想要查询的sql语句
　　sql="""sql语句"""
　　#接收邮件的用户列表
　　to_list = ['test1@example.com',
　　'test2@example.com']
　　#执行sql并将结果传递给ret
　　ret = retsql(sql)
　　#将结果文件路径结果传给retfile
　　retfile = retxls(ret,"1")
　　#发送邮件
　　#发送sql语句内容
　　sendm(sub1,sub1,to_list,retfile1)
　　Python之ipython、notebook、matplotlib安装使用
　　#!/usr/bin/python
　　# -*- coding: UTF-8 -*-
　　以下进行逐步安装配置
　　python 3.5.2, ipython 5.1.0, jupyter notebook, matplotlib
　　1、安装python3.5
　　具体安装请参考官方文档。安装程序时注意勾选配置环境变量。https://www.python.org/downloads/windows/
　　2、升级pip
　　python -m pip install --upgrade pip
　　3、使用pip安装ipython
　　pip.exe install ipython
　　4、使用pip安装notebook
　　pip install notebook
　　5、安装画图工具 matplotlib
　　pip install matplotlib
　　pip install matplotlib --upgrade
　　6、实例
　　import numpy as np
　　import matplotlib.pyplot as plt
　　N = 5
　　menMeans = (20, 35, 30, 35, 27)
　　menStd = (2, 3, 4, 1, 2)
　　ind = np.arange(N)  # the x locations for the groups

　　width = 0.35    # the>　　fig, ax = plt.subplots()

　　rects1 = ax.bar(ind, menMeans,>　　womenMeans = (25, 32, 34, 20, 25)
　　womenStd = (3, 5, 2, 3, 3)

　　rects2 = ax.bar(ind+width, womenMeans,>　　# add some
　　ax.set_ylabel('Scores')
　　ax.set_title('Scores by group and gender')
　　ax.set_xticks(ind+width)
　　ax.set_xticklabels( ('G1', 'G2', 'G3', 'G4', 'G5') )
　　ax.legend( (rects1[0], rects2[0]), ('Men', 'Women') )
　　def autolabel(rects):
　　# attach some text labels
　　for rect in rects:

　　>　　ax.text(rect.get_x()+rect.get_width()/2., 1.05*height, '%d'%int(height),
　　ha='center', va='bottom')
　　autolabel(rects1)
　　autolabel(rects2)
　　plt.show()
　　import numpy as np
　　import matplotlib.pyplot as plt
　　x = np.arange(9)
　　y = np.sin(x)
　　plt.plot(x,y)
　　plt.show()
　　import matplotlib.pyplot as plt
　　plt.bar(left = 0,height = 1)
　　plt.show()
　　首先我们import了matplotlib.pyplot ，然后直接调用其bar方法，最后用show显示图像。
　　我解释一下bar中的两个参数：
　　left：柱形的左边缘的位置，如果我们指定1那么当前柱形的左边缘的x值就是1.0了
　　height：这是柱形的高度，也就是Y轴的值了
　　left，height除了可以使用单独的值（此时是一个柱形），也可以使用元组来替换（此时代表多个矩形）。例如，下面的例子：
　　import matplotlib.pyplot as plt
　　plt.bar(left = (0,1),height = (1,0.5))
　　plt.show()
　　可以看到 left = (0,1)的意思就是总共有两个矩形，第一个的左边缘为0，第二个的左边缘为1。height参数同理。
　　当然，可能你还觉得这两个矩形“太胖”了。此时我们可以通过指定bar的width参数来设置它们的宽度。
　　import matplotlib.pyplot as plt
　　plt.bar(left = (0,1),height = (1,0.5),width = 0.35)
　　plt.show()
　　此时又来需求了，我需要标明x，y轴的说明。比如x轴是性别，y轴是人数。实现也很简单，看代码：
　　import matplotlib.pyplot as plt
　　plt.xlabel(u'性别')
　　plt.ylabel(u'人数')
　　plt.bar(left = (0,1),height = (1,0.5),width = 0.35)
　　plt.show()
　　注意这里的中文一定要用u（3.0以上好像不用，我用的2.7），因为matplotlib只支持unicode。接下来，让我们在x轴上的每个bar进行说明。比如第一个是“男”，第二个是“女”。
　　import matplotlib.pyplot as plt
　　plt.xlabel(u'性别')
　　plt.ylabel(u'人数')
　　plt.xticks((0,1),(u'男',u'女'))
　　plt.bar(left = (0,1),height = (1,0.5),width = 0.35)
　　plt.show()
　　plt.xticks的用法和我们前面说到的left,height的用法差不多。如果你有几个bar，那么就是几维的元组。第一个是文字的位置，第二个是具体的文字说明。不过这里有个问题，很显然我们指定的位置有些“偏移”，最理想的状态应该在每个矩形的中间。你可以更改(0,1)=>( (0+0.35)/2 ,(1+0.35)/2 )不过这样比较麻烦。我们可以通过直接指定bar方法里面的align="center"就可以让文字居中了。
　　import matplotlib.pyplot as plt
　　plt.xlabel(u'性别')
　　plt.ylabel(u'人数')
　　plt.xticks((0,1),(u'男',u'女'))
　　plt.bar(left = (0,1),height = (1,0.5),width = 0.35,align="center")
　　plt.show()
　　接下来，我们还可以给图标加入标题。当然，还有图例也少不掉:
　　import matplotlib.pyplot as plt
　　plt.xlabel(u'性别')
　　plt.ylabel(u'人数')
　　plt.title(u"性别比例分析")
　　plt.xticks((0,1),(u'男',u'女'))
　　rect = plt.bar(left = (0,1),height = (1,0.5),width = 0.35,align="center")
　　plt.legend((rect,),(u"图例",))
　　plt.show()
　　注意这里的legend方法，里面的参数必须是元组。即使你只有一个图例，不然显示不正确。
　　接下来，我们还可以在每个矩形的上面标注它具体点Y值。这里，我们需要用到一个通用的方法：
　　def autolabel(rects):
　　for rect in rects:

　　>　　plt.text(rect.get_x()+rect.get_width()/2., 1.03*height, '%s' % float(height))
　　其中plt.text的参数分别是：x坐标，y坐标，要显示的文字。所以，调用代码如下：
　　import matplotlib.pyplot as plt
　　def autolabel(rects):
　　for rect in rects:

　　>　　plt.text(rect.get_x()+rect.get_width()/2., 1.03*height, '%s' % float(height))
　　plt.xlabel(u'性别')
　　plt.ylabel(u'人数')
　　plt.title(u"性别比例分析")
　　plt.xticks((0,1),(u'男',u'女'))
　　rect = plt.bar(left = (0,1),height = (1,0.5),width = 0.35,align="center")
　　plt.legend((rect,),(u"图例",))
　　autolabel(rect)
　　plt.show()
　　matplotlib所绘制的图表的每个组成部分都和一个对象对应，我们可以通过调用这些对象的属性设置方法set_*()或者pyplot模块的属性设置函数setp()设置它们的属性值。
　　因为matplotlib实际上是一套面向对象的绘图库，因此也可以直接获取对象的属性
　　配置文件
　　绘制一幅图需要对许多对象的属性进行配置，例如颜色、字体、线型等等。我们在绘图时，并没有逐一对这些属性进行配置，许多都直接采用了matplotlib的缺省配置。
　　matplotlib将这些缺省配置保存在一个名为“matplotlibrc”的配置文件中，通过修改配置文件，我们可以修改图表的缺省样式。配置文件的读入可以使用rc_params()，它返回一个配置字典；在matplotlib模块载入时会调用rc_params()，并把得到的配置字典保存到rcParams变量中；matplotlib将使用rcParams字典中的配置进行绘图；用户可以直接修改此字典中的配置，所做的改变会反映到此后创建的绘图元素。
　　绘制多子图（快速绘图）
　　Matplotlib 里的常用类的包含关系为 Figure -> Axes -> (Line2D, Text, etc.)一个Figure对象可以包含多个子图(Axes)，在matplotlib中用Axes对象表示一个绘图区域，可以理解为子图。
　　可以使用subplot()快速绘制包含多个子图的图表，它的调用形式如下：
　　subplot(numRows, numCols, plotNum)
　　subplot将整个绘图区域等分为numRows行* numCols列个子区域，然后按照从左到右，从上到下的顺序对每个子区域进行编号，左上的子区域的编号为1。
　　如果numRows，numCols和plotNum这三个数都小于10的话，可以把它们缩写为一个整数，例如subplot(323)和subplot(3,2,3)是相同的。
　　subplot在plotNum指定的区域中创建一个轴对象。如果新创建的轴和之前创建的轴重叠的话，之前的轴将被删除。
　　subplot()返回它所创建的Axes对象，我们可以将它用变量保存起来，然后用sca()交替让它们成为当前Axes对象，并调用plot()在其中绘图。
　　绘制多图表（快速绘图）
　　如果需要同时绘制多幅图表，可以给figure()传递一个整数参数指定Figure对象的序号，如果序号所指定的Figure对象已经存在，将不创建新的对象，而只是让它成为当前的Figure对象。
　　import numpy as np
　　import matplotlib.pyplot as plt
　　plt.figure(1) # 创建图表1
　　plt.figure(2) # 创建图表2
　　ax1 = plt.subplot(211) # 在图表2中创建子图1
　　ax2 = plt.subplot(212) # 在图表2中创建子图2
　　x = np.linspace(0, 3, 100)
　　for i in xrange(5):
　　plt.figure(1)  # # 选择图表1
　　plt.plot(x, np.exp(i*x/3))
　　plt.sca(ax1) # # 选择图表2的子图1
　　plt.plot(x, np.sin(i*x))
　　plt.sca(ax2)  # 选择图表2的子图2
　　plt.plot(x, np.cos(i*x))
　　plt.show()
　　在图表中显示中文
　　matplotlib的缺省配置文件中所使用的字体无法正确显示中文。为了让图表能正确显示中文，可以有几种解决方案。
　　在程序中直接指定字体。
　　在程序开头修改配置字典rcParams。
　　修改配置文件。
　　比较简便的方式是，中文字符串用unicode格式，例如：u''测试中文显示''，代码文件编码使用utf-8 加上" # coding = utf-8  "一行。
　　matplotlib输出图象的中文显示问题
　　面向对象画图
　　matplotlib API包含有三层，Artist层处理所有的高层结构，例如处理图表、文字和曲线等的绘制和布局。通常我们只和Artist打交道，而不需要关心底层的绘制细节。
　　直接使用Artists创建图表的标准流程如下：
　　创建Figure对象
　　用Figure对象创建一个或者多个Axes或者Subplot对象
　　调用Axies等对象的方法创建各种简单类型的Artists
　　import matplotlib.pyplot as plt
　　X1 = range(0, 50) Y1 = [num**2 for num in X1] # y = x^2 X2 = [0, 1] Y2 = [0, 1] # y = x
　　Fig = plt.figure(figsize=(8,4)) # Create a `figure' instance
　　Ax = Fig.add_subplot(111) # Create a `axes' instance in the figure
　　Ax.plot(X1, Y1, X2, Y2) # Create a Line2D instance in the axes
　　Fig.show()
　　Fig.savefig("test.pdf")
　　matplotlib还提供了一个名为pylab的模块，其中包括了许多NumPy和pyplot模块中常用的函数，方便用户快速进行计算和绘图，十分适合在IPython交互式环境中使用。这里使用下面的方式载入pylab模块：
　　>>>import pylab as pl
　　1 安装numpy和matplotlib
　　>>> import numpy
　　>>> numpy.__version__
　　>>> import matplotlib
　　>>> matplotlib.__version__
　　2 两种常用图类型：Line and scatter plots(使用plot()命令), histogram(使用hist()命令)
　　2.1 折线图&散点图 Line and scatter plots
　　2.1.1 折线图 Line plots(关联一组x和y值的直线)
　　import numpy as np
　　import pylab as pl
　　x = [1, 2, 3, 4, 5]
　　y = [1, 4, 9, 16, 25]
　　pl.plot(x, y)
　　pl.show()
　　2.1.2 散点图 Scatter plots
　　把pl.plot(x, y)改成pl.plot(x, y, 'o')即可，下图的蓝色版本
　　2.2  美化 Making things look pretty
　　2.2.1 线条颜色 Changing the line color
　　红色：把pl.plot(x, y, 'o')改成pl.plot(x, y, ’or’)
　　2.2.2 线条样式 Changing the line style
　　虚线:plot(x,y, '--')
　　2.2.3 marker样式 Changing the marker style
　　蓝色星型markers：plot(x,y, ’b*’)

　　2.2.4 图和轴标题以及轴坐标限度 Plot and axis>　　import numpy as np
　　import pylab as pl
　　x = [1, 2, 3, 4, 5]# Make an array of x values
　　y = [1, 4, 9, 16, 25]# Make an array of y values for each x value
　　pl.plot(x, y)# use pylab to plot x and y

　　pl.title(’Plot of y vs. x’)# give plot a>　　pl.xlabel(’x axis’)# make axis labels
　　pl.ylabel(’y axis’)
　　pl.xlim(0.0, 7.0)# set axis limits
　　pl.ylim(0.0, 30.)
　　pl.show()# show the plot on the screen
　　2.2.5 在一个坐标系上绘制多个图 Plotting more than one plot on the same set of axes
　　做法是很直接的，依次作图即可:
　　import numpy as np
　　import pylab as pl
　　x1 = [1, 2, 3, 4, 5]# Make x, y arrays for each graph
　　y1 = [1, 4, 9, 16, 25]
　　x2 = [1, 2, 4, 6, 8]
　　y2 = [2, 4, 8, 12, 16]
　　pl.plot(x1, y1, ’r’)# use pylab to plot x and y
　　pl.plot(x2, y2, ’g’)

　　pl.title(’Plot of y vs. x’)# give plot a>　　pl.xlabel(’x axis’)# make axis labels
　　pl.ylabel(’y axis’)
　　pl.xlim(0.0, 9.0)# set axis limits
　　pl.ylim(0.0, 30.)
　　pl.show()# show the plot on the screen
　　2.2.6  图例 Figure legends
　　pl.legend((plot1, plot2), (’label1, label2’), 'best’, numpoints=1)
　　其中第三个参数表示图例放置的位置:'best’‘upper right’, ‘upper left’, ‘center’, ‘lower left’, ‘lower right’.
　　如果在当前figure里plot的时候已经指定了label，如plt.plot(x,z,label="cos(x2)")，直接调用plt.legend()就可以了哦。
　　import numpy as np
　　import pylab as pl
　　x1 = [1, 2, 3, 4, 5]# Make x, y arrays for each graph
　　y1 = [1, 4, 9, 16, 25]
　　x2 = [1, 2, 4, 6, 8]
　　y2 = [2, 4, 8, 12, 16]
　　plot1 = pl.plot(x1, y1, ’r’)# use pylab to plot x and y : Give your plots names
　　plot2 = pl.plot(x2, y2, ’go’)

　　pl.title(’Plot of y vs. x’)# give plot a>　　pl.xlabel(’x axis’)# make axis labels
　　pl.ylabel(’y axis’)
　　pl.xlim(0.0, 9.0)# set axis limits
　　pl.ylim(0.0, 30.)
　　pl.legend([plot1, plot2], (’red line’, ’green circles’), ’best’, numpoints=1)    # make legend
　　pl.show()# show the plot on the screen
　　2.3 直方图 Histograms
　　import numpy as np
　　import pylab as pl
　　# make an array of random numbers with a gaussian distribution with
　　# mean = 5.0
　　# rms = 3.0
　　# number of points = 1000
　　data = np.random.normal(5.0, 3.0, 1000)
　　# make a histogram of the data array
　　pl.hist(data)
　　# make plot labels
　　pl.xlabel(’data’)
　　pl.show()
　　如果不想要黑色轮廓可以改为pl.hist(data, histtype=’stepfilled’)

　　2.3.1 自定义直方图bin宽度 Setting the>　　增加这两行
　　bins = np.arange(-5., 16., 1.) #浮点数版本的range
　　pl.hist(data, bins, histtype=’stepfilled’)
　　3 同一画板上绘制多幅子图 Plotting more than one axis per canvas
　　如果需要同时绘制多幅图表的话，可以是给figure传递一个整数参数指定图标的序号，如果所指定
　　序号的绘图对象已经存在的话，将不创建新的对象，而只是让它成为当前绘图对象。
　　fig1 = pl.figure(1)
　　pl.subplot(211)
　　subplot(211)把绘图区域等分为2行*1列共两个区域, 然后在区域1(上区域)中创建一个轴对象. pl.subplot(212)在区域2(下区域)创建一个轴对象。
　　import numpy as np
　　import pylab as pl
　　# Use numpy to load the data contained in the file
　　# ’fakedata.txt’ into a 2-D array called data
　　data = np.loadtxt(’fakedata.txt’)
　　# plot the first column as x, and second column as y
　　pl.plot(data[:,0], data[:,1], ’ro’)
　　pl.xlabel(’x’)
　　pl.ylabel(’y’)
　　pl.xlim(0.0, 10.)
　　pl.show()
　　4.2 写入数据到文件 Writing data to a text file
　　写文件的方法也很多，这里只介绍一种可用的写入文本文件的方法，更多的可以参考官方文档。
　　import numpy as np
　　# Let’s make 2 arrays (x, y) which we will write to a file
　　# x is an array containing numbers 0 to 10, with intervals of 1
　　x = np.arange(0.0, 10., 1.)
　　# y is an array containing the values in x, squared
　　y = x*x
　　print ’x = ’, x
　　print ’y = ’, y
　　# Now open a file to write the data to
　　# ’w’ means open for ’writing’
　　file = open(’testdata.txt’, ’w’)
　　# loop over each line you want to write to file
　　for i in range(len(x)):
　　# make a string for each line you want to write
　　# ’\t’ means ’tab’
　　# ’\n’ means ’newline’
　　# ’str()’ means you are converting the quantity in brackets to a string type
　　txt = str(x) + ’\t’ + str(y) + ’ \n’
　　# write the txt to the file
　　file.write(txt)
　　# Close your file
　　file.close()
　　图例1
　　import matplotlib.pyplot as plt; plt.rcdefaults()
　　import numpy as np
　　import matplotlib.pyplot as plt
　　# Example data
　　people = ('Tom', 'Dick', 'Harry', 'Slim', 'Jim')
　　y_pos = np.arange(len(people))
　　performance = 3 + 10 * np.random.rand(len(people))
　　error = np.random.rand(len(people))

　　#barh(bottom,>
　　plt.barh(y_pos, performance, xerr=error,>　　plt.yticks(y_pos, people)
　　plt.xlabel('Performance')
　　plt.title('How fast do you want to go today?')
　　plt.show()
　　图例 2
　　import numpy as np
　　import matplotlib.pyplot as plt
　　import pylab
　　from matplotlib.ticker import MaxNLocator
　　grade = 2
　　day = '2014-06-22'  # Today in this year
　　numTests = 5
　　testNames = ['swap','memory', '/project', '/backup', '/root']
　　testMeta = ['', '', '', '','']
　　scores = [98,79, 39, 92,17]
　　lastweek_scores = ['97%','35%','86%','21%','70%']
　　#rankings = np.round(np.random.uniform(0, 1, numTests)*100, 0)
　　rankings = 3 + 10 * np.random.rand(numTests)
　　fig, ax1 = plt.subplots(figsize=(9, 7))
　　plt.subplots_adjust(left=0.115, right=0.88)
　　fig.canvas.set_window_title('Usage Chart')
　　pos = np.arange(numTests)+0.5 # Center bars on the Y-axis ticks

　　rects = ax1.barh(pos, scores, align='center',>　　ax1.axis([0, 100, 0, 5])
　　pylab.yticks(pos, testNames)
　　ax1.set_title('Server 18.32 Usage Chart')
　　plt.text(50, -0.5, 'date: ' + day,

　　horizontalalignment='center',>　　# Set the right-hand Y-axis ticks and labels and set X-axis tick marks at the
　　# deciles
　　ax2 = ax1.twinx()
　　ax2.plot([100, 100], [0, 5], 'white', alpha=0.1)
　　ax2.xaxis.set_major_locator(MaxNLocator(11))
　　xticks = pylab.setp(ax2, xticklabels=['0', '10', '20', '30', '40', '50', '60',
　　'70', '80', '90', '100'])
　　ax2.xaxis.grid(True, linestyle='--', which='major', color='grey',
　　alpha=0.25)
　　#Plot a solid vertical gridline to highlight the median position
　　plt.plot([50, 50], [0, 5], 'grey', alpha=0.25)
　　# Build up the score labels for the right Y-axis by first appending a carriage
　　# return to each string and then tacking on the appropriate meta information
　　# (i.e., 'laps' vs 'seconds'). We want the labels centered on the ticks, so if
　　# there is no meta info (like for pushups) then don't add the carriage return to
　　# the string
　　def withnew(i, scr):
　　if testMeta != '':
　　return '%s\n' % scr
　　else:
　　return scr
　　scoreLabels = [withnew(i, scr) for i, scr in enumerate(lastweek_scores)]
　　scoreLabels = [i+j for i, j in zip(scoreLabels, testMeta)]
　　# set the tick locations
　　ax2.set_yticks(pos)
　　# set the tick labels
　　ax2.set_yticklabels(scoreLabels)
　　# make sure that the limits are set equally on both yaxis so the ticks line up
　　ax2.set_ylim(ax1.get_ylim())
　　ax2.set_ylabel("Last Week's data",color='sienna')
　　#Make list of numerical suffixes corresponding to position in a list
　　#          0    1    2    3    4    5    6    7    8    9
　　suffixes = ['%', '%', '%', '%', '%', '%', '%', '%', '%', '%']
　　ax2.set_xlabel('Percentile Ranking Across ' + suffixes[grade]
　　+ ' Grade '  + 's')
　　# Lastly, write in the ranking inside each bar to aid in interpretation
　　for rect in rects:

　　# Rectangle>　　# type, so it helps to remove the trailing decimal point and 0 by

　　# converting>
　　>　　# Figure out what the last digit (width modulo 10) so we can add
　　# the appropriate numerical suffix (e.g., 1st, 2nd, 3rd, etc)

　　lastDigit =>　　# Note that 11, 12, and 13 are special cases
　　if (width == 11) or (width == 12) or (width == 13):
　　suffix = 'th'
　　else:
　　suffix = suffixes[lastDigit]
　　rankStr = str(width) + suffix
　　if (width < 5):       # The bars aren't wide enough to print the ranking inside

　　xloc =>　　clr = 'black'    # Black against white background
　　align = 'left'
　　else:
　　xloc = 0.98*width  # Shift the text to the left side of the right edge
　　clr = 'white'    # White on magenta
　　align = 'right'
　　# Center the text vertically in the bar
　　yloc = rect.get_y()+rect.get_height()/2.0
　　ax1.text(xloc, yloc, rankStr, horizontalalignment=align,
　　verticalalignment='center', color=clr, weight='bold')
　　plt.show()
　　python结合matplotlib，统计svn的代码提交量
　　安装所需的依赖包
　　yum install -y  numpy matplotlib
　　matplotlib.pyplot是一些命令行风格函数的集合，使matplotlib以类似于MATLAB的方式工作。每个pyplot函数对一幅图片(figure)做一些改动：比如创建新图片，在图片创建一个新的作图区域(plotting area)，在一个作图区域内画直线，给图添加标签(label)等。matplotlib.pyplot是有状态的，亦即它会保存当前图片和作图区域的状态，新的作图函数会作用在当前图片的状态基础之上。
　　import matplotlib.pyplot as plt
　　plt.plot([1,2,3,4])
　　plt.ylabel('some numbers')
　　plt.show()
　　上图的X坐标是1-3，纵坐标是1-4，这是因为如果你只提供给plot()函数一个列表或数组，matplotlib会认为这是一串Y值(Y向量)，并且自动生成X值(X向量)。而Python一般是从0开始计数的，所以X向量有和Y向量一样的长度(此处是4)，但是是从0开始，所以X轴的值为[0,1,2,3]。
　　也可以给plt.plot()函数传递多个序列(元组或列表)，每两个序列是一个X,Y向量对，在图中构成一条曲线，这样就会在同一个图里存在多条曲线。
　　为了区分同一个图里的多条曲线，可以为每个X,Y向量对指定一个参数来标明该曲线的表现形式，默认的参数是'b-'，亦即蓝色的直线，如果想用红色的圆点来表示这条曲线，可以：
　　import matplotlib.pyplot as plt
　　plt.plot([1,2,3,4],[1,4,9,16],'ro')
　　plt.axis([0,6,0,20])
　　axis()函数接受形如[xmin,xmax,ymin,ymax]的参数，指定了X,Y轴坐标的范围。
　　matplotlib不仅仅可以使用序列(列表和元组)作为参数，还可以使用numpy数组。实际上，所有的序列都被内在的转化为numpy数组。
　　import numpy as np
　　import matplotlib.pyplot as plt
　　t=np,arange(0.,5.,0.2)
　　plt.plot(t,t,'r--',t,t**2,'bs',t,t**3,'g^')
　　控制曲线的属性
　　曲线有许多我们可以设置的性质：曲线的宽度，虚线的风格，抗锯齿等等。有多种设置曲线属性的方法：
　　1.使用关键词参数：
　　plt.plot(x,y,linewidth=2.0)
　　2.使用Line2D实例的设置(Setter)方法。plot()返回的是曲线的列表，比如line1,line2=plot(x1,y1,x2,y2).我们取得plot()函数返回的曲线之后用Setter方法来设置曲线的属性。
　　line,=plt.plot(x,y,'-')
　　line.set)antialliased(False)  #关闭抗锯齿
　　3.使用setp()命令：
　　lines=plt.plot(x1,y1,x2,y2)
　　plt.setp(lines,color='r',linewidth=2.0)
　　plt.setp(lines,'color','r','linewidth','2.0')
　　处理多个图和Axe
　　MATLAB和pyplot都有当前图和当前axe的概念。所有的作图命令都作用在当前axe。
　　函数gca()返回当前axe，gcf()返回当前图。
　　复制代码
　　import numpy as np
　　import matplotlib.pyplot as plt
　　def f(t):
　　return np.exp(-t) * np.cos(2*np.pi*t)
　　t1 = np.arange(0.0, 5.0, 0.1)
　　t2 = np.arange(0.0, 5.0, 0.02)
　　plt.figure(1)
　　plt.subplot(211)
　　plt.plot(t1, f(t1), 'bo', t2, f(t2), 'k')
　　plt.subplot(212)
　　plt.plot(t2, np.cos(2*np.pi*t2), 'r--')
　　figure()命令是可选的，因为figure(1)会被默认创建，subplot(111)也会被默认创建。
　　subplot()命令会指定numrows,numcols,fignum，其中fignum的取值范围为从1到numrows*numcols。如果numrows*numcols小于10则subplot()命令中的逗号是可选的。所以subplot(2,1,1)与subplot(211)是完全一样的。
　　如果你想手动放置axe，而不是放置在矩形方格内，则可以使用axes()命令，其中的参数为axes([left,bottom,width,height])，每个参数的取值范围为(0,1)。
　　你可以使用多个figure()来创建多个图，每个图都可以有多个axe和subplot：
　　复制代码
　　import matplotlib.pyplot as plt
　　plt.figure(1)             # the first figure
　　plt.subplot(211)          # the first subplot in the first figure
　　plt.plot([1,2,3])
　　plt.subplot(212)          # the second subplot in the first figure
　　plt.plot([4,5,6])
　　plt.figure(2)          # a second figure
　　plt.plot([4,5,6])          # creates a subplot(111) by default
　　plt.figure(1)             # figure 1 current; subplot(212) still current
　　plt.subplot(211)          # make subplot(211) in figure1 current

　　plt.title('Easy as 1,2,3') # subplot 211>　　复制代码
　　你可以使用clf()和cla()命令来清空当前figure和当前axe。
　　如果你创建了许多图，你需要显示的使用close()命令来释放该图所占用的内存，仅仅关闭显示在屏幕上的图是不会释放内存空间的。
　　处理文本
　　text()命令可以用来在任意位置上添加文本，xlabel(),ylabel(),title()可以用来在X轴，Y轴，标题处添加文本。
　　复制代码
　　import numpy as np
　　import matplotlib.pyplot as plt
　　mu, sigma = 100, 15
　　x = mu + sigma * np.random.randn(10000)
　　# the histogram of the data

　　n, bins, patches = plt.hist(x, 50, normed=1,>　　plt.xlabel('Smarts')
　　plt.ylabel('Probability')
　　plt.title('Histogram of IQ')
　　plt.text(60, .025, r'$\mu=100,\ \sigma=15$')
　　plt.axis([40, 160, 0, 0.03])
　　plt.grid(True)
　　每个text()命令都会返回一个matplotlib.text.Text实例，就像之前处理曲线一样，你可以通过使用setp()函数来传递关键词参数来定制文本的属性。
　　t=plt.xlabel('my data',fontsize=14,color='red')
　　在文本中使用数学表达式
　　matplotlib在任何文本中都接受Text表达式。
　　Tex表达式是有两个dollar符号环绕起来的,比如math-4cd9a23707.png的Tex表达式如下
　　plt.title(r'$\sigma_i=15$')
　　用python的matplotlib画标准正态曲线
　　import math
　　import pylab as pl
　　import numpy as np
　　def gd(x,m,s):
　　left=1/(math.sqrt(2*math.pi)*s)
　　right=math.exp(-math.pow(x-m,2)/(2*math.pow(s,2)))
　　return left*right
　　def showfigure():
　　x=np.arange(-4,5,0.1)
　　y=[]
　　for i in x:
　　y.append(gd(i,0,1))
　　pl.plot(x,y)
　　pl.xlim(-4.0,5.0)
　　pl.ylim(-0.2,0.5)
　　#
　　ax = pl.gca()
　　ax.spines['right'].set_color('none')
　　ax.spines['top'].set_color('none')
　　ax.xaxis.set_ticks_position('bottom')
　　ax.spines['bottom'].set_position(('data',0))
　　ax.yaxis.set_ticks_position('left')
　　ax.spines['left'].set_position(('data',0))
　　#add param
　　label_f1 = "$\mu=0,\ \sigma=1$"
　　pl.text(2.5,0.3,label_f1,fontsize=15,verticalalignment="top",
　　horizontalalignment="left")
　　label_f2 = r"$f(x)=\frac{1}{\sqrt{2\pi}\sigma}exp(-\frac{(x-\mu)^2}{2\sigma^2})$"
　　pl.text(1.5,0.4,label_f2,fontsize=15,verticalalignment="top"
　　,horizontalalignment="left")
　　pl.show()
　　python数据可视化matplotlib的使用
　　# -*- coding:UTF-8 -*-
　　import numpy as np
　　import matplotlib.pyplot as plt
　　from matplotlib.ticker import MultipleLocator
　　from pylab import mpl
　　import sys
　　reload(sys)
　　sys.setdefaultencoding('utf8')
　　xmajorLocator = MultipleLocator(10* 1) #将x轴主刻度标签设置为10* 1的倍数
　　ymajorLocator = MultipleLocator(0.1* 1) #将y轴主刻度标签设置为0.1 * 1的倍数
　　# 设置中文字体
　　mpl.rcParams['font.sans-serif'] = ['SimHei']
　　# 导入文件数据
　　#data = np.loadtxt('test44.txt', delimiter=None, dtype=float )
　　#data = [[1,2],[3,4],[5,6]]
　　data = [[1,5,10,20,30,40,50,60,70,80,90,100],[0.0201,0.0262,0.0324,0.0295,0.0221,0.0258,0.0254,0.0299,0.0275,0.0299,0.0291,0.0328],
　　[0.0193,0.0254,0.0234,0.0684,0.0693,0.0803,0.1008,0.098,0.0947,0.0934,0.1971,0.2123],[0.0209,0.1176,0.2143,0.2295,0.4176,0.5258,0.6471,0.6484,0.8193,0.829,0.832,0.943]]
　　data = np.array(data)
　　# 截取数组数据
　　x = data[0] #时间
　　y = data[1] # 类别一的Y值
　　y2 = data[2] #类别二的Y值
　　y3 = data[3] #类别三的Y值
　　plt.figure(num=1, figsize=(8, 6))
　　ax = plt.subplot(111)
　　ax.xaxis.set_major_locator(xmajorLocator)
　　ax.yaxis.set_major_locator(ymajorLocator)
　　ax.xaxis.grid(True, which='major') #x坐标轴的网格使用主刻度
　　ax.yaxis.grid(True, which='major') #x坐标轴的网格使用主刻度

　　plt.xlabel('时间/t',fontsize='xx-large')#Valid font>　　plt.ylabel('y-label',fontsize='xx-large')
　　plt.title('Title',fontsize='xx-large')
　　plt.xlim(0, 110)
　　plt.ylim(0, 1)
　　line1, = ax.plot(x, y, 'g.-',label="类别一",)
　　line2, = ax.plot(x,y2,'b*-',label="类别二",)
　　line3, = ax.plot(x,y3,'rD-',label="类别三",)
　　ax.legend((line1, line2,line3),('类别一','类别二','类别三'),loc=5) # loc可为1、2、3、4、5、6，分别为不同的位置
　　plt.show()
　　python matplotlib 生成x的三次方曲线图
　　import matplotlib.pyplot as plt
　　import numpy as np
　　x = np.linspace(-100,100,100)
　　y = x**3

　　plt.figure(num=3,figsize=(8,5)) #num xuhao;figsize long>　　l1=plt.plot(x,y,'p')  # quta is to return name to plt.legend(handles)
　　plt.xlim((-100,100))
　　plt.ylim((-100000,100000))
　　plt.xlabel('X') #x zhou label
　　plt.ylabel('Y')
　　ax = plt.gca()
　　ax.spines['right'].set_color('none')
　　ax.spines['top'].set_color('none') ##don't display border
　　ax.xaxis.set_ticks_position('bottom') ##set x zhou
　　ax.yaxis.set_ticks_position('left')
　　ax.spines['bottom'].set_position(('data',0))  #y 0 postition is x position
　　ax.spines['left'].set_position(('data',0))
　　###tu li
　　# labels can just set one label just post one line
　　plt.legend(handles=l1,labels='y=x**3',loc='best')  ##loc=location
　　plt.show()
　　python matplotlib 绘制三次函数图像
　　>>> from matplotlib import pyplot as pl
　　>>> import numpy as np
　　>>> from scipy import interpolate
　　>>> x = np.linspace(-10, 5, 100)
　　>>> y = -2*x**3 + 5*x**2 + 9
　　>>> pl.figure(figsize = (8, 4))
　　>>> pl.plot(x, y, color="blue", linewidth = 1.5)
　　[]
　　>>> pl.show()
　　pl.figure 设置绘图区大小
　　pl.plot 开始绘图, 并设置线条颜色, 以及线条宽度
　　pl.show 显示图像
　　python生成20个随机的DNA fasta格式文件
　　生成20个随机的文件，由于没有用到hash名字，文件名有可能会重复
　　每个文件中有30-50条序列  每条序列的长度为70-120个碱基
　　import os
　　import random
　　import string
　　print (dir(string))
　　letter = string.ascii_letters
　　os.chdir("D:\\")
　　bases = {1:"A", 2:"T", 3:"C", 4:"G"}
　　## Test random module , get random DNA base
　　Nth = random.randint(1,4)
　　print (bases[Nth])
　　## Create random DNA sequences
　　for i in range(20):
　　Number_of_Seq = random.randint(30,50)
　　filename = letter
　　with open("Sequences"+filename + \
　　str(Number_of_Seq)+ ".fasta", "w") as file_output:
　　for j in range(Number_of_Seq):
　　each_Seq=""
　　Rand_len = random.randint(70,120)
　　for k in range(Rand_len):
　　Nth = random.randint(1,4)
　　each_Seq += bases[Nth]
　　file_output.write(">seq_"+str(Number_of_Seq)+ \
　　"_"+str(Rand_len)+"\n")
　　file_output.write(each_Seq+"\n")
　　.lines.line2d>.figure.figure>
　　.lines.line2d>.figure.figure>

账号		自动登录	找回密码
密码			立即注册

大疆运维招人啦，

C++ :try 语句块和异常处理

C++的多态

Red Hat RHCE 8 (EX294) Cert Guide

Java/C++ 区别：看完这一篇，就够用！

别再用过时库了！这 13 个顶级 C++ 库才是

c++ size_t 和 int 的区别

[经验分享] pandas 读写sql数据库

浏览过的版块

扫码加入运维网微信交流群