Python基本数据统计(三)---- Grouping & Merge
6. Grouping6.1 统计近一年每个月的开盘天数:
In : tempdf
Out:
open close high low volumemonth
2016-01-2257.28381954.01690757.77434753.11433443783400.0 1
2016-01-2553.42827253.97766454.71345553.11433418498300.0 1
...
2017-01-2075.98999876.19999776.91000475.389999 8382000.0 1
In : tempdf.groupby('month').count()
Out:
openclosehighlowvolume
month
1 19 19 19 19 19
2 20 20 20 20 20
...
12 21 21 21 21 21
In : tempdf.groupby('month').count().open
Out:
month
1 19
2 20
3 22
4 21
5 21
6 22
7 20
8 23
9 21
10 21
11 21
12 21
Name: open, dtype: int64
tempdf.groupby('month').count().open 6.2 统计近一年每个月的总成交量:(计算总和,再选出需求列)
In : tempdf.groupby('month').sum()
Out:
open close high low volume
month
1 1317.4359621311.8727161328.9477051301.086107190434600.0
2 1056.4690421058.8530171066.9957611046.069857159711400.0
3 1281.8370691286.4382201293.4233431270.692270129864200.0
4 1296.2965701299.9647891308.3677771288.042310 99855500.0
5 1332.0886831333.1829261342.9424081323.906527 96741700.0
6 1365.4021001363.3150791376.7716591350.264324121788800.0
7 1244.5623201247.5247311254.5591661237.280184 90064900.0
8 1484.6747951484.1893121492.5712081475.708341 77514100.0
9 1342.6681121341.9547641353.1504151332.294785 95572800.0
10 1331.6114931329.2811841341.2366381320.451196116243400.0
11 1459.8836711463.7272781471.1556031449.657290 99527200.0
12 1549.8897901549.3521071561.0820601538.528258 75948200.0
In : tempdf.groupby('month').sum().volume
Out:
month
1 190434600.0
2 159711400.0
3 129864200.0
4 99855500.0
5 96741700.0
6 121788800.0
7 90064900.0
8 77514100.0
9 95572800.0
10 116243400.0
11 99527200.0
12 75948200.0
Name: volume, dtype: float64
tempdf.groupby('month').sum().volume 6.3 前两例总结为df_object.groupby('month')后跟.sum().mean().min().max()
6.4 更高效地统计近一年每个月的总成交量:(先选出需求列,再计算和。处理大数据时效果明显)
In : tempdf.groupby('month').volume.sum()
Out:
month
1 190434600.0
2 159711400.0
3 129864200.0
4 99855500.0
5 96741700.0
6 121788800.0
7 90064900.0
8 77514100.0
9 95572800.0
10 116243400.0
11 99527200.0
12 75948200.0
Name: volume, dtype: float64
tempdf.groupby('month').volume.sum() 7. Merge
7.1 Append
In : quotesdf[:2].append(quotesdf['2016-12-01':'2016-12-05'])
Out:
open close high low volume
2016-01-2257.28381954.01690757.77434753.11433443783400.0
2016-01-2553.42827253.97766454.71345553.11433418498300.0
2016-12-0172.14231472.22197672.46095571.475161 4298200.0
2016-12-0272.21201671.55482372.35142971.126648 2959100.0
2016-12-0572.02282971.72410072.33150971.684269 3929100.0
quotesdf[:2].append(quotesdf['2016-12-01':'2016-12-05']) 7.2 Concat
7.2.1 两个相同逻辑结构的对象连接
In : pieces = ,tempdf]
In : pieces
Out:
[ open close high low volumemonth
2016-01-2257.28381954.01690757.77434753.11433443783400.0 1
2016-01-2553.42827253.97766454.71345553.11433418498300.0 1,
open close high low volumemonth
2017-01-1977.61000176.69000277.77999976.6100018111400.0 1
2017-01-2075.98999876.19999776.91000475.3899998382000.0 1]
In : pd.concat(pieces)
Out:
open close high low volumemonth
2016-01-2257.28381954.01690757.77434753.11433443783400.0 1
2016-01-2553.42827253.97766454.71345553.11433418498300.0 1
2017-01-1977.61000176.69000277.77999976.610001 8111400.0 1
2017-01-2075.98999876.19999776.91000475.389999 8382000.0 1
pd.concat(,tempdf]) 7.2.2 两个不同逻辑结构的对象连接
In : piece1 = quotesdf[:3]
In : piece2 = tempdf[:3]
In : piece1
Out:
open close high low volume
2016-01-2257.28381954.01690757.77434753.11433443783400.0
2016-01-2553.42827253.97766454.71345553.11433418498300.0
2016-01-2654.15425554.04633854.68402653.91880012844800.0
In : piece2
Out:
open close high low volumemonth
2016-01-2257.28381954.01690757.77434753.11433443783400.0 1
2016-01-2553.42827253.97766454.71345553.11433418498300.0 1
2016-01-2654.15425554.04633854.68402653.91880012844800.0 1
In : pd.concat(,ignore_index=True)
Out:
close high lowmonth open volume
054.01690757.77434753.114334 NaN57.28381943783400.0
153.97766454.71345553.114334 NaN53.42827218498300.0
254.04633854.68402653.918800 NaN54.15425512844800.0
354.01690757.77434753.114334 1.057.28381943783400.0
453.97766454.71345553.114334 1.053.42827218498300.0
554.04633854.68402653.918800 1.054.15425512844800.0
In : pd.concat(,ignore_index=False)
Out:
close high lowmonth open volume
2016-01-2254.01690757.77434753.114334 NaN57.28381943783400.0
2016-01-2553.97766454.71345553.114334 NaN53.42827218498300.0
2016-01-2654.04633854.68402653.918800 NaN54.15425512844800.0
2016-01-2254.01690757.77434753.114334 1.057.28381943783400.0
2016-01-2553.97766454.71345553.114334 1.053.42827218498300.0
2016-01-2654.04633854.68402653.918800 1.054.15425512844800.0
pd.concat(,ignore_index=True) 7.3 Join-- 将 与 合并
In : AKdf
Out:
volume codemonth
month
1 190434600.0AXP 1
2 159711400.0AXP 2
3 129864200.0AXP 3
4 99855500.0AXP 4
5 96741700.0AXP 5
6 121788800.0AXP 6
7 90064900.0AXP 7
8 77514100.0AXP 8
9 95572800.0AXP 9
10 116243400.0AXP 10
11 99527200.0AXP 11
12 75948200.0AXP 12
1 244121600.0 KO 1
2 304133400.0 KO 2
3 347238100.0 KO 3
4 317282000.0 KO 4
5 230074600.0 KO 5
6 265483400.0 KO 6
7 235959400.0 KO 7
8 235118300.0 KO 8
9 251007200.0 KO 9
10 264839100.0 KO 10
11 316271200.0 KO 11
12 280949400.0 KO 12
In : djidf
Out:
code name lasttrade
0 AAPL Apple Inc. 120.000
1 AXP American Express Company 76.200
2 BA The Boeing Company 159.530
3 CAT Caterpillar Inc. 94.580
4 CSCO Cisco Systems, Inc. 30.100
5 CVX Chevron Corporation 115.600
6 DD E. I. du Pont de Nemours and Company 73.030
7 DIS The Walt Disney Company 107.660
8 GE General Electric Company 30.530
9 GS The Goldman Sachs Group, Inc. 232.200
10 HD The Home Depot, Inc. 135.600
11 IBMInternational Business Machines Corporation 170.550
12INTC Intel Corporation 36.940
13 JNJ Johnson & Johnson 114.150
14 JPM JPMorgan Chase & Co. 83.670
15 KO The Coca-Cola Company 41.320
16 MCD McDonald's Corporation 122.260
17 MMM 3M Company 178.490
18 MRK Merck & Co., Inc. 62.530
19MSFT Microsoft Corporation 62.740
20 NKE NIKE, Inc. 53.200
21 PFE Pfizer Inc. 31.770
22 PG The Procter & Gamble Company 87.450
23 TRV The Travelers Companies, Inc. 118.020
24 UNH UnitedHealth Group Incorporated 158.660
25 UTX United Technologies Corporation 110.790
26 V Visa Inc. 81.840
27 VZ Verizon Communications Inc. 52.720
28 WMT Wal-Mart Stores, Inc. 67.180
29 XOM Exxon Mobil Corporation 85.890
In : pd.merge(djidf,AKdf,on='code').drop(['lasttrade'],axis=1)
Out:
code name volumemonth
0 AXPAmerican Express Company190434600.0 1
1 AXPAmerican Express Company159711400.0 2
2 AXPAmerican Express Company129864200.0 3
3 AXPAmerican Express Company 99855500.0 4
4 AXPAmerican Express Company 96741700.0 5
5 AXPAmerican Express Company121788800.0 6
6 AXPAmerican Express Company 90064900.0 7
7 AXPAmerican Express Company 77514100.0 8
8 AXPAmerican Express Company 95572800.0 9
9 AXPAmerican Express Company116243400.0 10
10AXPAmerican Express Company 99527200.0 11
11AXPAmerican Express Company 75948200.0 12
12 KO The Coca-Cola Company244121600.0 1
13 KO The Coca-Cola Company304133400.0 2
14 KO The Coca-Cola Company347238100.0 3
15 KO The Coca-Cola Company317282000.0 4
16 KO The Coca-Cola Company230074600.0 5
17 KO The Coca-Cola Company265483400.0 6
18 KO The Coca-Cola Company235959400.0 7
19 KO The Coca-Cola Company235118300.0 8
20 KO The Coca-Cola Company251007200.0 9
21 KO The Coca-Cola Company264839100.0 10
22 KO The Coca-Cola Company316271200.0 11
23 KO The Coca-Cola Company280949400.0 12
pd.merge(djidf,AKdf,on='code').drop(['lasttrade'],axis=1)+ =
页:
[1]