pandas中文文档
pandas数据结构之间的二元操作,有两个关键点:
In [14]: df = pd.DataFrame({'one' : pd.Series(np.random.randn(3), index=['a', 'b', 'c']), ....: 'two' : pd.Series(np.random.randn(4), index=['a', 'b', 'c', 'd']), ....: 'three' : pd.Series(np.random.randn(3), index=['b', 'c', 'd'])}) ....: In [15]: df Out[15]: one two three a -1.101558 1.124472 NaN b -0.177289 2.487104 -0.634293 c 0.462215 -0.486066 1.931194 d NaN -0.456288 -1.222918 In [16]: row = df.iloc[1] In [17]: column = df['two'] In [18]: df.sub(row, axis='columns') Out[18]: one two three a -0.924269 -1.362632 NaN b 0.000000 0.000000 0.000000 c 0.639504 -2.973170 2.565487 d NaN -2.943392 -0.588625 In [19]: df.sub(row, axis=1) Out[19]: one two three a -0.924269 -1.362632 NaN b 0.000000 0.000000 0.000000 c 0.639504 -2.973170 2.565487 d NaN -2.943392 -0.588625 In [20]: df.sub(column, axis='index') Out[20]: one two three a -2.226031 0.0 NaN b -2.664393 0.0 -3.121397 c 0.948280 0.0 2.417260 d NaN 0.0 -0.766631 In [21]: df.sub(column, axis=0) Out[21]: one two three a -2.226031 0.0 NaN b -2.664393 0.0 -3.121397 c 0.948280 0.0 2.417260 d NaN 0.0 -0.766631此外,您可以将多索引DataFrame与Series对齐。
In [22]: dfmi = df.copy() In [23]: dfmi.index = pd.MultiIndex.from_tuples([(1,'a'),(1,'b'),(1,'c'),(2,'a')], ....: names=['first','second']) ....: In [24]: dfmi.sub(column, axis=0, level='second') Out[24]: one two three first second 1 a -2.226031 0.00000 NaN b -2.664393 0.00000 -3.121397 c 0.948280 0.00000 2.417260 2 a NaN -1.58076 -2.347391
使用Panel,描述匹配行为有点困难,因此算术方法(而且可能令人困惑?)为您提供了指定广播轴的选项。 例如,假设我们希望贬低特定轴上的数据。 这可以通过在轴上取平均值并在同一轴上广播来实现:
In [25]: major_mean = wp.mean(axis='major') In [26]: major_mean Out[26]: Item1 Item2 A -0.878036 -0.092218 B -0.060128 0.529811 C 0.099453 -0.715139 D 0.248599 -0.186535 In [27]: wp.sub(major_mean, axis='major') Out[27]: <class 'pandas.core.panel.Panel'> Dimensions: 2 (items) x 5 (major_axis) x 4 (minor_axis) Items axis: Item1 to Item2 Major_axis axis: 2000-01-01 00:00:00 to 2000-01-05 00:00:00 Minor_axis axis: A to D