I feel like characterising Pandas as "improving on" Numpy/SciPy misses much of the point. Numpy/Scipy are quite focussed on efficient numeric calculation and solving numeric problems of the sort that scientists and engineers often solve. If your problem starts out with formulae and involves numerical solution from there, you're probably good with those two.
Pandas is much more aligned with problems that start with data stored in files or databases and which contain strings as well as numbers. Consider the problem of reading data from a database query. In Pandas, you can
read_sql_query
directly and have a usable version of the data in one line. There is no equivalent functionality in Numpy/SciPy.
For data featuring strings or discrete rather than continuous data, there is no equivalent to the
groupby
capability, or the database-like joining of tables on matching values.
对于时间序列来说,使用日期时间索引可以使
处理时间序列数据更加方便,您可以平滑地重新采样到不同的间隔,填充值并且非常容易地绘制系列。
由于我的许多问题最初都在电子表格中出现,因此我也非常感激在
.xls
和
.xlsx
格式中相对透明地处理Excel文件的
统一接口。
此外,还有更大的生态系统,例如seaborn等软件包,使得比基本的numpy/scipy工具更流畅的统计分析和模型拟合成为可能。