简要说明:
如果您的数据中有重复的列名,请在读取文件时将一个列重命名。
如果您的数据中有NaN
等值,请将其删除。
然后使用下面正确的答案进行合并。
可能是一个非常简单的问题。
我有两个数据集,使用pandas.read_csv()
读取。
我的数据在两个分离的csv文件中。
使用以下代码:
import mibian
import pandas as pd
underlying = pd.read_csv("txt1.csv", names=['dt1','price']);
options = pd.read_csv("txt2.txt", names=['dt2','ticker','maturity','strike','cP','px','strike','yield','rF','T','rlzd10']);
merged = underlying.merge(options, left_on='dt1', right_on='dt2');
我的两个数据头长这样:
>>> underlying.head();
0 1
0 20040326 3.579987
1 20040329 3.690494
2 20040330 3.755247
3 20040331 3.719373
4 20040401 3.728671
并且
>>> options.head();
0 1 2 3 4 5 6 7 8 9 10
0 20130628 SVXY 20130817 32.5 call 39.22 32.5 0 0.005 0.136986 0.411224
所以我要合并的数据集上的列0是我想要合并的关键字,我希望保留来自两个结果集的所有数据。
我应该如何做?我在网上找到的所有示例都需要关键字,但我的结果中没有。
但是在连接上,我收到以下错误:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Applications/Spyder.app/Contents/Resources/lib/python2.7/spyderlib/widgets/externalshell/sitecustomize.py", line 540, in runfile
execfile(filename, namespace)
File "/Users/jasonmellone/.spyder2/.temp.py", line 12, in <module>
merged = underlying.merge(options, left_on='dt1', right_on='dt2',how='outer');
File "/Library/Python/2.7/site-packages/pandas-0.13.0-py2.7-macosx-10.9-intel.egg/pandas/core/frame.py", line 3723, in merge
suffixes=suffixes, copy=copy)
File "/Library/Python/2.7/site-packages/pandas-0.13.0-py2.7-macosx-10.9-intel.egg/pandas/tools/merge.py", line 40, in merge
return op.get_result()
File "/Library/Python/2.7/site-packages/pandas-0.13.0-py2.7-macosx-10.9-intel.egg/pandas/tools/merge.py", line 197, in get_result
result_data = join_op.get_result()
File "/Library/Python/2.7/site-packages/pandas-0.13.0-py2.7-macosx-10.9-intel.egg/pandas/tools/merge.py", line 722, in get_result
return BlockManager(result_blocks, self.result_axes)
File "/Library/Python/2.7/site-packages/pandas-0.13.0-py2.7-macosx-10.9-intel.egg/pandas/core/internals.py", line 1954, in __init__
self._set_ref_locs(do_refs=True)
File "/Library/Python/2.7/site-packages/pandas-0.13.0-py2.7-macosx-10.9-intel.egg/pandas/core/internals.py", line 2091, in _set_ref_locs
'have _ref_locs set' % (block, labels))
AssertionError: Cannot create BlockManager._ref_locs because block [IntBlock: [dt1], 1 x 372145, dtype: int64] with duplicate items [Index([u'dt1', u'price', u'dt2', u'ticker', u'maturity', u'strike', u'cP', u'px', u'strike', u'yield', u'rF', u'T', u'rlzd10'], dtype='object')] does not have _ref_locs set
我搜索了我的数据集,没有重复的内容。
谢谢!
AssertionError: Cannot create BlockManager._ref_locs because block [IntBlock:
。 - jason msvxySynthetic.csv
文件中dt1
有独特的值,但是optionsArg
中的dt2
存在重复值,因为你有一个call
条目和一个put
条目。实际上,在 372032 行数据中,你只有 2411 个唯一的dt2
值,所以你希望如何合并这些值? - EdChummerged=options.merge(underlying, left_on'dt2', right_on'dt1', how='left')
- EdChum