如何将MultiIndex转换为字符串类型

6

考虑 MultiIndex idx

idx = pd.MultiIndex.from_product([range(2013, 2016), range(1, 5)])

当我操作

idx.to_series().str.join(' ')

我明白了

2013  1   NaN
      2   NaN
      3   NaN
      4   NaN
2014  1   NaN
      2   NaN
      3   NaN
      4   NaN
2015  1   NaN
      2   NaN
      3   NaN
      4   NaN
dtype: float64

这是因为不同层级的数据类型是int而不是strjoin函数需要的是一个str类型。如何将整个idx转换为str
我已经完成了。
join = lambda x, delim=' ': delim.join([str(y) for y in x])
idx.to_series().apply(join, delim=' ')

2013  1    2013 1
      2    2013 2
      3    2013 3
      4    2013 4
2014  1    2014 1
      2    2014 2
      3    2014 3
      4    2014 4
2015  1    2015 1
      2    2015 2
      3    2015 3
      4    2015 4
dtype: object

我想可能有一种我忽视的更简单的方法。
4个回答

5
像这样的吗?
idx.to_series().apply(lambda x: '{0}-{1}'.format(*x))

这非常类似于我已经完成的内容。但我喜欢它。 - piRSquared
@piRSquared 对,我没意识到你想要一个列表。你可以在上面的行后附加.values来获得与被接受的答案相同的输出。 - bananafish
请看下面我的回答。我想要一种高效而优雅的方法将索引转换为dtype str。 - piRSquared

3

我不确定这是最优雅的方法,但应该可以工作:

idx.get_level_values(0).astype(str).values + ' ' + idx.get_level_values(1).astype(str).values

这是最接近我的意图的。 - piRSquared

1
一种使用itertools中的starmap的通用解决方案。
from itertools import starmap

def flat2(midx, sep=''):
    fstr = sep.join(['{}'] * midx.nlevels)
    return pd.Index(starmap(fstr.format, midx))

演示

midx = pd.MultiIndex.from_product([[1, 2], [3, 4]])

flat(midx)
Index([u'13', u'14', u'23', u'24'], dtype='object')

flat(midx, '_')
Index([u'1_3', u'1_4', u'2_3', u'2_4'], dtype='object')

1

最快的是列表推导式:

print (['{} {}'.format(i[1], i[0]) for i in idx])
print ([' '.join((str(i[0]), str(i[1]))) for i in idx])

时间表:

In [21]: %timeit (['{} {}'.format(i[1], i[0]) for i in idx])
The slowest run took 4.68 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 7.51 µs per loop

In [22]: %timeit ([' '.join((str(i[0]), str(i[1]))) for i in idx])
The slowest run took 6.48 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 9.62 µs per loop

In [23]: %timeit (idx.get_level_values(0).astype(str).values + ' ' + idx.get_level_values(1).astype(str).values)
The slowest run took 5.91 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 215 µs per loop

In [24]: %timeit idx.to_series().apply(lambda x: '{0}-{1}'.format(*x))
The slowest run took 5.43 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 369 µs per loop

In [25]: %timeit idx.to_series().str.join(' ')
The slowest run took 5.53 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 394 µs per loop

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接