Python中的Paste R相当于什么？

Question

Python中的Paste R相当于什么？

53

我是一个新的Python爱好者。对于R用户来说，有一个函数叫做"paste"，可以帮助将数据框中的两个或多个变量连接在一起。它非常有用。例如，假设我有这个数据框：

   categorie titre tarifMin  lieu  long   lat   img dateSortie
1      zoo,  Aquar      0.0 Aquar 2.385 48.89 ilo,0           
2      zoo,  Aquar      4.5 Aquar 2.408 48.83 ilo,0           
6      lieu  Jardi      0.0 Jardi 2.320 48.86 ilo,0           
7      lieu  Bois       0.0 Bois  2.455 48.82 ilo,0           
13     espac Canal      0.0 Canal 2.366 48.87 ilo,0           
14     espac Canal     -1.0 Canal 2.384 48.89 ilo,0           
15     parc  Le Ma     20.0 Le Ma 2.353 48.87 ilo,0

我想在数据框中创建一个新列，该列使用数据框中的另一列和一些文本。使用R，我可以这样做：

我希望创建一个新的数据框列，其中包含原始数据框中某一列的值以及自定义文本。在R中，我可以执行以下操作：

> y$thecolThatIWant=ifelse(y$tarifMin!=-1,
+                             paste("Evenement permanent  -->",y$categorie,
+                                   y$titre,"C  partir de",y$tarifMin,"€uros"),
+                             paste("Evenement permanent  -->",y$categorie,
+                                   y$titre,"sans prix indique"))

结果是：

> y
   categorie titre tarifMin  lieu  long   lat   img dateSortie
1      zoo,  Aquar      0.0 Aquar 2.385 48.89 ilo,0           
2      zoo,  Aquar      4.5 Aquar 2.408 48.83 ilo,0           
6      lieu  Jardi      0.0 Jardi 2.320 48.86 ilo,0           
7      lieu  Bois       0.0 Bois  2.455 48.82 ilo,0           
13     espac Canal      0.0 Canal 2.366 48.87 ilo,0           
14     espac Canal     -1.0 Canal 2.384 48.89 ilo,0           
15     parc  Le Ma     20.0 Le Ma 2.353 48.87 ilo,0           
                                                thecolThatIWant
1  Evenement permanent  --> zoo,  Aquar C  partir de  0.0 €uros
2  Evenement permanent  --> zoo,  Aquar C  partir de  4.5 €uros
6  Evenement permanent  --> lieu  Jardi C  partir de  0.0 €uros
7  Evenement permanent  --> lieu  Bois  C  partir de  0.0 €uros
13 Evenement permanent  --> espac Canal C  partir de  0.0 €uros
14 Evenement permanent  --> espac Canal C  partir de -1.0 €uros
15 Evenement permanent  --> parc  Le Ma C  partir de 20.0 €uros

我的问题是：如何在Python Pandas或其他模块中执行相同的操作？

到目前为止我尝试了什么：由于我是一个非常新的用户，所以对于我的错误感到抱歉。我尝试在Python中复制示例，假设我得到了类似于这样的东西。

table=pd.read_csv("y.csv",sep=",")
tt= table.loc[:,['categorie','titre','tarifMin','long','lat','lieu']]
table
ategorie    titre   tarifMin    long    lat     lieu
0   zoo,    Aquar   0.0     2.385   48.89   Aquar
1   zoo,    Aquar   4.5     2.408   48.83   Aquar
2   lieu    Jardi   0.0     2.320   48.86   Jardi
3   lieu    Bois    0.0     2.455   48.82   Bois
4   espac   Canal   0.0     2.366   48.87   Canal
5   espac   Canal   -1.0    2.384   48.89   Canal
6   parc    Le Ma   20.0    2.353   48.87   Le Ma

我基本上尝试过这个

sc="Even permanent -->" + " "+ tt.titre+" "+tt.lieu
tt['theColThatIWant'] = sc
tt

我得到了这个

    categorie   titre   tarifMin    long    lat     lieu    theColThatIWant
0   zoo,    Aquar   0.0     2.385   48.89   Aquar   Even permanent --> Aquar Aquar
1   zoo,    Aquar   4.5     2.408   48.83   Aquar   Even permanent --> Aquar Aquar
2   lieu    Jardi   0.0     2.320   48.86   Jardi   Even permanent --> Jardi Jardi
3   lieu    Bois    0.0     2.455   48.82   Bois    Even permanent --> Bois Bois
4   espac   Canal   0.0     2.366   48.87   Canal   Even permanent --> Canal Canal
5   espac   Canal   -1.0    2.384   48.89   Canal   Even permanent --> Canal Canal
6   parc    Le Ma   20.0    2.353   48.87   Le Ma   Even permanent --> Le Ma Le Ma

现在，如果没有像 R 中一样的向量化操作，我想我必须使用循环结构来实现吗？

- GjT

1

有很多方法可以做到这一点，但由于大多数Python代码都不是“向量化”的，所以通常涉及迭代器或某种列表推导的版本。请分享您迄今为止尝试过的内容以及为什么它没有起作用。 - Justin

1

这里有一些现有的菜谱（不包括复制）：http://pandas.pydata.org/pandas-docs/dev/comparison_with_r.html - Jeff

9个回答

20

这是一个简单的实现，可用于列表和其他可迭代对象。警告：它只进行了轻微的测试，并且仅适用于Python 3.5+：

from functools import reduce

def _reduce_concat(x, sep=""):
    return reduce(lambda x, y: str(x) + sep + str(y), x)
        
def paste(*lists, sep=" ", collapse=None):
    result = map(lambda x: _reduce_concat(x, sep=sep), zip(*lists))
    if collapse is not None:
        return _reduce_concat(result, sep=collapse)
    return list(result)

assert paste([1,2,3], [11,12,13], sep=',') == ['1,11', '2,12', '3,13']
assert paste([1,2,3], [11,12,13], sep=',', collapse=";") == '1,11;2,12;3,13'

您还可以尝试更多的有趣操作，例如复制其他函数，如paste0：

from functools import partial

paste0 = partial(paste, sep="")

编辑：这里有一个 Repl.it 项目，其中包括此代码的类型注释版本。

- shadowtalker

1

谢谢！这对我非常有效，而且比我之前尝试的列表综合方法执行得更快。 - paulstey

1

我认为你可以将这个答案变成一个综合性的评估，列举不同情况下的替代方案，以吸引更多的赞同票，并为未来的读者提供参考。 - Hack-R

这个函数并不是 R 行为的很好模仿者。以下是它可能失败的简单示例：

paste("Hello", ["Ben", "Mike"]) # ['H Ben', 'e Mike'] # 不是我们想要的。
paste(["Hello"], ["Ben", "Mike"]) # ['Hello Ben'] # 不是我们想要的。
paste("a", ["Ben", "Mike"]) # ['a Ben'] # 不是我们想要的。

- Tal Galili

6

对于这种情况，R中的paste运算符与Python的format最为接近，后者在Python 2.6中添加。它比旧的%运算符更新且更加灵活。

如果纯粹使用Python而不使用numpy或pandas，以下是一种使用原始数据形式的列表嵌套列表的方法（也可以使用字典列表，但我认为那样会更加混乱）。

# -*- coding: utf-8 -*-
names=['categorie','titre','tarifMin','lieu','long','lat','img','dateSortie']

records=[[
    'zoo',   'Aquar',     0.0,'Aquar',2.385,48.89,'ilo',0],[
    'zoo',   'Aquar',     4.5,'Aquar',2.408,48.83,'ilo',0],[
    'lieu',  'Jardi',     0.0,'Jardi',2.320,48.86,'ilo',0],[
    'lieu',  'Bois',      0.0,'Bois', 2.455,48.82,'ilo',0],[
    'espac', 'Canal',     0.0,'Canal',2.366,48.87,'ilo',0],[
    'espac', 'Canal',    -1.0,'Canal',2.384,48.89,'ilo',0],[
    'parc',  'Le Ma',    20.0,'Le Ma', 2.353,48.87,'ilo',0] ]

def prix(p):
    if (p != -1):
        return 'C  partir de {} €uros'.format(p)
    return 'sans prix indique'

def msg(a):
    return 'Evenement permanent  --> {}, {} {}'.format(a[0],a[1],prix(a[2]))

[m.append(msg(m)) for m in records]

from pprint import pprint

pprint(records)

结果是这样的：

[['zoo',
  'Aquar',
  0.0,
  'Aquar',
  2.385,
  48.89,
  'ilo',
  0,
  'Evenement permanent  --> zoo, Aquar C  partir de 0.0 \xe2\x82\xacuros'],
 ['zoo',
  'Aquar',
  4.5,
  'Aquar',
  2.408,
  48.83,
  'ilo',
  0,
  'Evenement permanent  --> zoo, Aquar C  partir de 4.5 \xe2\x82\xacuros'],
 ['lieu',
  'Jardi',
  0.0,
  'Jardi',
  2.32,
  48.86,
  'ilo',
  0,
  'Evenement permanent  --> lieu, Jardi C  partir de 0.0 \xe2\x82\xacuros'],
 ['lieu',
  'Bois',
  0.0,
  'Bois',
  2.455,
  48.82,
  'ilo',
  0,
  'Evenement permanent  --> lieu, Bois C  partir de 0.0 \xe2\x82\xacuros'],
 ['espac',
  'Canal',
  0.0,
  'Canal',
  2.366,
  48.87,
  'ilo',
  0,
  'Evenement permanent  --> espac, Canal C  partir de 0.0 \xe2\x82\xacuros'],
 ['espac',
  'Canal',
  -1.0,
  'Canal',
  2.384,
  48.89,
  'ilo',
  0,
  'Evenement permanent  --> espac, Canal sans prix indique'],
 ['parc',
  'Le Ma',
  20.0,
  'Le Ma',
  2.353,
  48.87,
  'ilo',
  0,
  'Evenement permanent  --> parc, Le Ma C  partir de 20.0 \xe2\x82\xacuros']]

请注意，虽然我定义了一个列表names，但它实际上并没有被使用。你可以定义一个字典，以标题名称为键，字段编号（从0开始）为值，但我没有这样做是为了保持示例简单。

函数prix和msg非常简单。唯一棘手的部分是列表推导式[m.append(msg(m)) for m in records]，它遍历所有记录，并修改每个记录以添加通过调用msg创建的新字段。

- Edward

好的。谢谢。它的工作方式就是这样；但我认为Python panda的lowtech版本更适合我的用途。 - GjT

2

我的答案是基于原问题的，它是从woles的答案编辑而来的。我想举例说明以下几点：

%在Python中是paste操作符
使用apply函数可以创建新值并将其分配给新列

对于R语言用户：直接形式上没有ifelse函数（但有一些方法可以很好地替代它）。

import numpy as np
import pandas as pd

dates = pd.date_range('20140412',periods=7)
df = pd.DataFrame(np.random.randn(7,4),index=dates,columns=list('ABCD'))
df['categorie'] = ['z', 'z', 'l', 'l', 'e', 'e', 'p']

def apply_to_row(x):
    ret = "this is the value i want: %f" % x['A']
    if x['B'] > 0:
        ret = "no, this one is better: %f" % x['C']
    return ret

df['theColumnIWant'] = df.apply(apply_to_row, axis = 1)
print df

- lowtech

太好了，这正是我想要的。但是当我尝试粘贴两个以上的元素时，我遇到了一些问题。似乎def apply_to_row(x):ret = "this is the value i want: %s" % x['A'] % "euros"无法工作。我正在寻找其他的东西，如果成功了，我会分享的。 - GjT

@GjT 应该是 ret = "这是我想要的值：%s 欧元" % x['A'] - lowtech

numpy.where 相当于 R 中的 ifelse。 - xingzhi.sg

2

You can trypandas.Series.str.cat

import pandas as pd
def paste0(ss,sep=None,na_rep=None,):
    '''Analogy to R paste0'''
    ss = [pd.Series(s) for s in ss]
    ss = [s.astype(str) for s in ss]
    s = ss[0]
    res = s.str.cat(ss[1:],sep=sep,na_rep=na_rep)
    return res

pasteA=paste0

Or just sep.join()

#

def paste0(ss,sep=None,na_rep=None, 
    castF=unicode, ##### many languages dont work well with str
):
    if sep is None:
        sep=''
    res = [castF(sep).join(castF(s) for s in x) for x in zip(*ss)]
    return res
pasteB = paste0


%timeit pasteA([range(1000),range(1000,0,-1)],sep='_')
# 100 loops, best of 3: 7.11 ms per loop
%timeit pasteB([range(1000),range(1000,0,-1)],sep='_')
# 100 loops, best of 3: 2.24 ms per loop

I have used itertools to mimic recycling

import itertools
def paste0(ss,sep=None,na_rep=None,castF=unicode):
    '''Analogy to R paste0
    '''
    if sep is None:
        sep=u''
    L = max([len(e) for e in ss])
    it = itertools.izip(*[itertools.cycle(e) for e in ss])
    res = [castF(sep).join(castF(s) for s in next(it) ) for i in range(L)]
    # res = pd.Series(res)
    return res

patsy might be relevant (not an experienced user myself.)

- shouldsee

1

让我们尝试使用apply函数。

df.apply( lambda x: str( x.loc[ desired_col ] ) + "pasting?" , axis = 1 )

你将会收到类似粘贴的东西

- 胡亦朗

1

如果您只想将两个字符串列粘贴在一起，您可以简化@shouldsee的答案，因为您不需要创建函数。例如，在我的情况下：

df['newcol'] = df['id_part_one'].str.cat(df['id_part_two'], sep='_')

可能需要将两个系列的数据类型都设置为object，才能实现这一点（我还没有验证）。

- Corey Levinson

0

其实有一种非常简单的方法。你只需要将你的变量转换为一个字符串。例如，尝试运行这个：

a = 1; b = "you are number " + str(a); b

- Roxy

0

这是一个简单的例子，展示如何实现你想要做的事情（如果我没理解错）：

import numpy as np
import pandas as pd

dates = pd.date_range('20130101',periods=6)
df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))
for row in df.itertuples():
    index, A, B, C, D = row
    print '%s Evenement permanent  --> %s , next data %s' % (index, A, B)

输出：

>>>df
                   A         B         C         D
2013-01-01 -0.400550 -0.204032 -0.954237  0.019025
2013-01-02  0.509040 -0.611699  1.065862  0.034486
2013-01-03  0.366230  0.805068 -0.144129 -0.912942
2013-01-04  1.381278 -1.783794  0.835435 -0.140371
2013-01-05  1.140866  2.755003 -0.940519 -2.425671
2013-01-06 -0.610569 -0.282952  0.111293 -0.108521

这是用于打印的循环： 2013-01-01 00:00:00 永久事件 --> -0.400550121168，下一个数据为-0.204032344442

2013-01-02 00:00:00 Evenement permanent  --> 0.509040318928 , next data -0.611698560541

2013-01-03 00:00:00 Evenement permanent  --> 0.366230438863 , next data 0.805067758304

2013-01-04 00:00:00 Evenement permanent  --> 1.38127775713 , next data -1.78379439485

2013-01-05 00:00:00 Evenement permanent  --> 1.14086631509 , next data 2.75500268167

2013-01-06 00:00:00 Evenement permanent  --> -0.610568516983 , next data -0.282952162792

- Michał

我在我的数据框中没有索引。我已经尝试过了，但它不起作用。除此之外，我有一个关于一个变量中的值的条件，以显示不同的值。但是，谢谢。 - GjT

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- SAHIL BHANGE · Accepted Answer

59

这很像R中的粘贴（Paste）命令：

R代码：

 words = c("Here", "I","want","to","concatenate","words","using","pipe","delimeter")
 paste(words,collapse="|")

"这里|我|想要|使用|管道符号|连接|单词"

Python:

words = ["Here", "I","want","to","concatenate","words","using","pipe","delimeter"]
"|".join(words)

结果：

'这里|我|想要|使用|管道|分隔符|连接|单词'

- SAHIL BHANGE

1

我很惊讶这个答案没有排名更靠前。join函数是一个非常简单和短小的实现。 - bart cubrich

如果在这个列表中有一个数字，比如 i=1，那么 words=[i, ".jpg"]。 - mikey

我认为这个答案在你只需要通过一个表达式（例如“_”）连接单词时很好。然而，当涉及到 paste / paste0 函数的其他用途的稍微复杂的示例时：paste0(coeff，“（”，CI_lower，“，”，CI_higher，“）”），这种方法就不再适用了。 - Charlotte Deng

1

主要区别在于 R 函数是矢量化的。如果 states = [TX, CA, NY] 和 numbers = [1, 2, 3]，那么 paste 函数应返回 ['TX1', 'CA2', 'NY3']。R 的问题更简单，因为 Python 需要考虑更多类型：列表、NumPy 数组、pandas Series，因此如果 numbers 是 NumPy 数组，而 states 是 Series，则不清楚返回类型应该是什么。在这方面，R 比 Python 更符合 pep20 的“只有一种方法”的指令。 - Steven Scott