Python pandas:将字符串拆分为多列,并从拆分参数中提取列的数据

3

我对Python和Pandas都很陌生,有一列URL路径需要处理,我想把它分成单独的列。

字符串的每个参数都由分号分隔。

我知道有很多其他答案可以按分隔符将数据拆分为多个列,但在我的示例中,我想动态创建列,并从参数本身提取要放入每个列中的值。

每个参数应放置在其参数内的列中,数据位于等号后面。我想将等号后面的数据放入等号前面的列中。

例如:

cat=be_thnky;u1=men
cat=be_thnky;u1=custom

Should become

cat      u1
be_thnky men
be_thnky custom

为增加复杂性,不是每个URL都存在所有参数,如果参数不存在,我希望该列包含NaN。

我正在处理的一些示例URL路径字符串如下:

;src=4457426;type=be_salec;cat=be_thnky;qty=1;cost=60.00;ord=50608803;gtm=G64;gcldc=*;gclaw=*;gac=UA-32723457-1:*;u1=men;u2=schoenen;u3=none;u5=VA38G1NRI;u6=80;u7=0;u8=1;u9=EUR;u10=be;u11=Suede Old Skool Shoes;u12=checkout;u13=8;u14=VNIWTYI926IW7;u15=https://www.vans.be/webapp/wcs/stores/servlet/OrderOKView?langId=-27&catalogId=11260&storeId=10167&krypto=%2B03C782RqELOiuY1L2ELV7hFeTRMquZ9Eyr1lJqmoSQhClENiUJ6feRNwwAA1ZYd4V7tkAuIwyiIrClp7QaqfLeC%2B%2FPTLl7wSF%2FCyrVWqgiSJRgAS%2BWbXohu0DG8xsdPnSXp%2F%2F4MDb%2FkbPwh%2FT5EpiEWMkGur%2Fx%2FABR7Cvs4jh345776IITNx%2FTRZZXu4zeAco5P%2FvxyqDbmwvLKpPKljf3TpU0wOCmjCDWR5r3uR3ELErPFboWuV5H24FOIy7e%2B2b6m4YhCCDuzceKa5Qllkiwc4YI6AL9rIK1T2jExde343vk%2B4FZtK6XgOMtxbwv6pBIUMX%2Bn3kbb7soGQ%2FjnEwxzxMX5P%2FdMZzts6NkskMSICB955QKsZqPLepiS%2BWY5u5%2Bs9CPjquK%2FlsXmHTi26wq1cLqeiPdyolnE2AxaswLDhQcQbvDengszkSu8U8lTDhqaAxLExYF%2BMstZtKamD14AnMElNAbjZNcTEByzYlXOi1q2FpYg0kCyoaBBBtkRInSDBZtjxNWgd9bl98qs5R2ZqCiHmtOPrfcM53V77Acxcb5wl%2FkpdKEbTGuAijHpHgxpi55kIEcEmkJjvPnW7RwxUXPiVZbFjh34PlGJ10FaGvqPwsijBpR1TXrKWV3t3Z4r03yViU6txghbNtODiQ%3D%3D&ddkey=https%3AVFCWorldpayPunchoutCallbackCmd;~oref=https://www.vans.be/webapp/wcs/stores/servlet/OrderOKView?langId=-27&catalogId=11260&storeId=10167&krypto=%2B03C782RqELOiuY1L2ELV7hFeTRMquZ9Eyr1lJqmoSQhClENiUJ6feRNwwAA1ZYd4V7tkAuIwyiIrClp7QaqfLeC%2B%2FPTLl7wSF%2FCyrVWqgiSJRgAS%2BWbXohu0DG8xsdPnSXp%2F%2F4MDb%2FkbPwh%2FT5EpiEWMkGur%2Fx%2FABR7Cvs4jh345776IITNx%2FTRZZXu4zeAco5P%2FvxyqDbmwvLKpPKljf3TpU0wOCmjCDWR5r3uR3ELErPFboWuV5H24FOIy7e%2B2b6m4YhCCDuzceKa5Qllkiwc4YI6AL9rIK1T2jExde343vk%2B4FZtK6XgOMtxbwv6pBIUMX%2Bn3kbb7soGQ%2FjnEwxzxMX5P%2FdMZzts6NkskMSICB955QKsZqPLepiS%2BWY5u5%2Bs9CPjquK%2FlsXmHTi26wq1cLqeiPdyolnE2AxaswLDhQcQbvDengszkSu8U8lTDhqaAxLExYF%2BMstZtKamD14AnMElNAbjZNcTEByzYlXOi1q2FpYg0kCyoaBBBtkRInSDBZtjxNWgd9bl98qs5R2ZqCiHmtOPrfcM53V77Acxcb5wl%2FkpdKEbTGuAijHpHgxpi55kIEcEmkJjvPnW7RwxUXPiVZbFjh34PlGJ10FaGvqPwsijBpR1TXrKWV3t3Z4r03yViU6txghbNtODiQ%3D%3D&ddkey=https%3AVFCWorldpayPunchoutCallbackCmd

并且

;src=4457426;type=be_salec;cat=be_thnky;qty=1;cost=79.17;ord=50619855;gtm=G64;gac=UA-32723457-1:*;u1=custom;u2=undefined;u3=none;u5=AQNNOQ;u6=95;u7=0;u8=1;u9=EUR;u10=be;u11=Men Era Shoes;u12=checkout;u13=;u14=;u15=https://www.vans.be/webapp/wcs/stores/servlet/OrderOKView?langId=-27&catalogId=11260&storeId=10167&krypto=aaHqAAtJa9bzV4lSFEuMWqdyG11jxs2yT0UY242hWRQyCn%2Ff7AHBrF%2ByFm6GF%2BZiumn%2B6cjIaHASWHpiwsBKSa5k5fMJoyz3ex%2B8FTyDOp3WwLgA9U3ibS6gLNMEl68UQ8K7bVk%2FP1%2BC2ckY17vriakRKvUpobXypW0AvXHgHGmaleDoIOlM6dVIX1pSHBPbeKDG4JVoXbUOltTgLUcnYbojIiIGx6m%2FYlHnYjWU%2BaYQpCK%2BRBeFd%2FKyekIN9y9wQlZHHKb7pFar8c3S24tuHj%2FeDGe1jwJ0S7%2BBnUb5WloJ1SSf0LjDyFSZAWBSzhidLIRM2OWyTXJeCBdBFNSw%2BwICm6uWHKPClJD%2FRIzO4D%2F3HQyS4sOeynLgyIR6JHsCv3FH%2B%2BrINsPE0Y3eI51mpm7UEmmcLmNKiONm11LwTD1U%2FZKgnLe50naDdiYj9%2BCt7TUkNuDiOYq1jaC2yOSKcz%2BGdF2i4bgEttXJlK84ZUeCUhfvGbQNebesaoRLrGgU7FkuOhut3LQm7Lqu5lpKYSt5cV8gkGP5%2Fm%2BOa%2FzKbRNmbcwACXuZ1hBJW0alkcX%2F3hfpPiSg9UrT1uZKRwfQUpx6fHzagiSWtcWXJDYO2SfWtlfoS%2B7W%2FIvIoD1FtMbCeVC6oAvltLOnIojrW3VYh1OrFUIlXcl0XMXzCPfRz%2B2v28tFOmsucTRbixJ9WyW3WqN2h3YMHZJQoSFbpUDSN7VQkFJmC1NgHzX09u7X1AUIcwP1TmLqO034RnK6ZSfmS38NuYhWCAmPUIyopyEmxqE3M%2FzqEWjId6S1DTmaJSzo09Rx2UtLnZXMOLKXifzoN8eQy3yQvFeNsKxh3IkJxb6uifVXDBpyelQibch9gDg%3D&ddkey=https%3AVFCWorldpayPunchoutCallbackCmd;~oref=https://www.vans.be/webapp/wcs/stores/servlet/OrderOKView?langId=-27&catalogId=11260&storeId=10167&krypto=aaHqAAtJa9bzV4lSFEuMWqdyG11jxs2yT0UY242hWRQyCn%2Ff7AHBrF%2ByFm6GF%2BZiumn%2B6cjIaHASWHpiwsBKSa5k5fMJoyz3ex%2B8FTyDOp3WwLgA9U3ibS6gLNMEl68UQ8K7bVk%2FP1%2BC2ckY17vriakRKvUpobXypW0AvXHgHGmaleDoIOlM6dVIX1pSHBPbeKDG4JVoXbUOltTgLUcnYbojIiIGx6m%2FYlHnYjWU%2BaYQpCK%2BRBeFd%2FKyekIN9y9wQlZHHKb7pFar8c3S24tuHj%2FeDGe1jwJ0S7%2BBnUb5WloJ1SSf0LjDyFSZAWBSzhidLIRM2OWyTXJeCBdBFNSw%2BwICm6uWHKPClJD%2FRIzO4D%2F3HQyS4sOeynLgyIR6JHsCv3FH%2B%2BrINsPE0Y3eI51mpm7UEmmcLmNKiONm11LwTD1U%2FZKgnLe50naDdiYj9%2BCt7TUkNuDiOYq1jaC2yOSKcz%2BGdF2i4bgEttXJlK84ZUeCUhfvGbQNebesaoRLrGgU7FkuOhut3LQm7Lqu5lpKYSt5cV8gkGP5%2Fm%2BOa%2FzKbRNmbcwACXuZ1hBJW0alkcX%2F3hfpPiSg9UrT1uZKRwfQUpx6fHzagiSWtcWXJDYO2SfWtlfoS%2B7W%2FIvIoD1FtMbCeVC6oAvltLOnIojrW3VYh1OrFUIlXcl0XMXzCPfRz%2B2v28tFOmsucTRbixJ9WyW3WqN2h3YMHZJQoSFbpUDSN7VQkFJmC1NgHzX09u7X1AUIcwP1TmLqO034RnK6ZSfmS38NuYhWCAmPUIyopyEmxqE3M%2FzqEWjId6S1DTmaJSzo09Rx2UtLnZXMOLKXifzoN8eQy3yQvFeNsKxh3IkJxb6uifVXDBpyelQibch9gDg%3D&ddkey=https%3AVFCWorldpayPunchoutCallbackCmd
2个回答

3

以下是使用字典推导式和 pd.concat 的一种解决方案:

str1 = ';src=4457426;type=be_salec;cat=be_thnky;qty=1;cost=60.00;ord=50608803;gtm=G64;gcldc=*;gclaw=*;gac=UA-32723457-1:*;u1=men;u2=schoenen;u3=none;u5=VA38G1NRI;u6=80;u7=0;u8=1;u9=EUR;u10=be;u11=Suede Old Skool Shoes;u12=checkout;u13=8;u14=VNIWTYI926IW7;u15=https://www.vans.be/webapp/wcs/stores/servlet/OrderOKView?langId=-27&catalogId=11260&storeId=10167&krypto=%2B03C782RqELOiuY1L2ELV7hFeTRMquZ9Eyr1lJqmoSQhClENiUJ6feRNwwAA1ZYd4V7tkAuIwyiIrClp7QaqfLeC%2B%2FPTLl7wSF%2FCyrVWqgiSJRgAS%2BWbXohu0DG8xsdPnSXp%2F%2F4MDb%2FkbPwh%2FT5EpiEWMkGur%2Fx%2FABR7Cvs4jh345776IITNx%2FTRZZXu4zeAco5P%2FvxyqDbmwvLKpPKljf3TpU0wOCmjCDWR5r3uR3ELErPFboWuV5H24FOIy7e%2B2b6m4YhCCDuzceKa5Qllkiwc4YI6AL9rIK1T2jExde343vk%2B4FZtK6XgOMtxbwv6pBIUMX%2Bn3kbb7soGQ%2FjnEwxzxMX5P%2FdMZzts6NkskMSICB955QKsZqPLepiS%2BWY5u5%2Bs9CPjquK%2FlsXmHTi26wq1cLqeiPdyolnE2AxaswLDhQcQbvDengszkSu8U8lTDhqaAxLExYF%2BMstZtKamD14AnMElNAbjZNcTEByzYlXOi1q2FpYg0kCyoaBBBtkRInSDBZtjxNWgd9bl98qs5R2ZqCiHmtOPrfcM53V77Acxcb5wl%2FkpdKEbTGuAijHpHgxpi55kIEcEmkJjvPnW7RwxUXPiVZbFjh34PlGJ10FaGvqPwsijBpR1TXrKWV3t3Z4r03yViU6txghbNtODiQ%3D%3D&ddkey=https%3AVFCWorldpayPunchoutCallbackCmd;~oref=https://www.vans.be/webapp/wcs/stores/servlet/OrderOKView?langId=-27&catalogId=11260&storeId=10167&krypto=%2B03C782RqELOiuY1L2ELV7hFeTRMquZ9Eyr1lJqmoSQhClENiUJ6feRNwwAA1ZYd4V7tkAuIwyiIrClp7QaqfLeC%2B%2FPTLl7wSF%2FCyrVWqgiSJRgAS%2BWbXohu0DG8xsdPnSXp%2F%2F4MDb%2FkbPwh%2FT5EpiEWMkGur%2Fx%2FABR7Cvs4jh345776IITNx%2FTRZZXu4zeAco5P%2FvxyqDbmwvLKpPKljf3TpU0wOCmjCDWR5r3uR3ELErPFboWuV5H24FOIy7e%2B2b6m4YhCCDuzceKa5Qllkiwc4YI6AL9rIK1T2jExde343vk%2B4FZtK6XgOMtxbwv6pBIUMX%2Bn3kbb7soGQ%2FjnEwxzxMX5P%2FdMZzts6NkskMSICB955QKsZqPLepiS%2BWY5u5%2Bs9CPjquK%2FlsXmHTi26wq1cLqeiPdyolnE2AxaswLDhQcQbvDengszkSu8U8lTDhqaAxLExYF%2BMstZtKamD14AnMElNAbjZNcTEByzYlXOi1q2FpYg0kCyoaBBBtkRInSDBZtjxNWgd9bl98qs5R2ZqCiHmtOPrfcM53V77Acxcb5wl%2FkpdKEbTGuAijHpHgxpi55kIEcEmkJjvPnW7RwxUXPiVZbFjh34PlGJ10FaGvqPwsijBpR1TXrKWV3t3Z4r03yViU6txghbNtODiQ%3D%3D&ddkey=https%3AVFCWorldpayPunchoutCallbackCmd'
str2 = ';src=4457426;type=be_salec;cat=be_thnky;qty=1;cost=79.17;ord=50619855;gtm=G64;gac=UA-32723457-1:*;u1=custom;u2=undefined;u3=none;u5=AQNNOQ;u6=95;u7=0;u8=1;u9=EUR;u10=be;u11=Men Era Shoes;u12=checkout;u13=;u14=;u15=https://www.vans.be/webapp/wcs/stores/servlet/OrderOKView?langId=-27&catalogId=11260&storeId=10167&krypto=aaHqAAtJa9bzV4lSFEuMWqdyG11jxs2yT0UY242hWRQyCn%2Ff7AHBrF%2ByFm6GF%2BZiumn%2B6cjIaHASWHpiwsBKSa5k5fMJoyz3ex%2B8FTyDOp3WwLgA9U3ibS6gLNMEl68UQ8K7bVk%2FP1%2BC2ckY17vriakRKvUpobXypW0AvXHgHGmaleDoIOlM6dVIX1pSHBPbeKDG4JVoXbUOltTgLUcnYbojIiIGx6m%2FYlHnYjWU%2BaYQpCK%2BRBeFd%2FKyekIN9y9wQlZHHKb7pFar8c3S24tuHj%2FeDGe1jwJ0S7%2BBnUb5WloJ1SSf0LjDyFSZAWBSzhidLIRM2OWyTXJeCBdBFNSw%2BwICm6uWHKPClJD%2FRIzO4D%2F3HQyS4sOeynLgyIR6JHsCv3FH%2B%2BrINsPE0Y3eI51mpm7UEmmcLmNKiONm11LwTD1U%2FZKgnLe50naDdiYj9%2BCt7TUkNuDiOYq1jaC2yOSKcz%2BGdF2i4bgEttXJlK84ZUeCUhfvGbQNebesaoRLrGgU7FkuOhut3LQm7Lqu5lpKYSt5cV8gkGP5%2Fm%2BOa%2FzKbRNmbcwACXuZ1hBJW0alkcX%2F3hfpPiSg9UrT1uZKRwfQUpx6fHzagiSWtcWXJDYO2SfWtlfoS%2B7W%2FIvIoD1FtMbCeVC6oAvltLOnIojrW3VYh1OrFUIlXcl0XMXzCPfRz%2B2v28tFOmsucTRbixJ9WyW3WqN2h3YMHZJQoSFbpUDSN7VQkFJmC1NgHzX09u7X1AUIcwP1TmLqO034RnK6ZSfmS38NuYhWCAmPUIyopyEmxqE3M%2FzqEWjId6S1DTmaJSzo09Rx2UtLnZXMOLKXifzoN8eQy3yQvFeNsKxh3IkJxb6uifVXDBpyelQibch9gDg%3D&ddkey=https%3AVFCWorldpayPunchoutCallbackCmd;~oref=https://www.vans.be/webapp/wcs/stores/servlet/OrderOKView?langId=-27&catalogId=11260&storeId=10167&krypto=aaHqAAtJa9bzV4lSFEuMWqdyG11jxs2yT0UY242hWRQyCn%2Ff7AHBrF%2ByFm6GF%2BZiumn%2B6cjIaHASWHpiwsBKSa5k5fMJoyz3ex%2B8FTyDOp3WwLgA9U3ibS6gLNMEl68UQ8K7bVk%2FP1%2BC2ckY17vriakRKvUpobXypW0AvXHgHGmaleDoIOlM6dVIX1pSHBPbeKDG4JVoXbUOltTgLUcnYbojIiIGx6m%2FYlHnYjWU%2BaYQpCK%2BRBeFd%2FKyekIN9y9wQlZHHKb7pFar8c3S24tuHj%2FeDGe1jwJ0S7%2BBnUb5WloJ1SSf0LjDyFSZAWBSzhidLIRM2OWyTXJeCBdBFNSw%2BwICm6uWHKPClJD%2FRIzO4D%2F3HQyS4sOeynLgyIR6JHsCv3FH%2B%2BrINsPE0Y3eI51mpm7UEmmcLmNKiONm11LwTD1U%2FZKgnLe50naDdiYj9%2BCt7TUkNuDiOYq1jaC2yOSKcz%2BGdF2i4bgEttXJlK84ZUeCUhfvGbQNebesaoRLrGgU7FkuOhut3LQm7Lqu5lpKYSt5cV8gkGP5%2Fm%2BOa%2FzKbRNmbcwACXuZ1hBJW0alkcX%2F3hfpPiSg9UrT1uZKRwfQUpx6fHzagiSWtcWXJDYO2SfWtlfoS%2B7W%2FIvIoD1FtMbCeVC6oAvltLOnIojrW3VYh1OrFUIlXcl0XMXzCPfRz%2B2v28tFOmsucTRbixJ9WyW3WqN2h3YMHZJQoSFbpUDSN7VQkFJmC1NgHzX09u7X1AUIcwP1TmLqO034RnK6ZSfmS38NuYhWCAmPUIyopyEmxqE3M%2FzqEWjId6S1DTmaJSzo09Rx2UtLnZXMOLKXifzoN8eQy3yQvFeNsKxh3IkJxb6uifVXDBpyelQibch9gDg%3D&ddkey=https%3AVFCWorldpayPunchoutCallbackCmd'

def converter(x):
    return dict(i.split('=', 1) for i in str1.split(';') if '=' in i)

res = pd.concat([pd.DataFrame.from_dict(converter(i), orient='index').T \
                 for i in (str1, str2)])

结果:

print(res)

       src      type       cat qty   cost       ord  gtm gcldc gclaw  \
0  4457426  be_salec  be_thnky   1  60.00  50608803  G64     *     *   
0  4457426  be_salec  be_thnky   1  60.00  50608803  G64     *     *   

                                               ~oref  
0  https://www.vans.be/webapp/wcs/stores/servlet/...  
0  https://www.vans.be/webapp/wcs/stores/servlet/...  

[2 rows x 25 columns]

1
不错的字典解析答案!我只想补充一下,这种解决方案除了Rakesh的解决方案之外还有另一个优点,即带有多个“ = ”字符的字段在第一个“ = ”后面正确地不会被拆分(感谢使用i.split('=', 1)而不是i.split('=')[1])。 - Marco Spinaci
谢谢您的回答,我应该在我的问题中提到我有超过9000行这些字符串需要循环,并且它们已经在pandas数据框的一列中。如何使用您的函数循环遍历pandas数据框中每一行的列? - megatron77
1
没关系,做那部分很简单,你的解决方案似乎可行。谢谢! - megatron77

0

你可以做:

def gen_col(u):
    for i in u:
        d = {}
        val = filter(None, i.split(";"))
        for j in val:
            v = j.split("=")
            d[v[0]] = v[1]
        yield d

your9000list = list(yourOtherDFWithURLS['URLCOL'].values)
df = pd.DataFrame([r for r in gen_col(your9000list])
print(df)

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接