Python - 将邮编作为字符串加载到DataFrame中？

Question

Python - 将邮编作为字符串加载到DataFrame中？

7

我正在使用Pandas加载一个包含邮编（例如32771）的Excel电子表格。邮编作为5位数字字符串存储在电子表格中。当使用命令将它们拉入DataFrame时...

xls = pd.ExcelFile("5-Digit-Zip-Codes.xlsx")
dfz = xls.parse('Zip Codes')

它们被转换成数字。所以'00501'变成了501。

那么我的问题是，我如何：

a. 加载DataFrame并保留存储在Excel文件中的邮政编码的字符串类型？

b. 将DataFrame中的数字转换为五位数的字符串，例如“501”变为“00501”？

- Steve Maughan

5个回答

阿里云服务器只需要99元/年，新老用户同享，点击查看详情

2

您可以使用自定义转换器来避免Pandas的类型推断，例如，如果“zipcode”是带有邮政编码的列的标题：

dfz = xls.parse('Zip Codes', converters={'zipcode': lambda x:x})

这可能是一个bug，因为该列最初是字符串编码的，可以在这里找到相关问题。

- chrisb

如果你在Excel中有一个带有2个前导零的数字00501，那么在Pandas中它将变成501。 - Sergey Bushmanov

1

str(my_zip).zfill(5)

或

print("{0:>05s}".format(str(my_zip)))

这只是其中的两种方法之一。

- Joran Beasley

0

Pandas.read_excel文档中提到，通过将dtype指定为object，您可以保留与Excel表格中完全相同的数据： https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html

dtype类型名称或列 -> 类型的字典，默认为None 数据或列的数据类型。例如：{‘a’: np.float64, ‘b’: np.int32} 使用object来保留数据与Excel中存储的方式完全一致，而不解释数据类型。如果指定了转换器，将会应用转换器而不是dtype转换。

因此，类似以下的代码应该可以工作：

xls = pd.read_excel("5-Digit-Zip-Codes.xlsx", dtype=dtype={'zip_code': object, 'other_col': str})

（注意：我现在不在我的工作电脑旁边，所以还没有测试过。）

- Josef Joe Samanek

0

之前的回答已经正确地建议使用zfill(5)。然而，如果您的邮政编码由于某种原因已经是float数据类型（我最近遇到过这样的数据），您首先需要将其转换为int。然后您就可以使用zfill(5)了。

df = pd.DataFrame({'zipcode':[11.0, 11013.0]})

    zipcode
0   11.0
1   11013.0

df['zipcode'] = df['zipcode'].astype(int).astype(str).str.zfill(5)

    zipcode
0   00011
1   11013

- Sunit Gautam

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，

- unutbu · Accepted Answer

作为解决方法，您可以使用Series.str.zfill将int转换为长度为5的0填充字符串:

df['zipcode'] = df['zipcode'].astype(str).str.zfill(5)

演示：

import pandas as pd
df = pd.DataFrame({'zipcode':['00501']})
df.to_excel('/tmp/out.xlsx')
xl = pd.ExcelFile('/tmp/out.xlsx')
df = xl.parse('Sheet1')
df['zipcode'] = df['zipcode'].astype(str).str.zfill(5)
print(df)

产量

  zipcode
0   00501