检查对象类型列值是否为浮点数或字符串的函数

Question

检查对象类型列值是否为浮点数或字符串的函数

pythonpython-3.xpython-2.7user-defined-typesisinstance

3

我将尝试编写一个函数，它与Excel中的isnumber[column]函数相等。

数据集：

feature1 feature2 feature3
  123       1.07     1
  231       2.08     3
  122        ab      4
  111       3.04     6
  555        cde     8

feature1: integer dtype
feature2: object dtype
feature3: integer dtype

我尝试了这段代码

for item in df.feature2.iteritems():
    if isinstance(item, float):
       print('yes')
    else:
       print('no')

我得到的结果是：

 no
 no
 no
 no
 no

但我希望您能将结果呈现为：

yes
yes
no
yes
no

当我尝试检查单个feature2值的类型时，看到的是这样的：

type(df.feature2[0]) = str
type(df.feature2[1]) = str
type(df.feature2[2]) = str
type(df.feature2[3]) = str
type(df.feature2[4]) = str

But clearly 0,1,3 should be shown as float, but they show up as str

我做错了什么？

- Sai Sumanth

5个回答

1

我认为这里有两个需要考虑的问题:

Dict与DataFrame的方法
dtype(数组标量类型)和type(内置的Python类型)之间的区别-参考(https://numpy.org/devdocs/reference/arrays.dtypes.html)

第一点：

.iteritems() / .items()是字典的方法，而如果您正在处理dtype(根据您提供的数据判断)，您很可能要通过一个DataFrame来完成操作，在其中您不需要使用.iteritems()方法循环遍历每个值。另外，.iteritems()已经被Python淘汰，并被.items()替代(参见讨论：When should iteritems() be used instead of items()?)

第二点：

使用numpy或Pandas时，导入到数据框中的值的数据类型称为dtypes。这些需要与Python中直接比较的内容区分开来，Python只将其称为type。您应该使用"Pandas数据类型"标题下的表格将dtype映射到type（参考：https://pbpython.com/pandas_dtypes.html）。

现在，针对您的问题，以下代码应该解决您的问题：

import pandas as pd

columns = ['feature1', 'feature2', 'feature3']
data = [[123, 1.07, 1],
        [231, 2.08, 3],
        [122, 'ab', 4],
        [111, 3.04, 6],
        [555, 'cde', 8]]

df = pd.DataFrame(data, columns=columns)

for value in df.feature2:
    if isinstance(value,float):
        print('yes')
    else:
        print('no')

- MTay

0

试试这个：

for i in range(len(df["feature2"])):
    test = df.loc[i,"feature2"]
    if isinstance(test, float):
        print('yes')
    else:
        print('no')

- Ellie Hanna

请注意，这只测试浮点数 - 如果您想要任何数字，包括浮点数或整数，您必须将第三行更改为 if isinstance(test, float) or isinstance(test, int): - Ellie Hanna

0

你可以像这样做：

from pandas import DataFrame as df

columns = ['feature1', 'feature2', 'feature3']
data = [[123, 1.07, 1],
 [231, 2.08, 3],
 [122, 'ab', 4],
 [111, 3.04, 6],
 [555, 'cde', 8]]

df_ = df(data, columns=columns)
types = []
for k in df_:
    a = set(type(m) for m in df_[k])
    if len(a) > 1:
        types.append({k: 'object'})
    else:
        types.append({k: str(list(a)[0].__name__)})

print(types)

输出：

[{'feature1': 'int'}, {'feature2': 'object'}, {'feature3': 'int'}]

- Chiheb Nexus

0

这是因为iteritems()返回一个元组，即(index, value)。所以你试图检查例如(0, 1.07)或(1, 2.08)是否为浮点型，当然它们不是。

如果你将df.feature2.iteritems()更改为df.feature2.values，它应该可以工作 :)

- hmajid2301

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Ankur Gulati · Accepted Answer

Iteritems方法返回一个元组，((123, '1.07'), 1.07)。如果你想循环遍历每个值，请尝试下面的代码。你只需要删除.iteritems()，它就可以完美运行。

df['feature2']=[1.07,2.08,'ab',3.04,'cde']
for item in df.feature2:
    if isinstance(item,float):
       print('yes')
    else:
       print('no')

这是您的输出：

yes
yes
no
yes
no