如何根据列值对数据框结果进行排序

3
import requests
import pandas as pd

url = "https://coinmarketcap.com/new/"
page = requests.get(url,headers={'User-Agent': 'Mozilla/5.0'}, timeout=1)
pagedata = page.text
usecols = ["Name", "Price", "1h", "24h", "MarketCap", "Volume"]#, "Blockchain"]

df = pd.read_html(pagedata)[0] #Checking table
df[["Name", "Symbol"]] = df["Name"].str.split(r"\d+", expand=True)
df = df.rename(columns={"Fully Diluted Market Cap": "MarketCap"})[usecols]
dfAsString = df.to_string(index=False)

print(dfAsString)

当前代码输出:(已截断)

               Name            Price      1h      24h       MarketCap       Volume
0        DollarPepe         $0.02752  22.64%  336.25%              $3     $456,913
1       Billy Token      $0.00002822  41.69%   75.80%      $1,958,942   $6,999,241
2              JEFF          $0.1946   4.42%  226.18%     $19,458,328  $19,744,583
3            PUG AI   $0.00000001459  10.80%   15.84%      $1,459,428     $239,454
4         FART COIN    $0.0000004281   1.13%   42.13%     $42,806,075      $46,604
[30 rows x 6 columns] 

如何根据特定列(24小时)生成排序后的输出?-> 截断
               Name            Price      1h      24h       MarketCap       Volume
0        DollarPepe         $0.02752  22.64%  336.25%              $3     $456,913
2              JEFF          $0.1946   4.42%  226.18%     $19,458,328  $19,744,583
1       Billy Token      $0.00002822  41.69%   75.80%      $1,958,942   $6,999,241
4         FART COIN    $0.0000004281   1.13%   42.13%     $42,806,075      $46,604
3            PUG AI   $0.00000001459  10.80%   15.84%      $1,459,428     $239,454
[30 rows x 6 columns]

请参见 https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_values.html - Michael Butscher
1个回答

2
我会将您的数据框中的所有数字列转换为数字值;然后您可以轻松地对它们进行排序(您始终可以在显示时添加$%)。
numcols = df.columns[df.columns != 'Name']
df[numcols] = df[numcols].apply(lambda c:pd.to_numeric(c.str.replace(r'[^\d.]|(?<!\d)\.|\.(?!\d)', '', regex=True)))
df = df.sort_values('24h', ascending=False)

输出(针对您的示例数据):

          Name         Price     1h     24h  MarketCap    Volume
0   DollarPepe  2.752000e-02  22.64  336.25          3    456913
2         JEFF  1.946000e-01   4.42  226.18   19458328  19744583
1  Billy Token  2.822000e-05  41.69   75.80    1958942   6999241
4    FART COIN  4.281000e-07   1.13   42.13   42806075     46604
3       PUG AI  1.459000e-08  10.80   15.84    1459428    239454

请注意,非数字字符替换比您的示例数据所暗示的[^\d.]更为复杂;这是因为从该页面获取的其他一些价格值中有...(可能是因为它们太小而无法表示)。理想情况下,您需要找出如何将它们作为精确值获取;否则它们只能被近似。

或者,您可以通过将它们转换为浮点数后将系列传递给sort_values来按值排序:

df = df.sort_values('24h', ascending=False, key=lambda v:v.str.replace('%', '').astype(float))

输出:

          Name           Price      1h      24h    MarketCap       Volume
0   DollarPepe        $0.02752  22.64%  336.25%           $3     $456,913
2         JEFF         $0.1946   4.42%  226.18%  $19,458,328  $19,744,583
1  Billy Token     $0.00002822  41.69%   75.80%   $1,958,942   $6,999,241
4    FART COIN   $0.0000004281   1.13%   42.13%  $42,806,075      $46,604
3       PUG AI  $0.00000001459  10.80%   15.84%   $1,459,428     $239,454

在代码的哪个部分插入它? - Drew Duazeh
@DrewDuazeh 在 df = df.rename(columns={"Fully Diluted Market Cap": "MarketCap"})[usecols] 后。 - Nick
出现错误:AttributeError: 'DataFrame'对象没有'sortvalues'属性。 - Drew Duazeh
市值的结果为1.17092e+07。有没有一种方法可以显示所有的价值,比如$11,709,185? - Drew Duazeh
1
@DrewDuazeh 你可以使用这个问题的答案中提到的方法:https://dev59.com/KlsW5IYBdhLWcg3wZmmc - Nick
显示剩余3条评论

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接