R bigrquery:超过速率限制。

3

我正在尝试使用以下代码将Google Cloud平台上的BigQuery数据集下载到R工作区以便进行分析:

library(bigrquery)
library(DBI)
library(tidyverse)
library(dplyr)


con = dbConnect(
  bigquery(),
  project = "bigquery-public-data",
  dataset = "new_york_citibike",
  billing = "maanan-bigquery-in-r"
)

bigrquery::bq_auth()

my_db_pointer = tbl(con, "citibike_trips")

glimpse(my_db_pointer)

count(my_db_pointer)

selected  =  select(my_db_pointer, everything()) %>% collect()

然而,当我尝试运行最后一行以下载数据时,它返回以下错误:
Complete
Billed: 0 B
Downloading first chunk of data.
Received 55,308 rows in the first chunk.
Downloading the remaining 58,882,407 rows in 1420 chunks of (up to) 41,481 rows.
Downloading data [=====>--------------------------------------------------------------------------------------------------]   6% ETA: 19m
Error in `signal_reason()`:
! Exceeded rate limits: Your project:453562790213 exceeded quota for tabledata.list bytes per second per project. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors [rateLimitExceeded] 
ℹ Try increasing the `page_size` value of `bq_table_download()`
Run `rlang::last_error()` to see where the error occurred.

如果有人能帮我修复这个错误并下载数据,我将非常感激。我需要分析这个数据集。谢谢您提前。

1个回答

1
根据有关“rateLimitExceeded”的文档链接,看起来您已经超过了查询作业的阈值。
请考虑以下事项:
  • Check if your project bigquery api have setup limits and quotas that you might be breaking when performing the operation. To see your current quotas and limits please go to IAM & Admin > Quotas > Quotas for project "projectid" > bigquery.google.apis.com

  • As your chunks are about 55,308 rows per chunk of 58,882,407 rows it appears you are trying to download way more data that it allows and you might be hitting the following limits: Query/script execution-time limit, Maximum response size, Maximum row size.

  • Verify if table constraints are not reached. Specially the one about operations per day.

  • Check the amount of columns you row have. There is a limit of 10,000 columns.

  • Consider checking all the rest of quota limits specified on query jobs.

  • Reducing the scope of your select or reduce the size of your chunks. Million records tables of everything its truly needed?. You can perform something like this:

    library(bigrquery)
    
    # authenticate
    # use if notebook is outside gcp
    #bigrquery::bq_auth(path = '/Users/me/restofthepath/bigquery- credentials.json')
    
    bq_table_download("my-project-id.dataset-id.table", page_size = 100)
    

    For additionals details about this function, check bq_table_download


你好@Saïd Maanan,你能够解决你的问题吗? - Betjens
嗨@Betjens,最终我下载了部分数据作为zip文件。除此之外,我的笔记本电脑规格对于这么大的数据集来说太有限了。但非常感谢你的帮助,我真的很感激。 - Saïd Maanan
嗨,我也遇到了这个错误。在找了一会儿并发现可以增加配额的通知后,我注意到我甚至还远未达到限制。我注意到,即使在高峰使用期间,我们只使用了配额的大约40%。如果没有达到限制,为什么会出现这个错误呢? - Lawrence_NT

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接