R bigrquery：超过速率限制。

Question

R bigrquery：超过速率限制。

3

我正在尝试使用以下代码将Google Cloud平台上的BigQuery数据集下载到R工作区以便进行分析：

library(bigrquery)
library(DBI)
library(tidyverse)
library(dplyr)


con = dbConnect(
  bigquery(),
  project = "bigquery-public-data",
  dataset = "new_york_citibike",
  billing = "maanan-bigquery-in-r"
)

bigrquery::bq_auth()

my_db_pointer = tbl(con, "citibike_trips")

glimpse(my_db_pointer)

count(my_db_pointer)

selected  =  select(my_db_pointer, everything()) %>% collect()

然而，当我尝试运行最后一行以下载数据时，它返回以下错误：

Complete
Billed: 0 B
Downloading first chunk of data.
Received 55,308 rows in the first chunk.
Downloading the remaining 58,882,407 rows in 1420 chunks of (up to) 41,481 rows.
Downloading data [=====>--------------------------------------------------------------------------------------------------]   6% ETA: 19m
Error in `signal_reason()`:
! Exceeded rate limits: Your project:453562790213 exceeded quota for tabledata.list bytes per second per project. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors [rateLimitExceeded] 
ℹ Try increasing the `page_size` value of `bq_table_download()`
Run `rlang::last_error()` to see where the error occurred.

如果有人能帮我修复这个错误并下载数据，我将非常感激。我需要分析这个数据集。谢谢您提前。

- Saïd Maanan

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Betjens · Accepted Answer

根据有关“rateLimitExceeded”的文档链接，看起来您已经超过了查询作业的阈值。

请考虑以下事项：

Check if your project bigquery api have setup limits and quotas that you might be breaking when performing the operation. To see your current quotas and limits please go to IAM & Admin > Quotas > Quotas for project "projectid" > bigquery.google.apis.com
As your chunks are about 55,308 rows per chunk of 58,882,407 rows it appears you are trying to download way more data that it allows and you might be hitting the following limits: Query/script execution-time limit, Maximum response size, Maximum row size.
Verify if table constraints are not reached. Specially the one about operations per day.
Check the amount of columns you row have. There is a limit of 10,000 columns.
Consider checking all the rest of quota limits specified on query jobs.

Reducing the scope of your select or reduce the size of your chunks. Million records tables of everything its truly needed?. You can perform something like this:

library(bigrquery)

# authenticate
# use if notebook is outside gcp
#bigrquery::bq_auth(path = '/Users/me/restofthepath/bigquery- credentials.json')

bq_table_download("my-project-id.dataset-id.table", page_size = 100)

For additionals details about this function, check bq_table_download