从Google Analytics API v4下载批量报告

3

我正试图获取一个3个月的报告,为此,我需要进行多次请求,并将结果附加到一个列表中,因为API每次只返回100,000行。从API返回了一个名为nextPageToken的变量,我需要将其传递到下一个查询中,以获取报告的下一个100,000行。我正在尝试这样做时遇到了困难。

这是我的代码:

def initialize_analyticsreporting():
    '''Initializes an Analytics Reporting API V4 service object.

    Returns:
      An authorized Analytics Reporting API V4 service object.
    '''
    credentials = ServiceAccountCredentials.from_json_keyfile_name(
        KEY_FILE_LOCATION, SCOPES)

    # Build the service object.
    analytics = build('analyticsreporting', 'v4', credentials=credentials)

    return analytics


list = [] 




def get_report(analytics, pageTokenVariable):
    return analytics.reports().batchGet(
        body={
            'reportRequests': [
                {
                    'viewId': VIEW_ID,
                    'pageSize': 100000,
                    'dateRanges': [{'startDate': '90daysAgo', 'endDate': 'yesterday'}],
                    'metrics': [{'expression': 'ga:adClicks'}, {'expression': 'ga:impressions'}, {'expression': 'ga:adCost'}, {'expression': 'ga:CTR'}, {'expression': 'ga:CPC'}, {'expression': 'ga:costPerTransaction'}, {'expression': 'ga:transactions'}, {'expression': 'ga:transactionsPerSession'}, {'expression': 'ga:pageviews'}, {'expression': 'ga:timeOnPage'}],
                    "pageToken": pageTokenVariable,
                    'dimensions': [{'name': 'ga:adMatchedQuery'}, {'name': 'ga:campaign'}, {'name': 'ga:adGroup'}, {'name': 'ga:adwordsCustomerID'}, {'name': 'ga:date'}],
                    'orderBys': [{'fieldName': 'ga:impressions', 'sortOrder': 'DESCENDING'}],
                    'dimensionFilterClauses': [{

                        'filters': [{

                            'dimension_name': 'ga:adwordsCustomerID',
                            'operator': 'EXACT',
                            'expressions': 'abc',
                            'not': 'True'
                        }]
                    }],
                    'dimensionFilterClauses': [{

                        'filters': [{

                            'dimension_name': 'ga:adMatchedQuery',
                            'operator': 'EXACT',
                            'expressions': '(not set)',
                            'not': 'True'
                        }]
                    }]
                }]
        }
    ).execute()


analytics = initialize_analyticsreporting()
response = get_report(analytics, "0")

for report in response.get('reports', []):
    pagetoken = report.get('nextPageToken', None)
    print(pagetoken)
    #------printing the pagetoken here returns `100,000` which is expected

    columnHeader = report.get('columnHeader', {})
    dimensionHeaders = columnHeader.get('dimensions', [])
    metricHeaders = columnHeader.get(
        'metricHeader', {}).get('metricHeaderEntries', [])
    rows = report.get('data', {}).get('rows', [])

    for row in rows:
        # create dict for each row
        dict = {}
        dimensions = row.get('dimensions', [])
        dateRangeValues = row.get('metrics', [])

        # fill dict with dimension header (key) and dimension value (value)
        for header, dimension in zip(dimensionHeaders, dimensions):
            dict[header] = dimension

        # fill dict with metric header (key) and metric value (value)
        for i, values in enumerate(dateRangeValues):
            for metric, value in zip(metricHeaders, values.get('values')):
                # set int as int, float a float
                if ',' in value or ',' in value:
                    dict[metric.get('name')] = float(value)
                else:
                    dict[metric.get('name')] = float(value)
        list.append(dict)
      # Append that data to a list as a dictionary

# pagination function

    while pagetoken:  # This says while there is info in the nextPageToken get the data, process it and add to the list

        response = get_report(analytics, pagetoken)
        pagetoken = response['reports'][0]['nextPageToken']
        print(pagetoken)
        #------printing the pagetoken here returns `200,000` as is expected but the data being pulled is the same as for the first batch and so on. While in the loop the pagetoken is being incremented but it does not retrieve new data
        for row in rows:
                # create dict for each row
            dict = {}
            dimensions = row.get('dimensions', [])
            dateRangeValues = row.get('metrics', [])

            # fill dict with dimension header (key) and dimension value (value)
            for header, dimension in zip(dimensionHeaders, dimensions):
                dict[header] = dimension

            # fill dict with metric header (key) and metric value (value)
            for i, values in enumerate(dateRangeValues):
                for metric, value in zip(metricHeaders, values.get('values')):
                    # set int as int, float a float
                    if ',' in value or ',' in value:
                        dict[metric.get('name')] = float(value)
                    else:
                        dict[metric.get('name')] = float(value)
            list.append(dict)

df = pd.DataFrame(list)
print(df)  # Append that data to a list as a dictionary
df.to_csv('full_dataset.csv', encoding="utf-8", index=False)

我在传递pagetoken时出现了什么错误?

这里是来自谷歌的pageToken文档。


看一下这篇关于分页的文章。您需要查询report以获取nextPageToken,然后将其包含在随后的请求中作为pageToken。无论您要求多少行,API每个请求最多返回100,000行。 - Matt
1个回答

2

所以你正在更新pagetoken = response['reports'][0]['nextPageToken']中的pagetoken,但是你也应该在while循环中使用新数据更新rows,就像这样。

    while pagetoken:
        response = get_report(analytics, pagetoken)
        pagetoken = response['reports'][0].get('nextPageToken')
        for report in reponse.get('reports', []):
            rows = report.get('data', {}).get('rows', [])
            for row in rows:

我不知道,所以我正在寻找答案,文档没有很好地解释它。它只说明要更新“pageToken”。 - Jonas Palačionis
1
添加了代码示例。无法测试,但我认为需要更新“rows”,否则您将一遍又一遍地处理相同的数据。 - Rickard Körkkö
现在我在response = get_report(analytics, pagetoken)这一行遇到了KeyError: 'nextPageToken'的错误。数据似乎已经正确地获取了,只是当没有nextPageToken时它不会停止。但为什么呢?我已经使用了while pagetoken的条件。 - Jonas Palačionis
谢谢,那个方法可行!所以我的理解是:问题在于即使我有正确的“nextPageToken”,我仍然使用相同的报告进行查询?而使用“get”获取“pagetoken”的区别在于如果找不到它,它不会抛出错误? - Jonas Palačionis
1
正确。get 返回 None 而不是引发异常。 - Rickard Körkkö

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接