根据属性在Google BigQuery SQL中删除重复行

5

我有一张名为result的表格 我正在使用BigQuery从GA中选择数据

SELECT
  Date,
  totals.pageviews,
  h.transaction.transactionId,
  h.item.itemQuantity,
  h.transaction.transactionRevenue,
  totals.bounces,
  fullvisitorid,
  totals.timeOnSite,
  device.browser,
  device.deviceCategory,
  trafficSource.source,
  channelGrouping,
  h.page.pagePath,
  h.eventInfo.eventCategory,
  device.operatingSystem
FROM
  `atomic-life-148403.126959513.ga_sessions_*`,
  UNNEST(hits) AS h
WHERE
  _TABLE_SUFFIX BETWEEN REPLACE(CAST(DATE_ADD(CURRENT_DATE(), INTERVAL -1 YEAR) AS STRING), '-','')
  AND CONCAT('intraday_', REPLACE(CAST(DATE_ADD(CURRENT_DATE(), INTERVAL 0 DAY) AS STRING), '-',''))
  ORDER BY
  date  DESC

enter image description here

有一些记录是重复的。如何从表中删除重复的记录?

我想要获得以下结果。 在此输入图片描述


1
你是想找到并删除行,还是只是想在查询结果中隐藏它们?如果是后者,请使用DISTINCT。如果是前者,那就会变得更加复杂。 - ADyson
如何仅选择不同的行?因为itemquentity和revenue是彼此分开的。 - bob90937
5个回答

4

您可以使用 ROW_NUMBER 函数

WITH CTE AS 
(SELECT *, ROW_NUMBER() OVER (PARTITION BY transactionid ORDER BY 
transactionid) ROW FROM [YourTable]) 

DELETE [YourTable] 
FROM [YourTable]
JOIN CTE ON [YourTable].transactionid ON CTE.transactionid
                              WHERE CTE.ROW > 1

3

您可以使用ROW_NUMBER()分析函数,例如:

select * from (
select *,
ROW_NUMBER() OVER(PARTITION BY transactionid ORDER BY transactionid) rownum
from result ) xxx
where rownum = 1;

2

以下内容适用于BigQuery标准SQL

#standardSQL
SELECT DISTINCT
  Date,
  totals.pageviews,
  h.transaction.transactionId,
  h.item.itemQuantity,
  h.transaction.transactionRevenue,
  totals.bounces,
  fullvisitorid,
  totals.timeOnSite,
  device.browser,
  device.deviceCategory,
  trafficSource.source,
  channelGrouping,
  h.page.pagePath,
  h.eventInfo.eventCategory,
  device.operatingSystem
FROM
  `atomic-life-148403.126959513.ga_sessions_*`,
  UNNEST(hits) AS h
WHERE
  _TABLE_SUFFIX BETWEEN REPLACE(CAST(DATE_ADD(CURRENT_DATE(), INTERVAL -1 YEAR) AS STRING), '-','')
  AND CONCAT('intraday_', REPLACE(CAST(DATE_ADD(CURRENT_DATE(), INTERVAL 0 DAY) AS STRING), '-',''))
  ORDER BY
  date  DESC  

您可以看到 - 我只是在您的SELECT中添加了DISTINCT - 有关BigQuery标准SQL的更多信息,请参见SELECT及其修饰符


1
使用带有所有选定列的GROUP BY应该可以消除结果中的真正重复行:
SELECT
  Date,
  totals.pageviews,
  h.transaction.transactionId,
  h.item.itemQuantity,
  h.transaction.transactionRevenue,
  totals.bounces,
  fullvisitorid,
  totals.timeOnSite,
  device.browser,
  device.deviceCategory,
  trafficSource.source,
  channelGrouping,
  h.page.pagePath,
  h.eventInfo.eventCategory,
  device.operatingSystem
FROM
  `atomic-life-148403.126959513.ga_sessions_*`,
  UNNEST(hits) AS h
WHERE
  _TABLE_SUFFIX BETWEEN REPLACE(CAST(DATE_ADD(CURRENT_DATE(), INTERVAL -1 
YEAR) AS STRING), '-','')
  AND CONCAT('intraday_', REPLACE(CAST(DATE_ADD(CURRENT_DATE(), INTERVAL 0 DAY) AS STRING), '-',''))
GROUP BY
  Date,
  totals.pageviews,
  h.transaction.transactionId,
  h.item.itemQuantity,
  h.transaction.transactionRevenue,
  totals.bounces,
  fullvisitorid,
  totals.timeOnSite,
  device.browser,
  device.deviceCategory,
  trafficSource.source,
  channelGrouping,
  h.page.pagePath,
  h.eventInfo.eventCategory,
  device.operatingSystem
ORDER BY
  date  DESC;

0

您可以选择唯一的行并删除其他行:

DELETE FROM MyTable
LEFT OUTER JOIN (
   SELECT DISTINCT * FROM MyTable
) as UniqueRows ON
   MyTable.KeyField= UniqueRows.KeyField
WHERE
   UniqueRows.KeyField IS NULL;

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接