反向地理编码: 如何使用BigQuery SQL确定距离给定经纬度最近的城市?

8

我有一个庞大的点集合 - 我想确定每个点最近的城市。如何在BigQuery中实现这一点?

2个回答

7

这是目前我们制定的最佳查询:

WITH a AS (
  # a table with points around the world
  SELECT * FROM UNNEST([ST_GEOGPOINT(-70, -33), ST_GEOGPOINT(-122,37), ST_GEOGPOINT(151,-33)]) my_point
), b AS (
  # any table with cities world locations
  SELECT *, ST_GEOGPOINT(lon,lat) latlon_geo
  FROM `fh-bigquery.geocode.201806_geolite2_latlon_redux` 
)

SELECT my_point, city_name, subdivision_1_name, country_name, continent_name
FROM (
  SELECT loc.*, my_point
  FROM (
    SELECT ST_ASTEXT(my_point) my_point, ANY_VALUE(my_point) geop
      , ARRAY_AGG( # get the closest city
           STRUCT(city_name, subdivision_1_name, country_name, continent_name) 
           ORDER BY ST_DISTANCE(my_point, b.latlon_geo) LIMIT 1
        )[SAFE_OFFSET(0)] loc
    FROM a, b 
    WHERE ST_DWITHIN(my_point, b.latlon_geo, 100000)  # filter to only close cities
    GROUP BY my_point
  )
)
GROUP BY 1,2,3,4,5

enter image description here


@hoffa 如果我有IP地址而不是纬度和经度坐标,我应该参考哪个公共数据集表来获取地理编码。 - bunny
https://cloud.google.com/blog/products/data-analytics/geolocation-with-bigquery-de-identify-76-million-ip-addresses-in-20-seconds - Felipe Hoffa

1

我有一个大量的点的集合...

Felipe的解决方案在很多方面都是完美的,但我发现在那些你不仅要搜索最近城市而且不能将距离限制在60英里以下的情况下,以下解决方案更好。

#standardSQL
WITH a AS (
  # a table with points around the world
  SELECT ST_GEOGPOINT(lon,lat) my_point
  FROM `fh-bigquery.geocode.201806_geolite2_latlon_redux`  
), b AS (
  # any table with cities world locations
  SELECT *, ST_GEOGPOINT(lon,lat) latlon_geo, ST_ASTEXT(ST_GEOGPOINT(lon,lat)) hsh 
  FROM `fh-bigquery.geocode.201806_geolite2_latlon_redux` 
)
SELECT AS VALUE 
  ARRAY_AGG(
    STRUCT(my_point, city_name, subdivision_1_name, country_name, continent_name) 
    LIMIT 1
  )[OFFSET(0)]
FROM (
  SELECT my_point, ST_ASTEXT(closest) hsh 
  FROM a, (SELECT ST_UNION_AGG(latlon_geo) arr FROM b),
  UNNEST([ST_CLOSESTPOINT(arr, my_point)]) closest
)
JOIN b 
USING(hsh)
GROUP BY ST_ASTEXT(my_point)

注意:

  • 我正在使用ST_CLOSESTPOINT函数
  • 为了模拟“不仅仅是几个点”的情况,我使用了与b相同的表格,因此需要搜索最近城市的100K个点,并且没有关于查找城市距离远近的限制(对于这种情况,原始答案中的查询将出现著名的查询超出资源限制错误——而否则,它会显示更好,甚至是最佳性能,正如该答案所述)

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接