我有一个庞大的点集合 - 我想确定每个点最近的城市。如何在BigQuery中实现这一点?
我有一个庞大的点集合 - 我想确定每个点最近的城市。如何在BigQuery中实现这一点?
这是目前我们制定的最佳查询:
WITH a AS (
# a table with points around the world
SELECT * FROM UNNEST([ST_GEOGPOINT(-70, -33), ST_GEOGPOINT(-122,37), ST_GEOGPOINT(151,-33)]) my_point
), b AS (
# any table with cities world locations
SELECT *, ST_GEOGPOINT(lon,lat) latlon_geo
FROM `fh-bigquery.geocode.201806_geolite2_latlon_redux`
)
SELECT my_point, city_name, subdivision_1_name, country_name, continent_name
FROM (
SELECT loc.*, my_point
FROM (
SELECT ST_ASTEXT(my_point) my_point, ANY_VALUE(my_point) geop
, ARRAY_AGG( # get the closest city
STRUCT(city_name, subdivision_1_name, country_name, continent_name)
ORDER BY ST_DISTANCE(my_point, b.latlon_geo) LIMIT 1
)[SAFE_OFFSET(0)] loc
FROM a, b
WHERE ST_DWITHIN(my_point, b.latlon_geo, 100000) # filter to only close cities
GROUP BY my_point
)
)
GROUP BY 1,2,3,4,5
我有一个大量的点的集合...
Felipe的解决方案在很多方面都是完美的,但我发现在那些你不仅要搜索最近城市而且不能将距离限制在60英里以下的情况下,以下解决方案更好。
#standardSQL
WITH a AS (
# a table with points around the world
SELECT ST_GEOGPOINT(lon,lat) my_point
FROM `fh-bigquery.geocode.201806_geolite2_latlon_redux`
), b AS (
# any table with cities world locations
SELECT *, ST_GEOGPOINT(lon,lat) latlon_geo, ST_ASTEXT(ST_GEOGPOINT(lon,lat)) hsh
FROM `fh-bigquery.geocode.201806_geolite2_latlon_redux`
)
SELECT AS VALUE
ARRAY_AGG(
STRUCT(my_point, city_name, subdivision_1_name, country_name, continent_name)
LIMIT 1
)[OFFSET(0)]
FROM (
SELECT my_point, ST_ASTEXT(closest) hsh
FROM a, (SELECT ST_UNION_AGG(latlon_geo) arr FROM b),
UNNEST([ST_CLOSESTPOINT(arr, my_point)]) closest
)
JOIN b
USING(hsh)
GROUP BY ST_ASTEXT(my_point)
注意:
b
相同的表格,因此需要搜索最近城市的100K个点,并且没有关于查找城市距离远近的限制(对于这种情况,原始答案中的查询将出现著名的查询超出资源限制
错误——而否则,它会显示更好,甚至是最佳性能,正如该答案所述)