地理数据处理库Geopandas:如何绘制国家/城市?

10

我需要在地理图上绘制一些数据。具体而言,我想突出显示数据来自哪些国家和州。 我的数据集是:

    Year    Country State/City
0   2009    BGR     Sofia
1   2018    BHS     New Providence
2   2002    BLZ     NaN
3   2000    CAN     California
4   2002    CAN     Ontario
... ... ... ...
250 2001    USA     Ohio
251 1998    USA     New York
252 1995    USA     Virginia
253 2011    USA     NaN
254 2019    USA     New York

为了创建地理图,我一直使用以下方式的 geopandas

import geopandas as gpd

shapefile = 'path/ne_110m_admin_0_countries/ne_110m_admin_0_countries.shp'
gdf = gpd.read_file(shapefile)[['ADMIN', 'ADM0_A3', 'geometry']]
gdf.columns = ['country', 'country_code', 'geometry']

然后我合并了这两个数据集:

merged = gdf.merge(df, left_on = 'country_code', right_on = 'Country')

并将数据转换为 JSON 格式:

import json

merged_json = json.loads(merged.to_json())
#Convert to String like object.
json_data = json.dumps(merged_json)

最后,我尝试按照以下方式创建图表:

from bokeh.io import output_notebook, show, output_file
from bokeh.plotting import figure
from bokeh.models import GeoJSONDataSource, LinearColorMapper, ColorBar
from bokeh.palettes import brewer

geosource = GeoJSONDataSource(geojson = json_data)

#Define a sequential multi-hue color palette.
palette = brewer['YlGnBu'][8]
palette = palette[::-1]
color_mapper = LinearColorMapper(palette = palette, low = 0, high = 40)

tick_labels = {'0': '0%', '5': '5%', '10':'10%', '15':'15%', '20':'20%', '25':'25%', '30':'30%','35':'35%', '40': '>40%'}

color_bar = ColorBar(color_mapper=color_mapper, label_standoff=8,width = 500, height = 20,
border_line_color=None,location = (0,0), orientation = 'horizontal', major_label_overrides = tick_labels)

p = figure(title = 'Creation year across countries', plot_height = 600 , plot_width = 950, toolbar_location = None)
p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None

p.patches('xs','ys', source = geosource,fill_color = {'field' :'per_cent_year', 'transform' : color_mapper},
          line_color = 'black', line_width = 0.25, fill_alpha = 1)

p.add_layout(color_bar, 'below')

output_notebook()

#Display figure.
show(p)

当我运行它时,它会显示BokehJS 1.0.2 successfully loaded,但它不显示任何内容。 我的期望结果是显示一张地图,颜色基于每个国家出现的次数(例如USA=5将更加深色),另一张地图则基于州/城市(纽约将更加深色)。
上述代码有什么问题吗?
(如果需要,可以分享更多数据/信息)

你应该尝试运行最简单的例子。在bokeh的库中可以找到一些例子。如果有输出,那么你就知道你的代码出错了。否则你就知道有其他问题。也许你应该更新你的包。最新版本是2.2.3。 - mosc9575
2个回答

4

从您发布的代码中,我看不出绘图有什么问题,所以我认为问题可能出在您的数据聚合或合并方面。

以下是一个解决方案,它首先生成与您的数据类似的数据,然后计算每个国家在数据中出现的次数占数据集大小的比例,因为这是所需的指标。我们将仅以几个国家为例:

from random import choices
import pandas as pd
import numpy as np

def generate_data():
    
    k = 100
    
    countries_of_interest = ['USA','ARG','BRA','GBR','ESP','RUS']
    countries = choices(countries_of_interest, k=k)
    
    start_yr = 2010
    end_yr = 2021
    
    return pd.DataFrame({'Country':countries, 
                         'Year':np.random.randint(start_yr, end_yr, k)},
                        index=range(len(countries)))


def aggregate_data(df):
    data = df.groupby('Country').agg('count')*100.0/len(df)
    data = data.reset_index().rename(columns={'Year':'proportion_of_dataset'})
    return data

df = generate_data()

#    Country  Year
# 0      USA  2017
# 1      GBR  2014
# 2      USA  2013
# 3      BRA  2016
# 4      BRA  2018
# ..     ...   ...
# 95     ESP  2014
# 96     USA  2015
# 97     RUS  2019
# 98     RUS  2012
# 99     RUS  2011
# 
# [100 rows x 2 columns]

data = aggregate_data(df)

#   Country  proportion_of_dataset
# 0     ARG                   20.0
# 1     BRA                   17.0
# 2     ESP                   14.0
# 3     GBR                   14.0
# 4     RUS                   19.0
# 5     USA                   16.0

现在使用geopandas加载国家边界的shapefile文件,并重命名列名:
import geopandas as gpd

shapefile = 'path_to_shapfile_folder/ne_110m_admin_0_countries/ne_110m_admin_0_countries.shp'
gdf = gpd.read_file(shapefile)[['ADMIN', 'ADM0_A3', 'geometry']]
gdf.columns = ['country', 'country_code', 'geometry']

gdf.head()

#                        country country_code  \
# 0                         Fiji          FJI   
# 1  United Republic of Tanzania          TZA   
# 2               Western Sahara          SAH   
# 3                       Canada          CAN   
# 4     United States of America          USA   
# 
#                                             geometry  
# 0  MULTIPOLYGON (((180.00000 -16.06713, 180.00000...  
# 1  POLYGON ((33.90371 -0.95000, 34.07262 -1.05982...  
# 2  POLYGON ((-8.66559 27.65643, -8.66512 27.58948...  
# 3  MULTIPOLYGON (((-122.84000 49.00000, -122.9742...  
# 4  MULTIPOLYGON (((-122.84000 49.00000, -120.0000...

现在我们想将国家多边形数据框与我们聚合的数据合并。注意:我们要进行左连接(在完整的国家多边形数据框上),以便包括所有国家,即使我们没有这些国家的数据。还要注意,我们通过用零填充NaN值来为这些国家添加缺失值:

merged = gdf.merge(data, left_on = 'country_code', right_on = 'Country', how='left')
merged['proportion_of_dataset'] = merged['proportion_of_dataset'].fillna(0)

使用您的代码创建GeoJSON:

import json

merged_json = json.loads(merged.to_json())
json_data = json.dumps(merged_json)

最后,我们将把您的绘图代码放入一个函数中,并将geojson、要绘制的列和绘图标题作为参数传递进去。
from bokeh.io import output_notebook, show, output_file
from bokeh.plotting import figure
from bokeh.models import GeoJSONDataSource, LinearColorMapper, ColorBar
from bokeh.palettes import brewer

def plot_map(json_data,plot_col,title):

    geosource = GeoJSONDataSource(geojson = json_data)

    #Define a sequential multi-hue color palette.
    palette = brewer['YlGnBu'][8]
    palette = palette[::-1]
    color_mapper = LinearColorMapper(palette = palette, low = 0, high = 40)

    tick_labels = {'0': '0%', '5': '5%', '10':'10%', '15':'15%', '20':'20%', '25':'25%', '30':'30%','35':'35%', '40': '>40%'}

    color_bar = ColorBar(color_mapper=color_mapper, label_standoff=8,width = 500, height = 20,
    border_line_color=None,location = (0,0), orientation = 'horizontal', major_label_overrides = tick_labels)

    p = figure(title = title, plot_height = 600 , plot_width = 950, toolbar_location = None)
    p.xgrid.grid_line_color = None
    p.ygrid.grid_line_color = None

    p.patches('xs','ys', source = geosource,fill_color = {'field' :plot_col, 'transform' : color_mapper},
              line_color = 'black', line_width = 0.25, fill_alpha = 1)

    p.add_layout(color_bar, 'below')

    output_notebook()

    #Display figure.
    show(p)

现在我们只需调用绘图函数,并传入所需的参数:

plot_map(json_data,'proportion_of_dataset','Dataset countries of origin')

plot


0

我假设你正在使用Jupyter Notebook运行此代码,请尝试将以下片段添加到你的代码块顶部。

from bokeh.resources import INLINE
import bokeh.io

bokeh.io.output_notebook(INLINE)

或者使用您的导入

from bokeh.resources import INLINE
from bokeh.io import output_notebook, show, output_file
from bokeh.plotting import figure
from bokeh.models import GeoJSONDataSource, LinearColorMapper, ColorBar
from bokeh.palettes import brewer

output_notebook(INLINE)

嗨Jimmar,感谢您的回答。是的,使用INLINE可以显示地图。然而,我的预期输出无法再现。它只复制了标准地图,没有来自我的数据(我在帖子顶部提到的数据集)的任何信息。 - V_sqrt

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接