在matplotlib中使用hexbin获取bins坐标

14
我使用matplotlib的方法hexbin在我的数据上计算2D直方图。但我希望获取六边形中心的坐标以进一步处理结果。
我使用get_array()方法获取了值,但我无法弄清楚如何获取bin的坐标。
我尝试基于bin的数量和我的数据范围来计算它们,但我不知道每个方向上的准确bin数。 gridsize=(10,2)应该可以解决问题,但似乎没有起作用。
有什么想法吗?

我可能错了,但似乎没有获取坐标的方法。幸运的是,这都是开源的(在此文件中搜索“hexbin”:https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/axes.py),因此您可以查看网格是如何计算的并在您的代码中复制它。祝你好运! - Tobias
嗨,谢谢Tobold,我会检查你提到的源代码。 - user1151446
3个回答

22

我认为这个有效。

from __future__ import division
import numpy as np
import math
import matplotlib.pyplot as plt

def generate_data(n):
    """Make random, correlated x & y arrays"""
    points = np.random.multivariate_normal(mean=(0,0),
        cov=[[0.4,9],[9,10]],size=int(n))
    return points

if __name__ =='__main__':

    color_map = plt.cm.Spectral_r
    n = 1e4
    points = generate_data(n)

    xbnds = np.array([-20.0,20.0])
    ybnds = np.array([-20.0,20.0])
    extent = [xbnds[0],xbnds[1],ybnds[0],ybnds[1]]

    fig=plt.figure(figsize=(10,9))
    ax = fig.add_subplot(111)
    x, y = points.T
    # Set gridsize just to make them visually large
    image = plt.hexbin(x,y,cmap=color_map,gridsize=20,extent=extent,mincnt=1,bins='log')
    # Note that mincnt=1 adds 1 to each count
    counts = image.get_array()
    ncnts = np.count_nonzero(np.power(10,counts))
    verts = image.get_offsets()
    for offc in xrange(verts.shape[0]):
        binx,biny = verts[offc][0],verts[offc][1]
        if counts[offc]:
            plt.plot(binx,biny,'k.',zorder=100)
    ax.set_xlim(xbnds)
    ax.set_ylim(ybnds)
    plt.grid(True)
    cb = plt.colorbar(image,spacing='uniform',extend='max')
    plt.show()

在这里输入图片描述


1
你能提供一下这段代码运行的模块版本号吗?我从get_offsets()中没有获取到任何数据:In [1]: verts = image.get_offsets()In [2]: verts Out[2]: array([], dtype=float64)这是在matplotlib 1.0.1和numpy 1.5.1上运行的。 - Dave
1
我已经擅自编辑了你的问题,包括它生成的图片。非常好的答案! - Hooked

2
我很乐意确认使用get_offsets()的Hooked代码是否有效,但我尝试了上述提到的代码的几个迭代,以检索中心位置,并且像Dave提到的那样,get_offsets()保持为空。我发现的解决方法是使用非空的“image.get_paths()”选项。我的代码采用平均值来查找中心,但这意味着它会略微变长,但它确实起作用。
get_paths()选项返回一组嵌入的x、y坐标集,可以循环遍历并取平均值,以返回每个六角星的中心位置。
我的代码如下:
counts=image.get_array() #counts in each hexagon, works great
verts=image.get_offsets() #empty, don't use this
b=image.get_paths()   #this does work, gives Path([[]][]) which can be plotted

for x in xrange(len(b)):
    xav=np.mean(b[x].vertices[0:6,0]) #center in x (RA)
    yav=np.mean(b[x].vertices[0:6,1]) #center in y (DEC)
    plt.plot(xav,yav,'k.',zorder=100)

1
我有同样的问题。我认为需要开发一个框架,以拥有一个六边形网格对象,然后可以应用于许多不同的数据集(如果能在 N 维度上实现将非常棒)。这是可能的,令我惊讶的是,Scipy 或 Numpy 没有相关内容(此外,似乎除了 binify 之外没有其他类似的内容)。
话虽如此,我假设您想使用六边形网格来比较多个分组数据集。这需要一些共同的基础。我使用 matplotlib 的 hexbin 来使其工作的方式如下:
import numpy as np
import matplotlib.pyplot as plt

def get_data (mean,cov,n=1e3):
    """
    Quick fake data builder
    """
    np.random.seed(101)
    points = np.random.multivariate_normal(mean=mean,cov=cov,size=int(n))
    x, y = points.T
    return x,y

def get_centers (hexbin_output):
    """
    about 40% faster than previous post only cause you're not calculating the 
    min/max every time 
    """
    paths = hexbin_output.get_paths()
    v = paths[0].vertices[:-1] # adds a value [0,0] to the end
    vx,vy = v.T

    idx = [3,0,5,2] # index for [xmin,xmax,ymin,ymax]    
    xmin,xmax,ymin,ymax = vx[idx[0]],vx[idx[1]],vy[idx[2]],vy[idx[3]]

    half_width_x = abs(xmax-xmin)/2.0
    half_width_y = abs(ymax-ymin)/2.0

    centers = []
    for i in xrange(len(paths)):
        cx = paths[i].vertices[idx[0],0]+half_width_x
        cy = paths[i].vertices[idx[2],1]+half_width_y
        centers.append((cx,cy))

    return np.asarray(centers)


# important parts ==>

class Hexagonal2DGrid (object):
    """
    Used to fix the gridsize, extent, and bins
    """
    def __init__ (self,gridsize,extent,bins=None):
        self.gridsize = gridsize
        self.extent = extent
        self.bins = bins

def hexbin (x,y,hexgrid):
    """
    To hexagonally bin the data in 2 dimensions
    """
    fig = plt.figure()
    ax = fig.add_subplot(111)

    # Note mincnt=0 so that it will return a value for every point in the 
    # hexgrid, not just those with count>mincnt

    # Basically you fix the gridsize, extent, and bins to keep them the same
    # then the resulting count array is the same
    hexbin = plt.hexbin(x,y, mincnt=0,
                        gridsize=hexgrid.gridsize, 
                        extent=hexgrid.extent,
                        bins=hexgrid.bins)
    # you could close the figure if you don't want it
    # plt.close(fig.number)

    counts = hexbin.get_array().copy() 
    return counts, hexbin

# Example ===>
if __name__ == "__main__":
    hexgrid = Hexagonal2DGrid((21,5),[-70,70,-20,20])
    x_data,y_data = get_data((0,0),[[-40,95],[90,10]])
    x_model,y_model = get_data((0,10),[[100,30],[3,30]])

    counts_data, hexbin_data = hexbin(x_data,y_data,hexgrid)
    counts_model, hexbin_model = hexbin(x_model,y_model,hexgrid)

    # if you want the centers, they will be the same for both 
    centers = get_centers(hexbin_data) 

    # if you want to ignore the cells with zeros then use the following mask. 
    # But if want zeros for some bins and not others I'm not sure an elegant way
    # to do this without using the centers
    nonzero = counts_data != 0

    # now you can compare the two data sets
    variance_data = counts_data[nonzero]
    square_diffs = (counts_data[nonzero]-counts_model[nonzero])**2
    chi2 = np.sum(square_diffs/variance_data)
    print(" chi2={}".format(chi2))

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接