如何在Tensorflow中计算Spearman相关性

Question

如何在Tensorflow中计算Spearman相关性

6

问题

我需要计算Pearson和Spearman相关性，并在tensorflow中将其用作度量标准。

对于Pearson，这很简单：

tf.contrib.metrics.streaming_pearson_correlation(y_pred, y_true)

但是对于Spearman，我一无所知！

我尝试过的：

从这个答案：

    samples = 1
    predictions_rank = tf.nn.top_k(y_pred, k=samples, sorted=True, name='prediction_rank').indices
    real_rank = tf.nn.top_k(y_true, k=samples, sorted=True, name='real_rank').indices
    rank_diffs = predictions_rank - real_rank
    rank_diffs_squared_sum = tf.reduce_sum(rank_diffs * rank_diffs)
    six = tf.constant(6)
    one = tf.constant(1.0)
    numerator = tf.cast(six * rank_diffs_squared_sum, dtype=tf.float32)
    divider = tf.cast(samples * samples * samples - samples, dtype=tf.float32)
    spearman_batch = one - numerator / divider

但是它返回了 NaN...

根据维基百科的定义：

我尝试了：

size = tf.size(y_pred)
indice_of_ranks_pred = tf.nn.top_k(y_pred, k=size)[1]
indice_of_ranks_label = tf.nn.top_k(y_true, k=size)[1]
rank_pred = tf.nn.top_k(-indice_of_ranks_pred, k=size)[1]
rank_label = tf.nn.top_k(-indice_of_ranks_label, k=size)[1]
rank_pred = tf.to_float(rank_pred)
rank_label = tf.to_float(rank_label)
spearman = tf.contrib.metrics.streaming_pearson_correlation(rank_pred, rank_label)

但在运行时，我遇到了以下错误：

tensorflow.python.framework.errors_impl.InvalidArgumentError：输入必须至少有k列。现有1列，需要32列。

[[{{node metrics/spearman/TopKV2}} = TopKV2 [T=DT_FLOAT，sorted = true，_device =“/job:localhost/replica:0/task:0/device:CPU:0”]（lambda_1 / add，metrics/pearson/pearson_r/variance_predictions/Size）]]

- Astariul

3个回答

3

我一直在努力实现Spearman等级相关系数的Tensorflow代码，按照此网站（https://rpubs.com/aaronsc32/spearman-rank-correlation）的定义进行。我已经编写了下面的代码（如果有人发现它有用，我会分享它）。

@tf.function
def get_rank(y_pred):
  rank = tf.argsort(tf.argsort(y_pred, axis=-1, direction="ASCENDING"), axis=-1)+1 #+1 to get the rank starting in 1 instead of 0
  return rank

@tf.function
def sp_rank(x, y):
  cov = tfp.stats.covariance(x, y, sample_axis=0, event_axis=None)
  sd_x = tfp.stats.stddev(x, sample_axis=0, keepdims=False, name=None)
  sd_y = tfp.stats.stddev(y, sample_axis=0, keepdims=False, name=None)
  return 1-cov/(sd_x*sd_y) #1- because we want to minimize loss

@tf.function
def spearman_correlation(y_true, y_pred):
    #First we obtain the ranking of the predicted values
    y_pred_rank = tf.map_fn(lambda x: get_rank(x), y_pred, dtype=tf.float32)
    
    #Spearman rank correlation between each pair of samples:
    #Sample dim: (1, 8)
    #Batch of samples dim: (None, 8) None=batch_size=64
    #Output dim: (batch_size, ) = (64, )
    sp = tf.map_fn(lambda x: sp_rank(x[0],x[1]), (y_true, y_pred_rank), dtype=tf.float32)
    #Reduce to a single value
    loss = tf.reduce_mean(sp)
    return loss

- kevin

非常好！你尝试过将它转换为一个类吗？这样可以在训练期间用于监视排名相关性。 - Théophile Pace

0

top_k().indices 返回最佳元素的索引。Spearman 需要排名。它们是不同的。

例如，对于数组 [3, 1, 2]：

top_k().indices 返回 [1, 2, 0]
Spearman 需要 [2, 0, 1]。

您可以使用以下调用（使用 tf.scatter_nd()）获取排名：

def my_spearman(y_pred, labels):
  predictions_rank = tf.argsort(tf.squeeze(y_pred))
  real_rank = tf.argsort(labels)
  r = tf.range(tf.shape(labels))
  real_rank = tf.scatter_nd(tf.expand_dims(real_rank, -1), r, tf.shape(real_rank))
  predictions_rank = tf.scatter_nd(tf.expand_dims(predictions_rank, -1), r, tf.shape(predictions_rank))
  rank_diffs = predictions_rank - real_rank
  rank_diffs_squared_sum = tf.reduce_sum(rank_diffs * rank_diffs)
  numerator = tf.cast(6 * rank_diffs_squared_sum, dtype=tf.float32)
  samples = tf.shape(rank_diffs)[0]
  divider = tf.cast(samples * samples * samples - samples, dtype=tf.float32)
  spearman = 1.0 - numerator / divider
  return spearman

请注意，如果元素不唯一，则此算法无法工作。相反，应该在排名上计算皮尔逊相关系数：

def correlationMetric(x, y):
  x = tf.cast(x, tf.float32)
  y = tf.cast(y, tf.float32)
  n = tf.cast(tf.shape(x)[0], x.dtype)
  xsum = tf.reduce_sum(x, axis=0)
  ysum = tf.reduce_sum(y, axis=0)
  xmean = xsum / n
  ymean = ysum / n
  xvar = tf.reduce_sum(tf.math.squared_difference(x, xmean), axis=0)
  yvar = tf.reduce_sum(tf.math.squared_difference(y, ymean), axis=0)
  cov = tf.reduce_sum((x - xmean) * (y - ymean), axis=0)
  corr = cov / tf.sqrt(xvar * yvar)
  return corr

def my_spearman(y_pred, labels):
  predictions_rank = tf.argsort(tf.squeeze(y_pred))
  real_rank = tf.argsort(labels)
  r = tf.range(tf.shape(labels))
  real_rank = tf.scatter_nd(tf.expand_dims(real_rank, -1), r, tf.shape(real_rank))
  predictions_rank = tf.scatter_nd(tf.expand_dims(predictions_rank, -1), r, tf.shape(predictions_rank))
  spearman = correlationMetric(real_rank, predictions_rank)
  return spearman

- Andrey

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Konstantinos Ntagiantas · Accepted Answer

你可以使用Tensorflow的函数tf.py_function来与scipy.stats.spearmanr一起使用，定义输入和输出如下：

from scipy.stats import spearmanr
def get_spearman_rankcor(y_true, y_pred):
     return ( tf.py_function(spearmanr, [tf.cast(y_pred, tf.float32), 
                       tf.cast(y_true, tf.float32)], Tout = tf.float32) )