MySQL中计算中位数的简单方法

266

如何在 MySQL 中计算中位数,最好是简单快速的方法?我已经使用 AVG(x) 找到了平均值,但我很难找到一种简单的方式来计算中位数。目前,我正在将所有行返回到 PHP 中,排序,然后选择中间行,但肯定有一些简单的方法可以在单个 MySQL 查询中完成。

示例数据:

id | val
--------
 1    4
 2    7
 3    2
 4    2
 5    9
 6    8
 7    3

按照val排序得到2 2 3 4 7 8 9,因此中位数应为4,而SELECT AVG(val)的结果为5


7
自MariaDB 10.3版本以来,新增了一个函数——中位数函数,请参考 https://mariadb.com/kb/en/library/median/。 - berturion
50个回答

6
您可以使用在这里找到的用户定义函数。

4
这看起来是最有用的,但我不想在我的生产服务器上安装可能会导致 MySQL 崩溃的不稳定的 alpha 软件 :( Translated: 这个选项似乎是最有用的,但我不想在我的生产服务器上安装不稳定的 alpha 软件,这可能会导致 MySQL 崩溃。 - davr
7
所以研究它们与所需功能相关的源代码,根据需要修复或修改它们,然后安装“您自己”的稳定且非测试版本--这与在Stack Overflow上类似地调整不那么成熟的代码建议相比有何不同呢?-) - Alex Martelli

5
建立在Velcro的回答基础上,对于那些需要根据另一个参数进行分组的中位数计算:
选择grp_field和t1.val FROM( SELECT grp_field,@rownum:= IF(@s = grp_field,@rownum + 1,0)AS row_number, @s:= IF(@s = grp_field,@s,grp_field)AS sec,d.val FROM data d,(SELECT @rownum:= 0,@s:= 0)r ORDER BY grp_field,d.val )作为t1 JOIN( SELECT grp_field,count(*)as total_rows FROM data d GROUP BY grp_field )作为t2 ON t1.grp_field = t2.grp_field WHERE t1.row_number = floor(total_rows / 2)+1;

3

如果值的数量为奇数,则会计算中间两个值的平均值。

SELECT AVG(val) FROM
  ( SELECT x.id, x.val from data x, data y
      GROUP BY x.id, x.val
      HAVING SUM(SIGN(1-SIGN(IF(y.val-x.val=0 AND x.id != y.id, SIGN(x.id-y.id), y.val-x.val)))) IN (ROUND((COUNT(*))/2), ROUND((COUNT(*)+1)/2))
  ) sq

3
我的代码,没有表格或额外变量也很高效:
SELECT
((SUBSTRING_INDEX(SUBSTRING_INDEX(group_concat(val order by val), ',', floor(1+((count(val)-1) / 2))), ',', -1))
+
(SUBSTRING_INDEX(SUBSTRING_INDEX(group_concat(val order by val), ',', ceiling(1+((count(val)-1) / 2))), ',', -1)))/2
as median
FROM table;

5
由于GROUP_CONCAT被限制在1023个字符以内,即使在此类函数中使用,这将在任何大量数据的情况下失败。 - Rob Van Dam
1
你可以调整group_concat的限制到相当大的字符数,但这种批评是有道理的。最终在某些数据集上,查询将失败。 - Chris Strickland

3

一次查询实现完美中位数存档:

SELECT 
COUNT(*) as total_rows, 
IF(count(*)%2 = 1, CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(val ORDER BY val SEPARATOR ','), ',', 50/100 * COUNT(*)), ',', -1) AS DECIMAL), ROUND((CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(val ORDER BY val SEPARATOR ','), ',', 50/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) + CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(val ORDER BY val SEPARATOR ','), ',', 50/100 * COUNT(*)), ',', -1) AS DECIMAL)) / 2)) as median, 
AVG(val) as average 
FROM 
data

2
SELECT 
    SUBSTRING_INDEX(
        SUBSTRING_INDEX(
            GROUP_CONCAT(field ORDER BY field),
            ',',
            ((
                ROUND(
                    LENGTH(GROUP_CONCAT(field)) - 
                    LENGTH(
                        REPLACE(
                            GROUP_CONCAT(field),
                            ',',
                            ''
                        )
                    )
                ) / 2) + 1
            )),
            ',',
            -1
        )
FROM
    table

今日免费次数已满, 请开通会员/明日再来

它无法正确返回偶数个值的中位数,例如,{98,102,102,98}的中位数是100,但您的代码给出了102。对于奇数个数字,它可以正常工作。 - Nomiluks

2

我正在使用以下表格来解决MySQL问题:

CREATE TABLE transactions (
  transaction_id int , user_id int , merchant_name varchar(255), transaction_date date , amount int
);

INSERT INTO transactions (transaction_id, user_id, merchant_name, transaction_date, amount)  
VALUES (1, 1 ,'abc', '2015-08-17', 100),(2, 2, 'ced', '2015-2-17', 100),(3, 1, 'def', '2015-2-16', 121),
(4, 1 ,'ced', '2015-3-17', 110),(5, 1, 'ced', '2015-3-17', 150),(6, 2 ,'abc', '2015-4-17', 130), 
(7, 3 ,'ced', '2015-12-17', 10),(8, 3 ,'abc', '2015-8-17', 100),(9, 2 ,'abc', '2015-12-17', 140),(10, 1,'abc', '2015-9-17', 100),
(11, 1 ,'abc', '2015-08-17', 121),(12, 2 ,'ced', '2015-12-23', 130),(13, 1 ,'def', '2015-12-23', 13),(3, 4, 'abc', '2015-2-16', 120),(3, 4, 'def', '2015-2-16', 121),(3, 4, 'ced', '2015-2-16', 121);

计算“amount”列的中位数:
WITH Numbered AS 
(
SELECT *, COUNT(*) OVER () AS TotatRecords,
    ROW_NUMBER() OVER (ORDER BY amount) AS RowNum
FROM transactions
)
SELECT Avg(amount)
FROM Numbered
WHERE RowNum IN ( FLOOR((TotatRecords+1)/2), FLOOR((TotatRecords+2)/2) )
;

TotalRecords = 16 and Median = 120.5000

这个查询适用于奇数和偶数记录的两种情况。

2

如果你知道准确的行数,可以使用以下查询:

SELECT <value> AS VAL FROM <table> ORDER BY VAL LIMIT 1 OFFSET <half>

今日免费次数已满, 请开通会员/明日再来

2
另一种基于Velcrow答案的方法,但使用单个中间表,并利用行编号的变量来获取计数,而不是执行额外的查询来计算它。还将计数从第一行开始,使第一行成为第0行,以便简单地使用Floor和Ceil选择中位数行。
SELECT Avg(tmp.val) as median_val
    FROM (SELECT inTab.val, @rows := @rows + 1 as rowNum
              FROM data as inTab,  (SELECT @rows := -1) as init
              -- Replace with better where clause or delete
              WHERE 2 > 1
              ORDER BY inTab.val) as tmp
    WHERE tmp.rowNum in (Floor(@rows / 2), Ceil(@rows / 2));

2

你可以选择使用存储过程来完成这个操作:

DROP PROCEDURE IF EXISTS median;
DELIMITER //
CREATE PROCEDURE median (table_name VARCHAR(255), column_name VARCHAR(255), where_clause VARCHAR(255))
BEGIN
  -- Set default parameters
  IF where_clause IS NULL OR where_clause = '' THEN
    SET where_clause = 1;
  END IF;

  -- Prepare statement
  SET @sql = CONCAT(
    "SELECT AVG(middle_values) AS 'median' FROM (
      SELECT t1.", column_name, " AS 'middle_values' FROM
        (
          SELECT @row:=@row+1 as `row`, x.", column_name, "
          FROM ", table_name," AS x, (SELECT @row:=0) AS r
          WHERE ", where_clause, " ORDER BY x.", column_name, "
        ) AS t1,
        (
          SELECT COUNT(*) as 'count'
          FROM ", table_name, " x
          WHERE ", where_clause, "
        ) AS t2
        -- the following condition will return 1 record for odd number sets, or 2 records for even number sets.
        WHERE t1.row >= t2.count/2
          AND t1.row <= ((t2.count/2)+1)) AS t3
    ");

  -- Execute statement
  PREPARE stmt FROM @sql;
  EXECUTE stmt;
END//
DELIMITER ;


-- Sample usage:
-- median(table_name, column_name, where_condition);
CALL median('products', 'price', NULL);

谢谢!用户应该知道,缺失值(NULL)被视为有效值。为了避免这个问题,请在条件中添加 'x IS NOT NULL'。 - giordano
1
@giordano 在代码的哪一行应该添加x IS NOT NULL - Przemyslaw Remin
1
@PrzemyslawRemin 抱歉,我的陈述不够清晰,我现在意识到SP已经考虑到了缺失值的情况。应该这样调用SP: CALL median("table","x","x IS NOT NULL") - giordano

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接