MySQL中计算中位数的简单方法

266

如何在 MySQL 中计算中位数,最好是简单快速的方法?我已经使用 AVG(x) 找到了平均值,但我很难找到一种简单的方式来计算中位数。目前,我正在将所有行返回到 PHP 中,排序,然后选择中间行,但肯定有一些简单的方法可以在单个 MySQL 查询中完成。

示例数据:

id | val
--------
 1    4
 2    7
 3    2
 4    2
 5    9
 6    8
 7    3

按照val排序得到2 2 3 4 7 8 9,因此中位数应为4,而SELECT AVG(val)的结果为5


7
自MariaDB 10.3版本以来,新增了一个函数——中位数函数,请参考 https://mariadb.com/kb/en/library/median/。 - berturion
50个回答

0
以下的SQL代码将帮助你在MySQL中使用用户定义变量计算中位数。

create table employees(salary int);

insert into employees values(8);
insert into employees values(23);
insert into employees values(45);
insert into employees values(123);
insert into employees values(93);
insert into employees values(2342);
insert into employees values(2238);

select * from employees;

Select salary from employees  order by salary;

set @rowid=0;
set @cnt=(select count(*) from employees);
set @middle_no=ceil(@cnt/2);
set @odd_even=null;

select AVG(salary) from 
(select salary,@rowid:=@rowid+1 as rid, (CASE WHEN(mod(@cnt,2)=0) THEN @odd_even:=1 ELSE @odd_even:=0 END) as odd_even_status  from employees  order by salary) as tbl where tbl.rid=@middle_no or tbl.rid=(@middle_no+@odd_even);

如果您正在寻找详细的解释,请参考此博客。


0
在某些情况下,中位数的计算方法如下:
当数字列表按值排序时,“中位数”是列表中的“中间”值。对于偶数个数字的集合,中位数是两个中间值的平均值。 我已经为此编写了一个简单的代码:
$midValue = 0;
$rowCount = "SELECT count(*) as count {$from} {$where}";

$even = FALSE;
$offset = 1;
$medianRow = floor($rowCount / 2);
if ($rowCount % 2 == 0 && !empty($medianRow)) {
  $even = TRUE;
  $offset++;
  $medianRow--;
}

$medianValue = "SELECT column as median 
               {$fromClause} {$whereClause} 
               ORDER BY median 
               LIMIT {$medianRow},{$offset}";

$medianValDAO = db_query($medianValue);
while ($medianValDAO->fetch()) {
  if ($even) {
    $midValue = $midValue + $medianValDAO->median;
  }
  else {
    $median = $medianValDAO->median;
  }
}
if ($even) {
  $median = $midValue / 2;
}
return $median;

返回的$median将是所需的结果 :-)


0

这些方法从同一张表中选择两次。如果源数据来自一个昂贵的查询,这是避免运行两次的方法:

select KEY_FIELD, AVG(VALUE_FIELD) MEDIAN_VALUE
from (
    select KEY_FIELD, VALUE_FIELD, RANKF
    , @rownumr := IF(@prevrowidr=KEY_FIELD,@rownumr+1,1) RANKR
    , @prevrowidr := KEY_FIELD
    FROM (
        SELECT KEY_FIELD, VALUE_FIELD, RANKF
        FROM (
            SELECT KEY_FIELD, VALUE_FIELD 
            , @rownumf := IF(@prevrowidf=KEY_FIELD,@rownumf+1,1) RANKF
            , @prevrowidf := KEY_FIELD     
            FROM (
                SELECT KEY_FIELD, VALUE_FIELD 
                FROM (
                    -- some expensive query
                )   B
                ORDER BY  KEY_FIELD, VALUE_FIELD
            ) C
            , (SELECT @rownumf := 1) t_rownum
            , (SELECT @prevrowidf := '*') t_previd
        ) D
        ORDER BY  KEY_FIELD, RANKF DESC
    ) E
    , (SELECT @rownumr := 1) t_rownum
    , (SELECT @prevrowidr := '*') t_previd
) F
WHERE RANKF-RANKR BETWEEN -1 and 1
GROUP BY KEY_FIELD

0

按维度分组的中位数:

SELECT your_dimension, avg(t1.val) as median_val FROM (
SELECT @rownum:=@rownum+1 AS `row_number`,
   IF(@dim <> d.your_dimension, @rownum := 0, NULL),
   @dim := d.your_dimension AS your_dimension,
   d.val
   FROM data d,  (SELECT @rownum:=0) r, (SELECT @dim := 'something_unreal') d
  WHERE 1
  -- put some where clause here
  ORDER BY d.your_dimension, d.val
) as t1
INNER JOIN  
(
  SELECT d.your_dimension,
    count(*) as total_rows
  FROM data d
  WHERE 1
  -- put same where clause here
  GROUP BY d.your_dimension
) as t2 USING(your_dimension)
WHERE 1
AND t1.row_number in ( floor((total_rows+1)/2), floor((total_rows+2)/2) )

GROUP BY your_dimension;

0

0

以下查询适用于奇数或偶数行。在子查询中,我们找到具有相同行数的值,即在其之前和之后。对于奇数行,having子句将计算为0(相同数量的行之前和之后抵消了符号)。

同样地,对于偶数行,having子句将为两行(中心2行)计算为1,因为它们将(共同)具有相同数量的行之前和之后。

在外部查询中,我们将平均单个值(对于奇数行)或(对于偶数行的2个值)。

select avg(val) as median
from
(
    select d1.val
    from data d1 cross join data d2
    group by d1.val
    having abs(sum(sign(d1.val-d2.val))) in (0,1)
) sub

注意:如果您的表中有重复的值,则上述having子句应更改为以下条件。在这种情况下,可能存在原始可能性之外的值0,1。以下条件将使此条件动态化,并在出现重复时起作用。
having sum(case when d1.val=d2.val then 1 else 0 end)>=
abs(sum(sign(d1.val-d2.val)))

0

让我们创建一个名为numbers的示例表

此答案特定于MySQL数据库

在PostgresSql中,可以简单地使用per_cont函数

CREATE TABLE numbers(
  num INT,
  frequency INT
);

在数字表中插入数值

INSERT INTO numbers VALUES  
        (0,7),
        (1,1),
        (2,3),
        (3,1),
        (9,1),
        (1,1),
        (2,3),
        (3,1),
        (9,1);

-- select * from numbers 

WITH RECURSIVE num_frequency (num,frequency, i) AS 
(
SELECT  num,frequency,1
FROM   numbers
UNION ALL
SELECT num,frequency,i+1
FROM   num_frequency
WHERE  num_frequency.i < num_frequency.frequency
)

select * 
(max(case when numbers=lower_limit then num else null end)/2
+max(case when numbers=upper_limit then num else null end)/2) as median
from (
select *,total_number%2,
case 
when  total_number%2=0 then total_number/2
else  (total_number+1)/2 end as lower_limit,
case 
when  total_number%2=0 then total_number/2+1
else  (total_number+1)/2
end as upper_limit

from (
select *,max(numbers) over() as total_number from (
select num,row_number() over (order by num) 
as numbers from num_frequency
)b 
)b
)b

欢迎来到Stackoverflow。这个问题被提出已经超过13年了,并且已经有一个被接受的答案。请在添加新答案时添加一些详细信息说明原因。 - MD Zand

0
如果你需要每组的中位数,那么在ROW_NUMBER() OVER(...)中使用"PARTITION BY"。
WITH Numbered AS 
(
  SELECT groupingColumn, 
  val,
  COUNT(*) OVER (partition by groupingColumn) AS Cnt,
  ROW_NUMBER() OVER (partition by groupingColumn ORDER BY val) AS RowNum
  FROM yourtable
)
SELECT groupingColumn, val
FROM Numbered
WHERE RowNum IN ((Cnt+1)/2, (Cnt+2)/2)
ORDER BY groupingColumn
;

0
使用JSON函数在MySQL 5.7+、8+和MariaDB 10.2+中计算中位数的另一种方法。
这是我在MySQL 8.0中测试过的存储函数:
CREATE FUNCTION JSON_MEDIAN(input_json JSON)
RETURNS FLOAT NO SQL
BEGIN
    DECLARE median FLOAT;
    DECLARE middle INT;
    DECLARE arr_length INT;
    DECLARE peek_count INT;
    
    -- count non-empty items
    SELECT COUNT(*) INTO arr_length
    FROM JSON_TABLE(input_json, '$[*]' COLUMNS (item FLOAT PATH '$')) s1
    WHERE item IS NOT NULL;
    -- peek 1 item if length is odd or 2 items if length is even
    SET peek_count = 2 - arr_length % 2;
    SET middle = CEIL(arr_length / 2) - 1;
    
    SELECT AVG(item) INTO median 
    FROM (
        SELECT item
        FROM JSON_TABLE(input_json, '$[*]' COLUMNS (item FLOAT PATH '$')) s1
        WHERE item IS NOT NULL
        ORDER BY item
        LIMIT middle, peek_count
    ) s2;
    
    RETURN median;
END

现在您可以使用包含数字项目的JSON数组来使用此函数,或者使用JSON_ARRAYAGG函数创建输入,如下所示:

SELECT JSON_MEDIAN(JSON_ARRAYAGG(`val`))
FROM `my_table`

这种方法没有 GROUP_CONCAT 的限制。


-1
set @r = 0;

select  
    case when mod(c,2)=0 then round(sum(lat_N),4)
    else round(sum(lat_N)/2,4) 
    end as Med  
from 
    (select lat_N, @r := @r+1, @r as id from station order by lat_N) A
    cross join
    (select (count(1)+1)/2 as c from station) B
where id >= floor(c) and id <=ceil(c)

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接