如何从一组数字中计算均值、中位数、众数和范围

54

是否有任何函数(作为数学库的一部分)可以从一组数字中计算平均值, 中位数,众数和极差。

7个回答

76

是的,在Java Math中确实没有第三方库。两个被提到的库如下:

http://opsresearch.com/app/

http://www.iro.umontreal.ca/~simardr/ssj/indexe.html

但是,编写自己的方法来计算平均值、中位数、众数和范围并不难。

平均值

public static double mean(double[] m) {
    double sum = 0;
    for (int i = 0; i < m.length; i++) {
        sum += m[i];
    }
    return sum / m.length;
}

中位数

// the array double[] m MUST BE SORTED
public static double median(double[] m) {
    int middle = m.length/2;
    if (m.length%2 == 1) {
        return m[middle];
    } else {
        return (m[middle-1] + m[middle]) / 2.0;
    }
}

模式

public static int mode(int a[]) {
    int maxValue, maxCount;

    for (int i = 0; i < a.length; ++i) {
        int count = 0;
        for (int j = 0; j < a.length; ++j) {
            if (a[j] == a[i]) ++count;
        }
        if (count > maxCount) {
            maxCount = count;
            maxValue = a[i];
        }
    }

    return maxValue;
}

更新

正如Neelesh Salpe指出的那样,上述方法无法处理多模态集合。我们可以很容易地解决这个问题:

public static List<Integer> mode(final int[] numbers) {
    final List<Integer> modes = new ArrayList<Integer>();
    final Map<Integer, Integer> countMap = new HashMap<Integer, Integer>();

    int max = -1;

    for (final int n : numbers) {
        int count = 0;

        if (countMap.containsKey(n)) {
            count = countMap.get(n) + 1;
        } else {
            count = 1;
        }

        countMap.put(n, count);

        if (count > max) {
            max = count;
        }
    }

    for (final Map.Entry<Integer, Integer> tuple : countMap.entrySet()) {
        if (tuple.getValue() == max) {
            modes.add(tuple.getKey());
        }
    }

    return modes;
}

补充

如果你使用的是 Java 8 或更高版本,你也可以像下面这样确定模式:

public static List<Integer> getModes(final List<Integer> numbers) {
    final Map<Integer, Long> countFrequencies = numbers.stream()
            .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));

    final long maxFrequency = countFrequencies.values().stream()
            .mapToLong(count -> count)
            .max().orElse(-1);

    return countFrequencies.entrySet().stream()
            .filter(tuple -> tuple.getValue() == maxFrequency)
            .map(Map.Entry::getKey)
            .collect(Collectors.toList());
}

谢谢,但如果可能的话,我更愿意使用现成的东西。 - user339108
如果你有一个非常大的数组或需要动态计算值,这个类会出现问题。可以在没有数组的情况下编写均值和标准差;对于中位数和众数则不太确定。 - duffymo
1
正如我在Adeel的回答中所提到的,对整个数组进行排序以获取中位数是相当低效的。 - Chinasaur
@NeeleshSalpe - 感谢您指出这一点。我已经更新了我的答案。 - Nico Huysamen
如果一个数组只有一个条目,中位数函数会抛出ArrayIndexOutOfBoundsException。 - Sascha Vetter
显示剩余2条评论

13

参见 Adeel 的回答评论:Apache Commons Math 似乎使用了一个相当低效的中位数算法。 - Chinasaur

2
    public static Set<Double> getMode(double[] data) {
            if (data.length == 0) {
                return new TreeSet<>();
            }
            TreeMap<Double, Integer> map = new TreeMap<>(); //Map Keys are array values and Map Values are how many times each key appears in the array
            for (int index = 0; index != data.length; ++index) {
                double value = data[index];
                if (!map.containsKey(value)) {
                    map.put(value, 1); //first time, put one
                }
                else {
                    map.put(value, map.get(value) + 1); //seen it again increment count
                }
            }
            Set<Double> modes = new TreeSet<>(); //result set of modes, min to max sorted
            int maxCount = 1;
            Iterator<Integer> modeApperance = map.values().iterator();
            while (modeApperance.hasNext()) {
                maxCount = Math.max(maxCount, modeApperance.next()); //go through all the value counts
            }
            for (double key : map.keySet()) {
                if (map.get(key) == maxCount) { //if this key's value is max
                    modes.add(key); //get it
                }
            }
            return modes;
        }

        //std dev function for good measure
        public static double getStandardDeviation(double[] data) {
            final double mean = getMean(data);
            double sum = 0;
            for (int index = 0; index != data.length; ++index) {
                sum += Math.pow(Math.abs(mean - data[index]), 2);
            }
            return Math.sqrt(sum / data.length);
        }


        public static double getMean(double[] data) {
        if (data.length == 0) {
            return 0;
        }
        double sum = 0.0;
        for (int index = 0; index != data.length; ++index) {
            sum += data[index];
        }
        return sum / data.length;
    }

//by creating a copy array and sorting it, this function can take any data.
    public static double getMedian(double[] data) {
        double[] copy = Arrays.copyOf(data, data.length);
        Arrays.sort(copy);
        return (copy.length % 2 != 0) ? copy[copy.length / 2] : (copy[copy.length / 2] + copy[(copy.length / 2) - 1]) / 2;
    }

1
如果您只关心单峰分布,可以考虑类似这样的内容。
public static Optional<Integer> mode(Stream<Integer> stream) {
    Map<Integer, Long> frequencies = stream
        .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));

    return frequencies.entrySet().stream()
        .max(Comparator.comparingLong(Map.Entry::getValue))
        .map(Map.Entry::getKey);
}

0
public class Mode {
    public static void main(String[] args) {
        int[] unsortedArr = new int[] { 3, 1, 5, 2, 4, 1, 3, 4, 3, 2, 1, 3, 4, 1 ,-1,-1,-1,-1,-1};
        Map<Integer, Integer> countMap = new HashMap<Integer, Integer>();

        for (int i = 0; i < unsortedArr.length; i++) {
            Integer value = countMap.get(unsortedArr[i]);

            if (value == null) {
                countMap.put(unsortedArr[i], 0);
            } else {
                int intval = value.intValue();
                intval++;
                countMap.put(unsortedArr[i], intval);
            }
        }

        System.out.println(countMap.toString());

        int max = getMaxFreq(countMap.values());
        List<Integer> modes = new ArrayList<Integer>();

        for (Entry<Integer, Integer> entry : countMap.entrySet()) {
            int value = entry.getValue();
            if (value == max)
                modes.add(entry.getKey());
        }
        System.out.println(modes);
    }

    public static int getMaxFreq(Collection<Integer> valueSet) {
        int max = 0;
        boolean setFirstTime = false;

        for (Iterator iterator = valueSet.iterator(); iterator.hasNext();) {
            Integer integer = (Integer) iterator.next();

            if (!setFirstTime) {
                max = integer;
                setFirstTime = true;
            }
            if (max < integer) {
                max = integer;
            }
        }
        return max;
    }
}

测试数据

对于{3, 1, 5, 2, 4, 1, 3, 4, 3, 2, 1, 3, 4, 1},模式为{1,3};
对于{3, 1, 5, 2, 4, 1, 3, 4, 3, 2, 1, 3, 4, 1, -1, -1, -1, -1, -1},模式为{-1}。


0

正如Nico Huysamen所指出的那样,在Java 1.8中找到多个模式可以采用以下替代方法。

import java.util.ArrayList;
import java.util.List;
import java.util.HashMap;
import java.util.Map;

public static void mode(List<Integer> numArr) {
    Map<Integer, Integer> freq = new HashMap<Integer, Integer>();;
    Map<Integer, List<Integer>> mode = new HashMap<Integer, List<Integer>>();

    int modeFreq = 1; //record the highest frequence
    for(int x=0; x<numArr.size(); x++) { //1st for loop to record mode
        Integer curr = numArr.get(x); //O(1)
        freq.merge(curr, 1, (a, b) -> a + b); //increment the frequency for existing element, O(1)
        int currFreq = freq.get(curr); //get frequency for current element, O(1)

        //lazy instantiate a list if no existing list, then
        //record mapping of frequency to element (frequency, element), overall O(1)
        mode.computeIfAbsent(currFreq, k -> new ArrayList<>()).add(curr);

        if(modeFreq < currFreq) modeFreq = currFreq; //update highest frequency
    }
    mode.get(modeFreq).forEach(x -> System.out.println("Mode = " + x)); //pretty print the result //another for loop to return result
}

编程愉快!


0

这是完整的、清洁的、优化过的JAVA 8代码

import java.io.*;
import java.util.*;

public class Solution {

public static void main(String[] args) {

    /*Take input from user*/
    Scanner sc = new Scanner(System.in);

    int n =0;
    n = sc.nextInt();
    
    int arr[] = new int[n];
    
    //////////////mean code starts here//////////////////
    int sum = 0;
    for(int i=0;i<n; i++)
    {
         arr[i] = sc.nextInt();
         sum += arr[i]; 
    }
    System.out.println((double)sum/n); 
    //////////////mean code ends here//////////////////


    //////////////median code starts here//////////////////
    Arrays.sort(arr);
    int val = arr.length/2;
    System.out.println((arr[val]+arr[val-1])/2.0); 
    //////////////median code ends here//////////////////


    //////////////mode code starts here//////////////////
    int maxValue=0;
    int maxCount=0;

    for(int i=0; i<n; ++i)
    {
        int count=0;

        for(int j=0; j<n; ++j)
        {
            if(arr[j] == arr[i])
            {
                ++count;
            }

            if(count > maxCount)
            {
                maxCount = count;
                maxValue = arr[i];
            }
        }
    } 
    System.out.println(maxValue);
   //////////////mode code ends here//////////////////

  }

}

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接