我一直在寻找一种使用Apache Common Math 3.0生成特定数据集的区间(通过指定下限、上限和所需区间数)的方法。我查看了Frequency http://commons.apache.org/math/apidocs/org/apache/commons/math3/stat/Frequency.html,但它并不能给我想要的结果...我想要一个能够给出某个区间内值的频率的方法(例如:0到5之间有多少个值)。有什么建议或想法吗?
我一直在寻找一种使用Apache Common Math 3.0生成特定数据集的区间(通过指定下限、上限和所需区间数)的方法。我查看了Frequency http://commons.apache.org/math/apidocs/org/apache/commons/math3/stat/Frequency.html,但它并不能给我想要的结果...我想要一个能够给出某个区间内值的频率的方法(例如:0到5之间有多少个值)。有什么建议或想法吗?
以下是使用 Apache Commons Math 3 实现直方图的简单方法:
final int BIN_COUNT = 20;
double[] data = {1.2, 0.2, 0.333, 1.4, 1.5, 1.2, 1.3, 10.4, 1, 2.0};
long[] histogram = new long[BIN_COUNT];
org.apache.commons.math3.random.EmpiricalDistribution distribution = new org.apache.commons.math3.random.EmpiricalDistribution(BIN_COUNT);
distribution.load(data);
int k = 0;
for(org.apache.commons.math3.stat.descriptive.SummaryStatistics stats: distribution.getBinStats())
{
histogram[k++] = stats.getN();
}
public static int[] calcHistogram(double[] data, double min, double max, int numBins) {
final int[] result = new int[numBins];
final double binSize = (max - min)/numBins;
for (double d : data) {
int bin = (int) ((d - min) / binSize);
if (bin < 0) { /* this data is smaller than min */ }
else if (bin >= numBins) { /* this data point is bigger than max */ }
else {
result[bin] += 1;
}
}
return result;
}
编辑: 这里有一个例子。
double[] data = { 2, 4, 6, 7, 8, 9 };
int[] histogram = calcHistogram(data, 0, 10, 4);
// This is a histogram with 4 bins, 0-2.5, 2.5-5, 5-7.5, 7.5-10.
assert histogram[0] == 1; // one point (2) in range 0-2.5
assert histogram[1] == 1; // one point (4) in range 2.5-5.
// etc..
result[i]
告诉你第i
个箱子中有多少数据点。如果你想要频率(比例),只需执行result[i] / data.length
... - Max我认为你的代码有一个错误,请参见下面更正后的代码:
public static int[] calcHistogram(double[] data, double min, double max, int numBins) {
final int[] result = new int[numBins];
final double binSize = (max - min)/numBins;
for (double d : data) {
int bin = (int) ((d - min) / binSize); // changed this from numBins
if (bin < 0) { /* this data is smaller than min */ }
else if (bin >= numBins) { /* this data point is bigger than max */ }
else {
result[bin] += 1;
}
}
return result;
}
public static Long[] calcHistogram(Double[] data, Double min, Double max, Integer numBins) {
final var interval = (max - min) / numBins;
return IntStream.range(0, numBins)
.boxed()
.map(n -> {
var binStart = min + n * interval;
var binEnd = min + (n + 1) * interval;
return Arrays.stream(data).filter(d -> d >= binStart && d < binEnd).count();
})
.toArray(Long[]::new);
}
private fun displayHistogram(binCount: Int, data: DoubleArray) {
val histogram = DoubleArray(binCount)
val distribution = org.apache.commons.math3.random.EmpiricalDistribution(binCount)
distribution.load(data)
var k = 0
for (stats in distribution.binStats) {
histogram[k++] = stats.n.toDouble()
}
val binSize = (data.max()!!.toDouble() - data.min()!!.toDouble()) / binCount
for (i in 0 until histogram.size) {
series2?.appendData(DataPoint(generateHistogramXValues(data.min()!!.toDouble(), histogram.size, binSize)[i], histogram[i]), false, histogram.count())
}
}
这是生成x值的方法
val xValuesArray = DoubleArray(numberOfBIns)
for (i in 0 until numberOfBIns) {
if (i == 0){
xValuesArray[i] = min
}else{
val previous = xValuesArray[i-1]
xValuesArray[i] = previous+binSize
}
}
return xValuesArray
}
我正在使用 GraphView
绘图库在 Android 上进行此操作,但您可以在任何库上使用它。
SortedMultiset
。 - Louis Wasserman