Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong median in boxPlot #13

Open
simonkahl opened this issue Oct 8, 2019 · 3 comments
Open

Wrong median in boxPlot #13

simonkahl opened this issue Oct 8, 2019 · 3 comments

Comments

@simonkahl
Copy link

Hello,
this is a great toolbox and is liked the styling and the customization of the boxplots. However, I stumbled upon a bug in the calculation of the median. This script and the corresponding figure should clarify this issue:

data = [1 1 1 2 4 6 7];

figure
subplot( 1, 2, 1 )
b = iosr.statistics.boxPlot( data' );
title( {'IoSR-Surrey'; ...
'Matlab Toolbox'; ...
['Median = ' num2str(b.statistics.median)]} )

subplot( 1, 2, 2 )
boxplot(data');
title( {'MatLab R2018b'; ...
'Statistics and Machine Learning Toolbox'; ...
['Median = ' num2str(median(data))]} )

iosrBug

Hopefully this bug can be easily fixed.

@chummersone
Copy link
Contributor

The toolbox and MATLAB use different methods to calculate the median. See the ‘method’ property of iosr.statistics.boxPlot. The same methods are provided by the underlying function: iosr.statistics.quantile.

@DominikSchmidbauer
Copy link

DominikSchmidbauer commented Jan 27, 2022

I've got the same problem!

This is my data:

[1.501618122977346;0.498381877022654;0.460992907801418;0.375886524822695;1.080378250591016;1.724586288416076;1;0.258227848101266;1.741772151898734]

Median is 1 (9 numbers, 5th number in the sorted list) but the plot shows the median at 0.749190938511327.

The method (neither R-5 nor R-8) does not change anything as it only determines how the quantiles are calculated.

Apparently, the median calculated by boxPlot is the average of the 4th and 5th element in the list.

It happens only with these particular numbers. Other vectors, even with the same length are correctly plotted.

@prash-p
Copy link

prash-p commented Oct 21, 2022

The issue is that the median is being estimated as the data is being treated as a sample. Most users just want the median calculated as the MATLAB median() value.

This line: https://github.com/IoSR-Surrey/MatlabToolbox/blob/master/%2Biosr/%2Bstatistics/statsPlot.m#L170 should be:
obj.statistics.median = median(obj.y)

Changing to R-5 still does not give the correct median value - see in this box plot of 3 points for example the horizontal line does not pass through the middle point in 'R-5' or 'R-8'
image

But changing the line as above calculates the 'correct' median (what most users expect):

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants