Supplementary material to the paper

Two conference posters are also related to the project:

Poster (in Hungarian) presented at the 10th Hungarian Biometric Conference (16-17 May 2014, Budapest)

Poster (in English) presented at ISCB2014, the 35th Annual Conference of the International Society for Clinical Biostatistics (24-28 August 2014, Vienna, Austria)

M. Marozzi (2012), A Modified Cucconi Test for Location and Scale Change Alternatives, Colombian Journal of Statistics, 35, 3, 369-382.

M. Marozzi (2013), Nonparametric Simultaneous Tests for Location and Scale Testing: a Comparison of Several Methods, Communication in Statistics – Simulation and Computation, 42, 6, 1298-1317.

M. Marozzi (2014), The Multisample Cucconi Test, Statistical Methods and Applications, 23, 209-227.

For a PDF of these papers send an email to marco.marozzi@unical.it

An R function for the multisample Cucconi test can be downloaded by clicking here.

*(last updated on 2 Sep 2017)*

Marozzi M, Reiczigel J (2017) A progressive shift alternative to evaluate nonparametric tests for skewed data, Communications in Statistics - Simulation and Computation, in press

(DOI: 10.1080/03610918.2017.1371745)
In this research project we dealt with two-sample comparison of non-negative data
with highly skewed distribution. Examples of such data include sales per company,
income per household, treatment cost for a disease, parasite infection intensity,
and so on. Our aim was to compare the power of location-scale tests to the conventionally
used location tests in this context. The motivation originated from the analysis of
parasite infection data.

Power analysis requires an alternative hypothesis, and we think that when dealing with this kind of data it is not reasonable to use the shift alternative. For parasite infection data it is obvious that near zero values will not be shifted even if infection intensity is increasing, and we think this is also the case for sales, treatment costs, etc. Even if there are differences between two such distributions, the differences appear in the upper parts of the distributions while the range of the low values is nearly the same.

Therefore we proposed a probit-based progressive shift alternative that seems to be more realistic than the simple shift, and used it in simulation studies to compare power of location-scale tests and location tests.

The proposed progressive shift leaves the lowest part of the distribution (eg lowest 10%) nearly unchanged, applies the full shift to the highest part (eg highest 30%), and progressively increases the shift between these limits. Its parameters are L and U, the quantiles defining the lowest/highest part (0≤L≤U≤1), and D, the full shift. If XL and XU are the L and U quantiles of the distribution, the shift function S(x) is defined as S(x) = x + Φ(x) * D, where Φ denotes the normal cdf with mean = (XL + XU)/2 and sd = (XU - XL)/6. From the definition follows that for any X≤XL the shift is negligible, whereas for any X≥XU practically the full shift D is in operation. (The progressive shift approximates the ordinary shift if U is near 0.)

Power analysis requires an alternative hypothesis, and we think that when dealing with this kind of data it is not reasonable to use the shift alternative. For parasite infection data it is obvious that near zero values will not be shifted even if infection intensity is increasing, and we think this is also the case for sales, treatment costs, etc. Even if there are differences between two such distributions, the differences appear in the upper parts of the distributions while the range of the low values is nearly the same.

Therefore we proposed a probit-based progressive shift alternative that seems to be more realistic than the simple shift, and used it in simulation studies to compare power of location-scale tests and location tests.

The proposed progressive shift leaves the lowest part of the distribution (eg lowest 10%) nearly unchanged, applies the full shift to the highest part (eg highest 30%), and progressively increases the shift between these limits. Its parameters are L and U, the quantiles defining the lowest/highest part (0≤L≤U≤1), and D, the full shift. If XL and XU are the L and U quantiles of the distribution, the shift function S(x) is defined as S(x) = x + Φ(x) * D, where Φ denotes the normal cdf with mean = (XL + XU)/2 and sd = (XU - XL)/6. From the definition follows that for any X≤XL the shift is negligible, whereas for any X≥XU practically the full shift D is in operation. (The progressive shift approximates the ordinary shift if U is near 0.)

(1) R functions for the progressive shift and for the simulation can be downloaded
here.

(2) R functions for three location-scale tests, Cucconi's test, Lepage's test and Neuhäuser's test are also available for downloading.

(3) Results of the power simulation with theoretical distributions and with distributions generated from empirical parasite samples are presented here.

(4) Parasite samples used in the paper can be downloaded here. (.RData file, created by "save")

Data are published in Rékási J, Rózsa L, Kiss BJ (1997) Patterns in the distribution of avian lice (Phthiraptera: Amblycera, Ischnocera), Journal of Avian Biology 28, 150-156.

(5) Forbes 2004 data used in the paper can be downloaded here. (.RData file, created by "save")

Data are published in Everitt BS, Hothorn, T (2009) A Handbook of Statistical Analyses Using R (2nd Ed), Chapman and Hall/CRC, London, UK, and also included in the R package HSAUR2 (Everitt BS, Hothorn T, 2014)

(2) R functions for three location-scale tests, Cucconi's test, Lepage's test and Neuhäuser's test are also available for downloading.

(3) Results of the power simulation with theoretical distributions and with distributions generated from empirical parasite samples are presented here.

(4) Parasite samples used in the paper can be downloaded here. (.RData file, created by "save")

Data are published in Rékási J, Rózsa L, Kiss BJ (1997) Patterns in the distribution of avian lice (Phthiraptera: Amblycera, Ischnocera), Journal of Avian Biology 28, 150-156.

(5) Forbes 2004 data used in the paper can be downloaded here. (.RData file, created by "save")

Data are published in Everitt BS, Hothorn, T (2009) A Handbook of Statistical Analyses Using R (2nd Ed), Chapman and Hall/CRC, London, UK, and also included in the R package HSAUR2 (Everitt BS, Hothorn T, 2014)

Two conference posters are also related to the project:

Poster (in Hungarian) presented at the 10th Hungarian Biometric Conference (16-17 May 2014, Budapest)

Poster (in English) presented at ISCB2014, the 35th Annual Conference of the International Society for Clinical Biostatistics (24-28 August 2014, Vienna, Austria)

Related papers by Marco Marozzi:

M. Marozzi (2009), Some Notes on the Location-Scale Cucconi Test, Journal of Nonparametric Statistics, 21, 5, 629-647.M. Marozzi (2012), A Modified Cucconi Test for Location and Scale Change Alternatives, Colombian Journal of Statistics, 35, 3, 369-382.

M. Marozzi (2013), Nonparametric Simultaneous Tests for Location and Scale Testing: a Comparison of Several Methods, Communication in Statistics – Simulation and Computation, 42, 6, 1298-1317.

M. Marozzi (2014), The Multisample Cucconi Test, Statistical Methods and Applications, 23, 209-227.

For a PDF of these papers send an email to marco.marozzi@unical.it

An R function for the multisample Cucconi test can be downloaded by clicking here.