Notes to QPweb
These notes are a mixture of a users' guide, release notes,
version history, known issues and advices how to overcome them.
Comments and bug reports are welcome at reiczigel dot jeno at univet dot hu.
(updated on 18 January 2017)
Known issues Version history
QPweb assumes that you enter your data locally on your computer by a spreadsheet program (like Excel
in MS Office, Numbers on a Mac, Calc in LibreOffice), or by a text editor (like Notepad in Windows or TextEdit on a Mac),
and when your data file is ready, you upload it to QPweb for analysis. This means that the data entering
screen of QP is missing in QPweb.
What kind of data?
Data prepared for analysis by QPweb should be a so-called data matrix,
in which each row represents a host
each column corresponds to a characteristic,
for example host body weight, host age, number of parasite species, number of parasites
of a certain species, and so on. These characteristics are also called variables, traits, characters,
features, attributes. We'll use the word variable
It is quite usual that the first row contains the names of the variables rather than the data of a host.
As an example, look at the following simple data matrix:
QPweb accepts numerical data only. This is inherited from QP, and we don't plan to change this
at the moment. Thus the colums "Location" and "HostSex" above will not be accepted by QPweb as they are, you have to
use number codes
for location and sex (say, 1 for female, and 2 for male, and 10014 and 10018 for the locations).
QP accepted only whole numbers. Although this wasn't a real restriction, as one could always choose
a measurement unit in which data were whole numbers, QPweb is prepared to accept decimal numbers too.
In some analysis methods, however, decimal numbers may lead to errors. If you find a bug like that,
please tell us about it.
Maximum file size to upload and analyse is 200 kB.
If someone wants to work with larger files, should contact us by email.
Uploaded files can be displayed by clicking on the file name, or deleted by clicking on the "x" beside the file name.
Variable names (names of columns of the data matrix)
If your data file contains variable names in the first row, these names will be read in and used
Be careful with special characters like comma, semicolon, or accented
characters like ä or ô in names
(also in file names), as these may cause errors.
For structuring names you can safely use two special characters, dot and underscore,
and you can also exploit that names in QPweb
are case sensitive. Some naming examples: goose.summer, goose.automn, lice_male, lice_female,
If you don't have variable names, that is, if even the first row in your file contains data rather
than names, you should set "No" at "Variable names in the first row" in the "Import Data" screen.
Then QPweb gives the default names "Var1", "Var2", "Var3", etc.
Entering data using a spreadsheet program (Excel, Numbers, Calc, etc.)
If you enter your data in a spreadsheet program, save it as a simple text file (it results usually
in a file extension ".txt") or a so-called "comma-separated" text file (usually with an extension
".csv"). In a simple text file the values are separated by blanks or tab characters, in a csv by commas
or semicolons. (Semicolon is used in countries where the decimal symbol is comma.) Spreadsheet programs
usually offer several formats to save your data, you should find out which format is best for reading your
data in QPweb. The "Import data" screen of QPweb allows for choosing the appropriate separator character
(blank, tab, comma, semicolon, or any other).
If you use decimal numbers, be sure to specify the right decimal symbol in the
"Import Data" screen of QPweb. If you are uncertain which is used in your data file, check it by
looking into your data file using a text editor.
Entering data using a text editor (Notepad, TextEdit, gedit, etc.)
If you enter your data using a text editor, using comma or semicolon as delimiter
between the numbers is safer than using space or tabulator.
Note that two delimiters with no number between them (for example , , or ; ;) is interpreted as
an empty field, that is one with missing data. Think of this if you get the error message
"Too many fields!" This occurs most likely when the delimiter is space or tabulator because
these are invisible characters, thus it's not easy to notice if two of them are next to each other.
The above data (location and sex coded by numbers) entered in a text editor looks like this:
Location; HostBodyWeight; HostSex; Paras1Larvae; Par1Adults; Paras2Larvae; Par2Adults
10014; 12.5; 2; 0; 0; 7; 12
10014; 11.0; 2; 0; 0; 0; 2
10018; 9.3; 1; 10; 15; 16; 19
10014; 10.8; 1; 0; 0; 17; 10
10018; 16.0; 2; 25; 20; 5; 19
To import this file use the following settings on the "Import Data" screen:
- text file
- field separator: semicolon
- variables in the first row: yes
- decimal point character: period
When importing your data file, you can assign a short name to the data set.
If you don't give any name, then the file name will be used as data set name
without its extension part (.txt, .csv, .doc, etc.) truncated to 18 characters.
Giving a short name is useful when
- your data file has a too long name,
- the file name contains special characters, or
- after truncation two files would have the same name.
Using data which you entered in QP 2.0 or 3.0
Previous versions of QP stored the entered data in files with extension ".dat" in the same folder
in which the program was located. Such ".dat" files can be imported selecting "QP3.0 data file"
in the "Import Data" screen. This doesn't require further parameter settings, as QPweb knows the format
of the QP 3.0 files, and reads in the data correctly.
QP 3.0 data files consist of one single column representing the number of parasites per host.
If you read in a QP 3.0 data file, the variable name for this single column will always be "Data".
Missing data were no problem in QP, as you simply didn't enter them. But in QPweb a data file can have more than one column (that is, several traits for each host), so it may be that for a particular host some data are present and others
are missing. If a particular data item is missing, you can write NA in place of it, or you can simply write two delimiters next to
each other, with nothing between them. The following two lines result in the same data uploaded:
1, 5, NA, 10.5, 0.5
1, 5, , 10.5, 0.5
Although NA is not a number, it will be read and understood by QPweb correctly, as this is the standard code for missing values in the program R.
If you are using space as delimiter, two spaces next to each other are interpreted by QPweb so
that a data item is missing. Therefore using space as delimiter is a bit dangerous, it is
better to use comma or semicolon as delimiter.
NA, NaN, Inf in the output
Due to missing values in the data, calculations may also result in missing values. If for example
all values of a certain variable are missing, their mean or median will also be missing. This is indicated on the output
Division by zero may also result in a missing value. This may occur in statistics when a statistics
is divided by its standard error, and the standard error happens to be zero. A typical example of
this is a t-test when data are constant (=all data values are equal). In R, when a positive number
is divided by 0, the result is Inf (infinity). If a negative number is divided by 0, the result is
-Inf (minus infinity). If zero is divided by zero (0/0), the result is NaN ("not a number").
In R, if all values of a data series are missing, their minimum results in Inf, and their maximum
results in -Inf.
Results and graphs
Results are accumulating in a text window, and can be copied from it into a word processor or
another program in the standard way of select-copy-paste.
In reporting the results, we follow the $ notation used in the R program, that is, if you have read in a data set named "FoxesSummer2010", and a variable in this data set has the name "TickAdults", then this variable appears in the results as "FoxesSummer2010$TicksAdults".
If such a name is too long to be displayed in the output (>30 characters), its beginning is displayed,
followed by three dots. To receive unambiguous reports, give short dataset names and variable
names. If your file has a long name, give it a short name for use in QPweb when importing it.
Note that the name
"FoxesSummer2010$TicksFemales" is still o.k. (28 characters).
In QPweb we intend to follow the convention that p-values from statistical tests should be reported
numerically with 4 decimals (rather than just write p < 0.05), except if the p-value is smaller
than 0.0001. In such cases the conventional form of reporting is p < 0.0001. If QPweb still reports
p=0.000002, p=0 or alike, you should change it in your publication to p < 0.0001.
Sometimes very large or very small numbers are written in the so-called scientific form with a symbol
"e" for exponent. For example 1.28e−12 means 1.28 ⋅ 10 −12
QPweb doesn't store but the last 10 diagrams, that is, when the 11th diagram is created, the first one
is deleted, and so on. Users should save the diagrams they want to keep before they are deleted.
Diagrams are png files, and can be saved or copied from the graph window in the standard way.
Special notes to Mac users
We have almost no experience with Apple Macintosh computers, in particular we don't know which
are the most popular programs for data entering on a Mac. We would appreciate if Mac users
posted us their experience about what works fine and what does not. We would include their
tips in these notes.
If you enter data in TextEdit, save your data in simple text format (.txt). If TextEdit does
not offer this format, you can set it in the "Format" menu before saving.
If you enter data in Excel or Numbers, save your data as "comma separated values" (.csv),
as "Windows text" (.txt), or as "MS-Dos text" (.txt). In case of "comma separated values"
please check if the delimiter is really a comma (it depends on the local settings, e.g.
in Germany it is semicolon, because the comma is reserved for the decimal character).
If you enter data in Word or Pages, save your data as "text only" (.txt).
Top of page (Tips) Known issues
A new modul for the estimation of parasite species richness is included. This aims to estimate
the number of parasite species infecting a host, including those unobserved in the actual sample.
The "Chao2" method is used for the estimation, as this was found to perform best for parasite infection data
by Walter and Morand (1998), Parasitology, 116, 395-405.
This method estimates the number of unobserved parasite species from the number of rare species (occurring only in 1 or 2 hosts in the sample). If there are no such rare species in the sample, the estimation fails.
The method performs well if the number of rare species is <50% of all parasite species in the data set.
It is also advised that a large sample of hosts is needed to obtain a reliable estimate of species richness (a few hundred hosts are recommended).
The procedure uses an incidence (or abundance) matrix where each row represents a host, and each column represents a parasite species (as usual in QPweb). Be aware that you select only such variables (=columns) of the data set for analysis, which represent infection by some parasites. Values greater than 0 correspond to "present" while zeroes mean "absent" (that is, abundance data are automatically converted to incidence data by the program).
For a detailed description of the method and for correct interpretation of the results see
Chao, A. and Chiu, C.-H. 2016. Species Richness: Estimation and Comparison. Wiley StatsRef: Statistics Reference Online. 1-26.
or the author's original version at
The three new "Group comparisons" moduls are improved. The moduls aborted without any error message
when, due to missing values or choosing a wrong grouping variable, there were no groups to compare
to each other. Now the program gives error messages in such cases.
The "Group comparisons" moduls require that the data set contains a variable with at least 2 different
values, like for example sex of host, age group of host, location of observation, season of year, etc.
This variable should be selected as that defining the groups. The groups are then compared with
respect to prevalence of a parasite or mean of another variable. Maximum 6 groups are allowed in these
procedures (if the selected grouping variable has more than 6 different values, an error message
Be aware that there are two "Group comparisons" moduls for comparing means. One is made for comparing
mean intensities, that is, it uses only the nonzero values to compute the means, and the other for
abundance, where zeroes are also included. For host traits other than infection, you should
think it over, which one you need. (In most cases zeroes should also be included but it is not
always the case.)
Aggregation indices: a 95% bootstrap BCa CI is computed for Poulin's discrepancy index.
In several procedures only the conventional confidence levels (90, 95, 99%) can be used.
New procedure: two-sample comparison of Poulin's discrepancy index.
New procedure: two-sample comparison of intensity distributions using Neuhäuser's location-scale test.
This test is sensitive to any difference between the distributions (means, medians, variances, etc). However, the test
doesn't tell which feature of the data is responsible for the detected difference. That should be explored by inspection
of the descriptive statistics and graphs.
For details of the test see Neuhauser, M. (2000) An exact two-sample test based on the Baumgartner-Weiss-Schindler
statistic and a modification of Lepage’s test, Commun. Statist. Theory and Methods, 29, 67-78.
New procedures: group comparisons of prevalences and mean intensities.
Data entered in Excel usually contain columns that define some grouping, for example location, time, sex of the host,
etc., and you may want to compare prevalence or intensity between these subgroups. For illustration, let us have a look at the example table again.
You may want to compare prevalence of adults of the first parasite species between the two locations.
Or you may want to compare
the mean intensity of parasite 2 larvae between male and female hosts. Until now this was possible only
if you splitted the data by location or by host sex, and read in each part of the data set separately.
Now groups defined by a variable in the data set can be compared directly.
If you select a group comparison procedure, you should specify which of the selected variables is the
grouping variable. (Similarly to the Scatterplot, where you have to specify which variable should go to
the y axis of the graph.) Note that the maximum number of groups to compare is 6.
Comparison of means is made by bootstrap t-test (for 2 groups) or by bootstrap ANOVA (for more than
2 groups). Comparison of prevalences is made by Fisher's exact test.
Don't forget that QPweb cannot read in text variables, so use number codes for the grouping variables.
Enhanced scatterplot: the user can specify which variable to put on the X
and Y axis.
New options: now the user can set the confidence level of confidence intervals and
in the bootstrap procedures the number of bootstrap replications.
Don't change the
conventional 95% confidence level unless you have good reasons to do that.
For example if you receive an error message that
a 95% bootstrap CI cannot be constructed even with 10000 replications,
change it to 90%.
Or if the 95% CI is disappointingly wide, you can try the 90% CI. If you're
lucky, it'll look better. Or if the 95% CI consists of a single value, this may
encourage you to try the 99% CI. Other values aren't accepted by the procedures.
A bug related to bootstrap confidence intervals has been fixed. Since the R function for the
BCa interval fails when the sample size is greater than the number of bootstrap replications,
in such cases we apply the percentile method instead of BCa. Affected procedures are
bootstrap confidence intervals for mean intensity, mean abundance, and mean crowding.
This "Notes to QPweb" has become more detailed.
Attempts to upload too large files caused weird errors with no warning, therefore we limited
the file size, and issue an error message if the maximum allowed size (now 200 kB) is exceeded.
If someone wants to work with larger files, should contact us by email.
"Show graphs" is redesigned, so a new diagram doesn't overwrite the previous one. Diagrams are
numbered consecutively, and always the last 10 are available for display.
A few minor bugs are fixed.
Import data: Apple Macintosh line endings are handled correctly now. (Until now only data
saved in "Windows text" format were imported correctly from a Mac.)
Comparison of mean crowding: When the variability of infection is too little for the
bootstrap test (for example, if one of the samples doesn't contain any infected host, or
it contains just one, or alike), an error message is issued.
Stochastic equality of intensity distributions: When the variability of infection is
too little for the bootstrap test (for example, if one of the samples doesn't contain any
infected host, or it contains just one), an error message is issued.
Confidence interval for mean intensity: if there is only one infected host in the sample, an
error message is issued because CI cannot be calculated from one single intensity value.
Confidence interval for mean crowding: if there is only one infected host in the sample, an
error message is issued because CI cannot be calculated from one single crowding value.
Scatterplot with Spearman's rank correlation: output in the results file is completed with
the names of variables.
Aggregation indices: if the negative binomial exponent 'k' cannot be calculated, a short
explanation is given about the possible reasons why.
Importing data was modified so that empty lines at the end of the data file
are removed. (Note that empty lines between non-empty data lines remain illegal
and generate an error.)
Fisher's exact test for comparing prevalences and Mood's median test comparing
median intensities provide now simulated p-values (based on 20000 Monte Carlo replications)
when the samples are too large for computing an exact p-value.
A bug related to importing data files without column names and with blank delimiter is fixed.
Some tooltips are added.
Handling missing data is improved in the following procedures:
- Confidence interval for prevalence (Clopper-Pearson CI),
- Confidence interval for prevalence (Blaker's method, shorter CI),
- Confidence interval for prevalence (Sterne's method, shorter CI),
- Confidence interval for mean intensity (Bootstrap BCa),
- Confidence interval for mean abundance (Bootstrap BCa),
- Confidence interval for mean crowding (Bootstrap BCa),
- Comparison of prevalences (Chi-square test),
- Comparison of mean intensities (Bootstrap t-test),
- Comparison of mean abundances (Bootstrap t-test).
This "Notes to QPweb" is provided.
Import data: a bug related to the use of decimal comma is fixed. Data files with decimal comma
can be imported correctly now.
Aggregation indices: when the negative binomial exponent k cannot be estimated (e.g. because
data are not at all aggregated or extremely deviate from the negative binomial), NA is reported.
(Previously either nothing or a huge number was reported.)
Comparison of mean abundances: this analysis modul was accidentally deleted, so choosing
this analysis, no results were produced. Now it is replaced.
Top of page (Tips) Version history
In some cases the analysis aborts without any sensible error message. Some examples of this:
- If all values of a variable are missing (for example due to an erroneous uploading of data)
- If all values of a variable are same, that is, if the variance of the variable is zero
- If mean intensity is to be calculated and there is no infected host in the sample
We'll fix these by checking the conditions and issuing appropriate error messages.