Allora, guardando tra le FAQ del sito di AWstats spiega della differenza che si può riscontrare con altri analizzatori di log.
Riporto di seguito l'articolo per intero:
codice:
FAQ-COM250 : DIFFERENT RESULTS THAN OTHER ANALYZER
PROBLEM:
I also use Webalizer, Analog (or another log analyzer) and it doesn't report the same
results than AWStats. Why ?
SOLUTION:
If you compare AWStats results with an other log file analyzer, you will found some
differences, sometimes very important. In fact, all analyzer (even AWStats) make
"over reporting" because of the problem of proxy-servers and robots. However AWStats
is one of the most accurate and its "over reporting" is very low where all other
analyzers, even the most famous, have a VERY HIGH error rate (10% to 200% more than
reality !).
This is the most important reasons why you can find important differences:
Some dynamic pages generated by CGI programs are not counted by some analyzer
(ie Webalizer) like a "Page" (but only like a "Hit") if CGI prog does not end with a
defined extension (.cgi, ...), so they are not included correctly in their statistics.
AWStats use on oposite policy, assuming a file is a page except if type is in a list
(See NotPageList parameter). Error rate with a such policy is lower.
AWStats is able to detect robots visits. Most analyzers think robots visits are human
visitors. This error make them to report more visits and visitors than reality. When
AWStats reports a "1 visitor", it means "1 human visitor" (even if it's not posible
to detect all robots, most of them are detected). "Robots visitors" are reported
separately in the "Robots/Spiders visitors" chart.
Some log analyzers use the "Hits" to count visitors. This is a very bad way of
working : Some visitors use a lot of proxy servers to surf (ie: AOL users), this
means it's possible that several hosts (with several IP addresses) are used to reach
your site for only one visitor (ie: one proxy server download the page and 2 other
servers download all images). Because of this, if stats of unique visitors are made
on "hits", 3 users are reported but it's wrong. So AWStats considers only HTML "Pages"
to count unique visitors. This decrease the error, not completely, because it's
always possible that a proxy server download one HTML frame and another one download
another frame, but this make the over-reporting of unique visitors less important.
Another important reason to have difference is that an error log files is not always
completely sorted but only "nearly" sorted because of cache and writing log engines
used by server. Nearly all log analyzers (commercial and not) assumes that log file
is "exactly" sorted by hit date to calculate visits, entry and exit pages. But there
is nothing that guaranties this and some log files are only "nearly" sorted, above
all log files on highly loaded servers. AWStats has an advanced parsing algorithm
that is able to count correctly visits, entry and exit pages even if log file is only
"nearly" sorted.
Then, there is internal bugs in log analyzers that make reports wrong. For example,
a lot of users have reported that Webalizer "doubles" the number of visits or
visitors in some circumstances.
There is also other reasons, however those points explains only small differences:
To differenciate new visits of a same visitor, log analyers uses a visit time-out.
If value differs, then results differ (on visit count and entry and exit pages).
A such time-out is a fixed value (For example 60 minutes) meaning if a visitor make
a hit 59 minutes after downloading the previous page, it's the same visits, if he
make it 61 minutes after, it's a new visit. Of course, there is no realy difference
between 59 and 61, but couting visits without time-out is not possible. And because
the most important is to have a time-out (and not really it's value), AWStats time-out
is not an "exact" value but is "around" 60 minutes. This allows AWStats to have better
speed processing time, so you also might experience little differences, in visit count,
between AWStats and another log analyzer even if their time-out are both defined to
same value (because AWStats time-out is not exactly but nearly value defined).
There is also differences in log analyzers databases and algorithms that make details
of results less or more accurate:
AWStats has a larger browsers, os', search engines and robots database, so reports
concerning this are more accurate.
AWStats has url syntax rules to find keywords or keyphrases used to find your site, but
AWStats has also an algorithm to detect keywords of unknown search engines with unknown
url syntax rule.
AWStats does not count twice (by default) redirects made by rewrite rules that makes two hits
into log files but that are only one page "viewed".
Etc...
Poi viene proposto un test su un file di log, indicando i risultati corretti che dovrebbe riportare l'analizzatore di log.
Ho provato ad analizzarlo con Weblog Expert e i conti non tornano
...
In pratica stiamo tutti usando software che in realtà riportano dati errati? Al di là della "filosofia" applicata per fare analisi dei file di log ed interpretarli, non esiste nessuno strumento che possa definirsi realmente affidabile?
Come già accennavo nel mio post iniziale, un margine di errore del 15% mi sembra veramente eccessivo...