VISITORS
Visitors, on line documentation for 0.4
(click here for the home page)
SYNOPSIS
visitors <logfile> [<logfile2> ...] [options]

Generates access statistics on specified web log files. The web log files don't need to follow a strict format, just the date MUST be included between [ and ] chars, the client hostname MUST be the first entry in the log, referers and requests MUST be included between " chars. Out of box Apache log file will work without problems.

Note that logfile can be a - character to use the standard input.

AVAILABLE OPTIONS

  -A --all                            
  -T --trails                         
  -G --google                         
  -K --google-keyphrases              
  -U --user-agents                    
  -W --weekday-hour-map               
  -R --referers-age                   
  -D --domains                        
  -O --operating-systems              
  -B --browsers                       
  -X --error404
  -P --pageviews
     --stream                         
     --update-every                   <argument>
     --reset-every                    <argument>
  -f --output-file                    <argument>
  -m --max-lines                      <argument>
  -r --max-referers                   <argument>
  -p --max-pages                      <argument>
  -i --max-images                     <argument>
  -x --max-error404                   <argument>
  -u --max-useragents                 <argument>
  -t --max-trails                     <argument>
  -g --max-googled                    <argument>
  -k --max-google-keyphrases          <argument>
  -a --max-referers-age               <argument>
  -d --max-domains                    <argument>
  -P --prefix                         <argument>
  -o --output                         <argument>
  -V --graphviz                       
  -v --version                        
     --tail                           
     --time-delta                     <argument>
  -h --help                           

-A --all

Activate all the optional reports. This option is equivalent to -GKUWRDOB. Note that --trails is not implicitly included in this option because it also requires --prefix. See the --trails option documentation for details.

-T --trails

Enable the Web Trails feature. The report will show what are the more frequent moves between pages of your site. This option requires the --prefix option to work.

-G --googled

Activate a report about pages accessed by the Google web crawler. Pages are shown ordered accordingly to the last time the Google web crawler requested the page. The first page shown is the latest that was accessed.

-K --google-keyphrases

Actiave a report that shows common search keyphrases used to found your web site from Google.

-U --user-agents

Show informations about common user agents.

-W --weekday-hour-map

Activate the generation of a combined weekdays/hours bidimentional map that shows information about traffic in every 168 different hour of a 7 days week. Brighter colors mean higher traffic. This is ideal to figure what's the best moment on a week for a mateinance shoutdown, what's the target of the site, if people are accessing it from work or from home, and so on. The map is generated as pure html inside the report.

-R --refereres-age

Shows refereres ordered by age. The 'age' of a referer is the date it appeared the first time. In the report, newer referers are on top. This report is useful to check for new external links.

-D --domains

Activate the generation of information about Top Level Domains popularity. This information may be useful to guess the amount of visits from different countries. Note that Visitors will not resolve numerical IP addresses if they are not already resolved in the log file. All the unresolved IP addresses will be shown in this report under the entry Unresolved IP.

-O --operating-systems

Activate the report about Operating Systems popularity, sorted by number of accesses. All the common operating systems are listed in the report, while unknown operating systems will be summed in the unknown entry.

-B --browsers

Activate the report about Browsers popularity, sorted by number of accesses. All the common browsers are listed in the report, while unknown browsers will be summed in the unknown entry. Browsers are listed by family (for example Internet Explorer, Opera, and so on), and not by specific version.

-X --error404

Activate the generation of missing documents (404 error) report. This report will show files requested, but missing, ordered by number of requests. The report is useful in order to discover if for some mistake there is some file missing in the web site, but often you will see bizarre requests performed by users or internet worms and security scans.

-P --pageviews

Activate the generation of a report that shows (and approximation) of the percentage of pages viewed per unique visit. The goal of this report is to understand the usage patter of the site and the level of interest of the visitors. For example, in a site that provides a number of pages with interesting contents, the percentage of visitors performing a single page view per visit is probably searching for something else.

--stream

Enable the Stream Mode (see the Stream Mode Details section for more information). Shortly: when in stream mode Visitors will process all the log files specified (possibly none, that's valid in this mode) as usually, producing the report. Then the stream mode is entered and Visitors will start to read from standard input for a continuous stream of web logs, updating the statistics incrementally as new data is available.
A new report is produced periodically if new data arrived, accordingly to the --update-every option (default is to update the statistics every ten minutes). It's possible to ask Visitors to reset the statistics after some period of time using the --reset-every option. This allows to have a snapshot of what is going on in the last five minutes, hour, day or week.
Note that --stream requires --output-file, bacause Visitors needs to overwrite the report for every update, so can't output to standard output as usually.
If you plan to use the stream mode, also check the --tail option.

--update-every <seconds>

For default in Stream Mode statistics are updated every 10 minutes, This options specify a different period in seconds.

--reset-every <seconds>

For default in Stream Mode statistics are never reset, but continuously updated incrementally. This option specify to reset statistics after the given amount of time in seconds. This is useful to have a snapshot of the web site usage.

-m --max-lines, --max-* <number>

Set the max number of entries that should be shown in reports like referers, keyphrases and so on. This option set all the reports max number of entries for all the reports at once. There are other options to set the number of max entries for every specific report:

  • -r --max-referers max number of entries in the referer report.
  • -p --max-pages max number of entries in the accessed pages report.
  • -p --max-images max number of entries in the accessed images report.
  • -x --max-error404 max number of entries in the missing documents report.
  • -u --max-useragents max number of entries in the user agents report.
  • -t --max-trails max number of entries in the web trails report.
  • -g --max-googled max number of entries in the crawled pages report.
  • -k --max-google-keyphrases max number of entries in the google keyphrases report.
  • -a --max-refereres-age max number of entries in the referers by date report.
  • -d --max-domains max number of entries in the domains report.

-P --prefix <prefix>

Prefixes specify to visitors how a link should look like to be classified as internal to your site. This option is required for --trails, and will also have the nice effect to avoid that internal links are shown in the referers report. If you are analyzing statistics for http://www.your.site.com/, just use:

--prefix http://www.your.site.com
If your site is reachable using more hostnames you should specify all these, like in the following example:
--prefix http://www.your.site.com --prefix http://your.site.com

-o --output html|text

Output module. You can use text or html. The default is text.

-V --graphviz

This option enables the Graphviz mode: Visitors will analyze the log file and create a graph like this describing the access patterns of your web site. The information used to create the graph is the same as the web trails report (that you can enable with --trails), but on form of graph it can be more readable for non trivial sites. An example on how to use this feature:

% visitors access.log --prefix http://www.hping.org --graphviz > graph.dot
% dot /tmp/graph.dot -Tgif > graph.gif

The dot command is included in the Graphviz, (apt-get install graphviz on Debian). The generated graph will have edges of different colors, from blue to red to specify a low to high level of popularity of a given movement from one page to another of the web site.
This option requires one or more --prefix options in order to work, just like the --trails option.

--tail

When this option is specified Visitors will emulate the Unix command tail -f --max-unchanged-stats=1 -q. You can specify the log file names to monitor for changes, once new data is appended in any of the specified file, visitors will output the new data to the standard output. This option is useful conjunction to the Stream Mode (--stream). Files can be log-rotated because Visitors in Tail Mode will always try to reopen the file to check for changes.

--time-delta

If your web server is in a different timezone than most of your visitors or yourself, you will notice a shift in the reports regarding time and days of week. For default, Visitors will genereate outputs using the host's locale. You can use the --time-delta option in order to adjust the output. Positive values will shift on the right (toward future) from the given number of hours, negative values will shift on the left (toward past). In the future this option may have support to directly specify the output timezone.

STREAM MODE DETAILS

The usual way to run Visitors is to specify some option to control the report generation, and the name of log files. For example to generate a report from two Apache's access log files you can write:

% visitors -A -o html access.log.1 access.log.2 > report.html

Visitors will analyzer the log files, and will output the report. Sometimes it can be more interested to have web statistics updated continuously, almost in real time, as new data is available. In order to provide this feature Visitors implements a mode called Stream Mode that reads a stream of logs from the standard input. The following command line shows how to use it (but check the --stream option documentation for more information).

% tail -f /var/log/apache/access.log | \
visitors --stream -o html -A --update-every 60 --output-file /tmp/report.html

Visitors will incrementally update the statistics as new logs are available and will update the html report every 60 seconds. As you can see in this mode is required to specify the report file name using the --output-file option because visitors needs to overwrite the report to update it. Note that instead of the tail command in the above example it is possible to use instead visitors in Tail Mode (an emulation for the tail program):

% visitors --tail /var/log/apache/access.log | \
visitors --stream -o html -A --update-every 60 --output-file /tmp/report.html

It's possible to generate real time statistics about the last N seconds of web traffic, where N is configurable and can be from few seconds to one week or more, using the --reset-every option. The following example generates statistics updated every 30 seconds about the last hour of traffic:

% visitors --tail /var/log/apache/access.log | \
visitors --stream -o html -A --update-every 30 --reset-every 3600 \
--output-file /tmp/report.html
EXAMPLES

For explicit usage examples check also the Examples section here.


Copyright (C) 2004 Salvatore Sanfilippo -- All Rights Reserved

Google