Module ReferrerCop
In: referrercop.rb

Methods

Constants

APP_NAME = 'ReferrerCop'
APP_VERSION = '1.1.0'
CONFIG_PATHS = [ '.', '~', '/etc', '/usr/local/etc', '/usr/local/share/referrercop', '/usr/share/referrercop', '/usr/etc', ]   Array of paths that will be searched for the config file if it isn’t specified on the command line.
REGEXPS = { :apache_combined => /^\S+ - \S+ \[.+\] "[A-Z]+ \S+(?: \S+")? \d+ [\d-]+ "(.*)" ".*"$/i, :awstats_header => /^AWSTATS DATA FILE /, :awstats_map => /^BEGIN_MAP.*^END_MAP$/m, :awstats_pagerefs_extract => /^BEGIN_PAGEREFS.*?$.*?^(.*?)^END_PAGEREFS$/m, :awstats_pagerefs_replace => /^BEGIN_PAGEREFS.*?^END_PAGEREFS$/m, :awstats_url => /^(https?:\/\/\S+)/i, :text_url => /^(https?:\/\/\S+)/i, :address => /^(?:https?:\/\/)?(?:www\d*\.)?(\S+?)\/?$/i, }   Common regular expressions used throughout the application.

Public Class methods

Determines the format of input and extracts URLs of the specified type.

type should be either :ham or :spam.

Extracts URLs of the specified type (:ham or :spam) from an Apache combined log file.

Determines the format of input and filters it for referrer spam. The filtered data will be sent to output.

Parses and filters Apache combined log entries from input. The filtered log entries will be sent to output.

Parses and filters AWStats data from input. The filtered data will be sent to output.

Parses and filters input as a list of URLs (one per line). The filtered URLs will be sent to output.

Examines input and returns its type. The following input types are supported:

:apache_combined
Apache combined log file.
:awstats
AWStats data file.
:text
Unrecognized format (assumed to be a list of URLs).

Loads filename as a blacklist. If filename is nil and a blacklist exists at one of the paths specified in CONFIG_PATHS, that blacklist will be loaded.

Loads a whitelist or blacklist from the specified file. The type argument should be either :blacklist or :whitelist.

Loads filename as a whitelist. If filename is nil and a whitelist exists at one of the paths specified in CONFIG_PATHS, that whitelist will be loaded.

Returns true if the passed URL is referrer spam, false otherwise.

[Validate]