PICS Profile Language Working Group - PicsRULZ

 
Editor:
Martin Presler-Marshall, IBM <mpresler@us.ibm.com>
Authors:
Christopher Evans, Microsoft <cevans@microsoft.com>
Alex Hopmann, Microsoft <alexhop@microsoft.com>
Martin Presler-Marshall, IBM <mpresler@us.ibm.com>
Paul Resnick, AT&T <presnick@research.att.com>

Status of this document

This document is a draft of the PicsRULZ filtering language. This language has been officially renamed PICSRules. Version 1.1 of PICSRules has been made an W3C recommendation as of December 1997. This document is significant in that there is at least one known implementation that used this draft; that implementation is IBM Web Traffic Express version 1.0. Note that this document has no official status. Version 1.0 of PicsRULZ was never accepted as a standard or recommendation by any organization.

A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/pub/WWW/TR/.

Abstract

This document defines a language for writing profiles, which are filtering rules that allow or block access to URLs based on PICS labels that describe those URLs.

Definitions

This specification uses the same words as RFC 1123 for defining the significance of each particular requirement. These words are:
MUST
This word or the adjective "required" means that the item is an absolute requirement of the specification.
SHOULD
This word or the adjective "recommended" means that there may exist valid reasons in particular circumstances to ignore this item, but the full implications should be understood and the case carefully weighed before choosing a different course.
MAY
This word or the adjective "optional" means that this item is truly optional. One vendor may choose to include the item because a particular marketplace requires it or because it enhances the product, for example; another vendor may omit the same item.
An implementation is not compliant if it fails to satisfy one or more of the MUST requirements for the protocols it implements. An implementation that satisfies all the MUST and all the SHOULD requirements for its protocols is said to be "unconditionally compliant"; one that satisfies all the MUST requirements but not all the SHOULD requirements for its protocols is said to be "conditionally compliant."

The PicsRULZ language: examples

Example 1: Forbid access to certain URLs

 1    (PicsRule-1.0
 2        (
 3        failURL ("http://www.grody.com" "http://www.gross.net") 
 4        Filter (Pass "Unless-Prohibited") 
 5        ) 
 6    )
The numbers on the left are line numbers for ease of reference; they aren't part of the actual rule.

This example forbids access to a specific set of URLs, without using any PICS labels. Any URL that begins with either http://www.grody.com or http://www.gross.net will be blocked; any other URLs are considered acceptable.

Example 2: Forbid access based on PICS labels

 1    (PicsRule-1.0
 2        (
 3        serviceinfo (
 4          "http://www.coolness.raleigh.ibm.com/ratings/V1.html"
 5          shortname "Cool"
 6          bureauURL "http://www.ics.raleigh.ibm.com/LabelBureau")
 7        Filter (
 8          Pass "Unless-Prohibited"
 9          Block "((Cool.Coolness <= 3) or (Cool.Graphics >= 3))"
10       )
11   )
This rule checks the rating given to documents according to the "Cool" rating service ("http://coolness.raleigh.ibm.com/ratings/V1.html"). Labels will be fetched from the label bureau "http://www.ics.raleigh.ibm.com/LabelBureau". Documents which are not sufficiently cool or have too many graphics will be blocked. Everything else, including unlabeled documents, will be allowed.

Example 3: Allow access based on PICS labels: block everything else

 1    (PicsRule-1.0
 2        (
 3        ServiceInfo (
 4          name "http://www.coolness.raleigh.ibm.com/ratings/V1.html"
 5          shortname "Cool"
 6          bureauURL "http://www.ics.raleigh.ibm.com/labelbureau")
 7        Filter (
 8          Pass "((Cool.Coolness > 3) and (Cool.Graphics < 3))"
 9        )
10   )
This rule also checks the rating given to documents according to the "Cool" rating service. Here, only documents which are sufficiently cool and do not have too many graphics will be allowed. Everything else, including unlabeled documents, will be blocked.

Example 4: A more complex example

 1   (PicsRule-1.0
 2       (
 3       failURL     ("http://www.badnews.com/" "http://www.worsenews.com")
 4       passURL     ("http://www.rated-g.org/")
 5       name        (rulename "Example 4"
 6                    description "Example 4 from PicsRULZ spec; simply shows how PicsRULZ rules are formed. This rule is not actually intended for use by
real users.")
 7       source      (sourceURL "http://martinm.raleigh.ibm.com/Rules/Example1.html")
 8       ServiceInfo (name "http://coolness.raleigh.ibm.com/ratings/V1.html"
 9                    shortname "Cool"
10                    bureauURL "http://www.ics.raleigh.ibm.com/labelbureau") 
11       serviceinfo ("http://hotness.raleigh.ibm.com/ratings/V1.html" 
12                    shortname "Hot" 
13                    bureauURL "http://www.ics.raleigh.ibm.com/labelbureau") 
14       Filter      (Pass "(Hot.Hotness > 4) "
15                    Block "((Cool.Coolness < 3) or (Cool.graphics = 1))"
16                   )
17       ) 
18   )

Explanation of example

Line
Explanation
1
Defines this construct as a PICS rule, and gives the version number.
3
Provides a list of URLs which will be automatically blocked without examining any PICS labels. The quoted items are URL prefixes, just as in generic PICS labels.
4
Provides a list of URLs which are automatically passed by the user-agent; symmetric to failURL.
5
Provides a short, human-readable name for this rule. There is no requirement for uniqueness on this name; it's meant as a mnemonic for users when manipulating rules in some sort of a user interface.
6
Provides a longer, human-readable description of this rule. This is meant to be use for an explanation of the semantics of this rule.
7
Specifies "where the rule came from". This URL is intended to point to a human-readable Web page which will give more information about this rule, who created it, why it was created, possible updates, etc.
8-10
Defines the rating service "http://coolness.raleigh.ibm.com/ratings/V1.html", with short name "Cool" and identifies a label bureau from which to fetch its labels.
11-13
Defines the rating service "http://hotness.raleigh.ibm.com/ratings/V1.html", with short name "Hot", and defines a label bureau that serves labels for that service.
14-15
The filtering rule that examines labels. If lines 3 and 4 have not determined whether the URL is OK, these lines are evaluated. A URL must pass the permissions criteria and not fail the prohibitions criteria.
14
The permissions portion of the filtering rule. Only URLs that are labeled as sufficiently "hot" will be permitted. URLs unlabeled on the "hotness" scale will be rejected.
15
The prohibitions portion of the filtering rule. If the URL has passed the permissions criteria, the prohibitions are considered. If a URL is labeled as not cool enough or as having graphics = 1, it will be rejected. If it is not labeled using the "cool" rating service, the item is accepted.

Full syntax

Let us first consider the basic underpinnings of a PicsRULZ rule, then the general format of the rule, and finally the format of the filter clause.

Basic structure

PicsRULZ rules are based on a limited form of an S-expression, namely a parenthesized attribute-value pair. The parenthesized attribute-value pairs are allowed to be nested. They also contain a concept known as a "primary value". The primary value for any attribute-value pair is a value whose name is defined by the context; it can be thought of as the fundamental value associated with a given clause. An attribute-value pair MUST contain the primary key-value pair; it MAY contain additional key-value pairs. The general grammar for these limited S-expressions is:
attrvalpair:: attribute whitespace value | primaryvalue
attribute:: alphanumstr
value:: quotedstring | '(' attrvalpair (whitespace attrvalpair)* ')' 
primaryvalue:: quotedstring+ | '(' attrvalpair+ ')'
quotedstring:: ('"' notquotechars '"') | ("'" notquotechars "'")
alphanumchar:: alphanumwhitespace:: ' ' | '\t' | '\r' | '\n' 
alphanum:: '0' - '9' | 'A' - 'Z' | 'a' - 'z'
notquotechars :: any ASCII characters in the range 32-127 except ' and "
Note that all attribute names are case insensitive, while the case of values MUST be preserved. However, individual clauses and/or attributes MAY define their values to be case-insensitive.

Comments

The PicsRULZ syntax, which will be presented below, has a facility for descriptive text which can be shown to a user, in addition to various statements which influence the behavior of user-agents. However, it is frequently useful to have "source-level" comments - comments which are intended to individuals writing and/or editing rules, but which are not intended for display to end users. This is analogous to placing comments in source code; in an effort to encourage rule authors to write clear rules, we provide a facility for placing comments into PicsRULZ rules.

The syntax of a comment is:

comment:: '{' comment-text* '}'
comment-text:: any octets except '}'
Note that a result of the above syntax is that comments may not be nested.

Comments may appear anywhere in PicsRULZ rules. A user-agent MAY remove the comments during lexical analysis of the rule; text within comments MUST NOT influence the interpretation of the rule in any manner. Note also that user-agents which generate generate or export PicsRULZ rules MAY choose to strip out comments before generating, exporting, or transmitting them.

PicsRULZ Rules

The general format of a PicsRULZ rule, in modified BNF, is as follows:
rule :: '(' 'PicsRule-' verMajor '.' verMinor rule-body ')'

verMajor :: integer

verMinor :: integer

rule-body :: '(' rule-clauses ')'

rule-clauses :: rule-clause+

rule-clause :: filter-clause | 

               fail-clause | 

               pass-clause |

               name-clause |

               source-clause |

               service-clause |
               javelin-extension-clause |

               opt-extension-clause |

               req-extension-clause |

               extension-clause

filter-clause :: 'Filter' '(' attrvalpair+ ')'

fail-clause :: 'failURL' '(' attrvalpair+ ')'

url-list :: quotedURL+

pass-clause :: 'passURL' '(' attrvalpair+ ')'
name-clause :: 'name' '(' attrvalpair+ ')'
source-clause :: 'source' '(' attrvalpair+ ')'
service-clause :: 'serviceinfo' '(' attrvalpair+ ')'
javelin-extension-clause :: 'ibm-javelin-extensions' '(' attrvalpair+ ')'
opt-extension-clause :: 'optextension' '(' attrvalpair+ ')'
req-extension-clause :: 'reqextension' '(' attrvalpair+ ')'
extension-clause :: extension-clause-name '(' attrvalpair+ ')'

Semantics & details of individual clauses

Filter
This clause specifies the expression to be evaluated to determine if this rule will block access to a given URL. The syntax and semantics of the values associated with these attributes are discussed below.
There are 2 attributes, pass and block, defined for the Filter clause. The primary attribute is pass, and its default value is "Unless-Prohibited". The block attribute has no default value. The values for both of these attributes, if present MUST be quotedstrings; in the syntax given below for expressions, it is assumed that the quotes have been removed from the strings. These values must be quotedstrings as their syntax does not fall into the attribute-value syntax that is used for the rest of PicsRULZ rules.
FailURL
Rules may contain a list of URL prefixes which are to be explicitly blocked, without even checking for labels on those documents. The FailURL clause gives a list of quoted URL prefixes to block; any URL under consideration which is a prefix of any URL in the url-list associated with a FailURL clause will be blocked. This may seem to be outside the attribute-value syntax laid down for general clauses, but it is a simple extension of it: the only defined attribute for FailURL is URL, and its value is a URL to be blocked. The URL attribute is allowed to occur multiple times within a FailURL clause.
PassURL
Rules may contain a list of URL prefixes which are to be explicitly allowed, without even checking for labels on those documents. The PassURL clause gives a list of quoted URL prefixes to allow; any URL under consideration which is a prefix of any URL in the url-list associated with a PassURL clause will be allowed. This may seem to be outside the attribute-value syntax laid down for general clauses, but it is a simple extension of it: the only defined attribute for PassURL is URL, and its value is a URL to be allowed. The URL attribute is allowed to occur multiple times within a PassURL clause.
name
This clause provides a short, human-readable name for the rule being presented. It is intended that these names could be shown on a user-agent's user interface, to show a human operator which rules are loaded, active/inactive, etc.
There are 2 attributes, rulename and description, defined for the name clause. Rulename is the primary attribute for a name clause, and its value is the human-readable name of this rule. The value for description is a more-detailed analogue of name; it provides a human-readable description of the rule being presented. The description is intended for display in a user-agent's user interface, to allow a human operator to get some understanding of who created the rule, its semantics, etc. The exact contents of the value associated with description are left up to the rule author.
Note that this mechanism does not provide a transparent method for supporting multiple national languages. This is intentionally not being addressed in this version of PicsRULZ. If you wish to produce PicsRULZ-1.0 rules in multiple languages, you will have to produce multiple copies of the rule - one for each target language. We expect that this will be addressed in a cleaner way in future versions of PicsRULZ.
source
This clause gives information about where the rule came from. There are 4 attributes defined for source: sourceURL, creationTool, author, and lastModified. The primary attribute is sourceURL.
The sourceURL attribute gives the "rule's URL". It provides a location where a human user of this rule can go to get more information about the rule and/or its creator. The value of this attribute should be a URL here a user can find a human-readable description of this rule.
The creationTool attribute gives the ability to identify the tool, if any, that was used to create this rule. This is analagous to the User-Agent string in HTTP. The value of the creationTool is a quoted string. The string should be in the format toolname/version, as in "Cool-PICS-Rule-Editor/1.04".
The author attribute gives the e-mail address of the individual or organization who produced this rule. The value associated with this attribute should be a quoted e-mail address.
The lastModified attribute gives the date and time that this rule was last modified. The value must be a quoted-ISO-date, as defined in the PICS-1.1 Label Syntax and Communication Protocols.
serviceinfo
This clause specifies information about a rating service. There are currently 6 attributes defined for serviceinfo: name, shortname, bureauURL, ratfile, and defaultValue. The primary attribute is name.
The name attribute is the servicename URL of a rating service. Its value specifies the name of the service which is being described.
The shortname attribute gives a shorter name to this rating service. The shortname will be used in writing filter clauses; its value is a string. For example, for the rating service "http://coolness.raleigh.ibm.com/ratings/V1.html", the shortname might be "Cool".
The bureauURL attribute specifies the URL of a label bureau that has ratings from this rating service. The value for this attribute is the URL of the label bureau; i.e., the URL to which label-bureau requests should be sent. This attribute SHOULD be expected by user-agents to occur multiple times.
The ratfile attribute presents the machine-readable rating system description (also know as "RAT file") that is used by this rating service. This may be specified in one of two ways: the value may be a quoted string which contains the entire machine-readable service description, or it may be of the syntax "[RATFile-URL]", where RATFile-URL is the URL of the rating system description; a user-agent SHOULD assume that dereferencing this URL will produce a document of type application/pics-service. There is no default value for the ratfile attribute. If the quoted string contains the machine-readable service description, then it SHOULD be URL-encoded to escape quotes; in other words, double-quote characters should be replaced with the string "%22".
The defaultValue attribute specifies a "default" value to be used for categories in this rating service. The value for the defaultValue attribute is called the "default value" for categories in this rating service. The default value MUST NOT be used for documents for which no label is available. The default value is only used when one or more labels are available for a document, but the label(s) do not contain a value for a given category. For example, if a rule calls for a "coolness" value for a document, and the document includes an imbedded document which only gives a value for the "graphics" category, then the defaultValue SHOULD be used instead. The defaultValue construct has been created to reflect the fact that there are existing rating services as of May 1997 which assume that a default value will be used for omitted categories.
The IBM Javelin Proxy server version 1.0 defines one extension attribute for the serviceinfo clause: available-with-content. Allowed valued are "yes" and "no". This is used as a hint to the server to optimize label retrieval. If the labels from a rating services are frequently available with the content (indicated by a "yes" value for this attribute), then a request to a label bureau (if any) will be delayed until after the content is available and checked for labels; if the content contained a label from this service, then no request will be sent to the label bureau. The default value for available-with-content is "no".
ibm-javelin-extensions
This clause gives a set of extensions to PicsRULZ 1.0 that are implemented by the IBM Javelin Proxy server version 1.0. There are 8 attributes defined for ibm-javelin-extensions: use-expired, group-file, applies-to, active-days, start-time, end-time, active, and filter-local.
The use-expired attribute tells the rule evaluation engine whether it can use expired labels to make a filtering decision. Allowed values are "yes" and "no". A value of "yes" indicates that the rule evaluation engine can use labels even if they have expired, while a value of "no" (the default) indicates that expired labels are to be discarded by the rule evaluation engine.
The group-file attribute specifies a group file, in standard ICS format, to use for resolving users into groups. See the ICS 4.2 documentation for a description of group files. The value to the group-file attribute is the fully-qualified path of the group file in the local filesystem. The group file must be stored locally.
The applies-to attribute indicates a set of user IDs, group IDs, IP addresses, and/or hostnames of requesters that this rule applies to. The default if this attribute is not specified is that the rule applies to any request that comes through the proxy (but see the active attribute below). The value for this attribute is specified in the same syntax as is used for authentication rules in ICS.
The active-days, start-time, and end-time directives allow a rule to be active for only certain time periods. An example might be a rule in a corporate proxy server that applies during normal working hours, or that only applies on weekends. The default values for these directives are that the rule is active every day of the week, at all times during the day.
The active-days directive gives a vector of days of the week for which the rule is active. The vector MUST be exactly 7 characters long. Position 0 applies to Sunday, position 1 to Monday, etc. A value of 0 in a given position indicates that the rule is not active on that day of the week, and a value of 1 indicates that it is active on that day of the week.
The start-time directive specifies the time of day that the rule becomes active. The time is specified in 24-hour format, using HH:MM:SS as the format.
The end-time directive specifies the time of day that the rule becomes inactive. The time is specified in 24-hour format, using HH:MM:SS as the format.
The active directive can be used to deactivate a rule. Allowed values are "yes" and "no". The default is "yes". If a value of "no" is specified, the rule is not active, and the server will not select this rule for any requests. A value of "yes" indicates that the rule is eligible to be used for incoming requests.
The filter-local directive indicates whether the proxy server should filter requests for local files (requests that will be satisfied from the local filesystem, but not from a proxy's cache). Allowed values are "yes" and "no"; the default is "yes". If the value is "yes", the rule will automatically pass all requests for local resources, while if the value is "no", the rule will be applied normally for requests for local resources.
opt-extension-clause
opt-extension-clause and req-extension clause are the extension mechanisms in PicsRULZ; they are modeled after the extension mechanism in the PICS label format. More information on the extension mechanism is given below.
 
The optextension clause has only one defined attribute: extension-name. The value of the extension-name attribute specifies the name of an extension that will be used by this rule. The name of the extension is the quotedURL; this URL should point to a human-readable description of this extension. URLs are used for extension names to insure uniqueness without requiring a central naming body. If a user-agent receives a rule which contains an optextension which it does not recognize, the user-agent should process the rule, ignoring any clauses it does not recognize. This means that any optional extensions MUST use the S-expression syntax given above, so as to not break existing parsers.
Note that declaring the use of an optional extension may appear to be redundant, as unrecognized attribute-value pairs are discarded by user-agents. The purpose of the optextension construct is for use as a documentation mechanism. User-agents MAY also display the names of optional extensions used by a rule, asking the user for confirmation, before making use of a rule.
req-extension-clause
This clause has only one defined attribute: extension-name. The value of the extension-name attribute specifies the name of an extension that will be used by this rule. The name of the extension is the quotedURL; this URL should point to a human-readable description of this extension. URLs are used for extension names to insure uniqueness without requiring a central naming body.
If a user-agent receives a rule which contains an reqextension which it does not recognize, the user-agent should cease processing the rule and discard it.
verMajor
The major version number of PicsRULZ which this rule conforms to. It is expected that this version number will be '1' when this proposal is completed.
verMinor
The minor version number of PicsRULZ which this rule conforms to. It is expected that this version number will be '0' when this proposal is completed.

Restrictions

The filter, name, and source clauses MUST NOT appear more than once in a PicsRULZ rule. The FailURL, PassURL, optextension, reqextension, and serviceinfo clauses MAY appear more than once in a PicsRULZ rule. If FailURL appears multiple times in a rule, the URL lists MUST be combined into a single list. The same applies for multiple PassURL clauses.

PicsRULZ Filter Clauses

We define two attributes, Pass and Block, for a Filter clause. Intuitively, if the URL is to be allowed, the available labels must satisfy the permission condition and no available label can satisfy the prohibition condition. In this section we define the syntax and semantics of a Filter clause. The value for the Pass attribute is defined by the nonterminal PassExpression, and the value for the Block attribute is defined by the nonterminal BlockExpression:
PassExpression :: "Unless-Prohibited" | expression
BlockExpression :: expression
expression :: simple-expression | or-expression | and-expression
simple-expression :: '(' service '.' category op constant ')'
service :: any shortname defined in a serviceinfo clause within this rule
category :: any transmit-name for a category defined by the rating-system referred to by the matching system
op :: '>' | '<' | '=' | '!=' | '>=' | '=>' | '<=' | '=<' | 'all-equal' | 'none-equal' | 'includes'
constant :: [sign] alphanumchar ['.' alphanumchar]
or-expression :: '(' expression or expression [or expression]+ ')'
or :: 'or' | '||'
and-expression :: '(' expression and expression [and expression]+ ')'
and :: 'and' | '&&'
sign :: '-'
When evaluating a clause, the user-agent may use zero, one, or more labels from a given rating service (for more details, see the control flow section). A simple-expression evaluates to true if any available label from the specified service satisfies the condition of the expression.

We must deal with the situation where a simple-expression calls for a value from a label, and either no label is available, or the available labels do not have values for the specified category. In those situations, the simple-expression evaluates to false. This leads to the expected semantics: if a simple-expression has no associated label available, that expression cannot contribute evidence toward either permitting or prohibiting the URL.

If, for example, there is only a pass expression, which is just a single simple-expression, no label for the pass expression means that the item is not permitted (effectively, block unlabeled). If, instead, there is only a block expression, which is just a simple-expression, no label means that the item is not prohibited (allow unlabeled). These intuitive semantics hold up even when the expressions are more complicated and even when both a pass and a block expression are provided. We have explicitly decided to omit the NOT operator because it would destroy these intuitive semantics and make the handling of unlabeled situations very difficult for people to think about.

Simple-expressions, as defined above, can use any types of operators on any types of data. However, this is unimplementable and has debatable semantics. Here we clarify the situation:

  • All of the operators defined in the op clause are valid on numeric, single-valued categories. The semantics of each of the operators should be obvious by inspection (except, perhaps, 'all-equal', 'none-equal', and 'includes' - but those will be explained in a moment); the result of applying the operator will be a boolean value, true or false.
  • The operators 'all-equal', 'none-equal', and 'includes' are only meaningful for categories which have the multivalue true attribute set. User-agents MAY choose whatever interpretation they wish for categories which don't have that attribute.
  • For categories which have the multivalue true attribute set, the only allowed operators are '=', '!=', 'all-equal', 'none-equal', and 'includes'. These operators are defined as follows:
  •  
    '='
    The '=' operator is defined to be true if the constant in the expression matches any of the values for this category in the label that's being used to evaluate this clause, and false otherwise.
    '!='
    The '!=' operator is defined to be true if it matches none of the values for this category in the label that's being used to evaluate this clause, and false otherwise.
    'all-equal'
    The 'all-equal' operator is defined to be true if the constant in the expression matches all the values given for this category in the label that's being used to evaluate this clause, and false otherwise.
    'none-equal'
    The 'none-equal' operator is identical to the '!=' operator.
    'includes'
    The 'includes' operator is identical to the '=' operator.
  • The only operators defined on string-valued categories are '=' and '!='. The operator '=' returns true if two strings are equal by octet comparison, and false otherwise. '!=' returns the opposite: false if two strings are equal by octet comparison, and true otherwise.
  • A URL passes this filter if the pass-expression evaluates to true and the block-expression evaluates to false. If the pass-expression is "Unless Prohibited", it evaluates true, allowing everything that is not prohibited. If no block-expression is provided, its value defaults to false, allowing everything that is explicitly permitted. Note that if no pass-expression is given, its default value was defined above as "Unless-Prohibited".
  • Order of operations

    The clauses in a PicsRULZ rule MUST be evaluated in the following order:
    1. Compare the URL against the URLs listed in any FailURL clauses in the rule.
    2. Compare the URL against the URLs listed in any PassURL clauses in the rule.
    3. Evaluate the expression given in the Filter clause.
    The outcome of this is that FailURL has highest precedence; if a URL matches both a FailURL and a PassURL, then the document will be blocked.

    Control Flow

    The rule syntax and semantics given above define what can be placed in a rule, and the meaning of those constructs. In order to process these rules, a user-agent SHOULD adopt an internal data-flow as described here; this will ease the implementation of expected extensions to PicsRULZ when they become formalized.
    The standard user-agent which processes PicsRULZ rules SHOULD have four significant components: the rule parser, the label source, label validators, and a rule evaluator. Their roles are:
    Rule parser
    Parses PicsRULZ rules, possibly loaded from saved configuration information or over a network. In user-agents which may store multiple rules, such as proxy servers, this component is also responsible for deciding which rule to use for each specific request; subsequent modules assume that only one rule is being applied.
    Label source
    This component is responsible for finding labels. It takes as input information from the rule being evaluated; it MAY use this information to contact label bureaus for labels. It MAY also find labels imbedded in HTML documents or transmitted in datastreams (HTTP, NNTP) which support label transmission. The output of this component is the set of labels which apply to the document in question. Note that as there are multiple potential label sources, the label source component may produce more than one label from a given service for a given document. However, the label source component is responsible for choosing the "most applicable" label (i.e., picking specific labels over generic ones, and picking the most specific generic label if multiple generic labels are available). This component will also have to understand "default" labels should that proposal be adopted.
    The label source will need to specify to the other components not only the label itself, but also how the label was obtained (imbedded in content, from a label bureau, etc).
    Label validators
    Label validators are responsible for determining which labels are acceptable. No validators are defined in the PicsRULZ language, but we expect extensions to be defined. For example, a label validator may be defined which rejects labels that lack an authorized digital signature. Another possible validator would examine whether a label's author has been vouched for by a trusted third party.
    Rule evaluator
    The rule evaluator takes as input the labels that pass any validators, and the Filter expression that the rule parser found in the rule. It evaluates the permission and prohibition expressions and produces a pass/fail decision. Each simple-expression is evaluated by checking if any valid label satisfies the expression. As described above, any simple-expression for which no valid label applies evaluates to "false"

    Extension mechanism

    Any network protocol needs a mechanism for extension. Here we present the extension mechanism provided with PicsRULZ.

    Background

    PicsRULZ is structured as a nested set of attribute-value pairs. Unrecognized attribute keywords are ignored by user-agents, and the associated values can be discarded by a PicsRULZ parser, as all values will be in a known syntax. The basic mechanism for extending PicsRULZ is to define new clauses and/or attribute-value pairs, their context, and their meaning. All new attribute-value pairs will be associated with a named extension. Names of extensions will be unique by building atop DNS names; the attribute-value pairs used by an extension will not be required to have globally-unique names.

    Details

    1. To define a new extension:
    2. Determine if the extension is optional or required. Optional extensions may be ignored by user-agents which don't recognize the extension. In order for an extension to be "optional", the semantics of a rule which uses this extension must not be modified if the extension is ignored.
    3. Name the extension. Extensions must have a unique URL assigned to them. The URL should point to a human-readable document which explains the extension in detail. The URL must be in a domain controlled by the extension's creator, in order to insure uniqueness of extension names.
    4. If an extension needs new clauses, define the extension-clause-name that will be used for each new clause defined by this extension. Extensions SHOULD define no more than one new clause.
    5. Determine the new attribute-value pairs that this extension will define, and which clauses those attribute-value pairs may appear in.
    6. Define the semantics of each new attribute-value pair defined by this extension. In particular, if this extension overrides existing parts of PicsRULZ, then this behavior must be spelled out exactly.

    Here's a simple example of a PicsRULZ rule that uses an optional extension:

     
     1    (PicsRule-1.0
     2        (
     3        servicename (
     4          "http://www.coolness.raleigh.ibm.com/ratings/V1.html"
     5          shortname "Cool"
     6          bureauURL "http://www.ics.raleigh.ibm.com/Coolness")
     7        Filter (Pass '((Cool.Coolness < 3) or (Cool.Graphics < 3))' )
     8        optextension ("http://www.ics.raleigh.ibm.com/ICS/ICS42-extensions.html")
     9        ibm-ics-extension (use-expired "YES" group-file "/etc/ics.grp") 
    10        ) 
    11    )
    This example makes use of an optional extension named "http://www.ics.raleigh.ibm.com/ICS/ICS42-extensions.html". That extension presumably defines the keyword 'ibm-ics-extension', which is given on line 9. User-agents which don't understand this extension can simply ignore the ibm-ics-extension clause and its attribute-value pairs.

    Note that there is only one "level" to declaring extensions, but attribute-value pairs defined by extensions may appear anywhere within a PicsRULZ rule. That is, all extensions should declare themselves with an optextension or reqextension clause within a rule-clause, but the attributes defined by an extension may appear nested several layers down within a rule. This is shown in the following, more elaborate, example:

     1    (PicsRule-1.0
     2        (
     3        servicename (
     4          "http://www.coolness.raleigh.ibm.com/ratings/V1.html"
     5          shortname "Cool"
     6          bureauURL "http://www.ics.raleigh.ibm.com/Coolness")
     7        Filter (Block '((Cool.Coolness < 3) or (Cool.Graphics < 3))'
                      ibm-ics-time "06:00:00-20:00:00")
     8        optextension ("http://www.ics.raleigh.ibm.com/ICS/ICS43-extensions.html")
     9        )
    10     )
    In this example, the extension named "http://www.ics.raleigh.ibm.com/ICS/ICS43-extensions.html" is defined. It presumably defines a new attribute-value pair within a Filter clause. The attribute is named ibm-ics-time; it uses a single quoted string as its value, but it could use an entire S-expression as its value.

    Please mail any comments on this draft to the members of the working group; this list is available on the Web.

    Last modified: 4/22/1998