Start of change

Providing a robots.txt file

Web robots are programs that make automatic requests to servers. For example, search engines use robots (which are sometimes known as Web crawlers) to retrieve pages for inclusion in their search database. You can provide a robots.txt file to identify URLs that robots are not allowed to visit.

On visiting a Web site, a robot should make a request for the document robots.txt, using the URL
http://www.example.com/robots.txt
where www.example.com is the host name for the site. If you have host names that can be accessed using more than one port number, robots should request the robots.txt file for each combination of host name and port number. The policies listed in the file can apply to all robots, or name specific robots. Disallow statements are used to name URLs that the robots should not visit. Note that even when you provide a robots.txt file, any robots which do not comply with the robots exclusion standard might still access and index your Web pages.
If a Web browser requests a robots.txt file and you do not provide one, CICS sends an error response to the browser as follows:
  • If you are using the CICS-supplied default analyzer DFHWBAAX, a 404 (Not Found) response is returned. No CICS message is issued in this situation.
  • If you are using the sample analyzer DFHWBADX, or a similar analyzer which is only able to interpret the URL format that was required before CICS TS Version 3, the analyzer is likely to misinterpret the path robots.txt as an incorrectly specified converter program name. In this case, message DFHWB0723 is issued, and a 400 (Bad Request) response is returned to the browser. To avoid this situation, you can either modify the analyzer program to recognize the robots.txt request and provide a more suitable error response, or provide a robots.txt file using a URIMAP definition (which means that the sample analyzer program is bypassed for these requests).

To provide a robots.txt file for all or some of your host names:

  1. Create the text content for the robots.txt file. Information about creating a robots.txt file, and detailed examples, are available from several Web sites. Search on "robots.txt" or "robots exclusion standard" and select an appropriate site.
  2. Decide how to store and provide the robots.txt file. You can provide the file using only a URIMAP definition, or using an application program.
    1. You can store the robots.txt file on z/OS® UNIX System Services HFS, and provide the file as a static response using a URIMAP definition. Most Web servers store the robots.txt file in the root directory for the host name. For CICS, a URIMAP definition can provide a file stored anywhere on HFS, and the same file can be used for more than one host name.

      If you use a file stored on HFS, the CICS® region must have permissions to access z/OS UNIX, and it must have permission to access the HFS directory containing the file, and the file itself. Java™ Applications in CICS explains how to grant these permissions.

    2. You can make the robots.txt file into a CICS document, and provide it either as a static response using a URIMAP definition, or as a response from an application program. The CICS Application Programming Guide explains how to create a CICS document template. A document template is defined using a DOCTEMPLATE resource definition, and it can be held in a partitioned data set, a CICS program, a file, a temporary storage queue, a transient data queue, an exit program or a z/OS UNIX System Services HFS file.
    3. If you want to provide the contents of the robots.txt file using an application program, create a suitable Web-aware application program. Writing Web-aware application programs for CICS as an HTTP server tells you how to write an application program that uses the EXEC CICS WEB API commands. For example, you can use the EXEC CICS WEB SEND command with the FROM option to specify a buffer of data containing your robots.txt information. Alternatively, you can use the application program to deliver a CICS document from a template. Specify a media type of text/plain.

      You might want to use an application program to handle requests from robots so that you can track which robots are visiting your Web pages. The User-Agent header in a robot's request should give the name of the robot, and the From header should include contact information for the owner of the robot. Your application program could read and log these HTTP headers.

  3. Begin a URIMAP definition that matches requests made by Web robots for the robots.txt file. Starting a URIMAP resource definition for any requests for CICS as an HTTP server lists the steps to create a URIMAP resource definition matching a request. The following sample URIMAP definition attributes could be specified to match a request for a robots.txt file for any host name:
    	Urimap       ==> robots         - URIMAP name
    	Group        ==> MYGROUP        - Any suitable
    	Description  ==> Robots.txt     
    	STatus       ==> Enabled        
    	USAge        ==> Server         - For CICS as HTTP server
    UNIVERSAL RESOURCE IDENTIFIER 
    	SCheme       ==> HTTP           - Will also match HTTPS requests
    	HOST         ==> *              - * matches any host name.
                                            Specify host name if you
                                            provide separate robots.txt files
    	PAth         ==> /robots.txt    - Robots use this path to
                                            request robots.txt
    ASSOCIATED CICS RESOURCES
    	TCpipservice ==>                - Blank matches any port. Specify
                                            TCPIPSERVICE definition name if
                                            you provide different robots.txt
                                            files depending on the port
    Remember that the path components of URLs are case-sensitive. The path /robots.txt must be specified in lower case.
  4. If you are providing the robots.txt file as a static response, complete the URIMAP definition to specify the file location and the other information which CICS Web support needs to construct responses. Completing a URIMAP definition for a static response to an HTTP request for CICS as an HTTP server guides you through this process. For example, the following URIMAP definition attributes could be specified to provide a robots.txt file which was created using the EBCDIC code page 037 and stored in the /u/cts/CICSHome directory:
    STATIC DOCUMENT PROPERTIES
     Mediatype    ==> text/plain
     CHaracterset ==> iso-8859-1
     HOSTCodepage ==> 037
     HFsfile      ==> /u/cts/CICSHome/robots.txt
    The HFS name is case-sensitive.
  5. If you are providing the content of the robots.txt file using an application program, complete the URIMAP definition to specify that the program will handle requests. Completing a URIMAP definition for an application response to an HTTP request for CICS as an HTTP server guides you through this process. For example, the following URIMAP definition attributes could be used to make the Web-aware application program ROBOTS handle the request, with no analyzer or converter program involved:
    ASSOCIATED CICS RESOURCES
     Analyzer     ==>No             - Analyzer not used for request
     COnverter    ==>             - Blank means no converter program
     TRansaction  ==>             - Blank defaults to CWBA
     PRogram      ==> ROBOTS
End of change