module RDF::Util::File

Wrapper for retrieving RDF resources from HTTP(S) and file: scheme locations.

By default, HTTP(S) resources are retrieved using Net::HTTP. However, If the [Rest Client](rubygems.org/gems/rest-client) gem is included, it will be used for retrieving resources, allowing for sophisticated HTTP caching using [REST Client Components](rubygems.org/gems/rest-client-components) allowing the use of `Rack::Cache` to avoid network access.

To use other HTTP clients, consumers can subclass {RDF::Util::File::HttpAdapter} and set the {RDF::Util::File.http_adapter}.

Also supports the file: scheme for access to local files.

@since 0.2.4

Public Class Methods

http_adapter(use_net_http = false) click to toggle source

Get current HTTP adapter. If no adapter has been explicitly set, use RestClientAdapter (if RestClient is loaded), or the NetHttpAdapter

@param [Boolean] use_net_http use the NetHttpAdapter, even if other

adapters have been configured

@return [HttpAdapter] @since 1.2

# File lib/rdf/util/file.rb, line 257
def http_adapter(use_net_http = false)
  if use_net_http
    NetHttpAdapter
  else
    @http_adapter ||= begin
      # Otherwise, fallback to Net::HTTP
      if defined?(RestClient)
        RestClientAdapter
      else
        NetHttpAdapter
      end
    end
  end
end
http_adapter=(http_adapter) click to toggle source

Set the HTTP adapter @see .http_adapter @param [HttpAdapter] ::http_adapter @return [HttpAdapter] @since 1.2

# File lib/rdf/util/file.rb, line 245
def http_adapter= http_adapter
  @http_adapter = http_adapter
end
open_file(filename_or_url, options = {}) { |remote_document| ... } click to toggle source

Open the file, returning or yielding {RemoteDocument}.

Adds Accept header based on available reader content types to allow for content negotiation based on available readers.

Input received as non-unicode, is transformed to UTF-8. With Ruby >= 2.2, all UTF is normalized to [Unicode Normalization Form C (NFC)](unicode.org/reports/tr15/#Norm_Forms).

HTTP resources may be retrieved via proxy using the `proxy` option. If `RestClient` is loaded, they will use the proxy globally by setting something like the following:

`RestClient.proxy = "http://proxy.example.com/"`.

When retrieving documents over HTTP(S), use the mechanism described in [Providing and Discovering URI Documentation](www.w3.org/2001/tag/awwsw/issue57/latest/) to pass the appropriate `base_uri` to the block or as the return.

Applications needing HTTP caching may consider [Rest Client](rubygems.org/gems/rest-client) and [REST Client Components](rubygems.org/gems/rest-client-components) allowing the use of `Rack::Cache` as a local file cache.

@example using a local HTTP cache

require 'restclient/components'
require 'rack/cache'
RestClient.enable Rack::Cache
RDF::Util::File.open_file("http://example.org/some/resource")
  # => Cached resource if current, otherwise returned resource

@param [String] filename_or_url to open @param [Hash{Symbol => Object}] options

options are ignored in this implementation. Applications are encouraged
to override this implementation to provide more control over HTTP
headers and redirect following. If opening as a file,
options are passed to `Kernel.open`.

@option options [String] :proxy

HTTP Proxy to use for requests.

@option options [Array, String] :headers

HTTP Request headers, passed to Kernel.open.

@option options [Boolean] :verify_none (false)

Don't verify SSL certificates

@return [RemoteDocument, Object] A {RemoteDocument}. If a block is given, the result of evaluating the block is returned. @yield [ RemoteDocument] A {RemoteDocument} for local files @yieldreturn [Object] returned from ::open_file @raise [IOError] if not found

# File lib/rdf/util/file.rb, line 313
def self.open_file(filename_or_url, options = {}, &block)
  filename_or_url = $1 if filename_or_url.to_s.match(/^file:(.*)$/)
  remote_document = nil

  if filename_or_url.to_s =~ /^https?/
    base_uri = filename_or_url.to_s

    remote_document = self.http_adapter(!!options[:use_net_http]).open_url(base_uri, options)
  else
    # Fake content type based on found format
    format = RDF::Format.for(filename_or_url.to_s)
    content_type = format ? format.content_type.first : 'text/plain'
    # Open as a file, passing any options
    begin
      url_no_frag_or_query = RDF::URI(filename_or_url)
      url_no_frag_or_query.query = nil
      url_no_frag_or_query.fragment = nil
      options[:encoding] ||= Encoding::UTF_8
      Kernel.open(url_no_frag_or_query, "r", options) do |file|
        document_options = {
          base_uri:     filename_or_url.to_s,
          charset:      file.external_encoding.to_s,
          code:         200,
          content_type: content_type,
          last_modified:file.mtime,
          headers:      {content_type: content_type, last_modified: file.mtime.xmlschema}
        }

        remote_document = RemoteDocument.new(file.read, document_options)
      end
    rescue Errno::ENOENT => e
      raise IOError, e.message
    end
  end

  if block_given?
    yield remote_document
  else
    remote_document
  end
end