This is an internal data structure class for the LSI node. Save for raw_vector_with, it should be fairly straightforward to understand. You should never have to use it directly.
Creates the raw vector out of word_hash using word_list as the key for mapping the vector space.
# File lib/classifier-reborn/lsi/content_node.rb, line 35 def raw_vector_with( word_list ) if $GSL vec = GSL::Vector.alloc(word_list.size) else vec = Array.new(word_list.size, 0) end @word_hash.each_key do |word| vec[word_list[word]] = @word_hash[word] if word_list[word] end # Perform the scaling transform and force floating point arithmetic total_words = vec.sum.to_f total_unique_words = 0 if $GSL vec.each { |word| total_unique_words += 1 if word != 0 } else total_unique_words = vec.count{ |word| word != 0 } end # Perform first-order association transform if this vector has more # then one word in it. if total_words > 1.0 && total_unique_words > 1 weighted_total = 0.0 vec.each do |term| if ( term > 0 ) weighted_total += (( term / total_words ) * Math.log( term / total_words )) end end vec = vec.collect { |val| Math.log( val + 1 ) / -weighted_total } end if $GSL @raw_norm = vec.normalize @raw_vector = vec else @raw_norm = Vector[*vec].normalize @raw_vector = Vector[*vec] end end
Generated with the Darkfish Rdoc Generator 2.