使用Rails(Tire)和ElasticSearch进行模糊字符串匹配

6
我有一个Rails应用程序,现在使用ElasticSearch和Tire gem来对模型进行搜索。我想知道如何设置我的应用程序以在模型的某些索引上执行模糊字符串匹配。我已经设置了标题、描述等内容的索引,但是我希望在其中一些上执行模糊字符串匹配,但我不确定应该在哪里进行设置。如果您愿意,请查看下面的代码并提出建议!谢谢!
在控制器中:
    def search
      @resource = Resource.search(params[:q], :page => (params[:page] || 1),
                                 :per_page =>15, load: true )
   end

在模型中:
class Resource < ActiveRecord::Base
  include Tire::Model::Search
  include Tire::Model::Callbacks

  belongs_to :user
  has_many :resource_views, :class_name => 'UserResourceView'

  has_reputation :votes, source: :user, aggregated_by: :sum

  attr_accessible :title, :description, :link, :tag_list, :user_id, :youtubeID
  acts_as_taggable

  mapping do 
      indexes :id,  :index => :not_analyzed
      indexes :title, :analyzer => 'snowball', :boost => 40
      indexes :tag_list, :analyzer => 'snowball', :boost => 8
      indexes :description, :analyzer => 'snowball', :boost => 2
      indexes :user_id, :analyzer => 'snowball'
  end
end
1个回答

2

尝试创建自定义分析器以实现其他词干特征等功能。请查看我的示例(此示例还使用了Mongoid和附件,请勿查看,如果您不需要它):

class Document
      include Mongoid::Document
      include Mongoid::Timestamps
      include Tire::Model::Search
      include Tire::Model::Callbacks

      field :filename, type: String
      field :md5, type: String
      field :tags, type: String
      field :size, type: String

      index({md5: 1}, {unique: true})
      validates_uniqueness_of :md5


      DEFAULT_PAGE_SIZE = 10

      settings :analysis => {
          :filter => {
              :ngram_filter => {
                  :type => "edgeNGram",
                  :min_gram => 2,
                  :max_gram => 12
              },
              :custom_word_delimiter => {
                  :type => "word_delimiter",
                  :preserve_original => "true",
                  :catenate_all => "true",
              }
          }, :analyzer => {
              :index_ngram_analyzer => {
                  :type => "custom",
                  :tokenizer => "standard",
                  :filter => ["lowercase", "ngram_filter", "asciifolding", "custom_word_delimiter"]
              },
              :search_ngram_analyzer => {
                  :type => "custom",
                  :tokenizer => "standard",
                  :filter => ["standard", "lowercase", "ngram_filter", "custom_word_delimiter"]
              },
              :suggestions => {
                  :tokenizer => "standard",
                  :filter => ["suggestions_shingle"]
              }
          }
      } do
        mapping {
          indexes :id, index: :not_analyzed
          indexes :filename, :type => 'string', :store => 'yes', :boost => 100, :search_analyzer => :search_ngram_analyzer, :index_analyzer => :index_ngram_analyzer
          indexes :tags, :type => 'string', :store => 'yes', :search_analyzer => :search_ngram_analyzer, :index_analyzer => :index_ngram_analyzer
          indexes :attachment, :type => 'attachment',
                  :fields => {
                      :content_type => {:store => 'yes'},
                      :author => {:store => 'yes', :analyzer => 'keyword'},
                      :title => {:store => 'yes'},
                      :attachment => {:term_vector => 'with_positions_offsets', :boost => 90, :store => 'yes', :search_analyzer => :search_ngram_analyzer, :index_analyzer => :index_ngram_analyzer},
                      :date => {:store => 'yes'}
                  }
        }
      end


      def to_indexed_json
        self.to_json(:methods => [:attachment])
      end

      def attachment        
          path_to_file = "#{Rails.application.config.document_library}#{path}/#{filename}"
          Base64.encode64(open(path_to_file) { |file| file.read })
      end

      def self.search(query, options)
        tire.search do
          query { string "#{query}", :default_operator => :AND, :default_field => 'attachment', :fields => ['filename', 'attachment', 'tags'] }
          highlight :attachment
          page = (options[:page] || 1).to_i
          search_size = options[:per_page] || DEFAULT_PAGE_SIZE
          from (page -1) * search_size
          size search_size
          sort { by :_score, :desc }
          if (options[:facet])
            filter :terms, :tags => [options[:facet]]
            facet 'global-tags', :global => true do
              terms :tags
            end
            facet 'current-tags' do
              terms :tags
            end
          end
        end
      end
    end

希望这有所帮助,

很有帮助,但 Elasticsearch 最终变得过于繁琐,所以我们最终转向了 PostgreSQL。还是谢谢! - nobody
非常有帮助...只要有一点耐心,你的例子就能完美运行 :) - Rinku
1
:store => 'yes' 参数有什么作用? - phillbaker
来自文档的 @phillbaker:“设置为true以将字段实际存储在索引中,设置为false以不存储它。默认为false(请注意,JSON文档本身已存储,并且可以从中检索)。”-http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html 虽然我认为Tire隐式地存储值,因此这可能是不必要的。 - Dan Sandland

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接