设为首页 收藏本站
查看: 582|回复: 0

[经验分享] Solr: Solr Integrate with MongoDB

[复制链接]
累计签到:1 天
连续签到:1 天
发表于 2016-12-14 09:06:40 | 显示全部楼层 |阅读模式
  Install Mongo-Connector
  pip install mongo-connector
  pip uninstall mongo-connector

git clone https://github.com/10gen-labs/mongo-connector.git
cd mongo-connector
python setup.py install
  modify doc_managers/solr_doc_manager.py

from mongo_connector import errors
#from mongo_connector.constants import (DEFAULT_COMMIT_INTERVAL,DEFAULT_MAX_BULK)
from constants import (DEFAULT_COMMIT_INTERVAL,DEFAULT_MAX_BULK)
from mongo_connector.util import retry_until_ok
#from mongo_connector.doc_managers import DocManagerBase, exception_wrapper
from doc_managers import DocManagerBase, exception_wrapper
#from mongo_connector.doc_managers.formatters import DocumentFlattener
from doc_managers.formatters import DocumentFlattener

   
  Test
  #mongo-connector


  MongoDB Replica set
  1. make the following dirs arch
  rs
├── db
│   ├── rs1
│   │   ├── journal
│   │   └── _tmp
│   └── rs2
│       └── journal
└── log
  2.  run two instances
  #cd rs
  #mongod --port 27001 --oplogSize 100 --dbpath db/rs1 --logpath log/rs1.log --replSet rs/127.0.0.1:27002 --journal 

#mongod --port 27002 --oplogSize 100 --dbpath db/rs2 --logpath log/rs2.log --replSet rs/127.0.0.1:27001 --journal
  3. config replica set
  #mongo --port 27001
  >config={_id:'rs', members:[{_id:0, host:'localhost:27001'},{_id:1, host:'localhost:27002'}]}
  >rs.initiate(config)
  >rs.status()
  rs:PRIMARY>
  Dump data from mongo to solr
  #python connector.py --unique-key=id --auto-commit-interval=0 -n test.test  -m localhost:27001 -t http://localhost:8983/solr/inokqreply -d solr_doc_manager.py
  or
  #mongo-connector   --auto-commit-interval=0 -n test.test  -m localhost:27001 -t http://localhost:8983/solr/inokqreply -d doc_managers/solr_doc_manager.py
  ------------------------------------------------------------------------------------------------------------------------------------
  error:
  
DSC0000.png
 
  modify python2.7/site-packages pysolr.py

    716             for bit in values:
717                 if self._is_null_value(bit):
718                     continue
719
720                 #attrs = {'name': key}
721                 attrs = {str('name'): key}
722
723                 if boost and key in boost:
724                     #attrs['boost'] = force_unicode(boost[key])
725                     attrs[str('boost')] = force_unicode(boost[key])
726
727                 field = ET.Element('field', **attrs)
728                 field.text = self._from_python(bit)
729
730                 doc_elem.append(field)

  see related error :https://github.com/toastdriven/pysolr/issues/72
  error:
DSC0001.png
solution: delete the config.txt file under the dir which lanched the above command.
  error:
  File "build/bdist.linux-x86_64/egg/pysolr.py", line 318, in _send_request
    error_message = self._extract_error(resp)
  File "build/bdist.linux-x86_64/egg/pysolr.py", line 397, in _extract_error
    reason, full_html = self._scrape_response(resp.headers, resp.content)
  File "build/bdist.linux-x86_64/egg/pysolr.py", line 418, in _scrape_response
    import lxml.html
ImportError: No module named lxml.html
2014-07-31 14:29:51,872 - ERROR - OplogThread: Failed during dump collection cannot recover! Collection(Database(MongoClient([u'localhost:27017', u'localhost:27018']), u'local'), u'oplog.rs')


  Solution:
  yum install python-lxml
  yum install libxml2-python
  yum install libxml2-dev or libxslt-devel
  pip install lxml   or  pip install lxml==3.2.4

  pip install cssselect
  #ln -s /usr/local/python27/lib/libpython2.7.so /usr/local/lib/
 
  error
  File "build/bdist.linux-x86_64/egg/pysolr.py", line 318, in _send_request
    error_message = self._extract_error(resp)
  File "build/bdist.linux-x86_64/egg/pysolr.py", line 397, in _extract_error
    reason, full_html = self._scrape_response(resp.headers, resp.content)
  File "build/bdist.linux-x86_64/egg/pysolr.py", line 429, in _scrape_response
    p_nodes = body_node.cssselect('p')
AttributeError: 'NoneType' object has no attribute 'cssselect'
2014-07-31 17:29:25,320 - ERROR - OplogThread: Failed during dump collection cannot recover! Collection(Database(MongoClient([u'localhost:27017', u'localhost:27018']), u'local'), u'oplog.rs')
  Solution
  https://github.com/toastdriven/pysolr/pull/92
  https://github.com/toastdriven/pysolr/pull/92
  https://github.com/toastdriven/pysolr/pull/92
  ==========================================================
pysolr

  https://pypi.python.org/pypi/pysolr/3.2.0
  ===============================================================================
  datetime
  my situation is that there is a 'created_at' filed which store unix timestamp with long format
  when import these data to solr by mongo-connect, there is a error "Invalide date string "
  Solution:
  1. uninstall mongo-connector
  #pip uninstall mongo-connector
  2. modify mongo_connector/doc_managers/formatters.py

    143     def transform_element(self, key, value):
144         if isinstance(value, list):
145             for li, lv in enumerate(value):
146                 for inner_k, inner_v in self.transform_element(
147                         "%s.%s" % (key, li), lv):
148                     yield inner_k, inner_v
149         elif isinstance(value, dict):
150             formatted = self.format_document(value)
151             for doc_key in formatted:
152                 yield "%s.%s" % (key, doc_key), formatted[doc_key]
153         else:
154             # We assume that transform_value will return a 'flat' value,
155             # not a list or dict
156             # print("+++++++++++++++++++++ key=%s  value=%s" %(key,value))
157             if key == "created_at":
158                 yield key, self.transform_dateformat(value)
159             else:
160                 yield key, self.transform_value(value)


    105     def transform_dateformat(self, value):
106         return datetime.datetime.fromtimestamp(int(value), None)

  3. reinstall
  #python setup.py install
  Everything is OK.
  http://tool.chinaz.com/Tools/unixtime.aspx
  http://developwithstyle.com/articles/2010/07/09/handling-dates-in-mongodb/
  https://wiki.python.org/moin/TimeTransitionsImage
  ===================================================================================
  There is a post who support another mongodb-solr-DIH tool
  http://stackoverflow.com/questions/9345335/solr-data-import-handlers-for-mongodb
  #git clone https://github.com/james75/SolrMongoImporter
  -------------------------
  init script
  https://gist.github.com/lovett89/9260081
  http://www.snip2code.com/Snippet/33459/mongo-connector-init-script-%28tested-on-C
  https://github.com/10gen-labs/mongo-connector/issues/96


  • Modify the variables at the top of mongo-connector.start to your liking
  • Modify the wrapper variable at the top of the init script to point to the location of mongo-connector.start
  • Place the mongo-connector script in /etc/init.d and run chkconfig --add mongo-connector
  When I run chkconfig --add mongo-connector, there is no chkconfig command.
  Solution:

sudo apt-get install sysv-rc-conf

  =================================================================================
  mongodb commands
  http://blog.csdn.net/wangpeng047/article/details/7705588
  References
  http://www.cnblogs.com/sysuys/p/3403670.html
  http://blog.mongodb.org/post/29127828146/introducing-mongo-connector
  https://github.com/10gen-labs/mongo-connector/wiki/Usage-with-Solr

运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-314031-1-1.html 上篇帖子: Solr FQA 下篇帖子: Solr搜索服务架构图
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表