457475451 发表于 2016-12-14 09:06:40

Solr: Solr Integrate with MongoDB

  Install Mongo-Connector
  pip install mongo-connector
  pip uninstall mongo-connector

git clone https://github.com/10gen-labs/mongo-connector.git
cd mongo-connector
python setup.py install
  modify doc_managers/solr_doc_manager.py

from mongo_connector import errors
#from mongo_connector.constants import (DEFAULT_COMMIT_INTERVAL,DEFAULT_MAX_BULK)
from constants import (DEFAULT_COMMIT_INTERVAL,DEFAULT_MAX_BULK)
from mongo_connector.util import retry_until_ok
#from mongo_connector.doc_managers import DocManagerBase, exception_wrapper
from doc_managers import DocManagerBase, exception_wrapper
#from mongo_connector.doc_managers.formatters import DocumentFlattener
from doc_managers.formatters import DocumentFlattener

   
  Test
  #mongo-connector


  MongoDB Replica set
  1. make the following dirs arch
  rs
├── db
│   ├── rs1
│   │   ├── journal
│   │   └── _tmp
│   └── rs2
│       └── journal
└── log
  2.  run two instances
  #cd rs
  #mongod --port 27001 --oplogSize 100 --dbpath db/rs1 --logpath log/rs1.log --replSet rs/127.0.0.1:27002 --journal 

#mongod --port 27002 --oplogSize 100 --dbpath db/rs2 --logpath log/rs2.log --replSet rs/127.0.0.1:27001 --journal
  3. config replica set
  #mongo --port 27001
  >config={_id:'rs', members:[{_id:0, host:'localhost:27001'},{_id:1, host:'localhost:27002'}]}
  >rs.initiate(config)
  >rs.status()
  rs:PRIMARY>
  Dump data from mongo to solr
  #python connector.py --unique-key=id --auto-commit-interval=0 -n test.test  -m localhost:27001 -t http://localhost:8983/solr/inokqreply -d solr_doc_manager.py
  or
  #mongo-connector   --auto-commit-interval=0 -n test.test  -m localhost:27001 -t http://localhost:8983/solr/inokqreply -d doc_managers/solr_doc_manager.py
  ------------------------------------------------------------------------------------------------------------------------------------
  error:
  

 
  modify python2.7/site-packages pysolr.py

    716             for bit in values:
717               if self._is_null_value(bit):
718                     continue
719
720               #attrs = {'name': key}
721               attrs = {str('name'): key}
722
723               if boost and key in boost:
724                     #attrs['boost'] = force_unicode(boost)
725                     attrs = force_unicode(boost)
726
727               field = ET.Element('field', **attrs)
728               field.text = self._from_python(bit)
729
730               doc_elem.append(field)

  see related error :https://github.com/toastdriven/pysolr/issues/72
  error:

solution: delete the config.txt file under the dir which lanched the above command.
  error:
  File "build/bdist.linux-x86_64/egg/pysolr.py", line 318, in _send_request
    error_message = self._extract_error(resp)
  File "build/bdist.linux-x86_64/egg/pysolr.py", line 397, in _extract_error
    reason, full_html = self._scrape_response(resp.headers, resp.content)
  File "build/bdist.linux-x86_64/egg/pysolr.py", line 418, in _scrape_response
    import lxml.html
ImportError: No module named lxml.html
2014-07-31 14:29:51,872 - ERROR - OplogThread: Failed during dump collection cannot recover! Collection(Database(MongoClient(), u'local'), u'oplog.rs')


  Solution:
  yum install python-lxml
  yum install libxml2-python
  yum install libxml2-dev or libxslt-devel
  pip install lxml   or  pip install lxml==3.2.4

  pip install cssselect
  #ln -s /usr/local/python27/lib/libpython2.7.so /usr/local/lib/
 
  error
  File "build/bdist.linux-x86_64/egg/pysolr.py", line 318, in _send_request
    error_message = self._extract_error(resp)
  File "build/bdist.linux-x86_64/egg/pysolr.py", line 397, in _extract_error
    reason, full_html = self._scrape_response(resp.headers, resp.content)
  File "build/bdist.linux-x86_64/egg/pysolr.py", line 429, in _scrape_response
    p_nodes = body_node.cssselect('p')
AttributeError: 'NoneType' object has no attribute 'cssselect'
2014-07-31 17:29:25,320 - ERROR - OplogThread: Failed during dump collection cannot recover! Collection(Database(MongoClient(), u'local'), u'oplog.rs')
  Solution
  https://github.com/toastdriven/pysolr/pull/92
  https://github.com/toastdriven/pysolr/pull/92
  https://github.com/toastdriven/pysolr/pull/92
  ==========================================================
pysolr
  https://pypi.python.org/pypi/pysolr/3.2.0
  ===============================================================================
  datetime
  my situation is that there is a 'created_at' filed which store unix timestamp with long format
  when import these data to solr by mongo-connect, there is a error "Invalide date string "
  Solution:
  1. uninstall mongo-connector
  #pip uninstall mongo-connector
  2. modify mongo_connector/doc_managers/formatters.py

    143   def transform_element(self, key, value):
144         if isinstance(value, list):
145             for li, lv in enumerate(value):
146               for inner_k, inner_v in self.transform_element(
147                         "%s.%s" % (key, li), lv):
148                     yield inner_k, inner_v
149         elif isinstance(value, dict):
150             formatted = self.format_document(value)
151             for doc_key in formatted:
152               yield "%s.%s" % (key, doc_key), formatted
153         else:
154             # We assume that transform_value will return a 'flat' value,
155             # not a list or dict
156             # print("+++++++++++++++++++++ key=%svalue=%s" %(key,value))
157             if key == "created_at":
158               yield key, self.transform_dateformat(value)
159             else:
160               yield key, self.transform_value(value)


    105   def transform_dateformat(self, value):
106         return datetime.datetime.fromtimestamp(int(value), None)

  3. reinstall
  #python setup.py install
  Everything is OK.
  http://tool.chinaz.com/Tools/unixtime.aspx
  http://developwithstyle.com/articles/2010/07/09/handling-dates-in-mongodb/
  https://wiki.python.org/moin/TimeTransitionsImage
  ===================================================================================
  There is a post who support another mongodb-solr-DIH tool
  http://stackoverflow.com/questions/9345335/solr-data-import-handlers-for-mongodb
  #git clone https://github.com/james75/SolrMongoImporter
  -------------------------
  init script
  https://gist.github.com/lovett89/9260081
  http://www.snip2code.com/Snippet/33459/mongo-connector-init-script-%28tested-on-C
  https://github.com/10gen-labs/mongo-connector/issues/96


[*]Modify the variables at the top of mongo-connector.start to your liking
[*]Modify the wrapper variable at the top of the init script to point to the location of mongo-connector.start
[*]Place the mongo-connector script in /etc/init.d and run chkconfig --add mongo-connector
  When I run chkconfig --add mongo-connector, there is no chkconfig command.
  Solution:

sudo apt-get install sysv-rc-conf

  =================================================================================
  mongodb commands
  http://blog.csdn.net/wangpeng047/article/details/7705588
  References
  http://www.cnblogs.com/sysuys/p/3403670.html
  http://blog.mongodb.org/post/29127828146/introducing-mongo-connector
  https://github.com/10gen-labs/mongo-connector/wiki/Usage-with-Solr
页: [1]
查看完整版本: Solr: Solr Integrate with MongoDB