Solr: Solr Integrate with MongoDB
Install Mongo-Connectorpip install mongo-connector
pip uninstall mongo-connector
git clone https://github.com/10gen-labs/mongo-connector.git
cd mongo-connector
python setup.py install
modify doc_managers/solr_doc_manager.py
from mongo_connector import errors
#from mongo_connector.constants import (DEFAULT_COMMIT_INTERVAL,DEFAULT_MAX_BULK)
from constants import (DEFAULT_COMMIT_INTERVAL,DEFAULT_MAX_BULK)
from mongo_connector.util import retry_until_ok
#from mongo_connector.doc_managers import DocManagerBase, exception_wrapper
from doc_managers import DocManagerBase, exception_wrapper
#from mongo_connector.doc_managers.formatters import DocumentFlattener
from doc_managers.formatters import DocumentFlattener
Test
#mongo-connector
MongoDB Replica set
1. make the following dirs arch
rs
├── db
│ ├── rs1
│ │ ├── journal
│ │ └── _tmp
│ └── rs2
│ └── journal
└── log
2. run two instances
#cd rs
#mongod --port 27001 --oplogSize 100 --dbpath db/rs1 --logpath log/rs1.log --replSet rs/127.0.0.1:27002 --journal
#mongod --port 27002 --oplogSize 100 --dbpath db/rs2 --logpath log/rs2.log --replSet rs/127.0.0.1:27001 --journal
3. config replica set
#mongo --port 27001
>config={_id:'rs', members:[{_id:0, host:'localhost:27001'},{_id:1, host:'localhost:27002'}]}
>rs.initiate(config)
>rs.status()
rs:PRIMARY>
Dump data from mongo to solr
#python connector.py --unique-key=id --auto-commit-interval=0 -n test.test -m localhost:27001 -t http://localhost:8983/solr/inokqreply -d solr_doc_manager.py
or
#mongo-connector --auto-commit-interval=0 -n test.test -m localhost:27001 -t http://localhost:8983/solr/inokqreply -d doc_managers/solr_doc_manager.py
------------------------------------------------------------------------------------------------------------------------------------
error:
modify python2.7/site-packages pysolr.py
716 for bit in values:
717 if self._is_null_value(bit):
718 continue
719
720 #attrs = {'name': key}
721 attrs = {str('name'): key}
722
723 if boost and key in boost:
724 #attrs['boost'] = force_unicode(boost)
725 attrs = force_unicode(boost)
726
727 field = ET.Element('field', **attrs)
728 field.text = self._from_python(bit)
729
730 doc_elem.append(field)
see related error :https://github.com/toastdriven/pysolr/issues/72
error:
solution: delete the config.txt file under the dir which lanched the above command.
error:
File "build/bdist.linux-x86_64/egg/pysolr.py", line 318, in _send_request
error_message = self._extract_error(resp)
File "build/bdist.linux-x86_64/egg/pysolr.py", line 397, in _extract_error
reason, full_html = self._scrape_response(resp.headers, resp.content)
File "build/bdist.linux-x86_64/egg/pysolr.py", line 418, in _scrape_response
import lxml.html
ImportError: No module named lxml.html
2014-07-31 14:29:51,872 - ERROR - OplogThread: Failed during dump collection cannot recover! Collection(Database(MongoClient(), u'local'), u'oplog.rs')
Solution:
yum install python-lxml
yum install libxml2-python
yum install libxml2-dev or libxslt-devel
pip install lxml or pip install lxml==3.2.4
pip install cssselect
#ln -s /usr/local/python27/lib/libpython2.7.so /usr/local/lib/
error
File "build/bdist.linux-x86_64/egg/pysolr.py", line 318, in _send_request
error_message = self._extract_error(resp)
File "build/bdist.linux-x86_64/egg/pysolr.py", line 397, in _extract_error
reason, full_html = self._scrape_response(resp.headers, resp.content)
File "build/bdist.linux-x86_64/egg/pysolr.py", line 429, in _scrape_response
p_nodes = body_node.cssselect('p')
AttributeError: 'NoneType' object has no attribute 'cssselect'
2014-07-31 17:29:25,320 - ERROR - OplogThread: Failed during dump collection cannot recover! Collection(Database(MongoClient(), u'local'), u'oplog.rs')
Solution
https://github.com/toastdriven/pysolr/pull/92
https://github.com/toastdriven/pysolr/pull/92
https://github.com/toastdriven/pysolr/pull/92
==========================================================
pysolr
https://pypi.python.org/pypi/pysolr/3.2.0
===============================================================================
datetime
my situation is that there is a 'created_at' filed which store unix timestamp with long format
when import these data to solr by mongo-connect, there is a error "Invalide date string "
Solution:
1. uninstall mongo-connector
#pip uninstall mongo-connector
2. modify mongo_connector/doc_managers/formatters.py
143 def transform_element(self, key, value):
144 if isinstance(value, list):
145 for li, lv in enumerate(value):
146 for inner_k, inner_v in self.transform_element(
147 "%s.%s" % (key, li), lv):
148 yield inner_k, inner_v
149 elif isinstance(value, dict):
150 formatted = self.format_document(value)
151 for doc_key in formatted:
152 yield "%s.%s" % (key, doc_key), formatted
153 else:
154 # We assume that transform_value will return a 'flat' value,
155 # not a list or dict
156 # print("+++++++++++++++++++++ key=%svalue=%s" %(key,value))
157 if key == "created_at":
158 yield key, self.transform_dateformat(value)
159 else:
160 yield key, self.transform_value(value)
105 def transform_dateformat(self, value):
106 return datetime.datetime.fromtimestamp(int(value), None)
3. reinstall
#python setup.py install
Everything is OK.
http://tool.chinaz.com/Tools/unixtime.aspx
http://developwithstyle.com/articles/2010/07/09/handling-dates-in-mongodb/
https://wiki.python.org/moin/TimeTransitionsImage
===================================================================================
There is a post who support another mongodb-solr-DIH tool
http://stackoverflow.com/questions/9345335/solr-data-import-handlers-for-mongodb
#git clone https://github.com/james75/SolrMongoImporter
-------------------------
init script
https://gist.github.com/lovett89/9260081
http://www.snip2code.com/Snippet/33459/mongo-connector-init-script-%28tested-on-C
https://github.com/10gen-labs/mongo-connector/issues/96
[*]Modify the variables at the top of mongo-connector.start to your liking
[*]Modify the wrapper variable at the top of the init script to point to the location of mongo-connector.start
[*]Place the mongo-connector script in /etc/init.d and run chkconfig --add mongo-connector
When I run chkconfig --add mongo-connector, there is no chkconfig command.
Solution:
sudo apt-get install sysv-rc-conf
=================================================================================
mongodb commands
http://blog.csdn.net/wangpeng047/article/details/7705588
References
http://www.cnblogs.com/sysuys/p/3403670.html
http://blog.mongodb.org/post/29127828146/introducing-mongo-connector
https://github.com/10gen-labs/mongo-connector/wiki/Usage-with-Solr
页:
[1]