David Mytton为什么从MySQL迁移到MongoDB数据库

zlzyp · 发表于 2016-10-20 08:39:31

　　题记：
　　工作辞了，在家闲着也是闲着，研究了下non-relational数据库，恰巧看到robbin大哥写的“NOSQL数据库探讨”，便迫切想学习下，了解到MongoDB一些基本知识后，就去瞅了下在robbin大哥的文中提及到的一个MongoDB移植案例，如：
　　“由于Mongo可以支持复杂的数据结构，而且带有强大的数据查询功能，因此非常受到欢迎，很多项目都考虑用MongoDB来替代MySQL来实现不是特别复杂的Web应用，比方说why we migrated from MySQL to MongoDB就是一个真实的从MySQL迁移到MongoDB的案例，由于数据量实在太大，所以迁移到了Mongo上面，数据查询的速度得到了非常显著的提升。”
　　从中感到了作者的欢喜和忧愁，有翻译不妥或理解不到位的，还请指正：）
　　
　　1，David为什么要迁移？
　　
　　原文如下：
写道
The problem we encountered was administrative. We wanted to scale using replication but found that MySQL had a hard time keeping up, especially with the initial sync. As such, backups became an issue, but we solved that. However, scaling MySQL onto multiple clustered servers as we plan to do in the future is difficult. You either do this through replication but that is only really suited to read-heavy applications; or using MySQL cluster. The cluster looks very good but I have read about some problems with it and was unsure of it’s suitability for our needs.

　　
　　看上去大概的意思是说：我们遇到了管理上的麻烦，虽然我们解决了备份问题。我们试图通过MySql集群解决，集群看上去很好但对于一个大量写应用来说却遇到了困难，同时我们也不确定集群是否适应我们的需求。
　　
　　于是David选择更换MySQL，选择了MongoDB。
　　
　　2、为什么选择MongonDB？
　　
写道
Very easy to install.
PHP module available.
Very easy replication, including master-master support. In testing this caught up with our live DB very quickly and stayed in sync without difficulty.
Automated sharding being developed.
Good documentation.

　　我想最重要的一点应该是：Very easy replication, including master-master support. In testing this caught up with our live DB very quickly and stayed in sync without difficulty.
　　
　　非常容易的数据拷贝并且快速、一致。
　　
　　3、移植MongonDB后的问题。
　　
　　Schema-less：
写道
Schema-less

This means things are much more flexible for future structure changes but it also means that every row records the field names. We had relatively long, descriptive names in MySQL such as timeAdded or valueCached. For a small number of rows, this extra storage only amounts to a few bytes per row, but when you have 10 million rows, each with maybe 100 bytes of field names, then you quickly eat up disk space unnecessarily. 100 * 10,000,000 = ~900MB just for field names!

We cut down the names to 2-3 characters. This is a little more confusing in the code but the disk storage savings are worth it. And if you use sensible names then it isn’t that bad e.g. timeAdded -> tA. A reduction to about 15 bytes per row at 10,000,000 rows means ~140MB for field names – a massive saving.

　　
　　灵活的BSON文本存储结构意味着每条记录都带有了字段名，从而处理不当会导致空间的浪费，于是David减缩了字段名。
　　
　　The database-per-customer method doesn’t work
写道
The database-per-customer method doesn’t work

MongoDB stores data in flat files using their own binary storage objects. This means that data storage is very compact and efficient, perfect for high data volumes. However, it allocates a set of files per database and pre-allocates those files on the filesystem for speed:

This was a problem because MongoDB was frequently pre-allocating in advance when the data would almost never need to “flow” into another file, or only a tiny amount of another file. This is particularly the case with free accounts where we clear out data after a month. Such pre-allocation caused large amounts of disk space to be used up.

We therefore changed our data structure so that we had a single DB, thus making the most efficient use of the available storage. There is no performance hit for doing this because the files are split out, unlike MySQL which uses a single file per table.

　　
　　
　　MongoDB的文件存储是以“database”为颗粒的，不像MySQL为每个table使用一个单独的文件。并且避免生成硬盘碎片，mongonDB是预申请硬盘空间，以指数递增，所以如果数据组织不好的话，会导致文件中实际使用空间远小于占用硬盘的空间，所以David更改了数据组织结构以更高效得利用空间。
　　
　　Unexpected locking and blocking
写道
Unexpected locking and blocking

In MongoDB, removing rows locks and blocks the entire database. Adding indexes also does the same. When we imported our data, this was causing problems because large data sets were causing the locks to exist for some time until the indexing had completed. This is a not a problem when you first create the “collection” (tables in MySQL) because there are only a few (or no) rows, but creating indexes later will cause problems.

Previously in MySQL we would delete rows by using a wide ranging WHERE clause, for example to delete rows by date range or server ID. Now in MongoDB we have to loop through all the rows and delete them individually. This is slower, but it prevents the locking issue.

　　
　　
　　在MongonDB中，删除rows需要阻塞整个database，增加index也一样，相对Mysql来说，速度慢了，但防止出现关于锁的问题。
　　
　　Corruption
写道
Corruption

In MySQL if a database (more likely a few tables) become corrupt, you can repair them individually. In MongoDB, you have to repair on a database level. There is a command to do this but it reads all the data and re-writes it to a new set of files. This means all data is checked and means you will probably have some disk space freed up as files are compacted but it also means the entire database is locked and blocked during the time it takes. With our database being around 60GB, this operation takes several hours.

　　
　　mysql中各类table可以独立的修复，而mongonDB的修复是database级别的，所有的data都会被检查。
　　
写道
Performance

Our reasons for moving to MongoDB were not performance, however it has turned out that in many cases, query times are significantly faster than with MySQL. This is because MongoDB stores as much data in RAM as possible and so it becomes as fast as using something like memcached for the cached data. Even non-cached data is very fast.

　　
　　
　　选择MongonDB不是因为性能问题，但MongoDB的查询性能也还快，类似有个memcached缓存了数据一样。
　　
　　另外，
　　MongonDB不支持事务。
　　适合写完后马上读操作。
　　删除记录的时候不清理空间，只标记“删除”，以后可重复利用。
　　
　　
　　看完后，感觉MongoDB相对Mysql来说，只能说各有优略吧。
　　
　　
　　Comments（提取了一些个人觉得有价值的问题）：
　　
　　问：为什么不选择CouchDB？
　　答：MongonDB的查询与SQL很类似，CouchDB的KEY/VALUE查询形式相比复杂，并且mongoDB提供php模块。
　　
　　问：为什么不考虑memcache&hadoop？
　　答：map/reduce查询并不是我们需要的。
　　
　　问：为什么不考虑SenSage or Vertica？
　　答：对于一个新兴公司来说，商业产品成本太高。
　　
　　问：你需要一个什么样的数据复制，有多少节点需要拷贝？Keyspace产品适合你么？
　　答：两个都是新的产品，我们觉得mongoDB更成熟，另外提供PHP的模块是一大优势。
　　
　　问：你有考虑过阻塞对应用的影响吗？
　　答：是的，阻塞会导致应用一直等待最终超时。
　　
　　问：为什么不考虑TC/TT？
　　答：一时没有找到可工作的libs，TC/TT不是设计为了复杂查询，仅仅是KV数据库。
　　
　　问：Hi, did you try other mysql engines besides Myisam before moving to Mongodb?
　　答：MyISAM was the most suitable for the type of usage we were exeperiencing – many reads and few rights. We used InnoDB (and still do) for the billing and customer systems where we need transactions.
　　
　　
　　希望对大家使用MongoDB有所帮助：）

账号		自动登录	找回密码
密码			立即注册

大疆运维招人啦，

C++ :try 语句块和异常处理

C++的多态

Red Hat RHCE 8 (EX294) Cert Guide

Java/C++ 区别：看完这一篇，就够用！

别再用过时库了！这 13 个顶级 C++ 库才是

c++ size_t 和 int 的区别

[经验分享] David Mytton为什么从MySQL迁移到MongoDB数据库

浏览过的版块

扫码加入运维网微信交流群