题记:
工作辞了,在家闲着也是闲着,研究了下non-relational数据库,恰巧看到robbin大哥写的“NOSQL数据库探讨”,便迫切想学习下,了解到MongoDB一些基本知识后,就去瞅了下在robbin大哥的文中提及到的一个MongoDB移植案例,如:
“由于Mongo可以支持复杂的数据结构,而且带有强大的数据查询功能,因此非常受到欢迎,很多项目都考虑用MongoDB来替代MySQL来实现不是特别复杂的Web应用,比方说why we migrated from MySQL to MongoDB就是一个真实的从MySQL迁移到MongoDB的案例,由于数据量实在太大,所以迁移到了Mongo上面,数据查询的速度得到了非常显著的提升。”
从中感到了作者的欢喜和忧愁,有翻译不妥或理解不到位的,还请指正:)
1,David为什么要迁移?
原文如下:
写道
The problem we encountered was administrative. We wanted to scale using replication but found that MySQL had a hard time keeping up, especially with the initial sync. As such, backups became an issue, but we solved that. However, scaling MySQL onto multiple clustered servers as we plan to do in the future is difficult. You either do this through replication but that is only really suited to read-heavy applications; or using MySQL cluster. The cluster looks very good but I have read about some problems with it and was unsure of it’s suitability for our needs.
写道
Very easy to install.
PHP module available.
Very easy replication, including master-master support. In testing this caught up with our live DB very quickly and stayed in sync without difficulty.
Automated sharding being developed.
Good documentation.
我想最重要的一点应该是:Very easy replication, including master-master support. In testing this caught up with our live DB very quickly and stayed in sync without difficulty.
非常容易的数据拷贝并且快速、一致。
3、移植MongonDB后的问题。
Schema-less:
写道
Schema-less
This means things are much more flexible for future structure changes but it also means that every row records the field names. We had relatively long, descriptive names in MySQL such as timeAdded or valueCached. For a small number of rows, this extra storage only amounts to a few bytes per row, but when you have 10 million rows, each with maybe 100 bytes of field names, then you quickly eat up disk space unnecessarily. 100 * 10,000,000 = ~900MB just for field names!
We cut down the names to 2-3 characters. This is a little more confusing in the code but the disk storage savings are worth it. And if you use sensible names then it isn’t that bad e.g. timeAdded -> tA. A reduction to about 15 bytes per row at 10,000,000 rows means ~140MB for field names – a massive saving.
The database-per-customer method doesn’t work
写道
The database-per-customer method doesn’t work
MongoDB stores data in flat files using their own binary storage objects. This means that data storage is very compact and efficient, perfect for high data volumes. However, it allocates a set of files per database and pre-allocates those files on the filesystem for speed:
This was a problem because MongoDB was frequently pre-allocating in advance when the data would almost never need to “flow” into another file, or only a tiny amount of another file. This is particularly the case with free accounts where we clear out data after a month. Such pre-allocation caused large amounts of disk space to be used up.
We therefore changed our data structure so that we had a single DB, thus making the most efficient use of the available storage. There is no performance hit for doing this because the files are split out, unlike MySQL which uses a single file per table.
Unexpected locking and blocking
写道
Unexpected locking and blocking
In MongoDB, removing rows locks and blocks the entire database. Adding indexes also does the same. When we imported our data, this was causing problems because large data sets were causing the locks to exist for some time until the indexing had completed. This is a not a problem when you first create the “collection” (tables in MySQL) because there are only a few (or no) rows, but creating indexes later will cause problems.
Previously in MySQL we would delete rows by using a wide ranging WHERE clause, for example to delete rows by date range or server ID. Now in MongoDB we have to loop through all the rows and delete them individually. This is slower, but it prevents the locking issue.
In MySQL if a database (more likely a few tables) become corrupt, you can repair them individually. In MongoDB, you have to repair on a database level. There is a command to do this but it reads all the data and re-writes it to a new set of files. This means all data is checked and means you will probably have some disk space freed up as files are compacted but it also means the entire database is locked and blocked during the time it takes. With our database being around 60GB, this operation takes several hours.
Our reasons for moving to MongoDB were not performance, however it has turned out that in many cases, query times are significantly faster than with MySQL. This is because MongoDB stores as much data in RAM as possible and so it becomes as fast as using something like memcached for the cached data. Even non-cached data is very fast.
问:Hi, did you try other mysql engines besides Myisam before moving to Mongodb?
答:MyISAM was the most suitable for the type of usage we were exeperiencing – many reads and few rights. We used InnoDB (and still do) for the billing and customer systems where we need transactions.