NoSQL理论基础及安装、基本操作

2321212dd 发表于 2016-11-28 09:40:07

30分钟开始

分布式系统理论：
CAP：
一致性
可用性
分区容错性

MongoDB:
安装
crud
索引
副本集
分片

NoSQL:非关系型、分布式、不提供ACID功能
技术特点：
1、简单数据模型
2、元数据和应用数据分离（分不同服务器存储）
3、弱一致性

优势：
1、避免不必要的复杂性
2、高吞吐量
3、高水平扩展能力和低端硬件集群
4、不适用对象-关系映射

劣势：
1、不支持ACID特性
2、功能简单
3、没有统一的数据查询模型

分类：

NoSQL:
键值存储
列式数据库
文档数据库
图式数据库

SQL：
mysql
pgsql

缓存数据库系统：
memcache

CAP理论：从CAP中挑出2个
BASE理论：
基本可用
软状态
最终一致性

C,A:SQL(保证一致性，可用性)
C,P:悲观加锁机制（一致性，分区容错性）

A,P:DNS

数据一致性模型：强一致性、弱一致性、最终一致性

数据一致性的实现技术：
Quorum(法定票数)系统NRW策略（关注）
   N:副本数
   R:完成读操作所需要读取的最少副本数
   W:完成写操作所需要写入的最少副本数

   要想保证强一致性：R+W>N
   最多只能保证最终一致性：R+W<=N

两段式提交：2PC（two phase commit protocol）（关注）
有两类节点：
   一类为协调者
   一类为事务参与者

   两段：
            1、请求阶段：事务协调者通知事务参与者提交事务

            2、提交阶段：事务参与者提交事务

时间戳策略

Paxos:根据协议进行协调

向量时钟

NoSQL的数据存储模型：
1、键值存储：k-w
   优点：查找迅速
   缺点：数据无结构、通常只被当做字符串或二进制数据
   应用场景：内容缓存
   实例：redis，dynamo
2、列式模型：
   数据模型：数据按列存储，将同一列数据存在一起
   优点：查找迅速，可扩展性强，易于实现分布式
   缺点：功能相对sql有限
   应用场景：分布式文件系统或分布式存储
   实例：Bigtable(google),cassandra,HBase,Hypertable
3、文档模型
      数据模型：与键值模型相似，value指向结构化数据
         优点：数据格式要求不严格，无需事先定义结构
         缺点：查询性能不高，缺乏统一查询语法
         应用场景：web应用
         实例：MongoDB,CouchDB

4、图式模型：
         数据模型：图结构模型
         优点：利用图结构相关算法提高性能，满足特殊场景应用需求
         缺点：实现分布式较困难，功能有定向性
         应用场景：社交网络、推荐系统、关系图谱
         实例：Neo4j

www.nosql-database.org

Mongodb:
collection：表
多个collection：database

MongoDB的安装：是一个易于扩展的、高性能、开源、文档模式的no-sql数据库
存储：海量数据、文档数据库、不需要创建表结构、c++研发的，开源
是什么？
基于文档数据库（json格式）
无表结构
性能：
   c++
   支持各种索引
   不支持事务
   内存映射（延迟写操作）
扩展性：
   复制
   auto-sharding
商业支持
支持基于文档的查询：表达式为json
支持使用map/reduce
   灵活的聚合操作
   在分片的基础上并行处理
GridFS：网格文件系统，存储单个大文件或海量小文件的分布式文件系统
地理位置、空间索引
被生产环境验证过

特性：
动态查询
查询性能剖析
基于复制完成故障自动转移

rabbitmq的性能太差，使用HBase

适应场景：
web网站
缓存
低价值、高存储量
高扩展性
实现对象、json存储的应用编程环境

不适合场景：
事务型
商业智能决策
使用sql接口的

MongoDB数据模型：面向集合的数据库
数据库：无需创建
   表：集合（行）：由文档组成，多个文档组成一个表，文档是json格式的（可以嵌套）,集合无需事先定义

c/s:
mongod服务器端
mongo

安装：

查看配置文件：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
# cat /etc/mongod.conf
# mongo.conf

#where to log
logpath=/var/log/mongo/mongod.log

logappend=true

# fork and run in background
fork = true

#port = 27017

dbpath=/var/lib/mongo #运行mongod服务的用户也是mongod，所以保证这个目录的属主属组也为mongod，
为了数据使用，应该找一个合理的目录

# location of pidfile
pidfilepath = /var/run/mongodb/mongod.pid

# Disables write-ahead journaling
# nojournal = true

# Enables periodic logging of CPU utilization and I/O wait
#cpu = true

# Turn on/off security.Off is currently the default
#noauth = true
#auth = true

# Verbose logging output.
#verbose = true

# Inspect all client data for validity on receipt (useful for
# developing drivers)
#objcheck = true

# Enable db quota management
#quota = true

# Set oplogging level where n is
# 0=off (default)
# 1=W
# 2=R
# 3=both
# 7=W+some reads
#diaglog = 0

# Ignore query hints
#nohints = true

# Disable the HTTP interface (Defaults to localhost:27018).
#nohttpinterface = true

# Turns off server-side scripting.This will result in greatly limited
# functionality
#noscripting = true

# Turns off table scans.Any query that would do a table scan fails.
#notablescan = true

# Disable data file preallocation.
#noprealloc = true

# Specify .ns file size for new databases.
# nssize = <size>

# Accout token for Mongo monitoring server.
#mms-token = <token>

# Server name for Mongo monitoring server.
#mms-name = <server-name>

# Ping interval for Mongo monitoring server.
#mms-interval = <seconds>

# Replication Options

# in replicated mongo databases, specify here whether this is a slave or master
#slave = true
#source = master.example.com
# Slave only: specify a single database to replicate
#only = master.example.com
# or
#master = true
#source = slave.example.com

1
创建目录，改属主属组

修改配置文件

启动服务

system数据库保存其他数据库的元数据（和myslq中的mysql数据库一样）
查看端口：

27017：服务端口
28017：管理端口

NoSQL数据库一般是在内网中使用的，不进行认证，直接连

数据库帮助：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
> db.help
function () {
print("DB methods:");
print("\tdb.addUser(userDocument)");
print("\tdb.adminCommand(nameOrDocument) - switches to 'admin' db, and runs command [ just calls db.runCommand(...) ]");
print("\tdb.auth(username, password)");
print("\tdb.cloneDatabase(fromhost)");
print("\tdb.commandHelp(name) returns the help for the command");
print("\tdb.copyDatabase(fromdb, todb, fromhost)");
print("\tdb.createCollection(name, { size : ..., capped : ..., max : ... } )");
print("\tdb.currentOp() displays currently executing operations in the db");
print("\tdb.dropDatabase()");
print("\tdb.eval(func, args) run code server-side");
print("\tdb.fsyncLock() flush data to disk and lock server for backups");
print("\tdb.fsyncUnlock() unlocks server following a db.fsyncLock()");
print("\tdb.getCollection(cname) same as db['cname'] or db.cname");
print("\tdb.getCollectionNames()");
print("\tdb.getLastError() - just returns the err msg string");
print("\tdb.getLastErrorObj() - return full status object");
print("\tdb.getMongo() get the server connection object");
print("\tdb.getMongo().setSlaveOk() allow queries on a replication slave server");
print("\tdb.getName()");
print("\tdb.getPrevError()");
print("\tdb.getProfilingLevel() - deprecated");
print("\tdb.getProfilingStatus() - returns if profiling is on and slow threshold");
print("\tdb.getReplicationInfo()");
print("\tdb.getSiblingDB(name) get the db at the same server as this one");
print("\tdb.hostInfo() get details about the server's host");
print("\tdb.isMaster() check replica primary status");
print("\tdb.killOp(opid) kills the current operation in the db");
print("\tdb.listCommands() lists all the db commands");
print("\tdb.loadServerScripts() loads all the scripts in db.system.js");
print("\tdb.logout()");
print("\tdb.printCollectionStats()");
print("\tdb.printReplicationInfo()");
print("\tdb.printShardingStatus()");
print("\tdb.printSlaveReplicationInfo()");
print("\tdb.removeUser(username)");
print("\tdb.repairDatabase()");
print("\tdb.resetError()");
print("\tdb.runCommand(cmdObj) run a database command.if cmdObj is a string, turns it into { cmdObj : 1 }");
print("\tdb.serverStatus()");
print("\tdb.setProfilingLevel(level,<slowms>) 0=off 1=slow 2=all");
print("\tdb.setVerboseShell(flag) display extra information in shell output");
print("\tdb.shutdownServer()");
print("\tdb.stats()");
print("\tdb.version() current version of the server");

return __magicNoPrint;
}

集合帮助：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
> db.mycoll.help()
DBCollection help
db.mycoll.find().help() - show DBCursor help
db.mycoll.count()
db.mycoll.copyTo(newColl) - duplicates collection by copying all documents to newColl; no indexes are copied.
db.mycoll.convertToCapped(maxBytes) - calls {convertToCapped:'mycoll', size:maxBytes}} command
db.mycoll.dataSize()
db.mycoll.distinct( key ) - e.g. db.mycoll.distinct( 'x' )
db.mycoll.drop() drop the collection
db.mycoll.dropIndex(index) - e.g. db.mycoll.dropIndex( "indexName" ) or db.mycoll.dropIndex( { "indexKey" : 1 } )
db.mycoll.dropIndexes()
db.mycoll.ensureIndex(keypattern[,options]) - options is an object with these possible fields: name, unique, dropDups
db.mycoll.reIndex()
db.mycoll.find(,) - query is an optional query filter. fields is optional set of fields to return.
                                             e.g. db.mycoll.find( {x:77} , {name:1, x:1} )
db.mycoll.find(...).count()
db.mycoll.find(...).limit(n)
db.mycoll.find(...).skip(n)
db.mycoll.find(...).sort(...)
db.mycoll.findOne()
db.mycoll.findAndModify( { update : ... , remove : bool [, query: {}, sort: {}, 'new': false] } )
db.mycoll.getDB() get DB object associated with collection
db.mycoll.getIndexes()
db.mycoll.group( { key : ..., initial: ..., reduce : ...[, cond: ...] } )
db.mycoll.insert(obj)
db.mycoll.mapReduce( mapFunction , reduceFunction , <optional params> )
db.mycoll.remove(query)
db.mycoll.renameCollection( newName , <dropTarget> ) renames the collection.
db.mycoll.runCommand( name , <options> ) runs a db command with the given name where the first param is the collection name
db.mycoll.save(obj)
db.mycoll.stats()
db.mycoll.storageSize() - includes free space allocated to this collection
db.mycoll.totalIndexSize() - size in bytes of all the indexes
db.mycoll.totalSize() - storage allocated for all data and indexes
db.mycoll.update(query, object[, upsert_bool, multi_bool]) - instead of two flags, you can pass an object with fields: upsert, multi
db.mycoll.validate( <full> ) - SLOW
db.mycoll.getShardVersion() - only for use with sharding
db.mycoll.getShardDistribution() - prints statistics about data distribution in the cluster
db.mycoll.getSplitKeysForChunks( <maxChunkSize> ) - calculates split points over all chunks and returns splitter function

简单使用：

使用数据库：（无需创建），collection也无需创建

db.collection.insert：插入
show collections：查询集合
db.collections.find()：查询语句

db.collections.update():更新
db.collections.remove():移除
集合信息：

删除集合：

查看数据库文件：

基本操作：
show dbs:查看所有数据库
show collections:查看集合
show users:查看用户
show profile:
show logs:查看所有日志列表

show log :查看具体的日志

远程连接：
mongo --host ip

crud操作：
create,read,update,delete
虽然没有表结构，但还是应该对同类对象放到一个collection
查询：
db.users.find({age:{$gt:18}}).sort({age:1}) 查询age大于18的用户，以age为升序进行排序
插入：
db.users.insert(
{
   name:'suse',
   age:26,
   status:'A',
   group:['news','sports']

}

)

更新：
db.coll.update(
{age:{$gt:18}},
{$set:{status:'A'}},
{multi:true} 不指定时只修改第一个符合条件

)

删除：
db.coll.delete(
{status:'D'}

)

插入：

一批只显示20个，输入it继续
limit：

删除：

修改：
find高级用法：
db.collection.find(<添加>,<字段>)
db.collection.count()返回条数

比较运算：
$gt：大于{field:{$gt:value}}
$gte：大于等于{field:{$gte:value}}
$in：存在于{field:{$in:}}
$lt：小于{field:{$lt:value}}
$lte：小于等于{field:{$lte:value}}
$ne：不等于{field:{$ne:value}}
$nin：不存在于{field:{$nin:}}
大于

显示需要的字段：

逻辑运算：
$or:或运算，{$or:[{expression1},{expression2},...]}

$and:或运算，{$and:[{expression1},{expression2},...]}
$not:或运算，{field:{$not:{operator-expression}}}
$nor:反运算，即返回不符合所有指定条件的文档，{$nor:[{expression1},{expression2},...]}

与运算：

元素查询：
如果要分居文档中是否存在某字段等条件来挑选文档，则需要用到元素运算
$exists:根据指定字段的存在性挑选文档，语法：{field:{$exists:<boolean>}},指定<boolean>的值为'true'则返回存在指定字段的文档，'false'则返回不存在指定字段的文档
$mod:将指定字段的值进行取模运算，并返回其余数作为指定值的文档，语法{field:{$mod:}}
$type:返回指定字段的值类型为指定类型的文档，语法：{field:{$type:<bson type>}}
重新插入一条数据：

查询：

更新：

update专有操作符大致包含：field,array,bitwise
field:
$inc:增大指定字段的值，格式：
db.collection.update({field:value},{$nic:{field1:amount}}),其中{field:value}用于指定挑选标准，{$inc:{field1:amount}}用于指定要提升其值的字段及提升大小amount
$rename:更改字段名，格式为{$rename:{<old name1>:<new name1>,<old name2>:<new name2>,...}}
$set:修改字段的值为新指定的值，格式db.collection.update({field:value1},{$set:{field2:value2}})
$unset:删除指定的字段，格式db.collection.update({field:value1},{$unset:{field1:""}})

页: [1]

运维网's Archiver

NoSQL理论基础及安装、基本操作