[MongoDB] Aggregation Framework 이해하기

2013. 8. 3. 16:08 MongoDB/MapReduce

[MongoDB] Aggregation Framework 이해하기

MongoDB의 Aggregation 프레임워크에 대해서 알아보자. MongoDB의 Advance과정이라 말하고 싶다. MongoDB 전문가로 가고자 한다면 꼭 알아두어야 한다

1. Aggregation Framework

- 10gen에서 이야기하는 집계프레임워크 개념

- MongoDB의 Aggregation 목적은 Sharding 기반의 데이터에 대한 데이터 집계이다.

MongoDB's New Aggregation framework from Chris Westin

2. 개념 이해하기

- MongoDB v2.2 부터 나왔다

- Shard를 통하여 BigData를 저장하고, Aggragation Framework을 통하여 BigData를 처리한다

- Aggregation Framework의 2개의 중요 개념이 있다 : Pipelines, Expressions

+ Pipelines : Unix의 pipe와 동일한다. mongodb pipeline 은 document를 stream화 한다. 또한 pipeline operators는 document의 stream을 처리한다. (마치 map-reducing과 같다)

Name	Description
`$project`	Reshapes a document stream. `$project` can rename, add, or remove fields as well as create computed values and sub-documents. (참조 - Projection(π)은 관계 집합에서 원하지 않는 성분을 제거한다. 수학적인, 다른 dimension으로 mapping 한다는 것과 동일하다. Projection의 결과는 관계 집합이다)
`$match`	Filters the document stream, and only allows matching documents to pass into the next pipeline stage. `$match` uses standard MongoDB queries.
`$limit`	Restricts the number of documents in an aggregation pipeline.
`$skip`	Skips over a specified number of documents from the pipeline and returns the rest.
`$unwind`	Takes an array of documents and returns them as a stream of documents. (map=key:value 즉, map 만들기)
`$group`	Groups documents together for the purpose of calculating aggregate values based on a collection of documents.
`$sort`	Takes all input documents and returns them in a stream of sorted documents.
`$geoNear`	Returns an ordered stream of documents based on proximity to a geospatial point.

ex) $project와 $unwind 되는 중간의 콤마(,) 가 pipeline되면서 stream방식으로 데이터가 처리되는 것이다 (OLAP의 dimension과 같음)

var p2 = db.runCommand(
{ aggregate : "article", pipeline : [
    { $project : {
	author : 1,
	tags : 1,
	pageViews : 1
    }},
    { $unwind : "$tags" }

]});

ex) Aggregation과 SQL 맵핑관계 : sql은 dbms안에서 하는 것이고, mongodb는 sharding 기반에서 하는것이다

+ Expressions : input document를 수행한 계산값을 기반으로 output document를 생산하는 것이다.

Name	Description
`$addToSet`	Returns an array of all the unique values for the selected field among for each document in that group.
`$first`	Returns the first value in a group.
`$last`	Returns the last value in a group.
`$max`	Returns the highest value in a group.
`$min`	Returns the lowest value in a group.
`$avg`	Returns an average of all the values in a group.
`$push`	Returns an array of all values for the selected field among for each document in that group.
`$sum`	Returns the sum of all the values in a group.

4. 실습하기

// orders 컬렉션을 다음을 저장한다

$ mongod --dbpath /home/mongodb/aggregation

// orders의 도큐먼트를 2번 동일하게 save 한다

$ mongo

> db.orders.save({

cust_id: "abc123",

ord_date: ISODate("2012-11-02T17:04:11.102Z"),

status: 'A',

price: 50,

items: [ { sku: "xxx", qty: 25, price: 1 },

{ sku: "yyy", qty: 25, price: 1 } ]

});

//////////////////////////

// Where절

> db.orders.aggregate( [

{ $group: { _id: null,

count: { $sum: 1 } } }

] );

// 결과

{ "result" : [ { "_id" : null, "count" : 2 } ], "ok" : 1 }

// sql

SELECT COUNT(*) AS count

FROM orders

////////////////////////

// sub query

> db.orders.aggregate( [

{ $group: { _id: { cust_id: "$cust_id",

ord_date: "$ord_date" } } },

{ $group: { _id: null, count: { $sum: 1 } } }

] )

{ "result" : [ { "_id" : null, "count" : 1 } ], "ok" : 1 }

// sql

SELECT COUNT(*)

FROM (SELECT cust_id, ord_date

FROM orders

GROUP BY cust_id, ord_date) as DerivedTable

- Simple Aggregation Framework의 count, distinct, group function 예제

///////////////////////////////////////////////////

// count

// find()에 대한 count() 펑션의 호출일 뿐이다

> db.orders.find().count()

// aggregation 서비스이다.

> db.orders.count()

// aggregation 서비스이기 때문에 operation이 들어간다

> db.orders.count({status:'A'})

///////////////////////////////////////////////////

// 샘플 save

db.dowonDB.save({a:1})

db.dowonDB.save({a:2})

db.dowonDB.save({a:3})

> db.dowonDB.count()

> db.dowonDB.count({a:1})

///////////////////////////////////////////////////

// distinct

> db.dowonDB.distinct('a')

[ 1, 2, 3 ]

또는

// runCommand 계정권한를 가지고 수행하는 shell 명령

> db.runCommand({'distinct':'dowonDB', 'key':'a'})

{

"values" : [

"stats" : {

"n" : 4,

"nscanned" : 4,

"nscannedObjects" : 4,

"timems" : 0,

"cursor" : "BasicCursor"

"ok" : 1

}

///////////////////////////////////////////////////

// group

db.dowonDB.save({dept_id: 1, salary: 1})

db.dowonDB.save({dept_id: 1, salary: 2})

db.dowonDB.save({dept_id: 1, salary: 3})

db.dowonDB.save({dept_id: 2, salary: 10})

db.dowonDB.save({dept_id: 2, salary: 12})

db.dowonDB.save({dept_id: 2, salary: 16})

db.dowonDB.save({dept_id: 3, salary: 4})

db.dowonDB.save({dept_id: 3, salary: 1})

// map 값이 distinct를 의미 : key 값이 map이 된다

// reduce는 코딩 즉 function이다. 즉, 비즈니스 펑션이다

> db.dowonDB.group(

{ key: {'dept_id': true},

reduce: function(obj, prev) { prev.sum += obj.salary },

initial: {sum: 0}

});

// 결과

[

{

"dept_id" : null,

"sum" : NaN

{

"dept_id" : 1,

"sum" : 6

{

"dept_id" : 2,

"sum" : 38

{

"dept_id" : 3,

"sum" : 5

}

]

또는 condition 조건절 줌

> db.dowonDB.group( { key: {'dept_id': true}, reduce: function(obj, prev) { prev.sum += obj.salary }, initial: {sum: 0}, condition: {'dept_id': {$gt:2} } });

[ { "dept_id" : 3, "sum" : 5 } ]

> db.dowonDB.group( { key: {'dept_id': true}, reduce: function(obj, prev) { prev.sum += obj.salary }, initial: {sum: 0}, condition: {'dept_id': {$gte:2} } });

[ { "dept_id" : 2, "sum" : 38 }, { "dept_id" : 3, "sum" : 5 } ]

또는

> db.dowonDB.group( { key: {'dept_id': true}, reduce: function(obj, prev) { prev.sum += obj.salary; prev.cnt++ }, initial: {sum: 0, avg:0, cnt:0}, condition: {'dept_id': {$gte:2} }, finalize: function(out){ out.avg = out.sum/out.cnt;} });

[

{

"dept_id" : 2,

"sum" : 38,

"avg" : 12.666666666666666,

"cnt" : 3

{

"dept_id" : 3,

"sum" : 5,

"avg" : 2.5,

"cnt" : 2

}

]

5. 심화학습

- 동영상 강좌를 보자

- 상단의 Group만들기에서 map, reduce의 흐름도

- Sql처럼 sub query를 하지 않고 Reduce Function을 사용하여 코딩하면 된다

- Framework Flow PDF : collectoin에 대해서 mapping 후 reducing 하여 result 결과를 만들어 낸다

- 데이터 Collection에서 key에 맞는 Map을 만들고, 비즈니스 로직에 맞게 Reduce 펑션을 만든 결과를

실시간으로 서비스한다

- Intermediate-1 = unwind = map = key:value로 만들기

Intemediate-2 = group = reduce

- 이제 group으로 작업하지 말고 Pipeline Operation으로 하는 aggregate를 사용한다. 또는 mapreduce를 사용한다

- 결국 v2.2 에 오면 mapreduce를 사용한다 : 비즈니스적으론 BI 솔루션과 맵핑하여 UX로 표현한다 (Real-Time)

- SNS에서 오는 실시간 데이터가 쌓여서 BigData가 되고 이를 저장하고 처리하는 것이 MongoDB로 웹앱기술과 찰떡 궁합!

- 해당 작업은 disk가 아니라 memory cache해서 사용한다 (메뉴얼상으로)

/////////////////////////////////////

// 형식

> db.dowonDB.mapreduce(map, reduce, out)

// mongo에서 map

> var map = function() { for(var key in this) { emit(key,{count: 1}) } }

> var reduce = function(emits){ total=0; for(var i in emits) { total+=emits[i].count; } return {'count': total}; }

> var mr = db.runCommand({'mapreduce':'dowonDB', 'map':map, 'reduce':reduce, 'out':{'inline':1}});

// 결과

> mr

{

"results" : [

{

"_id" : "_id",

"value" : {

"count" : NaN

}

{

"_id" : "a",

"value" : {

"count" : NaN

}

{

"_id" : "dept_id",

"value" : {

"count" : NaN

}

{

"_id" : "salary",

"value" : {

"count" : NaN

}

"timeMillis" : 14,

"counts" : {

"input" : 12,

"emit" : 32,

"reduce" : 4,

"output" : 4

"ok" : 1

}

또는

//////////////////////////////////////////////////////////

// dowonDB2 컬렉션으로 새롭게 tags 컬럼 넣기

> db.dowonDB2.save({_id: 1, tags:['dog', 'cat']})

> db.dowonDB2.save({_id: 2, tags:['cat']})

> db.dowonDB2.save({_id: 3, tags:['mouse', 'cat', 'dog']})

> db.dowonDB4.save({_id: 4, tags:[]})

// map과 reduce를 만듦

> var map2 = function() { this.tags.forEach( function(z) { emit(z, {count: 1}); } ); }

> var reduce2 = function(key, values) { var total=0; for(var i=0; i < values.length; i++) { total += values[i].count; } return {count:total}; }

// mapReduce 호출

> var mr2 = db.dowonDB2.mapReduce( map2, reduce2, {out:{inline:1}} );

> mr2

{

"results" : [

{

"_id" : "cat",

"value" : {

"count" : 3

}

{

"_id" : "dog",

"value" : {

"count" : 2

}

{

"_id" : "mouse",

"value" : {

"count" : 1

}

"timeMillis" : 3,

"counts" : {

"input" : 3,

"emit" : 6,

"reduce" : 2,

"output" : 3

"ok" : 1,

}

- MongoDB API

db.runCommand(

{

mapReduce: <collection>,

map: <function>,

reduce: <function>,

out: <output>,

query: <document>,

sort: <document>,

limit: <number>,

finalize: <function>,

scope: <document>,

jsMode: <boolean>,

verbose: <boolean>

}

)

- MapReduce의 최종 목적은 무얼까? SPA방식의 Web Application 서비스의 구현을 위한 것은 아닐까?

10Gen에서 이야기하는 MongoDB in SPA (이용 : Node.js + Express.js + Mongoose.js + MongoDB)

<참조>

- MongoDB Aggregation Framework Concept

- MongoDB Aggragation Framework Examples

- Collection Functions 목록

- Aggregation Framework의 Pipeline 과 Expression 예제

- count, distinct, group 예제

- DataBase Projection 개념

저작자표시 비영리 변경금지

'MongoDB > MapReduce' 카테고리의 다른 글

[MongoDB] Aggregation Framework 실습하기 (0)	2013.08.03
[MongoDB] GridFS 사용하기 (0)	2013.02.23
[MongoDB] GridFS 개념잡기 (0)	2013.02.23

posted by Peter Note

AI Convergence

Publication

Tag

Category

Recent Post