'mapreduce' 태그의 글 목록

'mapreduce'에 해당되는 글 4건

2013.09.12 [Hadoop] Mongo-Hadoop 에 대한 생각
2013.09.11 [Hadoop] MapReduce 직접 .jar 파일로 수행하기
2013.09.09 [Hadoop] Eclipse에서 Maven으로 하둡 코딩하기2
2013.08.03 [MongoDB] Aggregation Framework 이해하기

2013. 9. 12. 20:45 Big Data

[Hadoop] Mongo-Hadoop 에 대한 생각

몽고디비를 하둡의 Input/Output의 Store를 사용하면 어떨까? 어차피 몽고디비는 Document Store 이며 Scale-Out을 위한 무한한 Sharding(RDB의 Partitioning) 환경을 제공하니 충분히 사용할 수 있을 것이다. Store에 저장된 데이터의 Batch Processing Engine으로 하둡을 사용하면 될 일이다.

Mongo-Hadoop Connector 소개

- Hadoop을 통하면 Mongo안에 있는 데이터를 전체 코어를 사용하면서 병렬로 처리할 수 있다

- 하둡포멧으로 Mongo를 BSON format을 파일로 저장하거나, MongoDB에 바로 저장할 수 있는 Java API존재

- Pig + Hive를 사용할 수 있음

- AWS의 Amazon Elastic MapReduce 사용

Webinar: What's New with MongoDB Hadoop Integration from mongodb/10gen

Batch Processing Model 종류

- 사실 MongoDB에서도 Aggregation Framework을 제공하여 MapReduce프로그래밍을 JavaScript로 개발 적용할 수 있다다

- 시간단위 Batch Processing은 요렇게도 사용할 수 있겠다

- 데이터가 정말 Big 이면 하둡을 이용하여 Batch Processing을 해야겠다. 여기서 몽고디비를 "Raw Data Store" 와 "Result Data Store"로 사용한다

- MongoDB & Hadoop : Batch Processing Model 전체 내역을 보자

MongoDB & Hadoop: Flexible Hourly Batch Processing Model from Takahiro Inoue

<참조>

- 결국 처리된 데이터는 표현되어야 한다 : Data Visualization Resources

- MongoDB 넌 뭐니? NoSQL에 대한 이야기 (조대협)

저작자표시 비영리 변경금지

'Big Data' 카테고리의 다른 글

[RethinkDB] 시작하기 (0)	2017.04.11
[Hadoop] MapReduce 직접 .jar 파일로 수행하기 (0)	2013.09.11
[Hadoop] Eclipse에서 Maven으로 하둡 코딩하기 (2)	2013.09.09
[Hadoop] 개념이해 및 설치하기 (2)	2013.09.09

posted by Peter Note

2013. 9. 11. 19:20 Big Data

[Hadoop] MapReduce 직접 .jar 파일로 수행하기

Mapper & Reducer를 .jar로 배포하고 직접 하둡명령으로 수행하는 방법에 대하여 알아보자

MapReduce 프로그램

- Writable Interface는 Value에서 사용한다

- Mapper 인터페이스

Mapper<K1, V1, K2, V2>의 형태 : key는 WritableComparable를 구현해야 하며, value는 Writable를 구현해야 함.

- Reducer 인터페이스

reducer는 여러가지 매퍼로부터 생성된 결과를 받고, key/value 쌍의 key에 대해 데이터를 정렬하고 동일한 key에 대한 모든 값을 그룹핑 함.

- 이전의 WordCount에 대한 것을 직접 코딩하였는데, 맵퍼-TokenCountMapper-, 리듀서-LongSumReducer-를 사용해서 동일하게 만들 수 있다

hadoop 명령어로 .jar 직접 수행하기

- pom.xml 에 MapReduce Jar파일을 만들어 특정위치로 복사하는 플러그인 설정을 넣는다

<build>

<artifactId>maven-antrun-plugin</artifactId>

<tasks>

<copy file="target/${project.artifactId}-${project.version}.jar"

tofile="/Users/dowon/Documents/hadoop-jobs/${project.artifactId}-${project.version}.jar" />

</tasks>

</configuration>

<phase>install</phase>

<goals>

</goals>

</execution>

</executions>

</plugin>

</plugins>

</build>

- 기존 WordCount에 대한 WordCount3 복사본을 만들고 TokenCountMapper와 LongSumReducer로 변형한다

즉 직접 코딩하지 말고 하둡에서 제공하는 클래스를 사용한다

import java.io.IOException;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapred.FileInputFormat;

import org.apache.hadoop.mapred.FileOutputFormat;

import org.apache.hadoop.mapred.JobClient;

import org.apache.hadoop.mapred.JobConf;

import org.apache.hadoop.mapred.lib.LongSumReducer;

import org.apache.hadoop.mapred.lib.TokenCountMapper;

public class WordCount3 {

public static void main(String[] args) throws IOException {

// 1. configuration Mapper & Reducer of Hadoop

JobConf conf = new JobConf(WordCount3.class);

conf.setJobName("wordcount3");

// 2. final output key type & value type

conf.setOutputKeyClass(Text.class);

conf.setOutputValueClass(LongWritable.class);

// 3. in/output format

conf.setMapperClass(TokenCountMapper.class);

conf.setCombinerClass(LongSumReducer.class);

conf.setReducerClass(LongSumReducer.class);

// 4. set the path of file for read files

// input path : args[0]

// output path : args[1]

FileInputFormat.setInputPaths(conf, new Path(args[0]));

FileOutputFormat.setOutputPath(conf, new Path(args[1]));

// 5. run job

JobClient client = new JobClient();

client.setConf(conf);

JobClient.runJob(conf);

}

- eclipse의 프로젝트를 선택하고 "Run As"에서 "Maven build..."를 선택하여 "clean install" 입력하고 "run"버튼을 클릭한다

- 결과로 배포가 성공으로 나오면 된다 : /Users/dowon/Documents/hadoop-jobs 디렉토리에 *.jar 파일 생성을 확인한다

- 하둡 데몬들을 수행하기 전 NameNode에 대해서 format을 하고 수행한다

// name node 포멧

$ hadoop namenode -format

// .bash_profile 에 PATH 설정

set -o vi

export JAVA_HOME=/Library/Java/Home

export H_HOME=~/Documents/hadoop-1.2.1

export PATH=.:$PATH:$JAVA_HOME/bin:$H_HOME/bin:/usr/bin

alias ll='ls -alrt'

alias cdh='cd $H_HOME'

// 하둡 데몬 수행

// 50030 : job-tracker 접속 포트

// 50070 : NameNode 접속 포트

$ start-all.sh

- input 의 위치를 지정하여 준다 (만일, NameNode를 포멧하였다면)

// 위치가 하기와 같다면

$ pwd

/Users/dowon/Documents/input

$ ls

total 16

-rw-r--r-- 1 dowon staff 22 9 11 19:26 file01

-rw-r--r-- 1 dowon staff 21 9 11 19:26 file02

// input 을 HDFS에 만든다

$ hadoop fs -put . input

1) http://localhost:50070/으로 접속하여 "Browser the filesystem"을 클릭하면 볼 수 있다

2) /user/dowon/input 경로로 만들어 졌음을 알 수 있다

- hadoop 명령어로 생성된 jar 파일을 수행해 보자

// 경로가 다음과 같고, WordCount3.class가 들어있는 .jar 파일이 존재한다

$ pwd

/Users/dowon/Documents/hadoop-jobs

$ ls

-rw-r--r-- 1 dowon staff 5391 9 11 19:45 MapReduce-1.0.0-SNAPSHOT.jar

// 명령어 수행

$ hadoop jar *.jar WordCount3 /user/dowon/input /user/dowon/output3

1) 네임노드에 접속해서 /user/dowon에 들어가 보면 "output3"이 생성된 것을 볼 수있다

2) output3으로 들어가면 결과값을 지니 파일이 존재한다

3) 만일 명령을 재수행하고 싶다면 output3 디렉토리리 삭제해야 한다

$ hadoop fs -rmr /user/dowon/output3

eclipse에서 수행하지 않고 반출된 .jar 파일을 가지고 hadoop명령으로 수행하는 방법을 알아보았다.

<참조>

없음

저작자표시 비영리 변경금지

'Big Data' 카테고리의 다른 글

[RethinkDB] 시작하기 (0)	2017.04.11
[Hadoop] Mongo-Hadoop 에 대한 생각 (0)	2013.09.12
[Hadoop] Eclipse에서 Maven으로 하둡 코딩하기 (2)	2013.09.09
[Hadoop] 개념이해 및 설치하기 (2)	2013.09.09

posted by Peter Note

2013. 9. 9. 21:52 Big Data

[Hadoop] Eclipse에서 Maven으로 하둡 코딩하기

Eclipse하에서 하둡코딩시 Maven을 기본으로 하여 외부 라이브러리 의존성을 관리하자.

Hadoop 역할

- 분산된 파일을 처리하는 순서

> input HDFS으로 들어오기

> Job 수행 : 읽어서 로직처리

> 결과를 파일 또는 DB에 넣는다

- Tera 단위의 데이터가 이미 HDFS에 있을 경우 해당 데이터를 처리하는데 하둡의 쓰임새가 있다

- HDFS와 MapReduce의 이해

Intro to HDFS and MapReduce from Ryan Tabora

Maven Project 만들기

- Maven Project 선택하고 "Create a simple project" 선택한다

- 메이븐의 GroupID와 ArtifactID 설정한다

- 최종 생성 내역

MapReduce 프로그래밍을 여기서 하게 되고, 단위 테스트 프로그래밍도 할 수 있다

- pom.xml 에 hadoop 관련 라이브러리 의존관계를 넣는다. (파란색이 추가부분)

추가하고 저장을 하면 자동으로 의존관계 라이브러리를 다운로드 받는다

이클립트 좌측 "Project Explorer"의 "Maven Dependencies"에서 관련 파일들이 추가된 것을 확인할 수 있다

// pom.xml 내역

<groupId>kr.mobiconsoft.hadoop</groupId>

<artifactId>MapReduce</artifactId>

<version>1.0.0-SNAPSHOT</version>

<groupId>org.apache.hadoop</groupId>

<artifactId>hadoop-core</artifactId>

</dependency>

</dependencies>

</project>

// 결과

Word Counting MapReduce 구현하기

- file 2개 생성하고 유사한 word를 넣는다

// file01

hello world bye world

// file02

hi world hello dowon

- Mapper Class를 생성

import java.io.IOException;

import java.util.StringTokenizer;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapred.MapReduceBase;

import org.apache.hadoop.mapred.Mapper;

import org.apache.hadoop.mapred.OutputCollector;

import org.apache.hadoop.mapred.Reporter;

/**

* K1 : read key type

* V2 : read value type

* K2 : write key type

* V2 : write value type

//public class WordCountMapper implements Mapper<K1, V1, K2, V2> {

public class WordCountMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {

// map 결과는 reducer로 자동으로 던져진다

public void map(LongWritable key, Text value,

OutputCollector<Text, IntWritable> output, Reporter reporter)

throws IOException {

// TODO Auto-generated method stub

String line = value.toString();

StringTokenizer tokenizer = new StringTokenizer(line);

while(tokenizer.hasMoreTokens()) {

Text outputKey = new Text(tokenizer.nextToken());

// Hadoop 에서 wrapping한 Integer 타입의 객체를 넣어줌

// param1: outputKey, param2: outputValue

output.collect(outputKey, new IntWritable(1));

}

- Reducer Class 생성

import java.io.IOException;

import java.util.Iterator;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapred.MapReduceBase;

import org.apache.hadoop.mapred.OutputCollector;

import org.apache.hadoop.mapred.Reducer;

import org.apache.hadoop.mapred.Reporter;

/**

* K1 : Mapper의 K2 와 동일

* V1 : Mapper의 V2 와 동일

public class WordCountReducer extends MapReduceBase

implements Reducer<Text, IntWritable, Text, IntWritable> {

/**

* V1 에서 values는 Iterator이다. 실제 같은 단어가 여러개 일 경우

public void reduce(Text key, Iterator<IntWritable> values,

OutputCollector<Text, IntWritable> output, Reporter reporter)

throws IOException {

// TODO Auto-generated method stub

int sum = 0;

while(values.hasNext()) {

sum += values.next().get(); // get Integer value

}

output.collect(key, new IntWritable(sum));

}

- Job Tracker를 생성 : 하단 main 선택한다

import java.io.IOException;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapred.FileInputFormat;

import org.apache.hadoop.mapred.FileOutputFormat;

import org.apache.hadoop.mapred.JobClient;

import org.apache.hadoop.mapred.JobConf;

import org.apache.hadoop.mapred.TextInputFormat;

import org.apache.hadoop.mapred.TextOutputFormat;

public class WordCount {

public static void main(String[] args) throws IOException {

// 1. configuration Mapper & Reducer of Hadoop

JobConf conf = new JobConf();

conf.setJobName("wordcount");

conf.setMapperClass(WordCountMapper.class);

conf.setReducerClass(WordCountReducer.class);

// 2. final output key type & value type

conf.setOutputKeyClass(Text.class);

conf.setOutputValueClass(IntWritable.class);

// 3. in/output format

conf.setInputFormat(TextInputFormat.class);

conf.setOutputFormat(TextOutputFormat.class);

// 4. set the path of file for read files

// input path : args[0]

// output path : args[1]

FileInputFormat.setInputPaths(conf, new Path(args[0]));

FileOutputFormat.setOutputPath(conf, new Path(args[1]));

// 5. run job

JobClient.runJob(conf);

}

- 최종 모습

- eclipse 설정하기

main펑션이 있는 WordCount를 수행할 때 input path와 output path를 지정하여 준다

이때 output path의 디렉토리는 생성되어 있지 않아야 한다 (target/hadoop-result)

하단 우측 "run" 클릭

- 결과값

- 결국 이런 처리과정을 수행하게 된다

- Mapper와 Reducer 역할

Mapper : 소스를 쪼개어 key:value 맵을 여러개 만들고

Reducer : 여러 Map 값을 하나의 결과값으로 만들어 준다

단위 테스트 해보기

- pom.xml에 mrunit 추가

<groupId>org.apache.hadoop</groupId>

<artifactId>hadoop-core</artifactId>

</dependency>

<groupId>org.apache.mrunit</groupId>

<artifactId>mrunit</artifactId>

<version>0.8.0-incubating</version>

</dependency>

</dependencies>

- Mapper Test 클래스 생성

Run As... 에서 JUnit으로 테스트 하여 초록색-성공인지 체크한다

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mrunit.MapDriver;

import org.junit.Test;

/**

* 테스트를 통하여 Mapper와 Reducer를 테스트에서 수행하여 검증 할 수 있다

* @author dowon

public class WordCountMapperTest {

@Test

public void testMap() {

// 1. 설

Text value = new Text("Hello World Bye World");

MapDriver<LongWritable, Text, Text, IntWritable> mapDriver = new MapDriver();

mapDriver.withMapper(new WordCountMapper());

mapDriver.withInputValue(value);

// 2. 검정 및 실행

// 순서를 정확히 해야 에러없이 수행된다. 빼먹어도 에러가 난다

mapDriver.withOutput(new Text("Hello"), new IntWritable(1));

mapDriver.withOutput(new Text("World"), new IntWritable(1));

mapDriver.withOutput(new Text("Bye"), new IntWritable(1));

mapDriver.withOutput(new Text("World"), new IntWritable(1));

mapDriver.runTest();

}

- Reducer Test 클래스 생성

Run As... 에서 JUnit으로 테스트 하여 초록색-성공인지 체크한다

import java.util.Arrays;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mrunit.ReduceDriver;

import org.junit.Test;

public class WordCountReducerTest {

@Test

public void testReducer() {

// 1. 설정

ReduceDriver<Text, IntWritable, Text, IntWritable> reduceDriver = new ReduceDriver();

reduceDriver.withReducer(new WordCountReducer());

reduceDriver.withInputKey(new Text("World"));

reduceDriver.withInputValues(Arrays.asList(new IntWritable(1), new IntWritable(1)));

// 2. 검증 및 실행

reduceDriver.withOutput(new Text("World"), new IntWritable(2));

reduceDriver.runTest();

}

<참조>

- Maven 기초 사용법

저작자표시 비영리 변경금지

'Big Data' 카테고리의 다른 글

[RethinkDB] 시작하기 (0)	2017.04.11
[Hadoop] Mongo-Hadoop 에 대한 생각 (0)	2013.09.12
[Hadoop] MapReduce 직접 .jar 파일로 수행하기 (0)	2013.09.11
[Hadoop] 개념이해 및 설치하기 (2)	2013.09.09

posted by Peter Note

2013. 8. 3. 16:08 MongoDB/MapReduce

[MongoDB] Aggregation Framework 이해하기

MongoDB의 Aggregation 프레임워크에 대해서 알아보자. MongoDB의 Advance과정이라 말하고 싶다. MongoDB 전문가로 가고자 한다면 꼭 알아두어야 한다

1. Aggregation Framework

- 10gen에서 이야기하는 집계프레임워크 개념

- MongoDB의 Aggregation 목적은 Sharding 기반의 데이터에 대한 데이터 집계이다.

MongoDB's New Aggregation framework from Chris Westin

2. 개념 이해하기

- MongoDB v2.2 부터 나왔다

- Shard를 통하여 BigData를 저장하고, Aggragation Framework을 통하여 BigData를 처리한다

- Aggregation Framework의 2개의 중요 개념이 있다 : Pipelines, Expressions

+ Pipelines : Unix의 pipe와 동일한다. mongodb pipeline 은 document를 stream화 한다. 또한 pipeline operators는 document의 stream을 처리한다. (마치 map-reducing과 같다)

Name	Description
`$project`	Reshapes a document stream. `$project` can rename, add, or remove fields as well as create computed values and sub-documents. (참조 - Projection(π)은 관계 집합에서 원하지 않는 성분을 제거한다. 수학적인, 다른 dimension으로 mapping 한다는 것과 동일하다. Projection의 결과는 관계 집합이다)
`$match`	Filters the document stream, and only allows matching documents to pass into the next pipeline stage. `$match` uses standard MongoDB queries.
`$limit`	Restricts the number of documents in an aggregation pipeline.
`$skip`	Skips over a specified number of documents from the pipeline and returns the rest.
`$unwind`	Takes an array of documents and returns them as a stream of documents. (map=key:value 즉, map 만들기)
`$group`	Groups documents together for the purpose of calculating aggregate values based on a collection of documents.
`$sort`	Takes all input documents and returns them in a stream of sorted documents.
`$geoNear`	Returns an ordered stream of documents based on proximity to a geospatial point.

ex) $project와 $unwind 되는 중간의 콤마(,) 가 pipeline되면서 stream방식으로 데이터가 처리되는 것이다 (OLAP의 dimension과 같음)

var p2 = db.runCommand(
{ aggregate : "article", pipeline : [
    { $project : {
	author : 1,
	tags : 1,
	pageViews : 1
    }},
    { $unwind : "$tags" }

]});

ex) Aggregation과 SQL 맵핑관계 : sql은 dbms안에서 하는 것이고, mongodb는 sharding 기반에서 하는것이다

+ Expressions : input document를 수행한 계산값을 기반으로 output document를 생산하는 것이다.

Name	Description
`$addToSet`	Returns an array of all the unique values for the selected field among for each document in that group.
`$first`	Returns the first value in a group.
`$last`	Returns the last value in a group.
`$max`	Returns the highest value in a group.
`$min`	Returns the lowest value in a group.
`$avg`	Returns an average of all the values in a group.
`$push`	Returns an array of all values for the selected field among for each document in that group.
`$sum`	Returns the sum of all the values in a group.

4. 실습하기

// orders 컬렉션을 다음을 저장한다

$ mongod --dbpath /home/mongodb/aggregation

// orders의 도큐먼트를 2번 동일하게 save 한다

$ mongo

> db.orders.save({

cust_id: "abc123",

ord_date: ISODate("2012-11-02T17:04:11.102Z"),

status: 'A',

price: 50,

items: [ { sku: "xxx", qty: 25, price: 1 },

{ sku: "yyy", qty: 25, price: 1 } ]

});

//////////////////////////

// Where절

> db.orders.aggregate( [

{ $group: { _id: null,

count: { $sum: 1 } } }

] );

// 결과

{ "result" : [ { "_id" : null, "count" : 2 } ], "ok" : 1 }

// sql

SELECT COUNT(*) AS count

FROM orders

////////////////////////

// sub query

> db.orders.aggregate( [

{ $group: { _id: { cust_id: "$cust_id",

ord_date: "$ord_date" } } },

{ $group: { _id: null, count: { $sum: 1 } } }

] )

{ "result" : [ { "_id" : null, "count" : 1 } ], "ok" : 1 }

// sql

SELECT COUNT(*)

FROM (SELECT cust_id, ord_date

FROM orders

GROUP BY cust_id, ord_date) as DerivedTable

- Simple Aggregation Framework의 count, distinct, group function 예제

///////////////////////////////////////////////////

// count

// find()에 대한 count() 펑션의 호출일 뿐이다

> db.orders.find().count()

// aggregation 서비스이다.

> db.orders.count()

// aggregation 서비스이기 때문에 operation이 들어간다

> db.orders.count({status:'A'})

///////////////////////////////////////////////////

// 샘플 save

db.dowonDB.save({a:1})

db.dowonDB.save({a:2})

db.dowonDB.save({a:3})

> db.dowonDB.count()

> db.dowonDB.count({a:1})

///////////////////////////////////////////////////

// distinct

> db.dowonDB.distinct('a')

[ 1, 2, 3 ]

또는

// runCommand 계정권한를 가지고 수행하는 shell 명령

> db.runCommand({'distinct':'dowonDB', 'key':'a'})

{

"values" : [

"stats" : {

"n" : 4,

"nscanned" : 4,

"nscannedObjects" : 4,

"timems" : 0,

"cursor" : "BasicCursor"

"ok" : 1

}

///////////////////////////////////////////////////

// group

db.dowonDB.save({dept_id: 1, salary: 1})

db.dowonDB.save({dept_id: 1, salary: 2})

db.dowonDB.save({dept_id: 1, salary: 3})

db.dowonDB.save({dept_id: 2, salary: 10})

db.dowonDB.save({dept_id: 2, salary: 12})

db.dowonDB.save({dept_id: 2, salary: 16})

db.dowonDB.save({dept_id: 3, salary: 4})

db.dowonDB.save({dept_id: 3, salary: 1})

// map 값이 distinct를 의미 : key 값이 map이 된다

// reduce는 코딩 즉 function이다. 즉, 비즈니스 펑션이다

> db.dowonDB.group(

{ key: {'dept_id': true},

reduce: function(obj, prev) { prev.sum += obj.salary },

initial: {sum: 0}

});

// 결과

[

{

"dept_id" : null,

"sum" : NaN

{

"dept_id" : 1,

"sum" : 6

{

"dept_id" : 2,

"sum" : 38

{

"dept_id" : 3,

"sum" : 5

}

]

또는 condition 조건절 줌

> db.dowonDB.group( { key: {'dept_id': true}, reduce: function(obj, prev) { prev.sum += obj.salary }, initial: {sum: 0}, condition: {'dept_id': {$gt:2} } });

[ { "dept_id" : 3, "sum" : 5 } ]

> db.dowonDB.group( { key: {'dept_id': true}, reduce: function(obj, prev) { prev.sum += obj.salary }, initial: {sum: 0}, condition: {'dept_id': {$gte:2} } });

[ { "dept_id" : 2, "sum" : 38 }, { "dept_id" : 3, "sum" : 5 } ]

또는

> db.dowonDB.group( { key: {'dept_id': true}, reduce: function(obj, prev) { prev.sum += obj.salary; prev.cnt++ }, initial: {sum: 0, avg:0, cnt:0}, condition: {'dept_id': {$gte:2} }, finalize: function(out){ out.avg = out.sum/out.cnt;} });

[

{

"dept_id" : 2,

"sum" : 38,

"avg" : 12.666666666666666,

"cnt" : 3

{

"dept_id" : 3,

"sum" : 5,

"avg" : 2.5,

"cnt" : 2

}

]

5. 심화학습

- 동영상 강좌를 보자

- 상단의 Group만들기에서 map, reduce의 흐름도

- Sql처럼 sub query를 하지 않고 Reduce Function을 사용하여 코딩하면 된다

- Framework Flow PDF : collectoin에 대해서 mapping 후 reducing 하여 result 결과를 만들어 낸다

- 데이터 Collection에서 key에 맞는 Map을 만들고, 비즈니스 로직에 맞게 Reduce 펑션을 만든 결과를

실시간으로 서비스한다

- Intermediate-1 = unwind = map = key:value로 만들기

Intemediate-2 = group = reduce

- 이제 group으로 작업하지 말고 Pipeline Operation으로 하는 aggregate를 사용한다. 또는 mapreduce를 사용한다

- 결국 v2.2 에 오면 mapreduce를 사용한다 : 비즈니스적으론 BI 솔루션과 맵핑하여 UX로 표현한다 (Real-Time)

- SNS에서 오는 실시간 데이터가 쌓여서 BigData가 되고 이를 저장하고 처리하는 것이 MongoDB로 웹앱기술과 찰떡 궁합!

- 해당 작업은 disk가 아니라 memory cache해서 사용한다 (메뉴얼상으로)

/////////////////////////////////////

// 형식

> db.dowonDB.mapreduce(map, reduce, out)

// mongo에서 map

> var map = function() { for(var key in this) { emit(key,{count: 1}) } }

> var reduce = function(emits){ total=0; for(var i in emits) { total+=emits[i].count; } return {'count': total}; }

> var mr = db.runCommand({'mapreduce':'dowonDB', 'map':map, 'reduce':reduce, 'out':{'inline':1}});

// 결과

> mr

{

"results" : [

{

"_id" : "_id",

"value" : {

"count" : NaN

}

{

"_id" : "a",

"value" : {

"count" : NaN

}

{

"_id" : "dept_id",

"value" : {

"count" : NaN

}

{

"_id" : "salary",

"value" : {

"count" : NaN

}

"timeMillis" : 14,

"counts" : {

"input" : 12,

"emit" : 32,

"reduce" : 4,

"output" : 4

"ok" : 1

}

또는

//////////////////////////////////////////////////////////

// dowonDB2 컬렉션으로 새롭게 tags 컬럼 넣기

> db.dowonDB2.save({_id: 1, tags:['dog', 'cat']})

> db.dowonDB2.save({_id: 2, tags:['cat']})

> db.dowonDB2.save({_id: 3, tags:['mouse', 'cat', 'dog']})

> db.dowonDB4.save({_id: 4, tags:[]})

// map과 reduce를 만듦

> var map2 = function() { this.tags.forEach( function(z) { emit(z, {count: 1}); } ); }

> var reduce2 = function(key, values) { var total=0; for(var i=0; i < values.length; i++) { total += values[i].count; } return {count:total}; }

// mapReduce 호출

> var mr2 = db.dowonDB2.mapReduce( map2, reduce2, {out:{inline:1}} );

> mr2

{

"results" : [

{

"_id" : "cat",

"value" : {

"count" : 3

}

{

"_id" : "dog",

"value" : {

"count" : 2

}

{

"_id" : "mouse",

"value" : {

"count" : 1

}

"timeMillis" : 3,

"counts" : {

"input" : 3,

"emit" : 6,

"reduce" : 2,

"output" : 3

"ok" : 1,

}

- MongoDB API

db.runCommand(

{

mapReduce: <collection>,

map: <function>,

reduce: <function>,

out: <output>,

query: <document>,

sort: <document>,

limit: <number>,

finalize: <function>,

scope: <document>,

jsMode: <boolean>,

verbose: <boolean>

}

)

- MapReduce의 최종 목적은 무얼까? SPA방식의 Web Application 서비스의 구현을 위한 것은 아닐까?

10Gen에서 이야기하는 MongoDB in SPA (이용 : Node.js + Express.js + Mongoose.js + MongoDB)

<참조>

- MongoDB Aggregation Framework Concept

- MongoDB Aggragation Framework Examples

- Collection Functions 목록

- Aggregation Framework의 Pipeline 과 Expression 예제

- count, distinct, group 예제

- DataBase Projection 개념

저작자표시 비영리 변경금지

'MongoDB > MapReduce' 카테고리의 다른 글

[MongoDB] Aggregation Framework 실습하기 (0)	2013.08.03
[MongoDB] GridFS 사용하기 (0)	2013.02.23
[MongoDB] GridFS 개념잡기 (0)	2013.02.23

posted by Peter Note

AI Convergence

Publication

Tag

Category

Recent Post

'mapreduce'에 해당되는 글 4건

[Hadoop] Mongo-Hadoop 에 대한 생각

'Big Data' 카테고리의 다른 글

[Hadoop] MapReduce 직접 .jar 파일로 수행하기

'Big Data' 카테고리의 다른 글

[Hadoop] Eclipse에서 Maven으로 하둡 코딩하기

'Big Data' 카테고리의 다른 글

[MongoDB] Aggregation Framework 이해하기

'MongoDB > MapReduce' 카테고리의 다른 글

티스토리툴바

AI Convergence

Publication

Tag

Search

Category

Recent Post

'mapreduce'에 해당되는 글 4건

[Hadoop] Mongo-Hadoop 에 대한 생각

'Big Data' 카테고리의 다른 글

[Hadoop] MapReduce 직접 .jar 파일로 수행하기

'Big Data' 카테고리의 다른 글

[Hadoop] Eclipse에서 Maven으로 하둡 코딩하기

'Big Data' 카테고리의 다른 글

[MongoDB] Aggregation Framework 이해하기

'MongoDB > MapReduce' 카테고리의 다른 글

티스토리툴바