[Hadoop] MapReduce 직접 .jar 파일로 수행하기

2013. 9. 11. 19:20 Big Data

[Hadoop] MapReduce 직접 .jar 파일로 수행하기

Mapper & Reducer를 .jar로 배포하고 직접 하둡명령으로 수행하는 방법에 대하여 알아보자

MapReduce 프로그램

- Writable Interface는 Value에서 사용한다

- Mapper 인터페이스

Mapper<K1, V1, K2, V2>의 형태 : key는 WritableComparable를 구현해야 하며, value는 Writable를 구현해야 함.

- Reducer 인터페이스

reducer는 여러가지 매퍼로부터 생성된 결과를 받고, key/value 쌍의 key에 대해 데이터를 정렬하고 동일한 key에 대한 모든 값을 그룹핑 함.

- 이전의 WordCount에 대한 것을 직접 코딩하였는데, 맵퍼-TokenCountMapper-, 리듀서-LongSumReducer-를 사용해서 동일하게 만들 수 있다

hadoop 명령어로 .jar 직접 수행하기

- pom.xml 에 MapReduce Jar파일을 만들어 특정위치로 복사하는 플러그인 설정을 넣는다

<build>

<artifactId>maven-antrun-plugin</artifactId>

<tasks>

<copy file="target/${project.artifactId}-${project.version}.jar"

tofile="/Users/dowon/Documents/hadoop-jobs/${project.artifactId}-${project.version}.jar" />

</tasks>

</configuration>

<phase>install</phase>

<goals>

</goals>

</execution>

</executions>

</plugin>

</plugins>

</build>

- 기존 WordCount에 대한 WordCount3 복사본을 만들고 TokenCountMapper와 LongSumReducer로 변형한다

즉 직접 코딩하지 말고 하둡에서 제공하는 클래스를 사용한다

import java.io.IOException;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapred.FileInputFormat;

import org.apache.hadoop.mapred.FileOutputFormat;

import org.apache.hadoop.mapred.JobClient;

import org.apache.hadoop.mapred.JobConf;

import org.apache.hadoop.mapred.lib.LongSumReducer;

import org.apache.hadoop.mapred.lib.TokenCountMapper;

public class WordCount3 {

public static void main(String[] args) throws IOException {

// 1. configuration Mapper & Reducer of Hadoop

JobConf conf = new JobConf(WordCount3.class);

conf.setJobName("wordcount3");

// 2. final output key type & value type

conf.setOutputKeyClass(Text.class);

conf.setOutputValueClass(LongWritable.class);

// 3. in/output format

conf.setMapperClass(TokenCountMapper.class);

conf.setCombinerClass(LongSumReducer.class);

conf.setReducerClass(LongSumReducer.class);

// 4. set the path of file for read files

// input path : args[0]

// output path : args[1]

FileInputFormat.setInputPaths(conf, new Path(args[0]));

FileOutputFormat.setOutputPath(conf, new Path(args[1]));

// 5. run job

JobClient client = new JobClient();

client.setConf(conf);

JobClient.runJob(conf);

}

- eclipse의 프로젝트를 선택하고 "Run As"에서 "Maven build..."를 선택하여 "clean install" 입력하고 "run"버튼을 클릭한다

- 결과로 배포가 성공으로 나오면 된다 : /Users/dowon/Documents/hadoop-jobs 디렉토리에 *.jar 파일 생성을 확인한다

- 하둡 데몬들을 수행하기 전 NameNode에 대해서 format을 하고 수행한다

// name node 포멧

$ hadoop namenode -format

// .bash_profile 에 PATH 설정

set -o vi

export JAVA_HOME=/Library/Java/Home

export H_HOME=~/Documents/hadoop-1.2.1

export PATH=.:$PATH:$JAVA_HOME/bin:$H_HOME/bin:/usr/bin

alias ll='ls -alrt'

alias cdh='cd $H_HOME'

// 하둡 데몬 수행

// 50030 : job-tracker 접속 포트

// 50070 : NameNode 접속 포트

$ start-all.sh

- input 의 위치를 지정하여 준다 (만일, NameNode를 포멧하였다면)

// 위치가 하기와 같다면

$ pwd

/Users/dowon/Documents/input

$ ls

total 16

-rw-r--r-- 1 dowon staff 22 9 11 19:26 file01

-rw-r--r-- 1 dowon staff 21 9 11 19:26 file02

// input 을 HDFS에 만든다

$ hadoop fs -put . input

1) http://localhost:50070/으로 접속하여 "Browser the filesystem"을 클릭하면 볼 수 있다

2) /user/dowon/input 경로로 만들어 졌음을 알 수 있다

- hadoop 명령어로 생성된 jar 파일을 수행해 보자

// 경로가 다음과 같고, WordCount3.class가 들어있는 .jar 파일이 존재한다

$ pwd

/Users/dowon/Documents/hadoop-jobs

$ ls

-rw-r--r-- 1 dowon staff 5391 9 11 19:45 MapReduce-1.0.0-SNAPSHOT.jar

// 명령어 수행

$ hadoop jar *.jar WordCount3 /user/dowon/input /user/dowon/output3

1) 네임노드에 접속해서 /user/dowon에 들어가 보면 "output3"이 생성된 것을 볼 수있다

2) output3으로 들어가면 결과값을 지니 파일이 존재한다

3) 만일 명령을 재수행하고 싶다면 output3 디렉토리리 삭제해야 한다

$ hadoop fs -rmr /user/dowon/output3

eclipse에서 수행하지 않고 반출된 .jar 파일을 가지고 hadoop명령으로 수행하는 방법을 알아보았다.

<참조>

없음

저작자표시 비영리 변경금지 (새창열림)

'Big Data' 카테고리의 다른 글

[RethinkDB] 시작하기 (0)	2017.04.11
[Hadoop] Mongo-Hadoop 에 대한 생각 (0)	2013.09.12
[Hadoop] Eclipse에서 Maven으로 하둡 코딩하기 (2)	2013.09.09
[Hadoop] 개념이해 및 설치하기 (2)	2013.09.09

posted by Peter Note

AI Convergence

Publication

Tag

Category

Recent Post