Hadoop 의 Counter 이야기

까딱이(micropai) 2014. 11. 22. 04:36

2014. 11. 22. 04:36

기본 카운터의 구현은 이전 포스트에서 다른적이 있다.

hadoop 에서는 이미 counter 를 많이 사용하고 있다.

hadoop 에 jar를 올려 실행하면 실행결과 최종에 나오는 값들이 모두 카운터를 통해 구현된 값이다.

해당 jar 실행시 아래와 같은 최종 결과가 나온다

counter 는 총 43개 이며 해당 카운터에 mapreduce 프로그램이 동작할 사용한 주요 값들을 확인할 수 있다.

대표적으로 map, combine, reduce 에 input 된 record 수와 output된 record 수를 확인 할 수 있있으며, 실행시간 shuffle 된 byte 수도 확인이 가능한다.

이것을 통해 튜닝포인트를 잡을 수가 있게 된다.

기본 실행 결과 예시> combiner 를 구현하지 않은 경우

INFO mapreduce.Job: Counters: 43

File System Counters

FILE: Number of bytes read=37834707

FILE: Number of bytes written=76663746

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=1397535171

HDFS: Number of bytes written=21

HDFS: Number of read operations=36

HDFS: Number of large read operations=0

HDFS: Number of write operations=2

Job Counters

Launched map tasks=11

Launched reduce tasks=1

Data-local map tasks=11

Total time spent by all maps in occupied slots (ms)=88360

Total time spent by all reduces in occupied slots (ms)=11532

Map-Reduce Framework

Map input records=2007594

Map output records=2007593

Map output bytes=33819515

Map output materialized bytes=37834767

Input split bytes=1474

Combine input records=0

Combine output records=0

Reduce input groups=4

Reduce shuffle bytes=37834767

Reduce input records=2007593

Reduce output records=1

Spilled Records=4015186

Shuffled Maps =11

Failed Shuffles=0

Merged Map outputs=11

GC time elapsed (ms)=1029

CPU time spent (ms)=26570

Physical memory (bytes) snapshot=10722308096

Virtual memory (bytes) snapshot=23048646656

Total committed heap usage (bytes)=12089032704

Shuffle Errors

BAD_ID=0

CONNECTION=0

IO_ERROR=0

WRONG_LENGTH=0

WRONG_MAP=0

WRONG_REDUCE=0

File Input Format Counters

Bytes Read=1397533697

File Output Format Counters

Bytes Written=21

위의 결과를 보면 combiner 가 구현되지 않았기 때문에 combine input records 와 output records 가 0 이며,

reduce 에서 shuffle 된 bytes 와 records 수가 매우 높은 것을 알 수 있다.

그에 따라 reduce 에서 해당 값을 처리하기 위해 11532 ms 를 소모했다.

아래 결과는 combiner 를 구현한 경우이다.

INFO mapreduce.Job: Counters: 43

File System Counters

FILE: Number of bytes read=908

FILE: Number of bytes written=998468

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=1397535171

HDFS: Number of bytes written=21

HDFS: Number of read operations=36

HDFS: Number of large read operations=0

HDFS: Number of write operations=2

Job Counters

Launched map tasks=11

Launched reduce tasks=1

Data-local map tasks=11

Total time spent by all maps in occupied slots (ms)=86586

Total time spent by all reduces in occupied slots (ms)=5220

Map-Reduce Framework

Map input records=2007594

Map output records=2007593

Map output bytes=33819515

Map output materialized bytes=968

Input split bytes=1474

Combine input records=2007593

Combine output records=44

Reduce input groups=4

Reduce shuffle bytes=968

Reduce input records=44

Reduce output records=1

Spilled Records=88

Shuffled Maps =11

Failed Shuffles=0

Merged Map outputs=11

GC time elapsed (ms)=1057

CPU time spent (ms)=24930

Physical memory (bytes) snapshot=10712588288

Virtual memory (bytes) snapshot=23029968896

Total committed heap usage (bytes)=12059148288

Shuffle Errors

BAD_ID=0

CONNECTION=0

IO_ERROR=0

WRONG_LENGTH=0

WRONG_MAP=0

WRONG_REDUCE=0

File Input Format Counters

Bytes Read=1397533697

File Output Format Counters

Bytes Written=21

첫번째의 결과와 다른 것은 combiner (각 datanode 에서 동작하여 분산처리됨)의 구현을 통해 reduce 에서 취합되는 데이터의 양을 최소화 시켰다. 즉 combine output 은 44 records 로 줄였으며, 그로 인해 reduce 의 input shuffle 등의 값이 현저하게 주는 것을 확인할 수 있다.

결과를 보면 maps 의 소모시간은 조금 증가하였지만, reduce 의 소모시간은 반으로 준 것을 확인할 수 있다.

이와 같이 counter 는 개발자의 데이터 공유뿐만 아니라 mapreduce 동작의 기본값을 확인 할 수 있기 때문에 활용도가 매우 높다.

이런 기본 counter 값도 reduce 동작중 획득하여 사용할 수 있다.

저작자표시 비영리 변경금지

'IT > 빅데이터(bigData)' 카테고리의 다른 글

Mapper 클래스 구성 및 사용 (0)	2014.11.22
여러 기능의 맵리듀스를 하나의 jar 로 묶을때 (0)	2014.11.22
Reduce 에서 counter 조회 (0)	2014.11.22
hadoop counter 사용 (0)	2014.11.21
하이브(hive)에 대한 외부접속 (0)	2014.11.21

까딱이의 춤과 IT

Hadoop 의 Counter 이야기

'IT > 빅데이터(bigData)' 카테고리의 다른 글

+ Recent posts

티스토리툴바