반응형

hdfs에 hadoop-evn.sh을 넣어놓고 wordcount 예제를 돌려보자.


LoganLeeui-MacBook-Pro:hadoop Logan$ hadoop jar hadoop-examples-1.2.1.jar wordcount conf/hadoop-env.sh output

18/02/10 03:25:11 INFO input.FileInputFormat: Total input paths to process : 1

18/02/10 03:25:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

18/02/10 03:25:11 WARN snappy.LoadSnappy: Snappy native library not loaded

18/02/10 03:25:11 INFO mapred.JobClient: Running job: job_201802100324_0001

18/02/10 03:25:12 INFO mapred.JobClient:  map 0% reduce 0%

18/02/10 03:25:15 INFO mapred.JobClient:  map 100% reduce 0%

18/02/10 03:32:44 INFO mapred.JobClient: Task Id : attempt_201802100324_0001_r_000000_0, Status : FAILED

Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.

18/02/10 03:32:44 WARN mapred.JobClient: Error reading task outputConnection refused (Connection refused)

18/02/10 03:32:44 WARN mapred.JobClient: Error reading task outputConnection refused (Connection refused)

18/02/10 03:40:14 INFO mapred.JobClient: Task Id : attempt_201802100324_0001_r_000000_1, Status : FAILED

Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.

18/02/10 03:40:14 WARN mapred.JobClient: Error reading task outputConnection refused (Connection refused)

18/02/10 03:40:14 WARN mapred.JobClient: Error reading task outputConnection refused (Connection refused)

18/02/10 03:47:45 INFO mapred.JobClient: Task Id : attempt_201802100324_0001_r_000000_2, Status : FAILED

Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.

18/02/10 03:47:45 WARN mapred.JobClient: Error reading task outputConnection refused (Connection refused)

18/02/10 03:47:45 WARN mapred.JobClient: Error reading task outputConnection refused (Connection refused)

18/02/10 03:55:18 INFO mapred.JobClient: Job complete: job_201802100324_0001

18/02/10 03:55:18 INFO mapred.JobClient: Counters: 20

18/02/10 03:55:18 INFO mapred.JobClient:   Map-Reduce Framework

18/02/10 03:55:18 INFO mapred.JobClient:     Combine output records=178

18/02/10 03:55:18 INFO mapred.JobClient:     Spilled Records=178

18/02/10 03:55:18 INFO mapred.JobClient:     Map output materialized bytes=3151

18/02/10 03:55:18 INFO mapred.JobClient:     Map input records=62

18/02/10 03:55:18 INFO mapred.JobClient:     SPLIT_RAW_BYTES=116

18/02/10 03:55:18 INFO mapred.JobClient:     Map output records=306

18/02/10 03:55:18 INFO mapred.JobClient:     Map output bytes=3856

18/02/10 03:55:18 INFO mapred.JobClient:     Combine input records=306

18/02/10 03:55:18 INFO mapred.JobClient:     Total committed heap usage (bytes)=179306496

18/02/10 03:55:18 INFO mapred.JobClient:   File Input Format Counters 

18/02/10 03:55:18 INFO mapred.JobClient:     Bytes Read=2676

18/02/10 03:55:18 INFO mapred.JobClient:   FileSystemCounters

18/02/10 03:55:18 INFO mapred.JobClient:     HDFS_BYTES_READ=2792

18/02/10 03:55:18 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=60256

18/02/10 03:55:18 INFO mapred.JobClient:   Job Counters 

18/02/10 03:55:18 INFO mapred.JobClient:     Launched map tasks=1

18/02/10 03:55:18 INFO mapred.JobClient:     Launched reduce tasks=4

18/02/10 03:55:18 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=7072

18/02/10 03:55:18 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0

18/02/10 03:55:18 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=3591

18/02/10 03:55:18 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0

18/02/10 03:55:18 INFO mapred.JobClient:     Failed reduce tasks=1

18/02/10 03:55:18 INFO mapred.JobClient:     Data-local map tasks=1

저번편을 참고하면, 마지막에 wordcount예제를 돌릴려는 도중에 에러가 발생했었다.

Task Id : attempt_201802100324_0001_r_000000_0, Status : FAILED

Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.

이 에러는 보통 쓰레드를 초과했을 때 나는거라고 하는데 용량이 얼마하지도않는 hadoop-env.sh에서 쓰레드 초과가 나는건 아니니,

말이 안되는것이고 
WARN mapred.JobClient: Error reading task outputConnection refused (Connection refused)

문제점은 이놈이 확실해 보인다. 흔히들 말하는 map 100% reduce 0% 에러인데, 

mapper에서 reducer로 처리된 데이터를 보내는 과정 중에 shuffle이라는 과정을 거치게되는데 

이때 리듀서와 통신 에러가 나는 것이다.

그럼 뭐가 문제인걸까...ssh localhost로 해보았을 때, 문제가 없다...

그렇다면 블로깅하면서 설정파일을 내가 만졌을까? 그건 아니지싶다.

무언가가 엉킨것같은데 모르겠다. 고로 필자는 하둡을 지웠다.

hadoop-data도 지웠고

추후 설정에서는 hadoop-data폴더를 hadoop-1.2.1폴더안에 넣었다.

그리고 설치편에서 진행했던 과정을 모두 진행하고 wordcount를 실행해보았다.


LoganLeeui-MacBook-Pro:hadoop Logan$ hadoop jar hadoop-examples-1.2.1.jar wordcount conf/hadoop-env.sh output_

18/02/10 19:52:25 INFO input.FileInputFormat: Total input paths to process : 1

18/02/10 19:52:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

18/02/10 19:52:25 WARN snappy.LoadSnappy: Snappy native library not loaded

18/02/10 19:52:25 INFO mapred.JobClient: Running job: job_201802101951_0001

18/02/10 19:52:26 INFO mapred.JobClient:  map 0% reduce 0%

18/02/10 19:52:29 INFO mapred.JobClient:  map 100% reduce 0%

18/02/10 19:52:36 INFO mapred.JobClient:  map 100% reduce 33%

18/02/10 19:52:38 INFO mapred.JobClient:  map 100% reduce 100%

18/02/10 19:52:38 INFO mapred.JobClient: Job complete: job_201802101951_0001

18/02/10 19:52:38 INFO mapred.JobClient: Counters: 26

18/02/10 19:52:38 INFO mapred.JobClient:   Map-Reduce Framework

18/02/10 19:52:38 INFO mapred.JobClient:     Spilled Records=346

18/02/10 19:52:38 INFO mapred.JobClient:     Map output materialized bytes=2979

18/02/10 19:52:38 INFO mapred.JobClient:     Reduce input records=173

18/02/10 19:52:38 INFO mapred.JobClient:     Map input records=59

18/02/10 19:52:38 INFO mapred.JobClient:     SPLIT_RAW_BYTES=116

18/02/10 19:52:38 INFO mapred.JobClient:     Map output bytes=3700

18/02/10 19:52:38 INFO mapred.JobClient:     Reduce shuffle bytes=2979

18/02/10 19:52:38 INFO mapred.JobClient:     Reduce input groups=173

18/02/10 19:52:38 INFO mapred.JobClient:     Combine output records=173

18/02/10 19:52:38 INFO mapred.JobClient:     Reduce output records=173

18/02/10 19:52:38 INFO mapred.JobClient:     Map output records=302

18/02/10 19:52:38 INFO mapred.JobClient:     Combine input records=302

18/02/10 19:52:38 INFO mapred.JobClient:     Total committed heap usage (bytes)=308281344

18/02/10 19:52:38 INFO mapred.JobClient:   File Input Format Counters 

18/02/10 19:52:38 INFO mapred.JobClient:     Bytes Read=2532

18/02/10 19:52:38 INFO mapred.JobClient:   FileSystemCounters

18/02/10 19:52:38 INFO mapred.JobClient:     HDFS_BYTES_READ=2648

18/02/10 19:52:38 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=127735

18/02/10 19:52:38 INFO mapred.JobClient:     FILE_BYTES_READ=2979

18/02/10 19:52:38 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=2283

18/02/10 19:52:38 INFO mapred.JobClient:   Job Counters 

18/02/10 19:52:38 INFO mapred.JobClient:     Launched map tasks=1

18/02/10 19:52:38 INFO mapred.JobClient:     Launched reduce tasks=1

18/02/10 19:52:38 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=8086

18/02/10 19:52:38 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0

18/02/10 19:52:38 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=3033

18/02/10 19:52:38 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0

18/02/10 19:52:38 INFO mapred.JobClient:     Data-local map tasks=1

18/02/10 19:52:38 INFO mapred.JobClient:   File Output Format Counters 

18/02/10 19:52:38 INFO mapred.JobClient:     Bytes Written=2283

LoganLeeui-MacBook-Pro:hadoop Logan$ hadoop jar hadoop-examples-1.2.1.jar wordcount conf/hadoop-env.sh output_2

18/02/11 09:37:54 INFO input.FileInputFormat: Total input paths to process : 1

18/02/11 09:37:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

18/02/11 09:37:54 WARN snappy.LoadSnappy: Snappy native library not loaded

18/02/11 09:37:54 INFO mapred.JobClient: Running job: job_201802101951_0002

18/02/11 09:37:55 INFO mapred.JobClient:  map 0% reduce 0%

18/02/11 09:37:58 INFO mapred.JobClient:  map 100% reduce 0%

18/02/11 09:38:05 INFO mapred.JobClient:  map 100% reduce 33%

18/02/11 09:38:06 INFO mapred.JobClient:  map 100% reduce 100%

18/02/11 09:38:08 INFO mapred.JobClient: Job complete: job_201802101951_0002

18/02/11 09:38:08 INFO mapred.JobClient: Counters: 26

18/02/11 09:38:08 INFO mapred.JobClient:   Map-Reduce Framework

18/02/11 09:38:08 INFO mapred.JobClient:     Spilled Records=346

18/02/11 09:38:08 INFO mapred.JobClient:     Map output materialized bytes=2979

18/02/11 09:38:08 INFO mapred.JobClient:     Reduce input records=173

18/02/11 09:38:08 INFO mapred.JobClient:     Map input records=59

18/02/11 09:38:08 INFO mapred.JobClient:     SPLIT_RAW_BYTES=116

18/02/11 09:38:08 INFO mapred.JobClient:     Map output bytes=3700

18/02/11 09:38:08 INFO mapred.JobClient:     Reduce shuffle bytes=2979

18/02/11 09:38:08 INFO mapred.JobClient:     Reduce input groups=173

18/02/11 09:38:08 INFO mapred.JobClient:     Combine output records=173

18/02/11 09:38:08 INFO mapred.JobClient:     Reduce output records=173

18/02/11 09:38:08 INFO mapred.JobClient:     Map output records=302

18/02/11 09:38:08 INFO mapred.JobClient:     Combine input records=302

18/02/11 09:38:08 INFO mapred.JobClient:     Total committed heap usage (bytes)=308281344

18/02/11 09:38:08 INFO mapred.JobClient:   File Input Format Counters 

18/02/11 09:38:08 INFO mapred.JobClient:     Bytes Read=2532

18/02/11 09:38:08 INFO mapred.JobClient:   FileSystemCounters

18/02/11 09:38:08 INFO mapred.JobClient:     HDFS_BYTES_READ=2648

18/02/11 09:38:08 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=127733

18/02/11 09:38:08 INFO mapred.JobClient:     FILE_BYTES_READ=2979

18/02/11 09:38:08 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=2283

18/02/11 09:38:08 INFO mapred.JobClient:   Job Counters 

18/02/11 09:38:08 INFO mapred.JobClient:     Launched map tasks=1

18/02/11 09:38:08 INFO mapred.JobClient:     Launched reduce tasks=1

18/02/11 09:38:08 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=8102

18/02/11 09:38:08 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0

18/02/11 09:38:08 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=3325

18/02/11 09:38:08 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0

18/02/11 09:38:08 INFO mapred.JobClient:     Data-local map tasks=1

18/02/11 09:38:08 INFO mapred.JobClient:   File Output Format Counters 

18/02/11 09:38:08 INFO mapred.JobClient:     Bytes Written=2283

결과는 드디어 성공이었다.

hadoop fs -lsr

drwxr-xr-x   - Logan supergroup          0 2018-02-10 19:52 /user/Logan/conf

-rw-r--r--   1 Logan supergroup       2532 2018-02-10 19:52 /user/Logan/conf/hadoop-env.sh

drwxr-xr-x   - Logan supergroup          0 2018-02-10 19:52 /user/Logan/output_

-rw-r--r--   1 Logan supergroup          0 2018-02-10 19:52 /user/Logan/output_/_SUCCESS

drwxr-xr-x   - Logan supergroup          0 2018-02-10 19:52 /user/Logan/output_/_logs

drwxr-xr-x   - Logan supergroup          0 2018-02-10 19:52 /user/Logan/output_/_logs/history

-rw-r--r--   1 Logan supergroup      11699 2018-02-10 19:52 /user/Logan/output_/_logs/history/job_201802101951_0001_1518259945223_Logan_word+count

-rw-r--r--   1 Logan supergroup      53374 2018-02-10 19:52 /user/Logan/output_/_logs/history/job_201802101951_0001_conf.xml

-rw-r--r--   1 Logan supergroup       2283 2018-02-10 19:52 /user/Logan/output_/part-r-00000


 hadoop fs -cat output_/part-r-00000

# 38

$HADOOP_BALANCER_OPTS" 1

$HADOOP_DATANODE_OPTS" 1

$HADOOP_HOME/conf/slaves 1

$HADOOP_HOME/logs 1

$HADOOP_JOBTRACKER_OPTS" 1

$HADOOP_NAMENODE_OPTS" 1

$HADOOP_SECONDARYNAMENODE_OPTS" 1

$USER 1

'man 1

(fs, 1

-o 1

/tmp 1

1000. 1

A 1

All 1

CLASSPATH 1

Command 1

ConnectTimeout=1 1

Default 1

Empty 2

Extra 3

File 1

HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote 1

HADOOP_CLASSPATH= 1

HADOOP_CLIENT_OPTS 1

HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote 1

HADOOP_HEAPSIZE=2000 1

HADOOP_HOME_WARN_SUPPRESS=1 1

HADOOP_IDENT_STRING=$USER 1

HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote 1

HADOOP_LOG_DIR=${HADOOP_HOME}/logs 1

HADOOP_MASTER=master:/home/$USER/src/hadoop 1

HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote 1

HADOOP_NICENESS=10 1

HADOOP_OPTS 1

HADOOP_OPTS=-server 1

HADOOP_PID_DIR=/var/hadoop/pids 1

HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote 1

HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves 1

HADOOP_SLAVE_SLEEP=0.1 1

HADOOP_SSH_OPTS="-o 1

HADOOP_TASKTRACKER_OPTS= 1

Hadoop-specific 1

JAVA_HOME 1

JAVA_HOME. 1

JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home 1

Java 2

MB. 1

NOTE: 1

Optional. 1

Otherwise 1

Required. 1

Seconds 1

See 1

SendEnv=HADOOP_CONF_DIR" 1

Set 1

Suppessing 1

The 6

This 1

Unset 2

Warning 1

When 1

Where 1

a 3

amount 1

appended 1

applies 1

are 4

arrive 1

attack. 1

be 4

best 1

between 1

by 9

can 4

clusters, 1

code 1

commands 1

commands. 1

configuration 1

correctly 1

daemon 1

daemons. 1

default. 8

defined 1

dfs, 1

directory 2

distcp 1

distributed 1

e.g., 1

elements. 1

environment 2

etc) 1

export 20

faster 1

file, 1

files 2

following 1

for 2

from. 1

fsck, 1

going 1

hadoop 2

hadoop. 1

heap 1

here. 1

host:path 1

hosts. 1

implementation 1

in 3

instance 1

is 5

it 2

java 1

large 1

log 1

master 1

maximum 1

multiple 1

naming 1

nice'. 1

nodes. 1

of 2

on 1

only 2

optional. 1

options 1

options. 2

others 1

otherwise 1

pid 1

potential 1

priority 1

processes. 1

remote 2

representing 1

required 1

rsync'd 1

rsyncs 1

run 1

running 1

runtime 1

scheduling 1

service 1

set 2

should 2

slave 3

sleep 1

so 1

specific 1

specified 1

ssh 1

stored. 2

string 1

symlink 1

than 1

that 3

the 4

them. 1

there 1

this 3

to 9

use, 1

use. 1

useful 1

users 1

variable 1

variables 1

when 1

where 2

where, 1

written 1


정상적으로 출력된다. 

궁금점이 있다면 댓글에 남겨주길 바란다.

반응형
by 발전소장 에르 :) 2018. 2. 11. 10:54