hdfs에 hadoop-evn.sh을 넣어놓고 wordcount 예제를 돌려보자.
LoganLeeui-MacBook-Pro:hadoop Logan$ hadoop jar hadoop-examples-1.2.1.jar wordcount conf/hadoop-env.sh output
18/02/10 03:25:11 INFO input.FileInputFormat: Total input paths to process : 1
18/02/10 03:25:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/02/10 03:25:11 WARN snappy.LoadSnappy: Snappy native library not loaded
18/02/10 03:25:11 INFO mapred.JobClient: Running job: job_201802100324_0001
18/02/10 03:25:12 INFO mapred.JobClient: map 0% reduce 0%
18/02/10 03:25:15 INFO mapred.JobClient: map 100% reduce 0%
18/02/10 03:32:44 INFO mapred.JobClient: Task Id : attempt_201802100324_0001_r_000000_0, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
18/02/10 03:32:44 WARN mapred.JobClient: Error reading task outputConnection refused (Connection refused)
18/02/10 03:32:44 WARN mapred.JobClient: Error reading task outputConnection refused (Connection refused)
18/02/10 03:40:14 INFO mapred.JobClient: Task Id : attempt_201802100324_0001_r_000000_1, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
18/02/10 03:40:14 WARN mapred.JobClient: Error reading task outputConnection refused (Connection refused)
18/02/10 03:40:14 WARN mapred.JobClient: Error reading task outputConnection refused (Connection refused)
18/02/10 03:47:45 INFO mapred.JobClient: Task Id : attempt_201802100324_0001_r_000000_2, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
18/02/10 03:47:45 WARN mapred.JobClient: Error reading task outputConnection refused (Connection refused)
18/02/10 03:47:45 WARN mapred.JobClient: Error reading task outputConnection refused (Connection refused)
18/02/10 03:55:18 INFO mapred.JobClient: Job complete: job_201802100324_0001
18/02/10 03:55:18 INFO mapred.JobClient: Counters: 20
18/02/10 03:55:18 INFO mapred.JobClient: Map-Reduce Framework
18/02/10 03:55:18 INFO mapred.JobClient: Combine output records=178
18/02/10 03:55:18 INFO mapred.JobClient: Spilled Records=178
18/02/10 03:55:18 INFO mapred.JobClient: Map output materialized bytes=3151
18/02/10 03:55:18 INFO mapred.JobClient: Map input records=62
18/02/10 03:55:18 INFO mapred.JobClient: SPLIT_RAW_BYTES=116
18/02/10 03:55:18 INFO mapred.JobClient: Map output records=306
18/02/10 03:55:18 INFO mapred.JobClient: Map output bytes=3856
18/02/10 03:55:18 INFO mapred.JobClient: Combine input records=306
18/02/10 03:55:18 INFO mapred.JobClient: Total committed heap usage (bytes)=179306496
18/02/10 03:55:18 INFO mapred.JobClient: File Input Format Counters
18/02/10 03:55:18 INFO mapred.JobClient: Bytes Read=2676
18/02/10 03:55:18 INFO mapred.JobClient: FileSystemCounters
18/02/10 03:55:18 INFO mapred.JobClient: HDFS_BYTES_READ=2792
18/02/10 03:55:18 INFO mapred.JobClient: FILE_BYTES_WRITTEN=60256
18/02/10 03:55:18 INFO mapred.JobClient: Job Counters
18/02/10 03:55:18 INFO mapred.JobClient: Launched map tasks=1
18/02/10 03:55:18 INFO mapred.JobClient: Launched reduce tasks=4
18/02/10 03:55:18 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=7072
18/02/10 03:55:18 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
18/02/10 03:55:18 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=3591
18/02/10 03:55:18 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
18/02/10 03:55:18 INFO mapred.JobClient: Failed reduce tasks=1
18/02/10 03:55:18 INFO mapred.JobClient: Data-local map tasks=1
저번편을 참고하면, 마지막에 wordcount예제를 돌릴려는 도중에 에러가 발생했었다.
Task Id : attempt_201802100324_0001_r_000000_0, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
이 에러는 보통 쓰레드를 초과했을 때 나는거라고 하는데 용량이 얼마하지도않는 hadoop-env.sh에서 쓰레드 초과가 나는건 아니니,
말이 안되는것이고
WARN mapred.JobClient: Error reading task outputConnection refused (Connection refused)
문제점은 이놈이 확실해 보인다. 흔히들 말하는 map 100% reduce 0% 에러인데,
mapper에서 reducer로 처리된 데이터를 보내는 과정 중에 shuffle이라는 과정을 거치게되는데
이때 리듀서와 통신 에러가 나는 것이다.
그럼 뭐가 문제인걸까...ssh localhost로 해보았을 때, 문제가 없다...
그렇다면 블로깅하면서 설정파일을 내가 만졌을까? 그건 아니지싶다.
무언가가 엉킨것같은데 모르겠다. 고로 필자는 하둡을 지웠다.
hadoop-data도 지웠고
추후 설정에서는 hadoop-data폴더를 hadoop-1.2.1폴더안에 넣었다.
그리고 설치편에서 진행했던 과정을 모두 진행하고 wordcount를 실행해보았다.
LoganLeeui-MacBook-Pro:hadoop Logan$ hadoop jar hadoop-examples-1.2.1.jar wordcount conf/hadoop-env.sh output_
18/02/10 19:52:25 INFO input.FileInputFormat: Total input paths to process : 1
18/02/10 19:52:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/02/10 19:52:25 WARN snappy.LoadSnappy: Snappy native library not loaded
18/02/10 19:52:25 INFO mapred.JobClient: Running job: job_201802101951_0001
18/02/10 19:52:26 INFO mapred.JobClient: map 0% reduce 0%
18/02/10 19:52:29 INFO mapred.JobClient: map 100% reduce 0%
18/02/10 19:52:36 INFO mapred.JobClient: map 100% reduce 33%
18/02/10 19:52:38 INFO mapred.JobClient: map 100% reduce 100%
18/02/10 19:52:38 INFO mapred.JobClient: Job complete: job_201802101951_0001
18/02/10 19:52:38 INFO mapred.JobClient: Counters: 26
18/02/10 19:52:38 INFO mapred.JobClient: Map-Reduce Framework
18/02/10 19:52:38 INFO mapred.JobClient: Spilled Records=346
18/02/10 19:52:38 INFO mapred.JobClient: Map output materialized bytes=2979
18/02/10 19:52:38 INFO mapred.JobClient: Reduce input records=173
18/02/10 19:52:38 INFO mapred.JobClient: Map input records=59
18/02/10 19:52:38 INFO mapred.JobClient: SPLIT_RAW_BYTES=116
18/02/10 19:52:38 INFO mapred.JobClient: Map output bytes=3700
18/02/10 19:52:38 INFO mapred.JobClient: Reduce shuffle bytes=2979
18/02/10 19:52:38 INFO mapred.JobClient: Reduce input groups=173
18/02/10 19:52:38 INFO mapred.JobClient: Combine output records=173
18/02/10 19:52:38 INFO mapred.JobClient: Reduce output records=173
18/02/10 19:52:38 INFO mapred.JobClient: Map output records=302
18/02/10 19:52:38 INFO mapred.JobClient: Combine input records=302
18/02/10 19:52:38 INFO mapred.JobClient: Total committed heap usage (bytes)=308281344
18/02/10 19:52:38 INFO mapred.JobClient: File Input Format Counters
18/02/10 19:52:38 INFO mapred.JobClient: Bytes Read=2532
18/02/10 19:52:38 INFO mapred.JobClient: FileSystemCounters
18/02/10 19:52:38 INFO mapred.JobClient: HDFS_BYTES_READ=2648
18/02/10 19:52:38 INFO mapred.JobClient: FILE_BYTES_WRITTEN=127735
18/02/10 19:52:38 INFO mapred.JobClient: FILE_BYTES_READ=2979
18/02/10 19:52:38 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=2283
18/02/10 19:52:38 INFO mapred.JobClient: Job Counters
18/02/10 19:52:38 INFO mapred.JobClient: Launched map tasks=1
18/02/10 19:52:38 INFO mapred.JobClient: Launched reduce tasks=1
18/02/10 19:52:38 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=8086
18/02/10 19:52:38 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
18/02/10 19:52:38 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=3033
18/02/10 19:52:38 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
18/02/10 19:52:38 INFO mapred.JobClient: Data-local map tasks=1
18/02/10 19:52:38 INFO mapred.JobClient: File Output Format Counters
18/02/10 19:52:38 INFO mapred.JobClient: Bytes Written=2283
LoganLeeui-MacBook-Pro:hadoop Logan$ hadoop jar hadoop-examples-1.2.1.jar wordcount conf/hadoop-env.sh output_2
18/02/11 09:37:54 INFO input.FileInputFormat: Total input paths to process : 1
18/02/11 09:37:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/02/11 09:37:54 WARN snappy.LoadSnappy: Snappy native library not loaded
18/02/11 09:37:54 INFO mapred.JobClient: Running job: job_201802101951_0002
18/02/11 09:37:55 INFO mapred.JobClient: map 0% reduce 0%
18/02/11 09:37:58 INFO mapred.JobClient: map 100% reduce 0%
18/02/11 09:38:05 INFO mapred.JobClient: map 100% reduce 33%
18/02/11 09:38:06 INFO mapred.JobClient: map 100% reduce 100%
18/02/11 09:38:08 INFO mapred.JobClient: Job complete: job_201802101951_0002
18/02/11 09:38:08 INFO mapred.JobClient: Counters: 26
18/02/11 09:38:08 INFO mapred.JobClient: Map-Reduce Framework
18/02/11 09:38:08 INFO mapred.JobClient: Spilled Records=346
18/02/11 09:38:08 INFO mapred.JobClient: Map output materialized bytes=2979
18/02/11 09:38:08 INFO mapred.JobClient: Reduce input records=173
18/02/11 09:38:08 INFO mapred.JobClient: Map input records=59
18/02/11 09:38:08 INFO mapred.JobClient: SPLIT_RAW_BYTES=116
18/02/11 09:38:08 INFO mapred.JobClient: Map output bytes=3700
18/02/11 09:38:08 INFO mapred.JobClient: Reduce shuffle bytes=2979
18/02/11 09:38:08 INFO mapred.JobClient: Reduce input groups=173
18/02/11 09:38:08 INFO mapred.JobClient: Combine output records=173
18/02/11 09:38:08 INFO mapred.JobClient: Reduce output records=173
18/02/11 09:38:08 INFO mapred.JobClient: Map output records=302
18/02/11 09:38:08 INFO mapred.JobClient: Combine input records=302
18/02/11 09:38:08 INFO mapred.JobClient: Total committed heap usage (bytes)=308281344
18/02/11 09:38:08 INFO mapred.JobClient: File Input Format Counters
18/02/11 09:38:08 INFO mapred.JobClient: Bytes Read=2532
18/02/11 09:38:08 INFO mapred.JobClient: FileSystemCounters
18/02/11 09:38:08 INFO mapred.JobClient: HDFS_BYTES_READ=2648
18/02/11 09:38:08 INFO mapred.JobClient: FILE_BYTES_WRITTEN=127733
18/02/11 09:38:08 INFO mapred.JobClient: FILE_BYTES_READ=2979
18/02/11 09:38:08 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=2283
18/02/11 09:38:08 INFO mapred.JobClient: Job Counters
18/02/11 09:38:08 INFO mapred.JobClient: Launched map tasks=1
18/02/11 09:38:08 INFO mapred.JobClient: Launched reduce tasks=1
18/02/11 09:38:08 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=8102
18/02/11 09:38:08 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
18/02/11 09:38:08 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=3325
18/02/11 09:38:08 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
18/02/11 09:38:08 INFO mapred.JobClient: Data-local map tasks=1
18/02/11 09:38:08 INFO mapred.JobClient: File Output Format Counters
18/02/11 09:38:08 INFO mapred.JobClient: Bytes Written=2283
결과는 드디어 성공이었다.
hadoop fs -lsr
drwxr-xr-x - Logan supergroup 0 2018-02-10 19:52 /user/Logan/conf
-rw-r--r-- 1 Logan supergroup 2532 2018-02-10 19:52 /user/Logan/conf/hadoop-env.sh
drwxr-xr-x - Logan supergroup 0 2018-02-10 19:52 /user/Logan/output_
-rw-r--r-- 1 Logan supergroup 0 2018-02-10 19:52 /user/Logan/output_/_SUCCESS
drwxr-xr-x - Logan supergroup 0 2018-02-10 19:52 /user/Logan/output_/_logs
drwxr-xr-x - Logan supergroup 0 2018-02-10 19:52 /user/Logan/output_/_logs/history
-rw-r--r-- 1 Logan supergroup 11699 2018-02-10 19:52 /user/Logan/output_/_logs/history/job_201802101951_0001_1518259945223_Logan_word+count
-rw-r--r-- 1 Logan supergroup 53374 2018-02-10 19:52 /user/Logan/output_/_logs/history/job_201802101951_0001_conf.xml
-rw-r--r-- 1 Logan supergroup 2283 2018-02-10 19:52 /user/Logan/output_/part-r-00000
hadoop fs -cat output_/part-r-00000
# 38
$HADOOP_BALANCER_OPTS" 1
$HADOOP_DATANODE_OPTS" 1
$HADOOP_HOME/conf/slaves 1
$HADOOP_HOME/logs 1
$HADOOP_JOBTRACKER_OPTS" 1
$HADOOP_NAMENODE_OPTS" 1
$HADOOP_SECONDARYNAMENODE_OPTS" 1
$USER 1
'man 1
(fs, 1
-o 1
/tmp 1
1000. 1
A 1
All 1
CLASSPATH 1
Command 1
ConnectTimeout=1 1
Default 1
Empty 2
Extra 3
File 1
HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote 1
HADOOP_CLASSPATH= 1
HADOOP_CLIENT_OPTS 1
HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote 1
HADOOP_HEAPSIZE=2000 1
HADOOP_HOME_WARN_SUPPRESS=1 1
HADOOP_IDENT_STRING=$USER 1
HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote 1
HADOOP_LOG_DIR=${HADOOP_HOME}/logs 1
HADOOP_MASTER=master:/home/$USER/src/hadoop 1
HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote 1
HADOOP_NICENESS=10 1
HADOOP_OPTS 1
HADOOP_OPTS=-server 1
HADOOP_PID_DIR=/var/hadoop/pids 1
HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote 1
HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves 1
HADOOP_SLAVE_SLEEP=0.1 1
HADOOP_SSH_OPTS="-o 1
HADOOP_TASKTRACKER_OPTS= 1
Hadoop-specific 1
JAVA_HOME 1
JAVA_HOME. 1
JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home 1
Java 2
MB. 1
NOTE: 1
Optional. 1
Otherwise 1
Required. 1
Seconds 1
See 1
SendEnv=HADOOP_CONF_DIR" 1
Set 1
Suppessing 1
The 6
This 1
Unset 2
Warning 1
When 1
Where 1
a 3
amount 1
appended 1
applies 1
are 4
arrive 1
attack. 1
be 4
best 1
between 1
by 9
can 4
clusters, 1
code 1
commands 1
commands. 1
configuration 1
correctly 1
daemon 1
daemons. 1
default. 8
defined 1
dfs, 1
directory 2
distcp 1
distributed 1
e.g., 1
elements. 1
environment 2
etc) 1
export 20
faster 1
file, 1
files 2
following 1
for 2
from. 1
fsck, 1
going 1
hadoop 2
hadoop. 1
heap 1
here. 1
host:path 1
hosts. 1
implementation 1
in 3
instance 1
is 5
it 2
java 1
large 1
log 1
master 1
maximum 1
multiple 1
naming 1
nice'. 1
nodes. 1
of 2
on 1
only 2
optional. 1
options 1
options. 2
others 1
otherwise 1
pid 1
potential 1
priority 1
processes. 1
remote 2
representing 1
required 1
rsync'd 1
rsyncs 1
run 1
running 1
runtime 1
scheduling 1
service 1
set 2
should 2
slave 3
sleep 1
so 1
specific 1
specified 1
ssh 1
stored. 2
string 1
symlink 1
than 1
that 3
the 4
them. 1
there 1
this 3
to 9
use, 1
use. 1
useful 1
users 1
variable 1
variables 1
when 1
where 2
where, 1
written 1
정상적으로 출력된다.
궁금점이 있다면 댓글에 남겨주길 바란다.
RECENT COMMENT