검색결과 리스트
WordCount에 해당되는 글 2건
- 2018.02.15 하둡 어플 만들기-1
- 2018.02.11 OSX에 Hadoop 설치하기 (ver 2)
지난 시간동안 우리는 하둡을 설치와 예제 실현을 통해 정상적으로 인스톨됨을 확인한 바가 있다.
2018/02/10 - [프로그래밍발전소 ♫/Hadoop발전소♫] - OSX에 Hadoop 설치하기 (ver 1)
2018/02/11 - [분류 전체보기] - OSX에 Hadoop 설치하기 (ver 2)
이번에는 하둡용 어플을 만들어보기전에 우리가 지지고볶고 해야될 데이터가 필요하기 때문에
데이터를 다운받아보도록합시다.
stat-computing.org에서 Download the Data를 클릭하시고~
그럼 1987~2008까지의 데이터들을 다운받아야겠죠? 허나 수기로 일일이 다운받는다면..
무슨 프로그래밍하는 사람의로써의 의미가 있겠습니까..
저번에 써먹었던 wget 명령어를 활용해 쉘을 만들어봅시다.
1 #!/bin/sh
2
3 for((i=1987; i<=2009;i++))
4 do
5 wget http://stat-computing.org/dataexpo/2009/${i}.csv.bz2
6 done
~
~
~
~
위에같이 for문으로 1987~2009까지 지정해주고 wget메소드를 활용하고 중간에 파라미터를 집어넣어서 저 쉘을 실행시키면
설정해준 기간까지의 데이터를 다운받게끔 진행합시다. 실행하기전에 chmod 777로 실행권한을 줘야합니다.
==================================================================================================
~
LoganLeeui-MacBook-Pro:hadoop Logan$ ./download.sh
--2018-02-11 14:54:05-- http://stat-computing.org/dataexpo/2009/1987.csv.bz2
Resolving stat-computing.org... 54.231.168.223
Connecting to stat-computing.org|54.231.168.223|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 12652442 (12M) [application/x-bzip2]
Saving to: '1987.csv.bz2'
1987.csv.bz2 100%[============================>] 12.07M 18.1MB/s in 0.7s
2018-02-11 14:54:07 (18.1 MB/s) - '1987.csv.bz2' saved [12652442/12652442]
--2018-02-11 14:54:07-- http://stat-computing.org/dataexpo/2009/1988.csv.bz2
Resolving stat-computing.org... 54.231.168.223
Connecting to stat-computing.org|54.231.168.223|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 49499025 (47M) [application/x-bzip2]
Saving to: '1988.csv.bz2'
1988.csv.bz2 100%[============================>] 47.21M 305KB/s in 69s
2018-02-11 14:55:16 (701 KB/s) - '1988.csv.bz2' saved [49499025/49499025]
--2018-02-11 14:55:16-- http://stat-computing.org/dataexpo/2009/1989.csv.bz2
Resolving stat-computing.org... 52.218.200.91
Connecting to stat-computing.org|52.218.200.91|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 49202298 (47M) [application/x-bzip2]
Saving to: '1989.csv.bz2'
1989.csv.bz2 100%[============================>] 46.92M 712KB/s in 32s
2018-02-11 14:55:49 (1.45 MB/s) - '1989.csv.bz2' saved [49202298/49202298]
--2018-02-11 14:55:49-- http://stat-computing.org/dataexpo/2009/1990.csv.bz2
Resolving stat-computing.org... 52.218.208.107
Connecting to stat-computing.org|52.218.208.107|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 52041322 (50M) [application/x-bzip2]
Saving to: '1990.csv.bz2'
1990.csv.bz2 100%[============================>] 49.63M 1.79MB/s in 21s
2018-02-11 14:56:11 (2.36 MB/s) - '1990.csv.bz2' saved [52041322/52041322]
--2018-02-11 14:56:11-- http://stat-computing.org/dataexpo/2009/1991.csv.bz2
Resolving stat-computing.org... 54.231.168.215
Connecting to stat-computing.org|54.231.168.215|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 49877448 (48M) [application/x-bzip2]
Saving to: '1991.csv.bz2'
1991.csv.bz2 100%[============================>] 47.57M 555KB/s in 71s
2018-02-11 14:57:23 (686 KB/s) - '1991.csv.bz2' saved [49877448/49877448]
--2018-02-11 14:57:23-- http://stat-computing.org/dataexpo/2009/1992.csv.bz2
Resolving stat-computing.org... 52.218.201.35
Connecting to stat-computing.org|52.218.201.35|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 50040946 (48M) [application/x-bzip2]
Saving to: '1992.csv.bz2'
1992.csv.bz2 100%[============================>] 47.72M 1.67MB/s in 23s
2018-02-11 14:57:46 (2.08 MB/s) - '1992.csv.bz2' saved [50040946/50040946]
--2018-02-11 14:57:46-- http://stat-computing.org/dataexpo/2009/1993.csv.bz2
Resolving stat-computing.org... 54.231.168.167
Connecting to stat-computing.org|54.231.168.167|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 50111774 (48M) [application/x-bzip2]
Saving to: '1993.csv.bz2'
1993.csv.bz2 100%[============================>] 47.79M 716KB/s in 68s
2018-02-11 14:58:55 (722 KB/s) - '1993.csv.bz2' saved [50111774/50111774]
--2018-02-11 14:58:55-- http://stat-computing.org/dataexpo/2009/1994.csv.bz2
Resolving stat-computing.org... 52.218.193.179
Connecting to stat-computing.org|52.218.193.179|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 51123887 (49M) [application/x-bzip2]
Saving to: '1994.csv.bz2'
1994.csv.bz2 100%[============================>] 48.75M 761KB/s in 27s
2018-02-11 14:59:22 (1.80 MB/s) - '1994.csv.bz2' saved [51123887/51123887]
--2018-02-11 14:59:23-- http://stat-computing.org/dataexpo/2009/1995.csv.bz2
Resolving stat-computing.org... 54.231.184.167
Connecting to stat-computing.org|54.231.184.167|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 74881752 (71M) [application/x-bzip2]
Saving to: '1995.csv.bz2'
1995.csv.bz2 100%[============================>] 71.41M 938KB/s in 87s
2018-02-11 15:00:51 (839 KB/s) - '1995.csv.bz2' saved [74881752/74881752]
--2018-02-11 15:00:51-- http://stat-computing.org/dataexpo/2009/1996.csv.bz2
Resolving stat-computing.org... 52.218.200.99
Connecting to stat-computing.org|52.218.200.99|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75887707 (72M) [application/x-bzip2]
Saving to: '1996.csv.bz2'
1996.csv.bz2 100%[============================>] 72.37M 2.15MB/s in 56s
2018-02-11 15:01:48 (1.28 MB/s) - '1996.csv.bz2' saved [75887707/75887707]
--2018-02-11 15:01:48-- http://stat-computing.org/dataexpo/2009/1997.csv.bz2
Resolving stat-computing.org... 52.218.192.235
Connecting to stat-computing.org|52.218.192.235|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 76705687 (73M) [application/x-bzip2]
Saving to: '1997.csv.bz2'
1997.csv.bz2 100%[============================>] 73.15M 976KB/s in 39s
2018-02-11 15:02:27 (1.87 MB/s) - '1997.csv.bz2' saved [76705687/76705687]
--2018-02-11 15:02:27-- http://stat-computing.org/dataexpo/2009/1998.csv.bz2
Resolving stat-computing.org... 52.218.128.15
Connecting to stat-computing.org|52.218.128.15|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 76683506 (73M) [application/x-bzip2]
Saving to: '1998.csv.bz2'
1998.csv.bz2 100%[============================>] 73.13M 249KB/s in 2m 8s
2018-02-11 15:04:36 (585 KB/s) - '1998.csv.bz2' saved [76683506/76683506]
--2018-02-11 15:04:36-- http://stat-computing.org/dataexpo/2009/1999.csv.bz2
Resolving stat-computing.org... 52.218.193.219
Connecting to stat-computing.org|52.218.193.219|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 79449438 (76M) [application/x-bzip2]
Saving to: '1999.csv.bz2'
1999.csv.bz2 100%[============================>] 75.77M 1.59MB/s in 53s
2018-02-11 15:05:30 (1.43 MB/s) - '1999.csv.bz2' saved [79449438/79449438]
--2018-02-11 15:05:30-- http://stat-computing.org/dataexpo/2009/2000.csv.bz2
Resolving stat-computing.org... 52.218.192.211
Connecting to stat-computing.org|52.218.192.211|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 82537924 (79M) [application/x-bzip2]
Saving to: '2000.csv.bz2'
2000.csv.bz2 100%[============================>] 78.71M 2.95MB/s in 62s
2018-02-11 15:06:32 (1.28 MB/s) - '2000.csv.bz2' saved [82537924/82537924]
--2018-02-11 15:06:32-- http://stat-computing.org/dataexpo/2009/2001.csv.bz2
Resolving stat-computing.org... 52.218.144.59
Connecting to stat-computing.org|52.218.144.59|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 83478700 (80M) [application/x-bzip2]
Saving to: '2001.csv.bz2'
2001.csv.bz2 100%[============================>] 79.61M 539KB/s in 5m 14s
2018-02-11 15:11:47 (259 KB/s) - '2001.csv.bz2' saved [83478700/83478700]
--2018-02-11 15:11:47-- http://stat-computing.org/dataexpo/2009/2002.csv.bz2
Resolving stat-computing.org... 54.231.168.163
Connecting to stat-computing.org|54.231.168.163|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75907218 (72M) [application/x-bzip2]
Saving to: '2002.csv.bz2'
2002.csv.bz2 100%[============================>] 72.39M 778KB/s in 2m 12s
2018-02-11 15:14:00 (560 KB/s) - '2002.csv.bz2' saved [75907218/75907218]
--2018-02-11 15:14:00-- http://stat-computing.org/dataexpo/2009/2003.csv.bz2
Resolving stat-computing.org... 52.218.192.59
Connecting to stat-computing.org|52.218.192.59|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 95326801 (91M) [application/x-bzip2]
Saving to: '2003.csv.bz2'
2003.csv.bz2 66%[==================> ] 60.82M 1.14MB/s eta 14s
// 보시는것과 같이 다운로드가 진행되고있음을 알 수 있습니다. 압축형식은 bz2 형식이네요/.
==================================================================================================
이제 압축을 풀어봅시다.
bzip2 -d *.bz2
LoganLeeui-MacBook-Pro:original Logan$ ls
1987.csv 1990.csv 1993.csv 1996.csv 1999.csv 2002.csv 2005.csv 2008.csv
1988.csv 1991.csv 1994.csv 1997.csv 2000.csv 2003.csv 2006.csv
1989.csv 1992.csv 1995.csv 1998.csv 2001.csv 2004.csv 2007.csv
head -n 10 1987.csv
Year,Month,DayofMonth,DayOfWeek,DepTime,CRSDepTime,ArrTime,CRSArrTime,UniqueCarrier,FlightNum,TailNum,ActualElapsedTime,CRSElapsedTime,AirTime,ArrDelay,DepDelay,Origin,Dest,Distance,TaxiIn,TaxiOut,Cancelled,CancellationCode,Diverted,CarrierDelay,WeatherDelay,NASDelay,SecurityDelay,LateAircraftDelay
1987,10,14,3,741,730,912,849,PS,1451,NA,91,79,NA,23,11,SAN,SFO,447,NA,NA,0,NA,0,NA,NA,NA,NA,NA
1987,10,15,4,729,730,903,849,PS,1451,NA,94,79,NA,14,-1,SAN,SFO,447,NA,NA,0,NA,0,NA,NA,NA,NA,NA
1987,10,17,6,741,730,918,849,PS,1451,NA,97,79,NA,29,11,SAN,SFO,447,NA,NA,0,NA,0,NA,NA,NA,NA,NA
1987,10,18,7,729,730,847,849,PS,1451,NA,78,79,NA,-2,-1,SAN,SFO,447,NA,NA,0,NA,0,NA,NA,NA,NA,NA
1987,10,19,1,749,730,922,849,PS,1451,NA,93,79,NA,33,19,SAN,SFO,447,NA,NA,0,NA,0,NA,NA,NA,NA,NA
1987,10,21,3,728,730,848,849,PS,1451,NA,80,79,NA,-1,-2,SAN,SFO,447,NA,NA,0,NA,0,NA,NA,NA,NA,NA
1987,10,22,4,728,730,852,849,PS,1451,NA,84,79,NA,3,-2,SAN,SFO,447,NA,NA,0,NA,0,NA,NA,NA,NA,NA
1987,10,23,5,731,730,902,849,PS,1451,NA,91,79,NA,13,1,SAN,SFO,447,NA,NA,0,NA,0,NA,NA,NA,NA,NA
1987,10,24,6,744,730,908,849,PS,1451,NA,84,79,NA,19,14,SAN,SFO,447,NA,NA,0,NA,0,NA,NA,NA,NA,NA
1 #!/bin/sh
2
3 for((i=1987; i<=2008; i++))
4 do
5 sed -e '1d' ${i}.csv > ${i}_modi.csv
6 done
ls
1987.csv 1991.csv 1995.csv 1999.csv 2003.csv 2007.csv
1987_modi.csv 1991_modi.csv 1995_modi.csv 1999_modi.csv 2003_modi.csv 2007_modi.csv
1988.csv 1992.csv 1996.csv 2000.csv 2004.csv 2008.csv
1988_modi.csv 1992_modi.csv 1996_modi.csv 2000_modi.csv 2004_modi.csv 2008_modi.csv
1989.csv 1993.csv 1997.csv 2001.csv 2005.csv sed.sh
1989_modi.csv 1993_modi.csv 1997_modi.csv 2001_modi.csv 2005_modi.csv
1990.csv 1994.csv 1998.csv 2002.csv 2006.csv
1990_modi.csv 1994_modi.csv 1998_modi.csv 2002_modi.csv 2006_modi.csv
ls
1987_modi.csv 1991_modi.csv 1995_modi.csv 1999_modi.csv 2003_modi.csv 2007_modi.csv
1988_modi.csv 1992_modi.csv 1996_modi.csv 2000_modi.csv 2004_modi.csv 2008_modi.csv
1989_modi.csv 1993_modi.csv 1997_modi.csv 2001_modi.csv 2005_modi.csv
1990_modi.csv 1994_modi.csv 1998_modi.csv 2002_modi.csv 2006_modi.csv
hadoop fs -mkdir input
fs -put ~/dataexpo/original/*.csv input
LoganLeeui-MacBook-Pro:~ Logan$ hadoop fs -ls
Found 4 items
drwxr-xr-x - Logan supergroup 0 2018-02-10 19:52 /user/Logan/conf
drwxr-xr-x - Logan supergroup 0 2018-02-14 03:07 /user/Logan/input
drwxr-xr-x - Logan supergroup 0 2018-02-10 19:52 /user/Logan/output_
drwxr-xr-x - Logan supergroup 0 2018-02-11 09:38 /user/Logan/output_2
==================================================================================================
hadoop fs -lsr
drwxr-xr-x - Logan supergroup 0 2018-02-15 15:01 /user/Logan/input
-rw-r--r-- 1 Logan supergroup 127162642 2018-02-15 15:00 /user/Logan/input/1987_modi.csv
-rw-r--r-- 1 Logan supergroup 501039172 2018-02-15 15:00 /user/Logan/input/1988_modi.csv
-rw-r--r-- 1 Logan supergroup 486518521 2018-02-15 15:00 /user/Logan/input/1989_modi.csv
-rw-r--r-- 1 Logan supergroup 509194387 2018-02-15 15:00 /user/Logan/input/1990_modi.csv
-rw-r--r-- 1 Logan supergroup 491209793 2018-02-15 15:00 /user/Logan/input/1991_modi.csv
-rw-r--r-- 1 Logan supergroup 492313431 2018-02-15 15:00 /user/Logan/input/1992_modi.csv
-rw-r--r-- 1 Logan supergroup 490753352 2018-02-15 15:00 /user/Logan/input/1993_modi.csv
-rw-r--r-- 1 Logan supergroup 501558365 2018-02-15 15:00 /user/Logan/input/1994_modi.csv
-rw-r--r-- 1 Logan supergroup 530751268 2018-02-15 15:00 /user/Logan/input/1995_modi.csv
-rw-r--r-- 1 Logan supergroup 533922063 2018-02-15 15:00 /user/Logan/input/1996_modi.csv
-rw-r--r-- 1 Logan supergroup 540347561 2018-02-15 15:00 /user/Logan/input/1997_modi.csv
-rw-r--r-- 1 Logan supergroup 538432575 2018-02-15 15:00 /user/Logan/input/1998_modi.csv
-rw-r--r-- 1 Logan supergroup 552925722 2018-02-15 15:00 /user/Logan/input/1999_modi.csv
-rw-r--r-- 1 Logan supergroup 570151313 2018-02-15 15:00 /user/Logan/input/2000_modi.csv
-rw-r--r-- 1 Logan supergroup 600411162 2018-02-15 15:00 /user/Logan/input/2001_modi.csv
-rw-r--r-- 1 Logan supergroup 530506713 2018-02-15 15:01 /user/Logan/input/2002_modi.csv
-rw-r--r-- 1 Logan supergroup 626744942 2018-02-15 15:01 /user/Logan/input/2003_modi.csv
-rw-r--r-- 1 Logan supergroup 669878813 2018-02-15 15:01 /user/Logan/input/2004_modi.csv
-rw-r--r-- 1 Logan supergroup 671026965 2018-02-15 15:01 /user/Logan/input/2005_modi.csv
-rw-r--r-- 1 Logan supergroup 672067796 2018-02-15 15:01 /user/Logan/input/2006_modi.csv
-rw-r--r-- 1 Logan supergroup 702877893 2018-02-15 15:01 /user/Logan/input/2007_modi.csv
-rw-r--r-- 1 Logan supergroup 689413044 2018-02-15 15:01 /user/Logan/input/2008_modi.csv
-lsr로 확인해보고 이상이 없으면 클리어!
OSX에 Hadoop 설치하기 (ver 1) (0) | 2018.02.10 |
---|---|
가상머신 VM에 리눅스를 설치해보자. 설치편 (0) | 2017.11.28 |
맥북에 하둡 설치하기. (하둡2) (0) | 2017.11.21 |
hdfs에 hadoop-evn.sh을 넣어놓고 wordcount 예제를 돌려보자.
LoganLeeui-MacBook-Pro:hadoop Logan$ hadoop jar hadoop-examples-1.2.1.jar wordcount conf/hadoop-env.sh output
18/02/10 03:25:11 INFO input.FileInputFormat: Total input paths to process : 1
18/02/10 03:25:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/02/10 03:25:11 WARN snappy.LoadSnappy: Snappy native library not loaded
18/02/10 03:25:11 INFO mapred.JobClient: Running job: job_201802100324_0001
18/02/10 03:25:12 INFO mapred.JobClient: map 0% reduce 0%
18/02/10 03:25:15 INFO mapred.JobClient: map 100% reduce 0%
18/02/10 03:32:44 INFO mapred.JobClient: Task Id : attempt_201802100324_0001_r_000000_0, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
18/02/10 03:32:44 WARN mapred.JobClient: Error reading task outputConnection refused (Connection refused)
18/02/10 03:32:44 WARN mapred.JobClient: Error reading task outputConnection refused (Connection refused)
18/02/10 03:40:14 INFO mapred.JobClient: Task Id : attempt_201802100324_0001_r_000000_1, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
18/02/10 03:40:14 WARN mapred.JobClient: Error reading task outputConnection refused (Connection refused)
18/02/10 03:40:14 WARN mapred.JobClient: Error reading task outputConnection refused (Connection refused)
18/02/10 03:47:45 INFO mapred.JobClient: Task Id : attempt_201802100324_0001_r_000000_2, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
18/02/10 03:47:45 WARN mapred.JobClient: Error reading task outputConnection refused (Connection refused)
18/02/10 03:47:45 WARN mapred.JobClient: Error reading task outputConnection refused (Connection refused)
18/02/10 03:55:18 INFO mapred.JobClient: Job complete: job_201802100324_0001
18/02/10 03:55:18 INFO mapred.JobClient: Counters: 20
18/02/10 03:55:18 INFO mapred.JobClient: Map-Reduce Framework
18/02/10 03:55:18 INFO mapred.JobClient: Combine output records=178
18/02/10 03:55:18 INFO mapred.JobClient: Spilled Records=178
18/02/10 03:55:18 INFO mapred.JobClient: Map output materialized bytes=3151
18/02/10 03:55:18 INFO mapred.JobClient: Map input records=62
18/02/10 03:55:18 INFO mapred.JobClient: SPLIT_RAW_BYTES=116
18/02/10 03:55:18 INFO mapred.JobClient: Map output records=306
18/02/10 03:55:18 INFO mapred.JobClient: Map output bytes=3856
18/02/10 03:55:18 INFO mapred.JobClient: Combine input records=306
18/02/10 03:55:18 INFO mapred.JobClient: Total committed heap usage (bytes)=179306496
18/02/10 03:55:18 INFO mapred.JobClient: File Input Format Counters
18/02/10 03:55:18 INFO mapred.JobClient: Bytes Read=2676
18/02/10 03:55:18 INFO mapred.JobClient: FileSystemCounters
18/02/10 03:55:18 INFO mapred.JobClient: HDFS_BYTES_READ=2792
18/02/10 03:55:18 INFO mapred.JobClient: FILE_BYTES_WRITTEN=60256
18/02/10 03:55:18 INFO mapred.JobClient: Job Counters
18/02/10 03:55:18 INFO mapred.JobClient: Launched map tasks=1
18/02/10 03:55:18 INFO mapred.JobClient: Launched reduce tasks=4
18/02/10 03:55:18 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=7072
18/02/10 03:55:18 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
18/02/10 03:55:18 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=3591
18/02/10 03:55:18 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
18/02/10 03:55:18 INFO mapred.JobClient: Failed reduce tasks=1
18/02/10 03:55:18 INFO mapred.JobClient: Data-local map tasks=1
저번편을 참고하면, 마지막에 wordcount예제를 돌릴려는 도중에 에러가 발생했었다.
Task Id : attempt_201802100324_0001_r_000000_0, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
이 에러는 보통 쓰레드를 초과했을 때 나는거라고 하는데 용량이 얼마하지도않는 hadoop-env.sh에서 쓰레드 초과가 나는건 아니니,
말이 안되는것이고
WARN mapred.JobClient: Error reading task outputConnection refused (Connection refused)
문제점은 이놈이 확실해 보인다. 흔히들 말하는 map 100% reduce 0% 에러인데,
mapper에서 reducer로 처리된 데이터를 보내는 과정 중에 shuffle이라는 과정을 거치게되는데
이때 리듀서와 통신 에러가 나는 것이다.
그럼 뭐가 문제인걸까...ssh localhost로 해보았을 때, 문제가 없다...
그렇다면 블로깅하면서 설정파일을 내가 만졌을까? 그건 아니지싶다.
무언가가 엉킨것같은데 모르겠다. 고로 필자는 하둡을 지웠다.
hadoop-data도 지웠고
추후 설정에서는 hadoop-data폴더를 hadoop-1.2.1폴더안에 넣었다.
그리고 설치편에서 진행했던 과정을 모두 진행하고 wordcount를 실행해보았다.
LoganLeeui-MacBook-Pro:hadoop Logan$ hadoop jar hadoop-examples-1.2.1.jar wordcount conf/hadoop-env.sh output_
18/02/10 19:52:25 INFO input.FileInputFormat: Total input paths to process : 1
18/02/10 19:52:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/02/10 19:52:25 WARN snappy.LoadSnappy: Snappy native library not loaded
18/02/10 19:52:25 INFO mapred.JobClient: Running job: job_201802101951_0001
18/02/10 19:52:26 INFO mapred.JobClient: map 0% reduce 0%
18/02/10 19:52:29 INFO mapred.JobClient: map 100% reduce 0%
18/02/10 19:52:36 INFO mapred.JobClient: map 100% reduce 33%
18/02/10 19:52:38 INFO mapred.JobClient: map 100% reduce 100%
18/02/10 19:52:38 INFO mapred.JobClient: Job complete: job_201802101951_0001
18/02/10 19:52:38 INFO mapred.JobClient: Counters: 26
18/02/10 19:52:38 INFO mapred.JobClient: Map-Reduce Framework
18/02/10 19:52:38 INFO mapred.JobClient: Spilled Records=346
18/02/10 19:52:38 INFO mapred.JobClient: Map output materialized bytes=2979
18/02/10 19:52:38 INFO mapred.JobClient: Reduce input records=173
18/02/10 19:52:38 INFO mapred.JobClient: Map input records=59
18/02/10 19:52:38 INFO mapred.JobClient: SPLIT_RAW_BYTES=116
18/02/10 19:52:38 INFO mapred.JobClient: Map output bytes=3700
18/02/10 19:52:38 INFO mapred.JobClient: Reduce shuffle bytes=2979
18/02/10 19:52:38 INFO mapred.JobClient: Reduce input groups=173
18/02/10 19:52:38 INFO mapred.JobClient: Combine output records=173
18/02/10 19:52:38 INFO mapred.JobClient: Reduce output records=173
18/02/10 19:52:38 INFO mapred.JobClient: Map output records=302
18/02/10 19:52:38 INFO mapred.JobClient: Combine input records=302
18/02/10 19:52:38 INFO mapred.JobClient: Total committed heap usage (bytes)=308281344
18/02/10 19:52:38 INFO mapred.JobClient: File Input Format Counters
18/02/10 19:52:38 INFO mapred.JobClient: Bytes Read=2532
18/02/10 19:52:38 INFO mapred.JobClient: FileSystemCounters
18/02/10 19:52:38 INFO mapred.JobClient: HDFS_BYTES_READ=2648
18/02/10 19:52:38 INFO mapred.JobClient: FILE_BYTES_WRITTEN=127735
18/02/10 19:52:38 INFO mapred.JobClient: FILE_BYTES_READ=2979
18/02/10 19:52:38 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=2283
18/02/10 19:52:38 INFO mapred.JobClient: Job Counters
18/02/10 19:52:38 INFO mapred.JobClient: Launched map tasks=1
18/02/10 19:52:38 INFO mapred.JobClient: Launched reduce tasks=1
18/02/10 19:52:38 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=8086
18/02/10 19:52:38 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
18/02/10 19:52:38 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=3033
18/02/10 19:52:38 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
18/02/10 19:52:38 INFO mapred.JobClient: Data-local map tasks=1
18/02/10 19:52:38 INFO mapred.JobClient: File Output Format Counters
18/02/10 19:52:38 INFO mapred.JobClient: Bytes Written=2283
LoganLeeui-MacBook-Pro:hadoop Logan$ hadoop jar hadoop-examples-1.2.1.jar wordcount conf/hadoop-env.sh output_2
18/02/11 09:37:54 INFO input.FileInputFormat: Total input paths to process : 1
18/02/11 09:37:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/02/11 09:37:54 WARN snappy.LoadSnappy: Snappy native library not loaded
18/02/11 09:37:54 INFO mapred.JobClient: Running job: job_201802101951_0002
18/02/11 09:37:55 INFO mapred.JobClient: map 0% reduce 0%
18/02/11 09:37:58 INFO mapred.JobClient: map 100% reduce 0%
18/02/11 09:38:05 INFO mapred.JobClient: map 100% reduce 33%
18/02/11 09:38:06 INFO mapred.JobClient: map 100% reduce 100%
18/02/11 09:38:08 INFO mapred.JobClient: Job complete: job_201802101951_0002
18/02/11 09:38:08 INFO mapred.JobClient: Counters: 26
18/02/11 09:38:08 INFO mapred.JobClient: Map-Reduce Framework
18/02/11 09:38:08 INFO mapred.JobClient: Spilled Records=346
18/02/11 09:38:08 INFO mapred.JobClient: Map output materialized bytes=2979
18/02/11 09:38:08 INFO mapred.JobClient: Reduce input records=173
18/02/11 09:38:08 INFO mapred.JobClient: Map input records=59
18/02/11 09:38:08 INFO mapred.JobClient: SPLIT_RAW_BYTES=116
18/02/11 09:38:08 INFO mapred.JobClient: Map output bytes=3700
18/02/11 09:38:08 INFO mapred.JobClient: Reduce shuffle bytes=2979
18/02/11 09:38:08 INFO mapred.JobClient: Reduce input groups=173
18/02/11 09:38:08 INFO mapred.JobClient: Combine output records=173
18/02/11 09:38:08 INFO mapred.JobClient: Reduce output records=173
18/02/11 09:38:08 INFO mapred.JobClient: Map output records=302
18/02/11 09:38:08 INFO mapred.JobClient: Combine input records=302
18/02/11 09:38:08 INFO mapred.JobClient: Total committed heap usage (bytes)=308281344
18/02/11 09:38:08 INFO mapred.JobClient: File Input Format Counters
18/02/11 09:38:08 INFO mapred.JobClient: Bytes Read=2532
18/02/11 09:38:08 INFO mapred.JobClient: FileSystemCounters
18/02/11 09:38:08 INFO mapred.JobClient: HDFS_BYTES_READ=2648
18/02/11 09:38:08 INFO mapred.JobClient: FILE_BYTES_WRITTEN=127733
18/02/11 09:38:08 INFO mapred.JobClient: FILE_BYTES_READ=2979
18/02/11 09:38:08 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=2283
18/02/11 09:38:08 INFO mapred.JobClient: Job Counters
18/02/11 09:38:08 INFO mapred.JobClient: Launched map tasks=1
18/02/11 09:38:08 INFO mapred.JobClient: Launched reduce tasks=1
18/02/11 09:38:08 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=8102
18/02/11 09:38:08 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
18/02/11 09:38:08 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=3325
18/02/11 09:38:08 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
18/02/11 09:38:08 INFO mapred.JobClient: Data-local map tasks=1
18/02/11 09:38:08 INFO mapred.JobClient: File Output Format Counters
18/02/11 09:38:08 INFO mapred.JobClient: Bytes Written=2283
결과는 드디어 성공이었다.
hadoop fs -lsr
drwxr-xr-x - Logan supergroup 0 2018-02-10 19:52 /user/Logan/conf
-rw-r--r-- 1 Logan supergroup 2532 2018-02-10 19:52 /user/Logan/conf/hadoop-env.sh
drwxr-xr-x - Logan supergroup 0 2018-02-10 19:52 /user/Logan/output_
-rw-r--r-- 1 Logan supergroup 0 2018-02-10 19:52 /user/Logan/output_/_SUCCESS
drwxr-xr-x - Logan supergroup 0 2018-02-10 19:52 /user/Logan/output_/_logs
drwxr-xr-x - Logan supergroup 0 2018-02-10 19:52 /user/Logan/output_/_logs/history
-rw-r--r-- 1 Logan supergroup 11699 2018-02-10 19:52 /user/Logan/output_/_logs/history/job_201802101951_0001_1518259945223_Logan_word+count
-rw-r--r-- 1 Logan supergroup 53374 2018-02-10 19:52 /user/Logan/output_/_logs/history/job_201802101951_0001_conf.xml
-rw-r--r-- 1 Logan supergroup 2283 2018-02-10 19:52 /user/Logan/output_/part-r-00000
hadoop fs -cat output_/part-r-00000
# 38
$HADOOP_BALANCER_OPTS" 1
$HADOOP_DATANODE_OPTS" 1
$HADOOP_HOME/conf/slaves 1
$HADOOP_HOME/logs 1
$HADOOP_JOBTRACKER_OPTS" 1
$HADOOP_NAMENODE_OPTS" 1
$HADOOP_SECONDARYNAMENODE_OPTS" 1
$USER 1
'man 1
(fs, 1
-o 1
/tmp 1
1000. 1
A 1
All 1
CLASSPATH 1
Command 1
ConnectTimeout=1 1
Default 1
Empty 2
Extra 3
File 1
HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote 1
HADOOP_CLASSPATH= 1
HADOOP_CLIENT_OPTS 1
HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote 1
HADOOP_HEAPSIZE=2000 1
HADOOP_HOME_WARN_SUPPRESS=1 1
HADOOP_IDENT_STRING=$USER 1
HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote 1
HADOOP_LOG_DIR=${HADOOP_HOME}/logs 1
HADOOP_MASTER=master:/home/$USER/src/hadoop 1
HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote 1
HADOOP_NICENESS=10 1
HADOOP_OPTS 1
HADOOP_OPTS=-server 1
HADOOP_PID_DIR=/var/hadoop/pids 1
HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote 1
HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves 1
HADOOP_SLAVE_SLEEP=0.1 1
HADOOP_SSH_OPTS="-o 1
HADOOP_TASKTRACKER_OPTS= 1
Hadoop-specific 1
JAVA_HOME 1
JAVA_HOME. 1
JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home 1
Java 2
MB. 1
NOTE: 1
Optional. 1
Otherwise 1
Required. 1
Seconds 1
See 1
SendEnv=HADOOP_CONF_DIR" 1
Set 1
Suppessing 1
The 6
This 1
Unset 2
Warning 1
When 1
Where 1
a 3
amount 1
appended 1
applies 1
are 4
arrive 1
attack. 1
be 4
best 1
between 1
by 9
can 4
clusters, 1
code 1
commands 1
commands. 1
configuration 1
correctly 1
daemon 1
daemons. 1
default. 8
defined 1
dfs, 1
directory 2
distcp 1
distributed 1
e.g., 1
elements. 1
environment 2
etc) 1
export 20
faster 1
file, 1
files 2
following 1
for 2
from. 1
fsck, 1
going 1
hadoop 2
hadoop. 1
heap 1
here. 1
host:path 1
hosts. 1
implementation 1
in 3
instance 1
is 5
it 2
java 1
large 1
log 1
master 1
maximum 1
multiple 1
naming 1
nice'. 1
nodes. 1
of 2
on 1
only 2
optional. 1
options 1
options. 2
others 1
otherwise 1
pid 1
potential 1
priority 1
processes. 1
remote 2
representing 1
required 1
rsync'd 1
rsyncs 1
run 1
running 1
runtime 1
scheduling 1
service 1
set 2
should 2
slave 3
sleep 1
so 1
specific 1
specified 1
ssh 1
stored. 2
string 1
symlink 1
than 1
that 3
the 4
them. 1
there 1
this 3
to 9
use, 1
use. 1
useful 1
users 1
variable 1
variables 1
when 1
where 2
where, 1
written 1
정상적으로 출력된다.
궁금점이 있다면 댓글에 남겨주길 바란다.
[물리저장소] 소닉 붐에 대한 오해를 알아보자.araboza (0) | 2018.03.01 |
---|---|
북한의 방공무기에 대하여 araboza! (0) | 2018.03.01 |
고대문명의 심볼에 대해 알아보자 - 2부 (1) | 2018.02.10 |
미녀들을 거쳐간 김국진의 사랑과 이별 (0) | 2018.02.10 |
고대문명의 심볼에 대해 알아보자 - 1부 (0) | 2018.02.10 |
RECENT COMMENT