Working with Hadoop via the Command Line Running a MapReduce Job

$ cd /srv/hadoop/share/hadoop/mapreduce/

$ hadoop jar hadoop-mapreduce-examples-2.5.2.jar wordcount shakespeare/input shakespeare/output
Error creating temp dir in hadoop.tmp.dir /var/app/hadoop/data due to Permission denied

$ sudo chmod -R 771 /var/app/hadoop/

$ ls -la /var/app/hadoop/
drwxrwx--x 4 hadoop hadoop 4096 Jan  7  2015 data

$ hadoop jar hadoop-mapreduce-examples-2.5.2.jar wordcount shakespeare/input shakespeare/output

15/07/25 09:41:36 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8050
org.apache.hadoop.security.AccessControlException: Permission denied: user=student, access=EXECUTE, inode="/tmp":hadoop:supergroup:drwxrwx---

$ hadoop fs -ls /tmp

ls: Permission denied: user=student, access=READ_EXECUTE, inode="/tmp":hadoop:supergroup:drwxrwx---

$ sudo su - hadoop

Прав 771 не хватило

$ hadoop fs -chmod -R 777 /tmp
$ exit

$ hadoop jar hadoop-mapreduce-examples-2.5.2.jar wordcount shakespeare/input shakespeare/output


15/07/25 09:49:44 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8050
15/07/25 09:49:44 INFO input.FileInputFormat: Total input paths to process : 1
15/07/25 09:49:45 INFO mapreduce.JobSubmitter: number of splits:1
15/07/25 09:49:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1437841736026_0001
15/07/25 09:49:47 INFO impl.YarnClientImpl: Submitted application application_1437841736026_0001
15/07/25 09:49:47 INFO mapreduce.Job: The url to track the job: http://hadoop:8088/proxy/application_1437841736026_0001/
15/07/25 09:49:47 INFO mapreduce.Job: Running job: job_1437841736026_0001
15/07/25 09:49:59 INFO mapreduce.Job: Job job_1437841736026_0001 running in uber mode : false
15/07/25 09:49:59 INFO mapreduce.Job:  map 0% reduce 0%
15/07/25 09:50:10 INFO mapreduce.Job:  map 100% reduce 0%
15/07/25 09:50:20 INFO mapreduce.Job:  map 100% reduce 100%
15/07/25 09:50:20 INFO mapreduce.Job: Job job_1437841736026_0001 completed successfully
15/07/25 09:50:20 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=483860
		FILE: Number of bytes written=1161747
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=4538656
		HDFS: Number of bytes written=356409
		HDFS: Number of read operations=6
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters
		Launched map tasks=1
		Launched reduce tasks=1
		Data-local map tasks=1
		Total time spent by all maps in occupied slots (ms)=8538
		Total time spent by all reduces in occupied slots (ms)=7626
		Total time spent by all map tasks (ms)=8538
		Total time spent by all reduce tasks (ms)=7626
		Total vcore-seconds taken by all map tasks=8538
		Total vcore-seconds taken by all reduce tasks=7626
		Total megabyte-seconds taken by all map tasks=8742912
		Total megabyte-seconds taken by all reduce tasks=7809024
	Map-Reduce Framework
		Map input records=129107
		Map output records=980637
		Map output bytes=8406347
		Map output materialized bytes=483860
		Input split bytes=133
		Combine input records=980637
		Combine output records=33505
		Reduce input groups=33505
		Reduce shuffle bytes=483860
		Reduce input records=33505
		Reduce output records=33505
		Spilled Records=67010
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=144
		CPU time spent (ms)=4890
		Physical memory (bytes) snapshot=346832896
		Virtual memory (bytes) snapshot=1600458752
		Total committed heap usage (bytes)=222101504
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters
		Bytes Read=4538523
	File Output Format Counters
		Bytes Written=356409

http://192.168.1.11:8088/

Появилось какое-то приложение.

$ hadoop fs -ls shakespeare/output

Found 2 items
-rw-r--r--   1 student student          0 2015-07-25 09:50 shakespeare/output/_SUCCESS
-rw-r--r--   1 student student     356409 2015-07-25 09:50 shakespeare/output/part-r-00000

$ hadoop fs -cat shakespeare/output/part-r-00000

$ hadoop fs -copyToLocal shakespeare/output/part-r-00000 ~/shakespeare_output.txt

$ cd ~/
$ tail -n 20 shakespeare_output.txt

$ hadoop fs -getmerge shakespeare/output ~/shakespeare_output.txt