Premise:
After writing the project code in the local idea, package and upload the whole project to the cluster for testing
Note that the input and output paths should be written correctly
Upload the two files to the HDFS cluster
# Upload Files hadoop fs -put /opt/module/hadoop_file/input/friends.txt /opt/module/hadoop_file/input # Delete Files hadoop fs -rm -f /opt/module/hadoop_file/input/friends.txt # Delete Folder hadoop fs -rm -r /opt/module/hadoop_file/input
Start Mr program for jar package test
# Run MR Programmer hadoop jar friends.jar com.lxz.friends.OneShareFriendsDriver
Problems encountered:
1
Error Message:INFO mapreduce.Job: Task Id : attempt_1629344910248_0009_m_000000_0, Status : FAILED Error: java.io.IOException: Initialization of all the collectors failed. Error in last collector was :interface javax.xml.soap.Text
The reason is that in your idea project, you rely on importing javax.xml.soap.text. You should import org.apache.hadoop.io.text
2
Error Message:INFO mapreduce.Job: Task Id : attempt_1607842602362_0032_m_000000_2, Status : FAIL
The reason is that there are spaces in your input file. Carefully check the writing format of the input file
Summary: MR is still a time-consuming and laborious program. The advantage is that as long as you write the program code and adjust the number of mapper and reducer resources, it is only a matter of time for the data to run out. Error reporting is not terrible. You must remember to check the log information in the logs folder under the current directory of Hadoop installation