【问题】spark运行python写的mapreduce任务,hadoop平台报错,java.net.ConnectException: 连接超时
问题:
用spark-submit以yarn-client方式提交任务,在集群的某些节点上的任务出现连接超时的错误,排查过各种情况后,确定在防火墙配置上出现问题。
原因:
我猜测是python程序启动后,作为Server,hadoop中资源调度是以java程序作为Client端访问,
Python启动的Server端需要接受localhost的client访问。
当你从一台linux主机向自身发送数据包时,实际上的数据包是通过虚拟的lo接口来发送接受的,而不会通过你的物理网卡 eth0/eth2....,此时防火墙就要允许来自本地lo接口的数据包,需要加入以下配置允许Python Server接受来自本地lo接口的数据包,然后解决该问题。
iptables-AINPUT-ilo-jACCEPT添加iptables规则,允许来自于lo接口的数据包
任务的部分报错日志
16/07/2513:56:44INFOlzo.LzoCodec:Successfullyloaded&initializednative-lzolibrary[hadoop-lzorevd62701d4d05dfa6115bbaf8d9dff002df142e62d]16/07/2513:56:44INFOConfiguration.deprecation:mapred.tip.idisdeprecated.Instead,usemapreduce.task.id16/07/2513:56:44INFOConfiguration.deprecation:mapred.task.idisdeprecated.Instead,usemapreduce.task.attempt.id16/07/2513:56:44INFOConfiguration.deprecation:mapred.task.is.mapisdeprecated.Instead,usemapreduce.task.ismap16/07/2513:56:44INFOConfiguration.deprecation:mapred.task.partitionisdeprecated.Instead,usemapreduce.task.partition16/07/2513:56:44INFOConfiguration.deprecation:mapred.job.idisdeprecated.Instead,usemapreduce.job.id16/07/2513:57:47WARNpython.PythonWorkerFactory:FailedtoopensockettoPythondaemon:java.net.ConnectException:连接超时atjava.net.PlainSocketImpl.socketConnect(NativeMethod)atjava.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)atjava.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)atjava.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)atjava.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)atjava.net.Socket.connect(Socket.java:579)atjava.net.Socket.connect(Socket.java:528)atjava.net.Socket.<init>(Socket.java:425)atjava.net.Socket.<init>(Socket.java:241)atorg.apache.spark.api.python.PythonWorkerFactory.createSocket$1(PythonWorkerFactory.scala:75)atorg.apache.spark.api.python.PythonWorkerFactory.liftedTree1$1(PythonWorkerFactory.scala:90)atorg.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:89)atorg.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:62)atorg.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:135)atorg.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:101)atorg.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)atorg.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)atorg.apache.spark.rdd.RDD.iterator(RDD.scala:264)atorg.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:342)atorg.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)atorg.apache.spark.rdd.RDD.iterator(RDD.scala:264)atorg.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)atorg.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)atorg.apache.spark.scheduler.Task.run(Task.scala:88)atorg.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)atjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)atjava.lang.Thread.run(Thread.java:745)16/07/2513:57:47WARNpython.PythonWorkerFactory:Assumingthatdaemonunexpectedlyquit,attemptingtorestart16/07/2513:58:51ERRORexecutor.Executor:Exceptionintask0.0instage0.0(TID0)
参考地址:
http://stackoverflow.com/questions/15659132/connection-refused-between-a-python-server-and-a-java-client
http://stackoverflow.com/questions/26297551/connecting-python-and-java-via-sockets/38605208#38605208
http://www.zybang.com/question/9ab66451988eb2768194817f25a0b7a9.html
声明:本站所有文章资源内容,如无特殊说明或标注,均为采集网络资源。如若本站内容侵犯了原著者的合法权益,可联系本站删除。