Using Python HDFS module to operate Hadoop HDFS

Why can’t you grab tickets when you travel on holiday?Reveal the key technology of 12306 how to ensure the ticket is not oversold>>>

In the webhdfs operation, use the following code

from hdfs import InsecureClient
client = InsecureClient('http://host:port', user='ann')

Get the file list in the remote/tmp directory (this step is only from the namenode)

# Listing all files inside a directory.
list_content = client.list('/tmp')

Call the upload method when uploading a file

client.upload(remote_dir,local_dir,overwriten=True)

The following exception message appears

[E 180201 14:21:10 client:599] Error while uploading. Attempting cleanup.
    Traceback (most recent call last):
      File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\packages\urllib3\connection.py", line 141, in _new_conn
        (self.host, self.port), self.timeout, **extra_kw)
      File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\packages\urllib3\util\connection.py", line 61, in create_connection
        for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
      File "C:\Program Files (x86)\Python36-32\lib\socket.py", line 743, in getaddrinfo
        for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
    socket.gaierror: [Errno 11001] getaddrinfo failed

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "C:\Program Files (x86)\Python36-32\lib\site-packages\hdfs\client.py", line 594, in upload
        _upload(path_tuple)
      File "C:\Program Files (x86)\Python36-32\lib\site-packages\hdfs\client.py", line 524, in _upload
        self.write(_temp_path, wrap(reader, chunk_size, progress), **kwargs)
      File "C:\Program Files (x86)\Python36-32\lib\site-packages\hdfs\client.py", line 470, in write
        consumer(data)
      File "C:\Program Files (x86)\Python36-32\lib\site-packages\hdfs\client.py", line 464, in consumer
        data=(c.encode(encoding) for c in _data) if encoding else _data,
      File "C:\Program Files (x86)\Python36-32\lib\site-packages\hdfs\client.py", line 207, in _request
        **kwargs
      File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\sessions.py", line 488, in request
        resp = self.send(prep, **send_kwargs)
      File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\sessions.py", line 609, in send
        r = adapter.send(request, **kwargs)
      File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\adapters.py", line 441, in send
        low_conn.endheaders()
      File "C:\Program Files (x86)\Python36-32\lib\http\client.py", line 1234, in endheaders
        self._send_output(message_body, encode_chunked=encode_chunked)
      File "C:\Program Files (x86)\Python36-32\lib\http\client.py", line 1026, in _send_output
        self.send(msg)
      File "C:\Program Files (x86)\Python36-32\lib\http\client.py", line 964, in send
        self.connect()
      File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\packages\urllib3\connection.py", line 166, in connect
        conn = self._new_conn()
      File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\packages\urllib3\connection.py", line 150, in _new_conn
        self, "Failed to establish a new connection: %s" % e)
    requests.packages.urllib3.exceptions.NewConnectionError: <requests.packages.urllib3.connection.HTTPConnection object at 0x046C9BD0>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed
[I 180201 14:21:10 client:848] Deleting '/tmp/data1/tensorflow' recursively.
[E 180201 14:21:10 web:1548] Uncaught exception POST /api/job/create (127.0.0.1)
    HTTPServerRequest(protocol='http', host='localhost:8081', method='POST', uri='/api/job/create', version='HTTP/1.1', remote_ip='127.0.0.1', headers={'Connection': 'close', 'Cookie': 'Pycharm-c9b2eeaf=d1c21794-2128-4ae7-9a97-2f9a04f8749c', 'Content-Length': '34', 'Referer': 'http://localhost:8082/', 'Content-Type': 'application/json;charset=utf-8', 'Accept-Encoding': 'gzip, deflate', 'Accept-Language': 'zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3', 'Accept': 'application/json, text/plain, */*', 'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64; rv:44.0) Gecko/20100101 Firefox/44.0', 'Host': 'localhost:8081'})
    Traceback (most recent call last):
      File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\packages\urllib3\connection.py", line 141, in _new_conn
        (self.host, self.port), self.timeout, **extra_kw)
      File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\packages\urllib3\util\connection.py", line 61, in create_connection
        for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
      File "C:\Program Files (x86)\Python36-32\lib\socket.py", line 743, in getaddrinfo
        for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
    socket.gaierror: [Errno 11001] getaddrinfo failed

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "C:\Program Files (x86)\Python36-32\lib\site-packages\tornado\web.py", line 1469, in _execute
        result = yield result
      File "C:\Program Files (x86)\Python36-32\lib\site-packages\tornado\gen.py", line 1015, in run
        value = future.result()
      File "C:\Program Files (x86)\Python36-32\lib\site-packages\tornado\concurrent.py", line 237, in result
        raise_exc_info(self._exc_info)
      File "<string>", line 3, in raise_exc_info
      File "C:\Program Files (x86)\Python36-32\lib\site-packages\tornado\gen.py", line 1024, in run
        yielded = self.gen.send(value)
      File "app.py", line 79, in post
        hdfs_client.upload(remote_hdfs_model_dir,model_dir,overwrite=True)
      File "C:\Program Files (x86)\Python36-32\lib\site-packages\hdfs\client.py", line 605, in upload
        raise err
      File "C:\Program Files (x86)\Python36-32\lib\site-packages\hdfs\client.py", line 594, in upload
        _upload(path_tuple)
      File "C:\Program Files (x86)\Python36-32\lib\site-packages\hdfs\client.py", line 524, in _upload
        self.write(_temp_path, wrap(reader, chunk_size, progress), **kwargs)
      File "C:\Program Files (x86)\Python36-32\lib\site-packages\hdfs\client.py", line 470, in write
        consumer(data)
      File "C:\Program Files (x86)\Python36-32\lib\site-packages\hdfs\client.py", line 464, in consumer
        data=(c.encode(encoding) for c in _data) if encoding else _data,
      File "C:\Program Files (x86)\Python36-32\lib\site-packages\hdfs\client.py", line 207, in _request
        **kwargs
      File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\sessions.py", line 488, in request
        resp = self.send(prep, **send_kwargs)
      File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\sessions.py", line 609, in send
        r = adapter.send(request, **kwargs)
      File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\adapters.py", line 441, in send
        low_conn.endheaders()
      File "C:\Program Files (x86)\Python36-32\lib\http\client.py", line 1234, in endheaders
        self._send_output(message_body, encode_chunked=encode_chunked)
      File "C:\Program Files (x86)\Python36-32\lib\http\client.py", line 1026, in _send_output
        self.send(msg)
      File "C:\Program Files (x86)\Python36-32\lib\http\client.py", line 964, in send
        self.connect()
      File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\packages\urllib3\connection.py", line 166, in connect
        conn = self._new_conn()
      File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\packages\urllib3\connection.py", line 150, in _new_conn
        self, "Failed to establish a new connection: %s" % e)
    requests.packages.urllib3.exceptions.NewConnectionError: <requests.packages.urllib3.connection.HTTPConnection object at 0x046C9BD0>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed
[E 180201 14:21:10 web:1971] 500 POST /api/job/create (127.0.0.1) 128.01ms
[I 180201 14:23:35 autoreload:204] C:\Program Files (x86)\Python36-32\lib\site-packages\requests\packages\urllib3\util\connection.py modified; restarting server
169.24.2.194
50070
bigdata6.chinasws.com
50075

After various tests, it should be that the upload file needs to be connected to the datanode node node to write data, and the HDFS client (the machine running the HDFS. Upload code) needs to keep the network unobstructed with each datanode node node. If your HDFS cluster adopts the domain name method, you need to configure it on the DNS server or modify the local mapping of the client, The file is/etc/hosts (Linux) or C: \ “windows \” system32 \ “drivers \” etc \ “hosts

Similar Posts: