python urllib | DebugAH

if you only use python3. X, you can not read it now, just remember to have a urllib library for example

Python 2. X has these library names: urllib, urllib2, urllib3, httplib, httplib2, requests

Python 3. X has these library names: urllib, urllib3, httplib2, requests

Both of them have urllib3 and requests, which are not standard libraries. Urllib3 provides thread safe connection pool and file post support, which has little to do with urllib and urllib2. Requests call themselves HTTP for humans, which is more concise and convenient to use

for python2. X:

The main differences between urllib and urllib2 are as follows:

Urllib2 can accept request object, set header information for URL, modify user agent, set cookie, etc. urllib2 can only accept a common URL

Urllib provides some primitive methods, but urllib2 doesn’t, such as URLEncode

Some examples of official documents of urllib

Using the GET method with parameters to retrieve the URL
>>> import urllib
>>> params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
>>> f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query?%s" % params)
>>> print f.read()
Using the POST method
>>> import urllib
>>> params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
>>> f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query", params)
>>> print f.read()
Use HTTP proxy, automatic tracking redirection
>>> import urllib
>>> proxies = {'http': 'http://proxy.example.com:8080/'}
>>> opener = urllib.FancyURLopener(proxies)
>>> f = opener.open("http://www.python.org")
>>> f.read()
Not using a proxy
>>> import urllib
>>> opener = urllib.FancyURLopener({})
>>> f = opener.open("http://www.python.org/")
>>> f.read()

Examples of several official documents of urllib2:

GET the next URL
>>> import urllib2
>>> f = urllib2.urlopen('http://www.python.org/')
>>> print f.read()

Use basic HTTP authentication
import urllib2
auth_handler = urllib2.HTTPBasicAuthHandler()
auth_handler.add_password(realm='PDQ Application',
                          uri='https://mahler:8092/site-updates.py',
                          user='klem',
                          passwd='kadidd!ehopper')
opener = urllib2.build_opener(auth_handler)
urllib2.install_opener(opener)
urllib2.urlopen('http://www.example.com/login.html')
build_opener() Many handlers are provided by default, including proxy handlers, which are set by default to those provided by environment variables.

An example of using a proxy
proxy_handler = urllib2.ProxyHandler({'http': 'http://www.example.com:3128/'})
proxy_auth_handler = urllib2.ProxyBasicAuthHandler()
proxy_auth_handler.add_password('realm', 'host', 'username', 'password')

opener = urllib2.build_opener(proxy_handler, proxy_auth_handler)
opener.open('http://www.example.com/login.html')

Add HTTP request headers
import urllib2
req = urllib2.Request('http://www.example.com/')
req.add_header('Referer', 'http://www.python.org/')
r = urllib2.urlopen(req)

更改User-agent
import urllib2
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
opener.open('http://www.example.com/')

**Httplib and httplib2 * * httplib is the implementation of HTTP client protocol, which is usually not used directly. Urllib is based on httplib. Httplib2 is a third-party library, which has more features than httplib

for python3. X:

Here, urllib becomes a package, which is divided into several modules

urllib.request Used to open and read URLs, 
urllib.error is used to handle exceptions caused by the previous request, 
urllib.parse is used to parse URLs, 
urllib.robotparser for parsing robots.txt files

Urllib. Urlopen() in python2. X is abandoned, and urllib2. Urlopen() is equivalent to urllib. Request. Urlopen() in python3. X

A few official examples:

GET一个URL
>>> import urllib.request
>>> with urllib.request.urlopen('http://www.python.org/') as f:
...     print(f.read(300))

PUT a request
import urllib.request
DATA=b'some data'
req = urllib.request.Request(url='http://localhost:8080', data=DATA,method='PUT')
with urllib.request.urlopen(req) as f:
    pass
print(f.status)
print(f.reason)

Basic HTTP authentication
import urllib.request
auth_handler = urllib.request.HTTPBasicAuthHandler()
auth_handler.add_password(realm='PDQ Application',
                          uri='https://mahler:8092/site-updates.py',
                          user='klem',
                          passwd='kadidd!ehopper')
opener = urllib.request.build_opener(auth_handler)
urllib.request.install_opener(opener)
urllib.request.urlopen('http://www.example.com/login.html')

use proxy
proxy_handler = urllib.request.ProxyHandler({'http': 'http://www.example.com:3128/'})
proxy_auth_handler = urllib.request.ProxyBasicAuthHandler()
proxy_auth_handler.add_password('realm', 'host', 'username', 'password')

opener = urllib.request.build_opener(proxy_handler, proxy_auth_handler)
opener.open('http://www.example.com/login.html')

add header
import urllib.request
req = urllib.request.Request('http://www.example.com/')
req.add_header('Referer', 'http://www.python.org/')
r = urllib.request.urlopen(req)

change User-agent
import urllib.request
opener = urllib.request.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
opener.open('http://www.example.com/')

Setting the parameters of the URL when using GET
>>> import urllib.request
>>> import urllib.parse
>>> params = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
>>> url = "http://www.musi-cal.com/cgi-bin/query?%s" % params
>>> with urllib.request.urlopen(url) as f:
...     print(f.read().decode('utf-8'))
...

Set parameters when using POST
>>> import urllib.request
>>> import urllib.parse
>>> data = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
>>> data = data.encode('ascii')
>>> with urllib.request.urlopen("http://requestb.in/xrbl82xr", data) as f:
...     print(f.read().decode('utf-8'))
...

proxy
>>> import urllib.request
>>> proxies = {'http': 'http://proxy.example.com:8080/'}
>>> opener = urllib.request.FancyURLopener(proxies)
>>> with opener.open("http://www.python.org") as f:
...     f.read().decode('utf-8')
...
Do not use a proxy, override the proxy of the environment variable
>>> import urllib.request
>>> opener = urllib.request.FancyURLopener({})
>>> with opener.open("http://www.python.org/") as f:
...     f.read().decode('utf-8')
...

DebugAH

How to Solve Your Programmer Error

Tag Archives: python urllib

Differences of urllib, urllib2, httplib and httplib2 libraries in Python

for python2. X:

for python3. X: