if you only use python3. X, you can not read it now, just remember to have a urllib library for example
Python 2. X has these library names: urllib, urllib2, urllib3, httplib, httplib2, requests
Python 3. X has these library names: urllib, urllib3, httplib2, requests
Both of them have urllib3 and requests, which are not standard libraries. Urllib3 provides thread safe connection pool and file post support, which has little to do with urllib and urllib2. Requests call themselves HTTP for humans, which is more concise and convenient to use
for python2. X:
The main differences between urllib and urllib2 are as follows:
Urllib2 can accept request object, set header information for URL, modify user agent, set cookie, etc. urllib2 can only accept a common URL
Urllib provides some primitive methods, but urllib2 doesn’t, such as URLEncode
Some examples of official documents of urllib
Using the GET method with parameters to retrieve the URL
>>> import urllib
>>> params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
>>> f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query?%s" % params)
>>> print f.read()
Using the POST method
>>> import urllib
>>> params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
>>> f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query", params)
>>> print f.read()
Use HTTP proxy, automatic tracking redirection
>>> import urllib
>>> proxies = {'http': 'http://proxy.example.com:8080/'}
>>> opener = urllib.FancyURLopener(proxies)
>>> f = opener.open("http://www.python.org")
>>> f.read()
Not using a proxy
>>> import urllib
>>> opener = urllib.FancyURLopener({})
>>> f = opener.open("http://www.python.org/")
>>> f.read()
Examples of several official documents of urllib2:
GET the next URL
>>> import urllib2
>>> f = urllib2.urlopen('http://www.python.org/')
>>> print f.read()
Use basic HTTP authentication
import urllib2
auth_handler = urllib2.HTTPBasicAuthHandler()
auth_handler.add_password(realm='PDQ Application',
uri='https://mahler:8092/site-updates.py',
user='klem',
passwd='kadidd!ehopper')
opener = urllib2.build_opener(auth_handler)
urllib2.install_opener(opener)
urllib2.urlopen('http://www.example.com/login.html')
build_opener() Many handlers are provided by default, including proxy handlers, which are set by default to those provided by environment variables.
An example of using a proxy
proxy_handler = urllib2.ProxyHandler({'http': 'http://www.example.com:3128/'})
proxy_auth_handler = urllib2.ProxyBasicAuthHandler()
proxy_auth_handler.add_password('realm', 'host', 'username', 'password')
opener = urllib2.build_opener(proxy_handler, proxy_auth_handler)
opener.open('http://www.example.com/login.html')
Add HTTP request headers
import urllib2
req = urllib2.Request('http://www.example.com/')
req.add_header('Referer', 'http://www.python.org/')
r = urllib2.urlopen(req)
更改User-agent
import urllib2
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
opener.open('http://www.example.com/')
**Httplib and httplib2 * * httplib is the implementation of HTTP client protocol, which is usually not used directly. Urllib is based on httplib. Httplib2 is a third-party library, which has more features than httplib
for python3. X:
Here, urllib becomes a package, which is divided into several modules
urllib.request Used to open and read URLs,
urllib.error is used to handle exceptions caused by the previous request,
urllib.parse is used to parse URLs,
urllib.robotparser for parsing robots.txt files
Urllib. Urlopen() in python2. X is abandoned, and urllib2. Urlopen() is equivalent to urllib. Request. Urlopen() in python3. X
A few official examples:
GET一个URL
>>> import urllib.request
>>> with urllib.request.urlopen('http://www.python.org/') as f:
... print(f.read(300))
PUT a request
import urllib.request
DATA=b'some data'
req = urllib.request.Request(url='http://localhost:8080', data=DATA,method='PUT')
with urllib.request.urlopen(req) as f:
pass
print(f.status)
print(f.reason)
Basic HTTP authentication
import urllib.request
auth_handler = urllib.request.HTTPBasicAuthHandler()
auth_handler.add_password(realm='PDQ Application',
uri='https://mahler:8092/site-updates.py',
user='klem',
passwd='kadidd!ehopper')
opener = urllib.request.build_opener(auth_handler)
urllib.request.install_opener(opener)
urllib.request.urlopen('http://www.example.com/login.html')
use proxy
proxy_handler = urllib.request.ProxyHandler({'http': 'http://www.example.com:3128/'})
proxy_auth_handler = urllib.request.ProxyBasicAuthHandler()
proxy_auth_handler.add_password('realm', 'host', 'username', 'password')
opener = urllib.request.build_opener(proxy_handler, proxy_auth_handler)
opener.open('http://www.example.com/login.html')
add header
import urllib.request
req = urllib.request.Request('http://www.example.com/')
req.add_header('Referer', 'http://www.python.org/')
r = urllib.request.urlopen(req)
change User-agent
import urllib.request
opener = urllib.request.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
opener.open('http://www.example.com/')
Setting the parameters of the URL when using GET
>>> import urllib.request
>>> import urllib.parse
>>> params = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
>>> url = "http://www.musi-cal.com/cgi-bin/query?%s" % params
>>> with urllib.request.urlopen(url) as f:
... print(f.read().decode('utf-8'))
...
Set parameters when using POST
>>> import urllib.request
>>> import urllib.parse
>>> data = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
>>> data = data.encode('ascii')
>>> with urllib.request.urlopen("http://requestb.in/xrbl82xr", data) as f:
... print(f.read().decode('utf-8'))
...
proxy
>>> import urllib.request
>>> proxies = {'http': 'http://proxy.example.com:8080/'}
>>> opener = urllib.request.FancyURLopener(proxies)
>>> with opener.open("http://www.python.org") as f:
... f.read().decode('utf-8')
...
Do not use a proxy, override the proxy of the environment variable
>>> import urllib.request
>>> opener = urllib.request.FancyURLopener({})
>>> with opener.open("http://www.python.org/") as f:
... f.read().decode('utf-8')
...