[Solved] HTTP Error 301: The HTTP server returned a redirect error that would lead to an infinite loop.

Original method of error reporting:

1) Using request.request, the above error occurs .HTML cannot be crawled

from urllib import request

def get_html(self, url):
    print(url)
    req = request.Request(url=url, headers={'User-Agent': random.choice(ua_list)})
    res = request.urlopen(req)
    # html = res.read().decode()
    html = req.read().decode("gbk", 'ignore')
    with open(filename, 'w') as f:
        f.write(html)
    self.parse_html(html)

Solution:

1) Replace urllib.request with the requests library, which needs to be reinstalled.

2) I don’t know the specific reason.

  import requests
    def get_html(self, url):
        print(url)
        req = requests.get(url=url, headers={'User-Agent': random.choice(ua_list)})
        req.encoding = 'utf-8'
        # print(req.text)
        # res = request.urlopen(req)
        # html = res.read().decode()
        # print(req)
        # html = req.read().decode("gbk", 'ignore')
        # print(html)
        # Call the analytic function directly
        # filename = '123456.html'
        # with open(filename, 'w') as f:
        #     f.write(html)
        self.parse_html(req.text)

Similar Posts: