Tag Archives: Scrapy

Super detail: command not found: the scratch solution (add the scratch environment variable to Zsh under MAC)

Background: originally, I planned to create a crawler project with scratch, but it showed Zsh: command not found: scratch . After reading many blogs, I solved the problem and decided to record it.

Main reference Blogs:

    https://www.jianshu.com/p/51196153f804

    https://stackoverflow.com/questions/17607944/scrapy-not-installed-correctly-on-mac

Problem analysis:

When I reinstall the script, I show:

WARNING: The script scrapy is installed in 
'/Library/Frameworks/Python.framework/Versions/3.9/bin' which is not on PATH.
  Consider adding this directory to PATH 
  or, if you prefer to suppress this warning, use --no-warn-script-location.

Note /library/frameworks/python.framework/versions/3.9/bin is not in the path. We need to add this path to the environment variable (consumer adding this directory to path).

terms of settlement:

Step 1: add source ~ /. Bash at the end of the. Zshrc file_ profile

    Open the finder, press Command + Shift + G, enter. Zshrc, open the. Zshrc file, and then write source ~ /. Bash at the end of the file_ profile

    Press Command + s to save

    Open the terminal, enter source ~ /. Zshrc , and execute the file.

Step 2: in. Bash_ Add environment variables to the profile file

    Open the finder, press Command + Shift + G at the same time, and enter. Bash_ Profile, open. Bash_ Profile file.

    Write on the last line:

    export PATH="/Library/Frameworks/Python.framework/Versions/3.9/bin:$PATH"

    be careful ⚠️: The number after versions is the python version number. It should be modified according to your own Python version. If the version is 2.7, it should be changed to:

    export PATH="/Library/Frameworks/Python.framework/Versions/2.7/bin:$PATH"

    Press Command + s to save.

    Open the terminal and enter source ~ /. Bash_ Profile , execute the file.

    Finally, you can enter echo $path on the terminal to see if the environment variable is added.

    You can see that /library/frameworks/python.framework/versions/2.7/bin has been added ( and many ).

    Finally, enter scapy and you can finally use it!!!

Spider Error: Scratch processing timeout [How to Solve]

Previously, the timeout exception was handled in download middleware, but it was always very laborious

Today, I checked the document and found that it can be processed in the errback callback

from scrapy.spidermiddlewares.httperror import HttpError
from twisted.internet.error import DNSLookupError
from twisted.internet.error import TimeoutError, TCPTimedOutError


yield scrapy.Request(url=full_url, errback=self.error_httpbin, dont_filter=True, callback=self.parse_list, meta={"hd": header})


def error_httpbin(self, failure):
        # failure.request is the Request object, if you need to retry, directly yield can
        # if failure.check(HttpError):
        # these exceptions come from HttpError spider middleware
        # you can get the non-200 response
        # response = failure.value.response
        # self.logger.error('HttpError on %s', response.url)

        if failure.check(DNSLookupError):
            print("DNSLookupError------->")
            # this is the original request
            request = failure.request
            yield request
            # self.logger.error('DNSLookupError on %s', request.url)
        elif failure.check(TimeoutError, TCPTimedOutError):
            print("timeout------->")
            request = failure.request
            yield request
            # self.logger.error('TimeoutError on %s', request.url)

It is hereby recorded that the timeout exception has not been handled in this way before

Installing scrapy in window — solving the problem of error reporting

The system is win10 64 bit
Python is 3.5.2
install PIP install scrapy today    To install
Microsoft Visual C + + 14.0 is required

It is found that Microsoft Visual C + + 14.0 is actually installed on the computer, but it cannot be installed successfully anyway

Later, the solution was to use files to install

1. Download the scrapy installation file

2. Install this using the command PIP install wheel

3. Switch CMD to the directory where the scrapy file is located and install it by PIP install *********************************************************

4. You can use PIP install scrape to check whether the installation is successful

 

By the way, post a teaching article: introduction to slapy

Solution to Anaconda installation scene error

Today, I encountered a pit when installing the script with anaconda. Now I’ll send out the solution for your reference:

Problem Description:

Anaconda installs the sweep and uses the CONDA install sweep command. After the installation is completed, execute the scratch prompt on the command line and report an error, as shown in the figure:

Installation under windows is like this… DLL load failed

Solution:

When you install directly using scratch, you will be prompted that the lxml module is not installed properly. Manual reinstallation is required

1. Find lxml file

Address: https://www.lfd.uci.edu/ ~gohlke/pythonlibs/#lxml

2. Download the corresponding xlml file. I downloaded lxml ‑ 4.2.4 ‑ cp36 ‑ cp36m ‑ win for windows 64 bit_amd64.whl

3. After downloading, open CMD to enter the file directory for execution

    pip install   lxml‑4.2.4‑cp36‑cp36m‑win_amd64.whl

  

OK, see the prompt that the installation is successful

4. Verify the results and execute the script again:

  

  ok。 Success

5. Create a crawler project and try the following:

  

Done

  

Attached:

Anaconda image source address of Tsinghua University:

   https://mirrors.tuna.tsinghua.edu.cn/help/anaconda/

Download address of lxml and other installation packages:

   https://www.lfd.uci.edu/ ~gohlke/pythonlibs/#lxml

 

Python PIP installs scrapy with an error of twisted

Scrapy relies on the following packages:
lxml: an efficient XML and HTML parser
w3lib: a multifunctional assistant for handling URL and web page coding
twisted: an asynchronous network framework
cryptography and pyopenssl: handling various network level security requirements
——————
1. Run PIP security first Install PIP install scrape
2. After the installation, except for the error twisted, the other dependent packages should be installed

Then download twisted by yourself. Note: it should correspond to your Python version number and the number of digits of the computer system
I use Python 37 and the system is 64 bits
https://www.lfd.uci.edu/ ~gohlke/pythonlibs/

3. After downloading, install pip. PIP install [file path] \ twisted-18.9.0-cp37-cp37m-win_AMD64. WHL
4. Run the PIP installation of scripy again for the last time to install successfully

——————
copyright notice: This article is the original article of CSDN blogger “Sagittarius 32”, which follows the CC 4.0 by-sa copyright agreement. Please attach the source link of the original text and this notice for reprint
original link: https://blog.csdn.net/sagittarius32/article/details/85345142