Python + Scrapy: find element with text

response.xpath("//*[contains(text(), 'txt goes here')]").getall()

September 29, 2021 · 1 min · 4 words · Saqib Razzaq

How to update a request URL in Scrapy

Often times you might want to be able to manipulate every outgoing request. You don’t have to modify all points in your scrapper where you’re making requests. You can modify your middleware. Open middlewares.py in your project and paste following code in process_request method original_url = request.url new_url = 'modified_url_here' request = request.replace(url=new_url)

September 27, 2021 · 1 min · 53 words · Saqib Razzaq

How to retry failed requests in Scrapy

You can add one or more statuses in settings.py. Scrapy will process requests normally and when one of these statuses is encountered, it will retry that request. You can modify RETRY_HTTP_CODES and add any number of statuses there. You can also control how many times to try with RETRY_TIMES

July 23, 2021 · 1 min · 49 words · Saqib Razzaq

How to run a function when Scrapy spider closes

Scrapy spider can close unexpectedly for many reasons. If you’d like to notify yourself or do anything whenever a spider closes (expectedly or unexpectedly) Create a function named anything e.g crawlFinished() Then paste self.crawlFinish() at the bottom of closed() function Now your function will be executed each time crawler exits

May 1, 2021 · 1 min · 50 words · Saqib Razzaq

How to make scrapy spider interactive at any point

from scrapy.shell import inspect_response inspect_response(response, self) Read this for more details

March 20, 2021 · 1 min · 11 words · Saqib Razzaq

Find element that has particular text in Scrapy

response.xpath("//*[contains(text(), 'txt goes here')]").getall()

March 19, 2021 · 1 min · 4 words · Saqib Razzaq

How to use proxies with Scrapy

There are two main ways to use proxies in Scrapy. You can either use it per request basis or use it with every scrapy outgoing request. How to use proxy with a single request proxy = 'proxy_here' return Request(url=url, callback=self.parse, meta={"proxy": proxy}) How to use proxy with every request Go to to middlewares.py and update process_request method and paste following code proxy = 'proxy_here' request.meta['proxy'] = proxy

February 24, 2021 · 1 min · 67 words · Saqib Razzaq

Python + Scrapy: retry failed requests

You can add one or more statuses in settings.py. Scrapy will process requests normally and when one of these statuses is encountered, it will retry that request. You can modify RETRY_HTTP_CODES and add any number of statuses there. You can also control how many times to try with RETRY_TIMES

January 21, 2021 · 1 min · 49 words · Saqib Razzaq

Markdown Syntax Guide

This article offers a sample of basic Markdown syntax that can be used in Hugo content files, also it shows whether basic HTML elements are decorated with CSS in a Hugo theme. ...

March 11, 2019 · 3 min · 446 words · Hugo Authors