Parallel web scraping in PHP - cURL multi functions

For anyone who’s ever tried to fetch multiple resources over HTTP in PHP, the logic is trivial, but one key challenge is ever-present: latency delays. While web servers have perfectly good downstream links, latencies can increase script execution time tenfold just by downloading a few external URLs. But there’s a simple solution: parallel cURL operations. In this tutorial, I’ll show you how to use the “multi” functions in PHP’s cURL library to get around this quickly and easily.

Caching alleviates the latency issue to some extent, but retrieving more than a few files is always going to be a problem, and, well, sometimes users just can’t wait. cURL’s parallel processing allows you to fire off multiple requests at a time and handle responses as they arrive, instead of linear operations - waiting for each request to complete (or worse, time out) before starting the next.


Post new comment

  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <h1> <quote> <img>
  • Lines and paragraphs break automatically.

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Copy the characters (respecting upper/lower case) from the image.