Universal Scraper

The concept 🧪

The universal scraper is one of the last tools that our big data team has developed. This allows you to retrieve several fields from any site and export them either in JSON format (allows the use of APIs) or in CSV by automatically retrieving the name of the columns that you will have indicated for each field.

The tool also manages pagination, recovery of internal pages as well as a pause time between each page browsed.

Note : the GUI (graphic user interface) has not been worked since this tool is private to us.

 

It’s an internal tool without public release, only for our customers.

 

Our work 🔨

  • Web development (from scratch in PHP)
  • Last big data technologies implemented

 

The challenges of this project 💪

The biggest challenge of such a tool, and to make it truly “universal”. Indeed, each site has its own architecture, and the tool must therefore know how to interact with each of them, thus making the entire code more complex.

    View project