How to run splash using Docker toolbox?


Introduction

Splash is a powerful tool for rendering JavaScript-based websites, making it an essential tool for web scraping and data extraction. In this tutorial, we will show you how to run Splash using Docker Toolbox, an older version of Docker that is no longer being maintained but is still available for download.

Prerequisites

Before you can get started, you will need to install Docker Toolbox on your machine. Docker Toolbox is a version of Docker that is designed to run on older systems and/or systems without native virtualization. It is available for Windows, macOS, and Linux.

Follow the steps to run Splash

  • Open the Docker Quickstart Terminal, which will start up Docker Toolbox.

  • Run the following command to pull the latest version of the Splash Docker image from Docker Hub −

$ docker pull scrapinghub/splash 

This command will download the latest version of the Splash Docker image to your machine.

  • Run the following command to start a new container from the Splash Docker image:

$ docker run -p 8050:8050 scrapinghub/splash 

This command will start a new container from the Splash Docker image and bind it to port 8050 on your machine. The -p 8050:8050 option specifies that the container should be bound to port 8050 on the host machine, which allows you to access the Splash web interface from your web browser.

  • Access the Splash web interface by visiting the following URL in your web browser:

http://localhost:8050

If you are using Docker Toolbox on Windows, you will need to use the IP address of the Docker virtual machine instead of localhost to access the Splash web interface. You can find the IP address of the Docker virtual machine by running the following command in the Docker Quickstart Terminal:

$ docker-machine ip 

For example, if the IP address of the Docker virtual machine is 192.168.99.100, you would visit the following URL in your web browser to access the Splash web interface:

http://192.168.99.100:8050

Difference between Docker desktop and Docker toolbox

Here is a table summarizing the differences between Docker Desktop and Docker Toolbox:

Property

Docker Desktop

Docker Toolbox

Maintenance status

Current

No longer maintained

Operating systems supported

Windows, macOS, Linux

Windows, macOS

Virtualization

Native

Oracle VirtualBox

Performance

Faster

Slower

Additional features

Kubernetes support, automatic updates

None

Capabilities of Splash

With the Splash container running and the web interface accessible, you can now use Splash to render JavaScript-based websites and extract data from them. Here are a few examples of what you can do with Splash:

  • Render websites − Splash allows you to render websites just like a web browser, which can be useful for cases where the website's content is generated dynamically with JavaScript. You can use Splash to retrieve the fully-rendered HTML of a website by making a request to the Splash server and specifying the URL of the website you want to render.

  • Run custom JavaScript − In addition to rendering websites, Splash also allows you to run custom JavaScript on the websites it renders. This can be useful for cases where you need to perform additional processing on the website's content, or for cases where the website's content is not easily accessible via the DOM

  • Extract data from websites − Splash includes several features that make it easy to extract data from websites, including support for XPath and CSS selectors. You can use Splash to retrieve specific elements from a website's HTML or to extract data from a website's DOM using custom JavaScript.

  • Headless browsing − Splash can be used as a headless browser, meaning that it can be controlled and accessed programmatically without the need for a GUI. This makes it easy to integrate Splash into automated workflows or custom scripts.

  • Load balancing − Splash includes built-in load balancing capabilities, which allow you to distribute rendering requests across multiple instances of Splash. This can be useful for cases where you need to scale up your rendering capacity or if you want to ensure high availability for your rendering service.

  • HTTP caching − Splash includes an HTTP cache that allows it to store and reuse previously fetched resources, which can improve rendering performance and reduce bandwidth usage.

  • Custom middleware − Splash allows you to write custom middleware scripts that can be used to modify or augment rendering requests or responses. This can be useful for cases where you need to add custom functionality or modify the behaviour of Splash.

Conclusion

By following these steps, you should now be able to run Splash using Docker Toolbox. You can explore the Splash web interface and try out its features, such as rendering websites and running custom JavaScript. Overall, Splash is a powerful tool for rendering JavaScript-based websites and extracting data from them. Whether you are using it for web scraping, data extraction, or any other purpose, Splash can help you get the job done efficiently and effectively.

Updated on: 30-Jan-2023

285 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements