How to read the contents of a web page without using any external library in Java?


The URL class of the java.net package represents a Uniform Resource Locator which is used to point a resource (file or, directory or a reference) in the world wide web.

The openStream() method of this class opens a connection to the URL represented by the current object and returns an InputStream object using which you can read data from the URL.

Therefore, to read data from web page (using the URL class) −

  • Instantiate the java.net.URL class by passing the URL of the desired web page as a parameter to its constructor.

  • Invoke the openStream() method and retrieve the InputStream object.

  • Instantiate the Scanner class by passing the above retrieved InputStream object as a parameter.

Example

import java.io.IOException;
import java.net.URL;
import java.util.Scanner;
public class ReadingWebPage {
   public static void main(String args[]) throws IOException {
      //Instantiating the URL class
      URL url = new URL("http://www.something.com/");
      //Retrieving the contents of the specified page
      Scanner sc = new Scanner(url.openStream());
      //Instantiating the StringBuffer class to hold the result
      StringBuffer sb = new StringBuffer();
      while(sc.hasNext()) {
         sb.append(sc.next());
         //System.out.println(sc.next());
      }
      //Retrieving the String from the String Buffer object
      String result = sb.toString();
      System.out.println(result);
      //Removing the HTML tags
      result = result.replaceAll("<[^>]*>", "");
      System.out.println("Contents of the web page: "+result);
   }
}

Output

<html><body><h1>Itworks!</h1></body></html>
Contents of the web page: Itworks!
raja
Published on 11-Oct-2019 10:06:36
Advertisements