Parse HTML with PHP's HTML DOMDocument

PHP's DOMDocument class provides powerful tools for parsing and manipulating HTML content. You can extract specific elements using XPath queries or DOM traversal methods.

Example

The following example shows how to extract text from nested <div> elements using XPath −

<?php
$html = <<<HTML
<div class="main">
    <div class="text">
        This is text 1
    </div>
</div>
<div class="main">
    <div class="text">
        This is text 2
    </div>
</div>
HTML;

$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$tags = $xpath->query('//div[@class="main"]/div[@class="text"]');

foreach ($tags as $tag) {
    var_dump(trim($tag->nodeValue));
}
?>

Output

This will produce the following output −

string(14) "This is text 1"
string(14) "This is text 2"

How It Works

The code creates a DOMDocument object and loads the HTML string. An XPath object is then used to query for specific elements using the CSS-like selector //div[@class="main"]/div[@class="text"], which finds all <div> elements with class "text" that are children of <div> elements with class "main".

Conclusion

DOMDocument with XPath provides a robust way to parse HTML and extract specific content. This method is particularly useful when working with structured HTML documents where you need to target elements by their attributes or hierarchical position.

Updated on: 2026-03-15T08:41:51+05:30

572 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements