Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Parse HTML with PHP's HTML DOMDocument
PHP's DOMDocument class provides powerful tools for parsing and manipulating HTML content. You can extract specific elements using XPath queries or DOM traversal methods.
Example
The following example shows how to extract text from nested <div> elements using XPath −
<?php
$html = <<<HTML
<div class="main">
<div class="text">
This is text 1
</div>
</div>
<div class="main">
<div class="text">
This is text 2
</div>
</div>
HTML;
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$tags = $xpath->query('//div[@class="main"]/div[@class="text"]');
foreach ($tags as $tag) {
var_dump(trim($tag->nodeValue));
}
?>
Output
This will produce the following output −
string(14) "This is text 1" string(14) "This is text 2"
How It Works
The code creates a DOMDocument object and loads the HTML string. An XPath object is then used to query for specific elements using the CSS-like selector //div[@class="main"]/div[@class="text"], which finds all <div> elements with class "text" that are children of <div> elements with class "main".
Conclusion
DOMDocument with XPath provides a robust way to parse HTML and extract specific content. This method is particularly useful when working with structured HTML documents where you need to target elements by their attributes or hierarchical position.
