Beautiful Soup - get_text() Method
Method Description
The get_text() method returns only the human-readable text from the entire HTML document or a given tag. All the child strings are concatenated by the given separator which is a null string by default.
Syntax
get_text(separator, strip)
Parameters
separator − The child strings will be concatenated using this parameter. By default it is "".
strip − The strings will be stripped before concatenation.
Return Type
The get_text() method returns a string.
Example - Getting Text from html Content
In the example below, the get_text() method gets text from all the HTML tags.
html = ''' <html> <body> <p> The quick, brown fox jumps over a lazy dog.</p> <p> DJs flock by when MTV ax quiz prog.</p> <p> Junk MTV quiz graced by fox whelps.</p> <p> Bawds jog, flick quartz, vex nymphs.</p> </body> </html> ''' from bs4 import BeautifulSoup soup = BeautifulSoup(html, "html.parser") text = soup.get_text() print(text)
Output
The quick, brown fox jumps over a lazy dog. DJs flock by when MTV ax quiz prog. Junk MTV quiz graced by fox whelps. Bawds jog, flick quartz, vex nymphs.
Example - Using Separator with get_text() method
In the following example, we specify the separator argument of get_text() method as '#'.
html = ''' <p>The quick, brown fox jumps over a lazy dog.</p> <p>DJs flock by when MTV ax quiz prog.</p> <p>Junk MTV quiz graced by fox whelps.</p> <p>Bawds jog, flick quartz, vex nymphs.</p> ''' from bs4 import BeautifulSoup soup = BeautifulSoup(html, "html.parser") text = soup.get_text(separator='#') print(text)
Output
#The quick, brown fox jumps over a lazy dog.# #DJs flock by when MTV ax quiz prog.# #Junk MTV quiz graced by fox whelps.# #Bawds jog, flick quartz, vex nymphs.#
Example - Using strip with get_text() method
Let us check the effect of strip parameter when it is set to True. By default it is False.
html = ''' <p>The quick, brown fox jumps over a lazy dog.</p> <p>DJs flock by when MTV ax quiz prog.</p> <p>Junk MTV quiz graced by fox whelps.</p> <p>Bawds jog, flick quartz, vex nymphs.</p> ''' from bs4 import BeautifulSoup soup = BeautifulSoup(html, "html.parser") text = soup.get_text(strip=True) print(text)
Output
The quick, brown fox jumps over a lazy dog.DJs flock by when MTV ax quiz prog.Junk MTV quiz graced by fox whelps.Bawds jog, flick quartz, vex nymphs.
Advertisements