Cheerio: Extract Text From HTML With Separators

February 24, 2007

Answer :

This seems to do the trick:

var t = $('html *').contents().map(function() {     return (this.type === 'text') ? $(this).text() : ''; }).get().join(' ');  console.log(t);

Result:

One Two

Just improved my solution a little bit:

var t = $('html *').contents().map(function() {     return (this.type === 'text') ? $(this).text()+' ' : ''; }).get().join('');

You can use the TextVersionJS package to generate the plain text version of an html string. You can use it on the browser and in node.js as well.

var createTextVersion = require("textversionjs");  var yourHtml = "<h1>Your HTML</h1><ul><li>goes</li><li>here.</li></ul>";  var textVersion = createTextVersion(yourHtml);

Download it from npm and require it with Browserify for example.

You can use the following function to extract the text from an html separated by a whitespace :

function extractTextFromHtml(html: string): string {   const cheerioStatic: CheerioStatic = cheerio.load(html || '');    return cheerioStatic('html *').contents().toArray()     .map(element => element.type === 'text' ? cheerioStatic(element).text().trim() : null)     .filter(text => text)     .join(' '); }

Search This Blog

Newbe Dev Stack

Cheerio: Extract Text From HTML With Separators

Answer :

Comments

Post a Comment

Popular posts from this blog

Chemistry - Bond Angles In NH3 And NCl3

Can Not Use Command Telnet In Git Bash

How To Go To The Next Line In Github Readme Code Example