head	1.1;
access;
symbols
	RELEASE_8_3_0:1.1
	RELEASE_9_0_0:1.1
	RELEASE_7_4_0:1.1
	RELEASE_8_2_0:1.1
	RELEASE_6_EOL:1.1
	RELEASE_8_1_0:1.1
	RELEASE_7_3_0:1.1
	RELEASE_8_0_0:1.1
	RELEASE_7_2_0:1.1;
locks; strict;
comment	@# @;


1.1
date	2009.03.05.22.55.11;	author kuriyama;	state Exp;
branches;
next	;


desc
@@


1.1
log
@Add p5-HTML-ExtractContent 0.05, perl extension for HTML content
extractor with scoring heuristics.
@
text
@HTML::ExtractContent is a module for extracting content from HTML with
scoring heuristics.

It guesses which block of HTML looks like content according to scores
depending on the amount of punctuation marks and the lengths of non-tag
texts.

It also guesses whether content end in the block or continue to the next
block.

WWW: http://search.cpan.org/dist/HTML-ExtractContent/
@
