Package org.apache.tika.parser.html
Class HtmlEncodingDetector
java.lang.Object
org.apache.tika.parser.html.HtmlEncodingDetector
- All Implemented Interfaces:
Serializable,org.apache.tika.detect.EncodingDetector
Character encoding detector for determining the character encoding of a
HTML document based on the potential charset parameter found in a
Content-Type http-equiv meta tag somewhere near the beginning. Especially
useful for determining the type among multiple closely related encodings
(ISO-8859-*) for which other types of encoding detection are unreliable.
- Since:
- Apache Tika 1.2
- See Also:
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptiondetect(InputStream input, org.apache.tika.metadata.Metadata metadata) intvoidsetMarkLimit(int markLimit) How far into the stream to read for charset detection.
-
Constructor Details
-
HtmlEncodingDetector
public HtmlEncodingDetector()
-
-
Method Details
-
detect
public Charset detect(InputStream input, org.apache.tika.metadata.Metadata metadata) throws IOException - Specified by:
detectin interfaceorg.apache.tika.detect.EncodingDetector- Throws:
IOException
-
getMarkLimit
public int getMarkLimit() -
setMarkLimit
@Field public void setMarkLimit(int markLimit) How far into the stream to read for charset detection. Default is 8192.- Parameters:
markLimit-
-