Export Document

The Document export tab allows you to export a segmented version of the current document with custom text added before and after various elements of the document.

This allows you to do things such as export the document with spaces added before/after each word, or generate an HTML copy of the document with markup tags added before or after different elements.

A document in Chinese Text Analyser contains the following elements:

  • Document - the entire document.
  • Paragraph - text separated by a newline character.
  • Word - individual words, as determined by Chinese Text Analyser’s segmenting engine.
  • Character - individual characters.

You can specify a Pre and Post tag for each element, with Pre tags added before the element, and Post tags added after the element.

_images/export-document.png

You can also add whitespace characters (such as newlines and tabs) to the Pre and Post fields with the following escape codes:

\n - a newline character
\r - a carriage return character
\t - a tab character
\\ - a backslash

Example 1 - Spaces after each word

If you wanted to generate a segmented copy of a document with a single space added after each word then you would set Word Post to a single space ' ' and Chinese Text Analyser would export the document and add a single space after each word.

Note

Note: By default, the Word Post field does contain a single space ' '. This will not be obvious just by looking at the dialog box, so remember to delete the space if you do not wish to add spaces after every word.

This can be done by clicking on the field, selecting the text, and then pressing delete.

Example 2 - Generating segmented HTML

If you wanted to generate an HTML document with spans around each word and character you could do:

Document Pre: <html><head><title>Chinese Text Analyser is the best!</title><meta charset="UTF-8"><style>.char:hover { color: red; } .word:hover { font-size: 150% }</style></head><body>

Document Post: </body></html>

Paragraph Pre: <p>

Paragraph Post: </p>

Word Pre: <span class="word">

Word Post: </span>

Character Pre: <span class="char">

Character Post: </span>

With the above settings, when the exported content is opened in a web browser, highlighted words would be shown with an increased font size, and highlighted characters would be shown in red.

Example 3 - Adding newlines after each paragraph

If you wanted to add a couple of extra lines after each paragraph you would set Paragraph Post to '\n\n'.

The exported document would then contain two extra lines after each paragraph.