Introduction

Purpose

Chinese Text Analyser is aimed at intermediate and advanced learners of the Chinese language who want to develop the skill to read Chinese text without external aid.

The program is designed around two major goals:

  1. Helping you know if a given piece of text is suitable for your current Chinese level, and
  2. Helping you prioritise which words you need to learn in order to get the greatest increase in understanding and the greatest improvement to your language skills.

It does this by keeping track of your known vocabulary, and then comparing that to the set of words in a given piece of Chinese text. Chinese Text Analyser can then use that information to export lists of unknown vocabulary, sorted by frequency and various other metrics.

Chinese Text Analyser can also be used to generate sentences from Chinese text that include this vocabulary, with or without cloze deletion.

In this way, Chinese Text Analyser is designed to be a feeder program for your existing study methods. You should use it to find content suitable for your current level and extract relevant vocabulary from that content for study with other tools that are dedicated to the task, such as Pleco or Anki.

Chinese Text Analyser is also useful if you want to collect statistics about some text (or texts), or if you want to segment/add markup to Chinese content.

Priorities

Chinese Text Analyser focuses on words rather than characters.

This is because words are the core unit of the Chinese language, at least in terms of whether or not you’ll be able to understand a given piece of text.

For this reason, most of the statistics and word lists generated by Chinese Text Analyser use words as the main unit, although some basic character statistics are provided.

One of the aims of Chinese Text Analyser is to get you to place less emphasis on raw (and often meaningless) statistics, and focus instead on things such as whether or not you’ll be able to read a given piece of text.

Although Chinese Text Analyser can be used as a document reader, unlike other readers, it is designed to give you an accurate representation of your current ability and actively point out where gaps are, rather than just helping you get the gist of what you are reading or acting as a translation tool.

It takes an uncompromising view that you either know a word or you do not. There is no middle ground, and no concept of a partially known word.

This is because when you are reading Chinese text, any word you don’t know with full confidence will interrupt the reading process and Chinese Text Analyser wants to explicitly call your attention to those words because they are the words that require your attention. If you are trying to decide whether to mark a word as ‘known’ or ‘unknown’, it’s usually best to err towards ‘unknown’ with the understanding that for Chinese Text Analyser ‘unknown’ means ‘I need to to study this words some more’.

This ties in to the main aim of Chinese Text Analyser, which is to help you develop the skill to read Chinese text without external aid. By focusing your efforts on the places that are causing you the most trouble, you’ll be able to make the most progress.

By design, Chinese Text Analyser does not have automatic mouseover dictionary definitions. To look up the meaning or pronunciation of a word you have to actively ask the program to show you the definition. Chinese Text Analyser will then take that as implicit acknowledgement that you don’t know that word well enough yet and mark it as unknown, because if you knew the word with confidence you wouldn’t have needed to look it up (even just to check).

If you are finding that you are looking up too many words, and you don’t like that Chinese Text Analyser is marking them all as unknown, then it is probably an indication that you are trying to read content not suited to your current level and you should be looking for easier content - something that Chinese Text Analyser can help you find!

Chinese Text Analyser can seem like quite a strict tool in this regard, but keep in mind that it is purposefully designed to call your attention to any reliance you might have on a dictionary, because that is something that will hinder your ability to read text unaided, and Chinese Text Analyser prioritises long-term learning objectives over short-term understanding.

Your goal with using Chinese Text Analyser should not be to accumulate a large number of known words, rather it should be to get an accurate picture of what texts are suitable for you to read, what words you know well, and what words require further study.

More ‘known’ words is not actually better if you don’t know those words with confidence, because in order to read fluently, you need instant recognition and recall on words.  If you’re even slightly hesitant about a word you should probably mark it as unknown, because you need to study it further to make sure that next time you are not hesitant about it.

Performance

Chinese Text Analyser is designed for high performance - both in terms of speed and memory usage. This gives it an advantage over similar tools because it can be used to analyse longer texts such as novels and TV/movie scripts without taking up large amounts of your time and without locking up your computer.

Chinese Text Analyser can analyse and segment a typical Chinese novel in under a second, making it ideal for comparing several texts to see which one is the best to read based on your current vocabulary.