-
Data extraction
- Vital data
- Education
-
Positions
-
clean data
- MCA
- Clustering
-
Relations
- Clean data
-
Summary
- Clean data
- Analysis
- Paper sections
- Mapping
-
Two Points of Entry
-
X-Boorman Text
-
Pinyin
-
Institution names
- e.g. (kao-teng shih-fan hsue-hsiao [Gaodeng shifan xuexiao])
-
Pinyin/Chinese
-
Place names
- [pinyin - Chinese]
- Anking【Anqing - 安慶】
-
Person names
- Main figures
- Biography entry
- [pinyin - Chinese
- e.g. Ch'en Kung-po [Chen Gongbo - 陳公博]
- Biography text
- [pinyin]
- e.g. Ch'en Kung-po [Chen Gongbo]
- e.g. Ch'en [Chen]
- Secondary figures
- First mention in a biography
- [pinyin - Chinese
- e.g. Kuo T'in-ling [Guo Tingliang - 郭廷亮]
- Other mention in same document
- [pinyin]
- e.g. [Guo Tingliang or Guo]
-
OCR corrections
- OCR error detection
- OCR correction
- OCR substitution
-
Index
-
Name index
- Pinyin substitution
- Chen Gongbo [Ch'en Kung-po]
-
Location index
- Pinyin - Chinese
- Anqing - 安慶
-
Institution index
- Based on original English name in text
-
Padagraph exploration
- By Name
- By Location
- By Institution
- By Position
- By Education [harder to define entries]
-
SolrDB
-
OCR Base Text
-
NLP Processing
- Indexed/Segmented Text
-
Blog posts
- X-Boorman (I): A Digital Revival
- X-Boorman (II): The Boorman Factory
- X-Boorman (III): Birth, Mobility, and Death
- X-Boorman (IV): Links and relations