summaryrefslogtreecommitdiffstats
path: root/docs/fileformat
diff options
context:
space:
mode:
Diffstat (limited to 'docs/fileformat')
-rw-r--r--docs/fileformat9
1 files changed, 9 insertions, 0 deletions
diff --git a/docs/fileformat b/docs/fileformat
new file mode 100644
index 0000000..d0945b9
--- /dev/null
+++ b/docs/fileformat
@@ -0,0 +1,9 @@
+The file format of libpinyin
+
+Input file format
+1. Index Files
+ * raw corpus are classified into /index/<category>/<subsection>/<items>.index
+ * Every line consists of <item>#<item path name>
+2. Content Files
+ * The content file is stored in <item path name>, such as <number>.text.
+ * Note: please add a prefix to the <item path name>, so the content files are easier to organize.