summaryrefslogtreecommitdiffstats
path: root/docs
diff options
context:
space:
mode:
authorPeng Wu <alexepico@gmail.com>2011-07-12 15:36:15 +0800
committerPeng Wu <alexepico@gmail.com>2011-07-12 15:36:15 +0800
commitb33433f0ef359e705e41d799db760d3d54144142 (patch)
tree08a3f340d020d46722504e7885cef54f474186c2 /docs
downloadtrainer-b33433f0ef359e705e41d799db760d3d54144142.tar.gz
trainer-b33433f0ef359e705e41d799db760d3d54144142.tar.xz
trainer-b33433f0ef359e705e41d799db760d3d54144142.zip
write file format
Diffstat (limited to 'docs')
-rw-r--r--docs/fileformat9
1 files changed, 9 insertions, 0 deletions
diff --git a/docs/fileformat b/docs/fileformat
new file mode 100644
index 0000000..d0945b9
--- /dev/null
+++ b/docs/fileformat
@@ -0,0 +1,9 @@
+The file format of libpinyin
+
+Input file format
+1. Index Files
+ * raw corpus are classified into /index/<category>/<subsection>/<items>.index
+ * Every line consists of <item>#<item path name>
+2. Content Files
+ * The content file is stored in <item path name>, such as <number>.text.
+ * Note: please add a prefix to the <item path name>, so the content files are easier to organize.