diff options
author | Peng Wu <alexepico@gmail.com> | 2011-07-12 15:36:15 +0800 |
---|---|---|
committer | Peng Wu <alexepico@gmail.com> | 2011-07-12 15:36:15 +0800 |
commit | b33433f0ef359e705e41d799db760d3d54144142 (patch) | |
tree | 08a3f340d020d46722504e7885cef54f474186c2 /docs | |
download | trainer-b33433f0ef359e705e41d799db760d3d54144142.tar.gz trainer-b33433f0ef359e705e41d799db760d3d54144142.tar.xz trainer-b33433f0ef359e705e41d799db760d3d54144142.zip |
write file format
Diffstat (limited to 'docs')
-rw-r--r-- | docs/fileformat | 9 |
1 files changed, 9 insertions, 0 deletions
diff --git a/docs/fileformat b/docs/fileformat new file mode 100644 index 0000000..d0945b9 --- /dev/null +++ b/docs/fileformat @@ -0,0 +1,9 @@ +The file format of libpinyin + +Input file format +1. Index Files + * raw corpus are classified into /index/<category>/<subsection>/<items>.index + * Every line consists of <item>#<item path name> +2. Content Files + * The content file is stored in <item path name>, such as <number>.text. + * Note: please add a prefix to the <item path name>, so the content files are easier to organize. |