diff options
Diffstat (limited to 'docs/fileformat')
| -rw-r--r-- | docs/fileformat | 9 |
1 files changed, 9 insertions, 0 deletions
diff --git a/docs/fileformat b/docs/fileformat new file mode 100644 index 0000000..d0945b9 --- /dev/null +++ b/docs/fileformat @@ -0,0 +1,9 @@ +The file format of libpinyin + +Input file format +1. Index Files + * raw corpus are classified into /index/<category>/<subsection>/<items>.index + * Every line consists of <item>#<item path name> +2. Content Files + * The content file is stored in <item path name>, such as <number>.text. + * Note: please add a prefix to the <item path name>, so the content files are easier to organize. |
