summaryrefslogtreecommitdiffstats
path: root/silpa
diff options
context:
space:
mode:
authorSanthosh Thottingal <santhosh.thottingal@gmail.com>2009-04-16 20:51:39 +0530
committerSanthosh Thottingal <santhosh.thottingal@gmail.com>2009-04-16 20:51:39 +0530
commitb4c9aab679ee466431a64688226ed870380d5b29 (patch)
tree1c4755464f4de3ef50b164811309bf7d46cf57d2 /silpa
parent712efc3d8159aa22d8c51f8266c1a6813a4f9dba (diff)
downloadRachana.git-b4c9aab679ee466431a64688226ed870380d5b29.tar.gz
Rachana.git-b4c9aab679ee466431a64688226ed870380d5b29.tar.xz
Rachana.git-b4c9aab679ee466431a64688226ed870380d5b29.zip
Ngram model algorithm notes
Diffstat (limited to 'silpa')
-rw-r--r--silpa/modules/ngram/algorithm23
1 files changed, 23 insertions, 0 deletions
diff --git a/silpa/modules/ngram/algorithm b/silpa/modules/ngram/algorithm
new file mode 100644
index 0000000..495b85a
--- /dev/null
+++ b/silpa/modules/ngram/algorithm
@@ -0,0 +1,23 @@
+We have a TREE data structure. Each node in the tree is an instance of NgramNode.
+Each NgramNode objects contains a string value of the node and a Rank
+Rank is the incremented frequency of occurance of the corresponding string in the training corpus
+
+NGramNode is a super class of SyllableNgramNode and WordNgramNode
+That means, each node in the tree can be either a syllable or a word.
+We have only one tree for both words and syllables as of now
+
+In the tree, the root node is an empty node with label *. That indicates that all its childs, either syllables or words,
+are start of word or sentence respectively.
+
+Child of a node meaning:
+Y is a child ofX means , Y can follow immediately after the occurance of X in the text, Where X,Y are either syllable or word(only one time in a tree route)
+X can have any number of childs.
+The probability that a node in the list of childs occur in a given context is controlled by Rank(node)
+Rank is nothing but integer values incremented based on frequency of occurance.
+Higher the rank, higher the probability that the node can follow immediately after X
+
+Persistance of the populated tree is achieved through pickling the entire tree structure.
+
+Tree operations:
+a) Adding a syllable-ngram, n=2
+