 
 
 
 
 
   
Equation 2 shows the general POS sequence formula which is expressed in terms of a window of L tags with M of these tags before the juncture and L-M tags after. We can expect longer sequences to be potentially more discriminative, but more prone to sparse data problems. Table 3 shows results from experiments which varied L and M. These were performed on the 23 POS tagset, using smoothing and a 1-gram and 6-gram phrase break model. For both phrase model conditions the L = 3, M=2 condition outperforms the others.
 
 
 
 
