Develop a Part-of-Speech Tagger and a Tagger-Maker: Algorithms, Implementations, Results, and APIs - Softcover

Han, Jiayun

 
9783659376221: Develop a Part-of-Speech Tagger and a Tagger-Maker: Algorithms, Implementations, Results, and APIs

Synopsis

This project is aimed to build an efficient, scalable, portable, and trainable part-of-speech tagger. Using 98% of Penn Treebank-3 as the training data, it builds a raw tagger, using Bayes’ theorem, a hidden Markov model, and the Viterbi algorithm. After that, a reinforcement machine learning algorithm and contextual transformation rules were applied to increase the tagger’s accuracy. The tagger’s final accuracy on the testing data is 96.51% and its speed is about 26,000 words per second on a computer with two-gigabyte random access memory and two 3.00 GHz Pentium duo processors. The tagger’s portability and trainability are proved by the tagger-maker’s success in building a new tagger out of a corpus that is annotated with the tagset different from that of Penn Treebank.

"synopsis" may belong to another edition of this title.

About the Author

Jiayun Han, Obtained his PhD in Linguistics and MS in Artificial Intelligence from The University of Georgia, U.S.A. He was working for North Side Inc. as a natural language processing engineer and is currently employed by Manwin Canada as a software developer.

"About this title" may belong to another edition of this title.