From
Revaluation Books, Exeter, United Kingdom
Seller rating 5 out of 5 stars
AbeBooks Seller since 6 January 2003
68 pages. 8.66x5.91x0.16 inches. In Stock. Seller Inventory # __3659376221
This project is aimed to build an efficient, scalable, portable, and trainable part-of-speech tagger. Using 98% of Penn Treebank-3 as the training data, it builds a raw tagger, using Bayes’ theorem, a hidden Markov model, and the Viterbi algorithm. After that, a reinforcement machine learning algorithm and contextual transformation rules were applied to increase the tagger’s accuracy. The tagger’s final accuracy on the testing data is 96.51% and its speed is about 26,000 words per second on a computer with two-gigabyte random access memory and two 3.00 GHz Pentium duo processors. The tagger’s portability and trainability are proved by the tagger-maker’s success in building a new tagger out of a corpus that is annotated with the tagset different from that of Penn Treebank.
About the Author: Jiayun Han, Obtained his PhD in Linguistics and MS in Artificial Intelligence from The University of Georgia, U.S.A. He was working for North Side Inc. as a natural language processing engineer and is currently employed by Manwin Canada as a software developer.
Title: Develop A Part-Of-Speech Tagger And A ...
Publisher: Lap Lambert Academic Publishing
Publication Date: 2013
Binding: Paperback
Condition: Brand New