Term Extraction via Neural Sequence Labeling a Comparative Evaluation of Strategies Using Recurrent Neural Networks
Maren Kucza, Jan Niehues, Thomas Zenkel, Alex Waibel and Sebastian Stüker
Abstract:
Traditionally systems for term extraction use a two stage approach of first identifying candiate terms and the scoring them in a second process for identifying actual terms. Thus, research in this field has often mainly focused on refining and improving the scoring process of term candidates, which commonly are identified using linguistic and statistical features. Machine learning techniques and especially neural networks are currently only used in the second stage, that is to score candidates and classify them. In contrast to that we have built a system that identifies terms via directly performing sequence-labeling with a BILOU scheme on word sequences. To do so we have worked with different kinds of recurrent neural networks and word embeddings. In this paper we describe how one can built a state-of-the-art term extraction systems with this single-stage technique and compare different network types and topologies and also examine the influence of the type of input embedding used for the task. We further investigated which network types and topologies are best suited when applying our term extraction systems to other domains than that of the training data of the networks.
Cite as: Kucza, M., Niehues, J., Zenkel, T., Waibel, A., Stüker, S. (2018) Term Extraction via Neural Sequence Labeling a Comparative Evaluation of Strategies Using Recurrent Neural Networks. Proc. Interspeech 2018, 2072-2076, DOI: 10.21437/Interspeech.2018-2017.
BiBTeX Entry:
@inproceedings{Kucza2018,
author={Maren Kucza and Jan Niehues and Thomas Zenkel and Alex Waibel and Sebastian Stüker},
title={Term Extraction via Neural Sequence Labeling a Comparative Evaluation of Strategies Using Recurrent Neural Networks},
year=2018,
booktitle={Proc. Interspeech 2018},
pages={2072--2076},
doi={10.21437/Interspeech.2018-2017},
url={http://dx.doi.org/10.21437/Interspeech.2018-2017} }