Sage Journals: Discover world-class research

Abstract

Initial results of neural architecture search (NAS) in natural language processing (NLP) have been achieved, but the search space of most NAS methods is based on the simplest recurrent cell and thus does not consider the modeling of long sequences. The remote information tends to disappear gradually when the input sequence is long, resulting in poor model performance. In this paper, we present an approach based on dual cells to search for a better-performing network architecture. We construct a search space that is more compatible with language modeling tasks by adding an information storage cell inside the search cell, so that we can make better use of the remote information of the sequence and improve the performance of the model. The language model searched by our method achieves better results than those of the baseline method on the Penn Treebank data set and WikiText-2 data set.

Keywords

Neural architecture search natural language processing recurrent neural network

Get full access to this article

View all access options for this article.

References

Hinton

G.E.S.

, et al., Reducing the Dimensionality of Data with Neural Networks. [J], Science, (2006).

, Zhang

, Ren

, et al., Deep residual learning for image recognition[C]//, Proceedings of the IEEE conference on computer vision and pattern recognition 2016, 770–778.

Toderici

, O’Malley

S.M.

, Hwang

S.J.

, et al., Recurrent Neural Network Regularization.

Vaswani

, Shazeer

, Parmar

, Uszkoreit

, Jones

, Gomez

A.N.

, Kaiser

Ł.

and Polosukhin

, Attention is all you need, In Advances in Neural Information Processing Systems (2017), 5998–6008.

Sutskever

, Vinyals

and Le

Q.V.

, Sequence to Sequence Learning with Neural Networks[C]// NIPS. MIT Press (2014).

Baker

, Gupta

, Naik

and Raskar

, Designing neural network architectures using reinforcement learning, ICLR (2017).

Real

, Moore

, Selle

, Saxena

, Suematsu

Y.L.

, Tan

, Le

Q.V.

and Kurakin

, Large-scale evolution of image classifiers, ICML (2017).

Luo

, Tian

, Qin

and Liu

T.-Y.

, Neural Architecture Optimization, NIPS (2018).

Liu

, Simonyan

and Yang

, DARTS: Differentiable architecture search, ICLR (2019).

10.

Zoph

and Le

Q.V.

, Neural architecture search with reinforcement learning, ICLR (2017).

11.

Pham

, Guan

M.Y.

, Zoph

and Le

Q.V.

, Jeff Dean, Efficient Neural Architecture Search via Parameter Sharing.

12.

Zoph

, Vasudevan

, Shlens

and Le

Q.V.

, Learning transferable architectures for scalable image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, (2018), 8697–8710.

13.

Zhong

, Yan

, Wu

, Shao

and Liu

C.-L.

, Practical block wise neural network architecture generation (2018), 2423–2432.

14.

Real

, Aggarwal

, Huang

and Le

Q.V.

, Regularized evolution for image classifier architecture search, AAAI (2019).

15.

Wei

, Wang

, Rui

and Chen

C.W.

, Networkmorphism, in: International Conference on Machine Learning (2016), 564–572.

16.

Shin

, Packer

and Song

, Differentiable neural network architecture search.

17.

Chen

, Xie

, Wu

and Tian

, Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation, In ICCV, (2019a).

18.

Zela

, Elsken

, Saikia

, Marrakchi

, Brox

and Hutter

, Understanding and robustifying differentiable architecture search, In ICLR, (2020).

19.

Chu

, Zhou

, Zhang

and Li

, Fair darts: Eliminating unfair advantages in differentiable architecture search, ECCV (2020).

20.

Chu

, Zhang

and Li

, Noisy differentiable architecture search, ICLR (2021).

21.

Dong

and Yang

, Searching for a robust neural architecture in four gpu hours, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2019).

22.

D.R.

, Liang

and Le

Q.V.

, The evolved transformer, In Proceedings of the 36th International Conference on Machine Learning (ICML). (2019).

23.

Wang

, Li

and Smola

A.J.

, Language models with transformers, CoRR, abs/1904.09408. (2019).

24.

Xia

, Tan

, Tian

, Gao

, Chen

, Fan

, Gong

, Leng

, Luo

, Wang

, Wu

, Zhu

, Qin

and Liu

T.-Y.

, Microsoft Research Asia’s Systems for WMT19. In Proceedings of the Fourth Conference on Machine Translation, Florence, Italy, Association for Computational Linguistics (2019).

25.

Jiang

, Hu

, Xiao

, Zhang

and Zhu

, Improved differentiable architecture search for language modeling and named entity recognition, In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP- IJCNLP), pages 3583–3588, Hong Kong, China. Association for Computational Linguistics. (2019).

26.

Zilly

J.G.

, Srivastava

R.K.

, Koutník

and Schmidhuber

, Recurrent highway networks, arXiv preprint arXiv:1607.03474, (2016).

27.

Merity

, Keskar

N.S.

and Socher

, Regularizing and optimizing lstm language models, ICLR (2018).

28.

Melis

, Dyer

and Blunsom

, On the state of the art of evaluation in neural language models, ICLR (2018).

29.

Yang

, Dai

, Salakhutdinov

and Cohen

W.W.

, Breaking the softmax bottleneck: a high-rank rnn language model, ICLR (2018).

30.

Inan

, Khosravi

and Socher

, Tying word vectors and word classifiers: A loss framework for language modeling, ICLR (2017).

31.

Grave

, Joulin

and Usunier

, Improving neural language models with a continuous cache, arXiv preprint arXiv:1612.04426, (2016).