Glyce | Proceedings of the 33rd International Conference on Neural Information Processing Systems (2024)

chapter

Free access

AUTHORs: Yuxian Meng, Wei Wu, Fei Wang, Xiaoya Li, + 6, Ping Nie, Fan Yin, + 4, Muyu Li, Qinghong Han, Xiaofei Sun, and Jiwei Li (Less)

Proceedings of the 33rd International Conference on Neural Information Processing Systems

December 2019

Article No.: 247, Pages 2746 - 2757

Published: 08 December 2019 Publication History

2citation
74
Downloads

Metrics

Total Citations2Total Downloads74

Last 12 Months27

Last 6 weeks8

Get Citation Alerts

New Citation Alert added!

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.

Manage my Alerts

New Citation Alert!

Please log in to your account

PDFeReaderPublisher Site

Proceedings of the 33rd International Conference on Neural Information Processing Systems
PREVIOUS ARTICLEQuaternion knowledge graph embeddingsPreviousNEXT ARTICLETurbo autoencoderNext
- Abstract
- References
- View Options
- References
- Media
- Tables
- Share

Abstract

It is intuitive that NLP tasks for logographic languages like Chinese should benefit from the use of the glyph information in those languages. However, due to the lack of rich pictographic evidence in glyphs and the weak generalization ability of standard computer vision models on character data, an effective way to utilize the glyph information remains to be found.

In this paper, we address this gap by presenting Glyce, the glyph-vectors for Chinese character representations. We make three major innovations: (1) We use historical Chinese scripts (e.g., bronzeware script, seal script, traditional Chinese, etc) to enrich the pictographic evidence in characters; (2) We design CNN structures (called tianzege-CNN) tailored to Chinese character image processing; and (3) We use image-classification as an auxiliary task in a multi-task learning setup to increase the model's ability to generalize.

We show that glyph-based models are able to consistently outperform word/char ID-based models in a wide range of Chinese NLP tasks. We are able to set new state-of-the-art results for a variety of Chinese NLP tasks, including tagging (NER, CWS, POS), sentence pair classification, single sentence classification tasks, dependency parsing, and semantic role labeling. For example, the proposed model achieves an F1 score of 80.6 on the OntoNotes dataset of NER, +1.5 over BERT; it achieves an almost perfect accuracy of 99.8% on the Fudan corpus for text classification.

References

[1]

Xinlei Shi, Junjie Zhai, Xudong Yang, Zehua Xie, and Chao Liu. Radical embedding: Delving deeper to chinese radicals. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), volume 2, pages 594-598, 2015.

Google Scholar

[2]

Yanran Li, Wenjie Li, Fei Sun, and Sujian Li. Component-enhanced chinese character embeddings. arXiv preprint arXiv:1508.06669, 2015.

Google Scholar

[3]

Rongchao Yin, Quan Wang, Peng Li, Rui Li, and Bin Wang. Multi-granularity chinese word embedding. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 981-986, 2016.

Crossref

Google Scholar

[4]

Yaming Sun, Lei Lin, Nan Yang, Zhenzhou Ji, and Xiaolong Wang. Radical-enhanced chinese character embedding. In International Conference on Neural Information Processing, pages 279-286. Springer, 2014.

Crossref

Google Scholar

[5]

Yan Shao, Christian Hardmeier, Jörg Tiedemann, and Joakim Nivre. Character-based joint segmentation and pos tagging for chinese using bidirectional rnn-crf. arXiv preprint arXiv:1704.01314, 2017.

Google Scholar

[6]

Mi Xue Tan, Yuhuang Hu, Nikola I Nikolov, and Richard HR Hahnloser. wubi2en: Character-level chinese-english translation through ascii encoding. arXiv preprint arXiv:1805.03330, 2018.

Google Scholar

[7]

Shaosheng Cao, Wei Lu, Jun Zhou, and Xiaolong Li. cw2vec: Learning chinese word embeddings with stroke n-gram information. 2018.

Google Scholar

[8]

Frederick Liu, Han Lu, Chieh Lo, and Graham Neubig. Learning character-level compositionality with visual features. arXiv preprint arXiv:1704.04859, 2017.

Google Scholar

[9]

Xiang Zhang and Yann LeCun. Which encoding is the best for text classification in chinese, english, japanese and korean? arXiv preprint arXiv:1708.02657, 2017.

Google Scholar

Cited By

View all

Wang JPan GSun DZhang JShen HZhuang YSmith JYang YCesar PMetze FPrabhakaran B(2021)Chinese Character Inpainting with Contextual Semantic ConstraintsProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475333(1829-1837)Online publication date: 17-Oct-2021
https://dl.acm.org/doi/10.1145/3474085.3475333
Wang XXiong YNiu HYue JZhu YYu PDemartini GZuccon GCulpepper JHuang ZTong H(2021)Improving Chinese Character Representation with Formation Graph Attention NetworkProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482265(1999-2009)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3459637.3482265

Index Terms

Glyce: glyph-vectors for chinese character representations
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
    2. Natural language processing
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
    2. Machine learning approaches
      1. Neural networks

Index terms have been assigned to the content through auto-classification.

Recommendations

Character and numeral recognition for non-Indic and Indic scripts: a survey
Abstract
A collection of different scripts is employed in writing languages throughout the world. Character and numeral recognition of a particular script is a key area in the field of pattern recognition. In this paper, we have presented a comprehensive ...
Read More
A survey on Arabic character segmentation
Arabic character segmentation is a necessary step in Arabic Optical Character Recognition (OCR). The cursive nature of Arabic script poses challenging problems in Arabic character recognition; however, incorrectly segmented characters will cause ...
Read More
Indic script family and its offline handwriting recognition for characters/digits and words: a comprehensive survey
Abstract
Handwriting recognition has become an active area of research in pattern recognition and machine learning in recent years. Handwriting recognition systems have a variety of applications ranging from digital character conversion to signboard ...
Read More

Comments

Information & Contributors

Information

Published In

NIPS'19: Proceedings of the 33rd International Conference on Neural Information Processing Systems

December 2019

15947 pages

Editors:
Hanna M. Wallach,
Hugo Larochelle,
Alina Beygelzimer,
Florence d'Alché-Buc,
Emily B. Fox

In-Cooperation

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 08 December 2019

Qualifiers

Chapter
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
74
Total Downloads

Downloads (Last 12 months)27
Downloads (Last 6 weeks)8

Other Metrics

View Author Metrics

Citations

Cited By

View all

Wang JPan GSun DZhang JShen HZhuang YSmith JYang YCesar PMetze FPrabhakaran B(2021)Chinese Character Inpainting with Contextual Semantic ConstraintsProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475333(1829-1837)Online publication date: 17-Oct-2021
https://dl.acm.org/doi/10.1145/3474085.3475333
Wang XXiong YNiu HYue JZhu YYu PDemartini GZuccon GCulpepper JHuang ZTong H(2021)Improving Chinese Character Representation with Formation Graph Attention NetworkProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482265(1999-2009)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3459637.3482265

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Media

Figures

Other

Tables

Glyce | Proceedings of the 33rd International Conference on Neural Information Processing Systems (2024)

FAQs

What is the acceptance rate for advances in neural information processing systems? ›

Contents. Advances in Neural Information Processing Systems (NIPS) has an average acceptance rate of 24.4% .

Find Out More ›

Is NeurIPS a good conference? ›

Over the years, NeurIPS became a premier conference on machine learning and although the 'Neural' in the NeurIPS acronym had become something of a historical relic, the resurgence of deep learning in neural networks since 2012, fueled by faster computers and big data, has led to achievements in speech recognition, ...

Know More ›

Can anyone attend NeurIPS? ›

You must be a full time student in an accredited undergraduate, masters or graduate program or have submitted an accepted paper while you were a full time student. You will also need to present the documentation when you check in at the registration desk.

How many submissions are there in NeurIPS 2024? ›

We are excited to announce the list of NeurIPS 2024 workshops! We received 204 total submissions — a significant increase from last year. From this great batch of submissions, we have accepted 56 workshops that will take place on Dec. 14 & 15.

How many people go to NeurIPS? ›

The thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023) took place in New Orleans from Sunday 10 to Saturday 16 December. The event was vast, with over 13,000 people in attendance at the venue, and a further 3,000 tuning in virtually.

Read The Full Story ›

How many submissions does NeurIPS get? ›

NeurIPS Statistics

Statistics	Total	Reject
NeurIPS 2022	10411 min: 1.60, max: 8.20 avg: 5.97, std: 0.72	153 (1.47%) min: 1.60, max: 7.60 avg: 4.82, std: 0.99
NeurIPS 2021	9122 min: 2.50, max: 8.70 avg: 6.38, std: 0.62	133 (1.46%) min: 2.50, max: 7.30 avg: 5.22, std: 0.80
NeurIPS 2020	9467	-
NeurIPS 2019	6743	-

8 more rows

Keep Reading ›

Is NeurIPS a top conference? ›

1. NeurIPS – Neural Information Processing Systems* Description: NeurIPS is one of the premier conferences on machine learning and computational neuroscience.

Show Me More ›

Does NeurIPS 2024 have rebuttal? ›

You can submit a rebuttal of up to 6000 characters per review, and one global rebuttal of up to 6000 reviews. These are posted by clicking the "Rebuttal" and "Author Rebuttal" buttons. You can additionally add a one-page PDF with Figures and tables.

Why did NeurIPS change its name? ›

Much later, another indelicate connotation came to be associated with the acronym, which eventually resulted in changing it to NeurIPS. The title of the Proceedings was "Neural Information Processing Systems" (the red one - all the following ones were blue and added "Advances in" to the title).

View Details ›

What is the acceptance rate for WSDM? ›

This year, WSDM was able to accept 84 out of 514 papers, which amounts to an acceptance rate about 16%.

Show Me More ›

What is the acceptance rate for ICDE? ›

ICDE is another data mining related conference with an acceptance rate of about 19%.

Discover More Details ›

What is the acceptance rate for the Ictai? ›

The full paper acceptance rate is 15.7%.

Learn More ›

What is the acceptance rate for Eccv 2024? ›

(27.90%)