Article

Open Access

On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜

Authors:
Emily M. Bender

University of Washington, Seattle, WA, USA

University of Washington, Seattle, WA, USA
View Profile

,
Timnit Gebru

Black in AI, Palo Alto, CA, USA

Black in AI, Palo Alto, CA, USA
View Profile

,
Angelina McMillan-Major

University of Washington, Seattle, WA, USA

University of Washington, Seattle, WA, USA
View Profile

,
Shmargaret Shmitchell

The Aether

The Aether
View Profile

FAccT '21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and TransparencyMarch 2021Pages 610–623https://doi.org/10.1145/3442188.3445922

Published:01 March 2021Publication History

FAccT '21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency

Pages 610–623

ABSTRACT

The past 3 years of work in NLP have been characterized by the development and deployment of ever larger language models, especially for English. BERT, its variants, GPT-2/3, and others, most recently Switch-C, have pushed the boundaries of the possible both through architectural innovations and through sheer size. Using these pretrained models and the methodology of fine-tuning them for specific tasks, researchers have extended the state of the art on a wide array of tasks as measured by leaderboards on specific benchmarks for English. In this paper, we take a step back and ask: How big is too big? What are the possible risks associated with this technology and what paths are available for mitigating those risks? We provide recommendations including weighing the environmental and financial costs first, investing resources into curating and carefully documenting datasets rather than ingesting everything on the web, carrying out pre-development exercises evaluating how the planned approach fits into research and development goals and supports stakeholder values, and encouraging research directions beyond ever larger language models.

References

Hussein M Adam, Robert D Bullard, and Elizabeth Bell. 2001. Faces of environmental racism: Confronting issues of global justice. Rowman & Littlefield.Google Scholar
Chris Alberti, Kenton Lee, and Michael Collins. 2019. A BERT Baseline for the Natural Questions. arXiv:1901.08634 [cs.CL]Google Scholar
Larry Alexander. 1992. What makes wrongful discrimination wrong? Biases, preferences, stereotypes, and proxies. University of Pennsylvania Law Review 141, 1 (1992), 149--219.Google ScholarCross Ref
American Psychological Association. 2019. Discrimination: What it is, and how to cope. https://www.apa.org/topics/discrimination (2019).Google Scholar
Dario Amodei and Daniel Hernandez. 2018. AI and Compute. https://openai. com/blog/ai-and-compute/Google Scholar
David Anthoff, Robert J Nicholls, and Richard SJ Tol. 2010. The economic impact of substantial sea-level rise. Mitigation and Adaptation Strategies for Global Change 15, 4 (2010), 321--335.Google ScholarCross Ref
Mikhail J Atallah, Victor Raskin, Christian F Hempelmann, Mercan Karahan, Radu Sion, Umut Topkara, and Katrina E Triezenberg. 2002. Natural Language Watermarking and Tamperproofing. In International Workshop on Information Hiding. Springer, 196--212.Google Scholar
Alexei Baevski and Abdelrahman Mohamed. 2020. Effectiveness of Self-Supervised Pre-Training for ASR. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 7694--7698.Google Scholar
Michael Barera. 2020. Mind the Gap: Addressing Structural Equity and Inclusion on Wikipedia. (2020). Accessible at http://hdl.handle.net/10106/29572.Google Scholar
Russel Barsh. 1990. Indigenous peoples, racism and the environment. Meanjin 49, 4 (1990), 723.Google Scholar
Christine Basta, Marta R Costa-jussà, and Noe Casas. 2019. Evaluating the Underlying Gender Bias in Contextualized Word Embeddings. In Proceedings of the First Workshop on Gender Bias in Natural Language Processing. 33--39.Google ScholarCross Ref
Iz Beltagy, Kyle Lo, and Arman Cohan. 2019. SciBERT: A Pretrained Language Model for Scientific Text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 3615--3620. https://doi.org/10.18653/v1/D19-1371Google Scholar
Emily M. Bender and Batya Friedman. 2018. Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics 6 (2018), 587--604.Google ScholarCross Ref
Emily M. Bender and Alexander Koller. 2020. Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 5185--5198. https://doi.org/10.18653/v1/2020.acl-main.463Google Scholar
Ruha Benjamin. 2019. Race After Technology: Abolitionist Tools for the New Jim Code. Polity Press, Cambridge, UK.Google Scholar
Elettra Bietti and Roxana Vatanparast. 2020. Data Waste. Harvard International Law Journal 61 (2020).Google Scholar
Steven Bird. 2016. Social Mobile Technologies for Reconnecting Indigenous and Immigrant Communities.. In People.Policy.Place Seminar. Northern Institute, Charles Darwin University, Darwin, Australia. https://www.cdu.edu.au/sites/default/files/the-northern-institute/ppp-bird-20160128-4up.pdfGoogle Scholar
Abeba Birhane and Vinay Uday Prabhu. 2021. Large Image Datasets: A Pyrrhic Win for Computer Vision?. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 1537--1547.Google ScholarCross Ref
Su Lin Blodgett, Solon Barocas, Hal Daumé III, and Hanna Wallach. 2020. Language (Technology) is Power: A Critical Survey of "Bias" in NLP. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 5454--5476. https://doi.org/10.18653/v1/2020.acl-main.485Google ScholarCross Ref
Thorsten Brants, Ashok C. Popat, Peng Xu, Franz J. Och, and Jeffrey Dean. 2007. Large Language Models in Machine Translation. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). Association for Computational Linguistics, Prague, Czech Republic, 858--867. https://www.aclweb.org/anthology/D07-1090Google Scholar
Ronan Le Bras, Swabha Swayamdipta, Chandra Bhagavatula, Rowan Zellers, Matthew E Peters, Ashish Sabharwal, and Yejin Choi. 2020. Adversarial Filters of Dataset Biases. In Proceedings of the 37th International Conference on Machine Learning.Google Scholar
Luke Breitfeller, Emily Ahn, David Jurgens, and Yulia Tsvetkov. 2019. Finding Microaggressions in the Wild: A Case for Locating Elusive Phenomena in Social Media Posts. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 1664--1674. https://doi.org/10.18653/v1/D19-1176Google Scholar
Susan E Brennan and Herbert H Clark. 1996. Conceptual pacts and lexical choice in conversation. Journal of Experimental Psychology: Learning, Memory, and Cognition 22, 6 (1996), 1482.Google ScholarCross Ref
Robin Brewer and Anne Marie Piper. 2016. "Tell It Like It Really Is" A Case of Online Content Creation and Sharing Among Older Adult Bloggers. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. 5529--5542.Google ScholarDigital Library
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, Hugo Larochelle, Marc'Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.). https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.htmlGoogle Scholar
Cristian Buciluă, Rich Caruana, and Alexandru Niculescu-Mizil. 2006. Model Compression. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Philadelphia, PA, USA) (KDD '06). Association for Computing Machinery, New York, NY, USA, 535--541. https://doi.org/10.1145/1150402.1150464Google ScholarDigital Library
Robert D Bullard. 1993. Confronting environmental racism: Voices from the grassroots. South End Press.Google Scholar
Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, Alina Oprea, and Colin Raffel. 2020. Extracting Training Data from Large Language Models. arXiv:2012.07805 [cs.CR]Google Scholar
Herbert H. Clark. 1996. Using Language. Cambridge University Press, Cambridge.Google Scholar
Herbert H. Clark and Adrian Bangerter. 2004. Changing ideas about reference. In Experimental Pragmatics. Springer, 25--49.Google Scholar
Herbert H. Clark and Meredyth A Krych. 2004. Speaking while monitoring addressees for understanding. Journal of Memory and Language 50, 1 (2004), 62--81.Google ScholarCross Ref
Herbert H. Clark, Robert Schreuder, and Samuel Buttrick. 1983. Common ground at the understanding of demonstrative reference. Journal of Verbal Learning and Verbal Behavior 22, 2 (1983), 245--258. https://doi.org/10.1016/S0022-5371(83)90189-5Google ScholarCross Ref
Herbert H. Clark and Deanna Wilkes-Gibbs. 1986. Referring as a collaborative process. Cognition 22, 1 (1986), 1--39. https://doi.org/10.1016/0010-0277(86) 90010-7Google ScholarCross Ref
Kimberlé Crenshaw. 1989. Demarginalizing the intersection of race and sex: A Black feminist critique of antidiscrimination doctrine, feminist theory and antiracist politics. The University of Chicago Legal Forum (1989), 139.Google Scholar
Benjamin Dangl. 2019. The Five Hundred Year Rebellion: Indigenous Movements and the Decolonization of History in Bolivia. AK Press.Google Scholar
Christian Davenport. 2009. Media bias, perspective, and state repression: The Black Panther Party. Cambridge University Press.Google Scholar
Ferdinand de Saussure. 1959. Course in General Linguistics. The Philosophical Society, New York. Translated by Wade Baskin.Google ScholarDigital Library
Terrance de Vries, Ishan Misra, Changhan Wang, and Laurens van der Maaten. 2019. Does object recognition work for everyone?. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 52--59.Google Scholar
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186. https://doi.org/10.18653/v1/N19-1423Google Scholar
Maeve Duggan. 2017. Online Harassment 2017. Pew Research Center.Google Scholar
Jennifer Earl, Andrew Martin, John D. McCarthy, and Sarah A. Soule. 2004. The use of newspaper data in the study of collective action. Annual Review of Sociology 30 (2004), 65--80.Google ScholarCross Ref
Ethan Fast, Tina Vachovsky, and Michael Bernstein. 2016. Shirtless and Dangerous: Quantifying Linguistic Signals of Gender Bias in an Online Fiction Writing Community. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 10.Google Scholar
William Fedus, Barret Zoph, and Noam Shazeer. 2021. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity. arXiv:2101.03961 [cs.LG]Google Scholar
Anjalie Field, Doron Kliger, Shuly Wintner, Jennifer Pan, Dan Jurafsky, and Yulia Tsvetkov. 2018. Framing and Agenda-setting in Russian News: a Computational Analysis of Intricate Political Strategies. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 3570--3580. https://doi.org/10.18653/v1/D18-1393Google ScholarCross Ref
Darja Fišer, Ruihong Huang, Vinodkumar Prabhakaran, Rob Voigt, Zeerak Waseem, and Jacqueline Wernimont (Eds.). 2018. Proceedings of the 2nd Workshop on Abusive Language Online (ALW2). Association for Computational Linguistics, Brussels, Belgium. https://www.aclweb.org/anthology/W18-5100Google Scholar
Susan T Fiske. 2017. Prejudices in cultural contexts: shared stereotypes (gender, age) versus variable stereotypes (race, ethnicity, religion). Perspectives on psychological science 12, 5 (2017), 791--799.Google Scholar
Antigoni Founta, Constantinos Djouvas, Despoina Chatzakou, Ilias Leontiadis, Jeremy Blackburn, Gianluca Stringhini, Athena Vakali, Michael Sirivianos, and Nicolas Kourtellis. 2018. Large Scale Crowdsourcing and Characterization of Twitter Abusive Behavior. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 12.Google ScholarCross Ref
Batya Friedman and David Hendry. 2012. The Envisioning Cards: A Toolkit for Catalyzing Humanistic and Technical Imaginations. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Austin, Texas, USA) (CHI '12). Association for Computing Machinery, New York, NY, USA, 1145--1148. https://doi.org/10.1145/2207676.2208562Google ScholarDigital Library
Batya Friedman and David G. Hendry. 2019. Value Sensitive Design: Shaping Technology with Moral Imagination. MIT Press.Google Scholar
Batya Friedman, Peter H. Kahn, Jr., and Alan Borning. 2006. Value sensitive design and information systems. In Human-Computer Interaction in Management Information Systems: Foundations, P Zhang and D Galletta (Eds.). M. E. Sharpe, Armonk NY, 348--372.Google Scholar
Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, Anish Thite, Noa Nabeshima, Shawn Presser, and Connor Leahy. 2020. The Pile: An 800GB Dataset of Diverse Text for Language Modeling. arXiv:2101.00027 [cs.CL]Google Scholar
Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé III, and Kate Crawford. 2020. Datasheets for Datasets. arXiv:1803.09010 [cs.DB]Google Scholar
Samuel Gehman, Suchin Gururangan, Maarten Sap, Yejin Choi, and Noah A. Smith. 2020. RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online, 3356--3369. https://doi.org/10.18653/v1/2020.findings-emnlp.301Google Scholar
Wei Guo and Aylin Caliskan. 2020. Detecting Emergent Intersectional Biases: Contextualized Word Embeddings Contain a Distribution of Human-like Biases. arXiv preprint arXiv:2006.03955 (2020).Google Scholar
Melissa Hart. 2004. Subjective decisionmaking and unconscious discrimination. Alabama Law Review 56 (2004), 741.Google Scholar
Deborah Hellman. 2008. When is Discrimination Wrong? Harvard University Press.Google Scholar
Peter Henderson, Jieru Hu, Joshua Romoff, Emma Brunskill, Dan Jurafsky, and Joelle Pineau. 2020. Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Learning. Journal of Machine Learning Research 21, 248 (2020), 1--43. http://jmlr.org/papers/v21/20-312.htmlGoogle Scholar
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).Google Scholar
Chao-Wei Huang and Yun-Nung Chen. 2019. Adapting Pretrained Transformer to Lattices for Spoken Language Understanding. In Proceedings of 2019 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU 2019). Sentosa, Singapore, 845--852.Google ScholarCross Ref
Hongzhao Huang and Fuchun Peng. 2019. An Empirical Study of Efficient ASR Rescoring with Transformers. arXiv:1910.11450 [cs.CL]Google Scholar
Ben Hutchinson, Vinodkumar Prabhakaran, Emily Denton, Kellie Webster, Yu Zhong, and Stephen Denuyl. 2020. Social Biases in NLP Models as Barriers for Persons with Disabilities. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 5491--5501. https://doi.org/10.18653/v1/2020.acl-main.487Google ScholarCross Ref
Eun Seo Jo and Timnit Gebru. 2020. Lessons from archives: strategies for collecting sociocultural data in machine learning. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 306--316.Google ScholarDigital Library
Leslie Kay Jones. 2020. #BlackLivesMatter: An Analysis of the Movement as Social Drama. Humanity & Society 44, 1 (2020), 92--110.Google ScholarCross Ref
Leslie Kay Jones. 2020. Twitter wants you to know that you're still SOL if you get a death threat --- unless you're President Donald Trump. (2020). https://medium.com/@agua.carbonica/twitter-wants-you-to-know-that-youre-still-sol-if-you-get-a-death-threat-unless-you-re-a5cce316b706.Google Scholar
Pratik Joshi, Sebastin Santy, Amar Budhiraja, Kalika Bali, and Monojit Choudhury. 2020. The State and Fate of Linguistic Diversity and Inclusion in the NLP World. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 6282--6293. https://doi.org/10.18653/v1/2020.acl-main.560Google ScholarCross Ref
Nurul Shamimi Kamaruddin, Amirrudin Kamsin, Lip Yee Por, and Hameedur Rahman. 2018. A Review of Text Watermarking: Theory, Methods, and Applications. IEEE Access 6 (2018), 8011--8028. https://doi.org/10.1109/ACCESS.2018. 2796585Google ScholarCross Ref
Brendan Kennedy, Drew Kogon, Kris Coombs, Joseph Hoover, Christina Park, Gwenyth Portillo-Wightman, Aida Mostafazadeh Davani, Mohammad Atari, and Morteza Dehghani. 2018. A typology and coding manual for the study of hate-based rhetoric. PsyArXiv. July 18 (2018).Google Scholar
Gary Klein. 2007. Performing a project premortem. Harvard business review 85, 9 (2007), 18--19.Google Scholar
Keita Kurita, Nidhi Vyas, Ayush Pareek, Alan W Black, and Yulia Tsvetkov. 2019. Measuring Bias in Contextualized Word Representations. In Proceedings of the First Workshop on Gender Bias in Natural Language Processing. 166--172.Google ScholarCross Ref
Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. arXiv preprint arXiv:1909.11942 (2019).Google Scholar
Amanda Lazar, Mark Diaz, Robin Brewer, Chelsea Kim, and Anne Marie Piper. 2017. Going gray, failure to hire, and the ick factor: Analyzing how older bloggers talk about ageism. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. 655--668.Google ScholarDigital Library
Christopher A Le Dantec, Erika Shehan Poole, and Susan P Wyche. 2009. Values as lived experience: evolving value sensitive design in support of value discovery. In Proceedings of the SIGCHI conference on human factors in computing systems. 1141--1150.Google ScholarDigital Library
Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, and Zhifeng Chen. 2020. GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding. arXiv:2006.16668 [cs.CL]Google Scholar
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).Google Scholar
Kadan Lottick, Silvia Susai, Sorelle A. Friedler, and Jonathan P. Wilson. 2019. Energy Usage Reports: Environmental awareness as part of algorithmic accountability. arXiv:1911.08354 [cs.LG]Google Scholar
Mette Edith Lundsfryd. 2017. Speaking Back to a World of Checkpoints: Oral History as a Decolonizing Tool in the Study of Palestinian Refugees from Syria in Lebanon. Middle East Journal of Refugee Studies 2, 1 (2017), 73--95.Google ScholarCross Ref
Marianna Martindale and Marine Carpuat. 2018. Fluency Over Adequacy: A Pilot Study in Measuring User Trust in Imperfect MT. In Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track). Association for Machine Translation in the Americas, Boston, MA, 13--25. https://www.aclweb.org/anthology/W18-1803Google Scholar
Sally McConnell-Ginet. 1984. The Origins of Sexist Language in Discourse. Annals of the New York Academy of Sciences 433, 1 (1984), 123--135.Google ScholarCross Ref
Sally McConnell-Ginet. 2020. Words Matter: Meaning and Power. Cambridge University Press.Google ScholarCross Ref
Kris McGuffie and Alex Newhouse. 2020. The Radicalization Risks of GPT-3 and Advanced Neural Language Models. Technical Report. Center on Terrorism, Extremism, and Counterterrorism, Middlebury Institute of International Studies at Monterrey. https://www.middlebury.edu/institute/sites/www.middlebury.edu.institute/files/2020-09/gpt3-article.pdf.Google Scholar
Douglas M McLeod. 2007. News coverage and social protest: How the media's protect paradigm exacerbates social conflict. Journal of Dispute Resolution (2007), 185.Google Scholar
Oren Melamud, Jacob Goldberger, and Ido Dagan. 2016. context2vec: Learning Generic Context Embedding with Bidirectional LSTM. In Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning. Association for Computational Linguistics, Berlin, Germany, 51--61. https://doi.org/10. 18653/v1/K16-1006Google ScholarCross Ref
Julia Mendelsohn, Yulia Tsvetkov, and Dan Jurafsky. 2020. A Framework for the Computational Linguistic Analysis of Dehumanization. Frontiers in Artificial Intelligence 3 (2020), 55. https://doi.org/10.3389/frai.2020.00055Google ScholarCross Ref
Kaitlynn Mendes, Jessica Ringrose, and Jessalynn Keller. 2018. # MeToo and the promise and pitfalls of challenging rape culture through digital feminist activism. European Journal of Women's Studies 25, 2 (2018), 236--246.Google ScholarCross Ref
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and Their Compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2 (Lake Tahoe, Nevada) (NIPS'13). Curran Associates Inc., Red Hook, NY, USA, 3111--3119.Google ScholarDigital Library
Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model cards for model reporting. In Proceedings of the conference on fairness, accountability, and transparency. 220--229.Google ScholarDigital Library
Robert C. Moore and William Lewis. 2010. Intelligent Selection of Language Model Training Data. In Proceedings of the ACL 2010 Conference Short Papers. Association for Computational Linguistics, Uppsala, Sweden, 220--224. https://www.aclweb.org/anthology/P10-2041Google Scholar
Kevin L. Nadal. 2018. Microaggressions and Traumatic Stress: Theory, Research, and Clinical Treatment. American Psychological Association. https://books.google.com/books?id=ogzhswEACAAJGoogle Scholar
Clifford Nass, Jonathan Steuer, and Ellen R Tauber. 1994. Computers are social actors. In Proceedings of the SIGCHI conference on Human factors in computing systems. 72--78.Google ScholarDigital Library
Lisa P. Nathan, Predrag V. Klasnja, and Batya Friedman. 2007. Value Scenarios: A Technique for Envisioning Systemic Effects of New Technologies. In CHI'07 Extended Abstracts on Human Factors in Computing Systems. ACM, 2585--2590.Google ScholarDigital Library
Wilhelmina Nekoto, Vukosi Marivate, Tshinondiwa Matsila, Timi Fasubaa, Taiwo Fagbohungbe, Solomon Oluwole Akinola, Shamsuddeen Muhammad, Salomon Kabongo Kabenamualu, Salomey Osei, Freshia Sackey, Rubungo Andre Niyongabo, Ricky Macharm, Perez Ogayo, Orevaoghene Ahia, Musie Meressa Berhe, Mofetoluwa Adeyemi, Masabata Mokgesi-Selinga, Lawrence Okegbemi, Laura Martinus, Kolawole Tajudeen, Kevin Degila, Kelechi Ogueji, Kathleen Siminyu, Julia Kreutzer, Jason Webster, Jamiil Toure Ali, Jade Abbott, Iroro Orife, Ignatius Ezeani, Idris Abdulkadir Dangana, Herman Kamper, Hady Elsahar, Goodness Duru, Ghollah Kioko, Murhabazi Espoir, Elan van Biljon, Daniel Whitenack, Christopher Onyefuluchi, Chris Chinenye Emezue, Bonaventure F. P. Dossou, Blessing Sibanda, Blessing Bassey, Ayodele Olabiyi, Arshath Ramkilowan, Alp Öktem, Adewale Akinfaderin, and Abdallah Bashir. 2020. Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online, 2144--2160. https://doi.org/10.18653/v1/2020.findings-emnlp.195Google Scholar
Maggie Nelson. 2015. The Argonauts. Graywolf Press, Minneapolis.Google Scholar
Timothy Niven and Hung-Yu Kao. 2019. Probing Neural Network Comprehension of Natural Language Arguments. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 4658--4664. https://doi.org/10.18653/v1/P19-1459Google ScholarCross Ref
Safiya Umoja Noble. 2018. Algorithms of Oppression: How Search Engines Reinforce Racism. NYU Press.Google Scholar
Debora Nozza, Federico Bianchi, and Dirk Hovy. 2020. What the [MASK]? Making Sense of Language-Specific BERT Models. arXiv:2003.02912 [cs.CL]Google Scholar
David Ortiz, Daniel Myers, Eugene Walls, and Maria-Elena Diaz. 2005. Where do we stand with newspaper data? Mobilization: An International Quarterly 10, 3 (2005), 397--419.Google Scholar
Charlotte Pennington, Derek Heim, Andrew Levy, and Derek Larkin. 2016. Twenty Years of Stereotype Threat Research: A Review of Psychological Mediators. PloS one 11 (01 2016), e0146487. https://doi.org/10.1371/journal.pone.0146487Google Scholar
Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. GloVe: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, 1532--1543. https://doi.org/10.3115/v1/D14-1162Google ScholarCross Ref
Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep Contextualized Word Representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, New Orleans, Louisiana, 2227--2237. https://doi.org/10.18653/v1/N18-1202Google ScholarCross Ref
Pew. 2018. Internet/Broadband Fact Sheet. (2 2018). https://www.pewinternet. org/fact-sheet/internet-broadband/Google Scholar
Aidan Pine and Mark Turin. 2017. Language Revitalization. Oxford Research Encyclopedia of Linguistics.Google Scholar
Francesca Polletta. 1998. Contending stories: Narrative in social movements. Qualitative sociology 21, 4 (1998), 419--446.Google Scholar
Vinodkumar Prabhakaran, Ben Hutchinson, and Margaret Mitchell. 2019. Perturbation Sensitivity Analysis to Detect Unintended Model Biases. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 5740--5745. https://doi.org/10.18653/v1/D19-1578Google Scholar
Laura Pulido. 2016. Flint, environmental racism, and racial capitalism. Capitalism Nature Socialism 27, 3 (2016), 1--16.Google ScholarCross Ref
Xipeng Qiu, Tianxiang Sun, Yige Xu, Yunfan Shao, Ning Dai, and Xuanjing Huang. 2020. Pre-trained Models for Natural Language Processing: A Survey. arXiv:2003.08271 [cs.CL]Google Scholar
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI Blog 1, 8 (2019), 9.Google Scholar
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research 21, 140 (2020), 1--67. http://jmlr.org/papers/v21/20-074.htmlGoogle Scholar
Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Austin, Texas, 2383--2392. https://doi.org/10.18653/v1/D16-1264Google ScholarCross Ref
Sarah T. Roberts, Joel Tetreault, Vinodkumar Prabhakaran, and Zeerak Waseem (Eds.). 2019. Proceedings of the Third Workshop on Abusive Language Online. Association for Computational Linguistics, Florence, Italy. https://www.aclweb.org/anthology/W19-3500Google Scholar
Anna Rogers, Olga Kovaleva, and Anna Rumshisky. 2021. A Primer in BERTology: What We Know About How BERT Works. Transactions of the Association for Computational Linguistics 8 (2021), 842--866.Google ScholarCross Ref
Ronald Rosenfeld. 2000. Two decades of statistical language modeling: Where do we go from here? Proc. IEEE 88, 8 (2000), 1270--1278.Google ScholarCross Ref
Corby Rosset. 2020. Turing-NLG: A 17-billion-parameter language model by Microsoft. Microsoft Blog (2020).Google Scholar
Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019).Google Scholar
Maarten Sap, Saadia Gabriel, Lianhui Qin, Dan Jurafsky, Noah A. Smith, and Yejin Choi. 2020. Social Bias Frames: Reasoning about Social and Power Implications of Language. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 5477--5490. https://doi.org/10.18653/v1/2020.acl-main.486Google ScholarCross Ref
Roy Schwartz, Jesse Dodge, Noah A. Smith, and Oren Etzioni. 2020. Green AI. Commun. ACM 63, 12 (Nov. 2020), 54--63. https://doi.org/10.1145/3381831Google ScholarDigital Library
Sabine Sczesny, Janine Bosak, Daniel Neff, and Birgit Schyns. 2004. Gender stereotypes and the attribution of leadership traits: A cross-cultural comparison. Sex roles 51, 11-12 (2004), 631--645.Google Scholar
Claude Elwood Shannon. 1949. The Mathematical Theory of Communication. University of Illinois Press, Urbana.Google Scholar
Sheng Shen, Zhen Dong, Jiayu Ye, Linjian Ma, Zhewei Yao, Amir Gholami, Michael W. Mahoney, and Kurt Keutzer. 2019. Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT. arXiv:1909.05840 [cs.CL]Google Scholar
Emily Sheng, Kai-Wei Chang, Premkumar Natarajan, and Nanyun Peng. 2019. The Woman Worked as a Babysitter: On Biases in Language Generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 3407--3412. https://doi.org/10.18653/v1/D19-1339Google Scholar
Katie Shilton, Jes A Koepfler, and Kenneth R Fleischmann. 2014. How to see values in social computing: methods for studying values dimensions. In Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing. 426--435.Google ScholarDigital Library
Joonbo Shin, Yoonhyung Lee, and Kyomin Jung. 2019. Effective Sentence Scoring Method Using BERT for Speech Recognition. In Asian Conference on Machine Learning. 1081--1093.Google Scholar
Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, and Bryan Catanzaro. 2019. Megatron-lm: Training multi-billion parameter language models using gpu model parallelism. arXiv preprint arXiv:1909.08053 (2019).Google Scholar
Irene Solaiman, Miles Brundage, Jack Clark, Amanda Askell, Ariel Herbert-Voss, Jeff Wu, Alec Radford, Gretchen Krueger, Jong Wook Kim, Sarah Kreps, et al. 2019. Release strategies and the social impacts of language models. arXiv preprint arXiv:1908.09203 (2019).Google Scholar
Karen Spärck Jones. 2004. Language modelling's generative model: Is it rational? Technical Report. Computer Laboratory, University of Cambridge.Google Scholar
Robyn Speer. 2017. ConceptNet Numberbatch 17.04: better, less-stereotyped word vectors. (2017). Blog post, https://blog.conceptnet.io/2017/04/24/conceptnet-numberbatch-17-04-better-less-stereotyped-word-vectors/.Google Scholar
Steven J. Spencer, Christine Logel, and Paul G. Davies. 2016. Stereotype Threat. Annual Review of Psychology 67, 1 (2016), 415--437. https://doi.org/10.1146/annurev-psych-073115-103235 arXiv:https://doi.org/10.1146/annurev-psych-073115-103235 PMID: 26361054.Google ScholarCross Ref
Katrina Srigley and Lorraine Sutherland. 2019. Decolonizing, Indigenizing, and Learning Biskaaybiiyang in the Field: Our Oral History Journey1. The Oral History Review (2019).Google Scholar
Greg J. Stephens, Lauren J. Silbert, and Uri Hasson. 2010. Speaker-listener neural coupling underlies successful communication. Proceedings of the National Academy of Sciences 107, 32 (2010), 14425--14430. https://doi.org/10.1073/pnas. 1008662107 arXiv:https://www.pnas.org/content/107/32/14425.full.pdfGoogle ScholarCross Ref
Emma Strubell, Ananya Ganesh, and Andrew McCallum. 2019. Energy and Policy Considerations for Deep Learning in NLP. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 3645--3650.Google ScholarCross Ref
Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, and Hua Wu. 2019. ERNIE: Enhanced Representation through Knowledge Integration. arXiv:1904.09223 [cs.CL]Google Scholar
Yu Sun, Shuohuan Wang, Yu-Kun Li, Shikun Feng, Hao Tian, Hua Wu, and Haifeng Wang. 2020. ERNIE 2.0: A Continual Pre-Training Framework for Language Understanding. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020. AAAI Press, 8968--8975. https://aaai.org/ojs/index.php/AAAI/article/view/6428Google ScholarCross Ref
Yi Chern Tan and L Elisa Celis. 2019. Assessing social and intersectional biases in contextualized word representations. In Advances in Neural Information Processing Systems. 13230--13241.Google Scholar
Ian Tenney, Dipanjan Das, and Ellie Pavlick. 2019. BERT Rediscovers the Classical NLP Pipeline. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 4593--4601. https://doi.org/10.18653/v1/P19-1452Google ScholarCross Ref
Trieu H. Trinh and Quoc V. Le. 2019. A Simple Method for Commonsense Reasoning. arXiv:1806.02847 [cs.AI]Google Scholar
Marlon Twyman, Brian C Keegan, and Aaron Shaw. 2017. Black Lives Matter in Wikipedia: Collective memory and collaboration around online social movements. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. 1400--1412.Google ScholarDigital Library
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.Google Scholar
Rob Voigt, David Jurgens, Vinodkumar Prabhakaran, Dan Jurafsky, and Yulia Tsvetkov. 2018. RtGender: A Corpus for Studying Differential Responses to Gender. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan. https://www.aclweb.org/anthology/L18-1445Google Scholar
Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel Bowman. 2018. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Association for Computational Linguistics, Brussels, Belgium, 353--355. https://doi.org/10. 18653/v1/W18-5446Google ScholarCross Ref
Zeerak Waseem, Thomas Davidson, Dana Warmsley, and Ingmar Weber. 2017. Understanding Abuse: A Typology of Abusive Language Detection Subtasks. In Proceedings of the First Workshop on Abusive Language Online. Association for Computational Linguistics, Vancouver, BC, Canada, 78--84. https://doi.org/10.18653/v1/W17-3012Google ScholarCross Ref
Joseph Weizenbaum. 1976. Computer Power and Human Reason: From Judgment to Calculation. WH Freeman & Co.Google ScholarDigital Library
Monnica T Williams. 2019. Psychology Cannot Afford to Ignore the Many Harms Caused by Microaggressions. Perspectives on Psychological Science 15 (2019), 38--43.Google ScholarCross Ref
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander Rush. 2020. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online, 38--45. https://doi.org/10.18653/v1/2020.emnlp-demos.6Google ScholarCross Ref
World Bank. 2018. Indiviuals Using the Internet. (2018). https://data.worldbank. org/indicator/IT.NET.USER.ZS?end=2017amp;locations=USamp;start=2015Google Scholar
Shijie Wu and Mark Dredze. 2020. Are All Languages Created Equal in Multilingual BERT?. In Proceedings of the 5th Workshop on Representation Learning for NLP. Association for Computational Linguistics, Online, 120--130. https://doi.org/10.18653/v1/2020.repl4nlp-1.16Google ScholarCross Ref
Dongling Xiao, Han Zhang, Yukun Li, Yu Sun, Hao Tian, Hua Wu, and Haifeng Wang. 2020. ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation. arXiv preprint arXiv:2001.11314 (2020).Google Scholar
Canwen Xu, Wangchunshu Zhou, Tao Ge, Furu Wei, and Ming Zhou. 2020. BERT-of-Theseus: Compressing BERT by Progressive Module Replacing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 7859--7869. https://doi.org/10.18653/v1/2020.emnlp-main.633Google ScholarCross Ref
Peng Xu, Chien-Sheng Wu, Andrea Madotto, and Pascale Fung. 2019. Clickbait? Sensational Headline Generation with Auto-tuned Reinforcement Learning. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 3065--3075. https://doi.org/10.18653/v1/D19-1303Google Scholar
Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, and Colin Raffel. 2020. mT5: A massively multilingual pre-trained text-to-text transformer. arXiv:2010.11934 [cs.CL]Google Scholar
Wei Yang, Yuqing Xie, Aileen Lin, Xingyu Li, Luchen Tan, Kun Xiong, Ming Li, and Jimmy Lin. 2019. End-to-End Open-Domain Question Answering with BERTserini. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations). Association for Computational Linguistics, Minneapolis, Minnesota, 72--77. https://doi.org/10.18653/v1/N19-4013Google Scholar
Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in neural information processing systems. 5753--5763.Google Scholar
Ze Yang, Can Xu, Wei Wu, and Zhoujun Li. 2019. Read, Attend and Comment: A Deep Architecture for Automatic News Comment Generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 5077--5089. https://doi.org/10.18653/v1/D19-1512Google Scholar
Meg Young, Lassana Magassa, and Batya Friedman. 2019. Toward Inclusive Tech Policy Design: A Method for Underrepresented Voices to Strengthen Tech Policy Documents. Ethics and Information Technology (2019), 1--15.Google Scholar
Ofir Zafrir, Guy Boudoukh, Peter Izsak, and Moshe Wasserblat. 2019. Q8BERT: Quantized 8Bit BERT. arXiv:1910.06188 [cs.CL]Google Scholar
Nico Zazworka, Rodrigo O. Spínola, Antonio Vetro', Forrest Shull, and Carolyn Seaman. 2013. A Case Study on Effectively Identifying Technical Debt. In Proceedings of the 17th International Conference on Evaluation and Assessment in Software Engineering (Porto de Galinhas, Brazil) (EASE '13). Association for Computing Machinery, New York, NY, USA, 42--47. https://doi.org/10.1145/2460999.2461005Google ScholarDigital Library
Rowan Zellers, Yonatan Bisk, Roy Schwartz, and Yejin Choi. 2018. SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 93--104. https://doi.org/10.18653/v1/D18-1009Google ScholarCross Ref
Haoran Zhang, Amy X Lu, Mohamed Abdalla, Matthew McDermott, and Marzyeh Ghassemi. 2020. Hurtful words: quantifying biases in clinical contextual word embeddings. In Proceedings of the ACM Conference on Health, Inference, and Learning. 110--120.Google ScholarDigital Library
Jieyu Zhao, Tianlu Wang, Mark Yatskar, Ryan Cotterell, Vicente Ordonez, and Kai-Wei Chang. 2019. Gender Bias in Contextualized Word Embeddings. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 629--634. https://doi.org/10.18653/v1/N19-1064Google ScholarCross Ref
Li Zhou, Jianfeng Gao, Di Li, and Heung-Yeung Shum. 2020. The Design and Implementation of XiaoIce, an Empathetic Social Chatbot. Computational Linguistics 46, 1 (March 2020), 53--93. https://doi.org/10.1162/coli_a_00368Google ScholarDigital Library

Index Terms

On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

POS tagger for Urdu using Stochastic approaches
ICTCS '16: Proceedings of the Second International Conference on Information and Communication Technology for Competitive Strategies

Part-of-Speech tagging is a problem of Natural language processing. It is a process of labeling an accurate part of speech for each word of a given corpus sentence. There are various approaches like rule based, stochastic and hybrid that are mainly used ...
Read More
Multilingual stochastic n-gram class language models
ICASSP '96: Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01

Stochastic language models are widely used in continuous speech recognition systems where a priori probabilities of word sequences are needed. These probabilities are usually given by n-gram word models, estimated on very large training texts. When n ...
Read More
Automatic stochastic tagging of natural language texts

Five language and tagset independent stochastic taggers, handling morphological and contextual information, are presented and tested in corpora of seven European languages (Dutch, English, French, German, Greek, Italian and Spanish), using two sets of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

FAccT '21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency
March 2021
899 pages
ISBN:9781450383097
DOI:10.1145/3442188

Copyright © 2021 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 March 2021
Check for updates
Qualifiers
- Article
- Research
- Refereed limited
Conference

Upcoming Conference

FAccT '24

The 2024 ACM Conference on Fairness, Accountability, and Transparency

June 3 - 6, 2024

Rio de Janeiro , Brazil
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1,096
  Total Citations
  View Citations
- 307,032
  Total Downloads
- Downloads (Last 12 months)169,937
- Downloads (Last 6 weeks)14,682
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜

FAccT '21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency

ABSTRACT

References

Cited By

Index Terms

Recommendations

POS tagger for Urdu using Stochastic approaches

Multilingual stochastic n-gram class language models

Automatic stochastic tagging of natural language texts

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜

FAccT '21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency

ABSTRACT

References

Cited By

Index Terms

Recommendations

POS tagger for Urdu using Stochastic approaches

Multilingual stochastic n-gram class language models

Automatic stochastic tagging of natural language texts

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media