Nora | Nora Graichen

Nora Graichen

Computational Linguist and PhD student in the COLT research group, learning at the Universitat Pompeu Fabra in Barcelona, Catalunya; supervised by Gemma Boleda.

Experience

Research Assistant
Universidad Politécnica de Cataluña, Departament de Ciències de la Computació October 2023 – October 2024
Skilled in automating business processes from natural language descriptions. Proficient in dataset creation, model training, testing, and evaluation. Experienced in state-of-the-art NLU research, developing data sets of NL statements and formal semantic results; strong background in documentation and report writing.
NLP Engineer
Process Talks March 2022 – October 2024
Proficient in data preprocessing, organization, model development, and evaluation. Skilled in refining dataset annotation criteria and maintaining version control for datasets and associated documentation. Experienced in software and multimedia analysis and design.
Student Assistant
SFB - A5 Information Density and Linguistic Encoding December 2018 – May 2021
Data preprocessing and organization. Designing stimuli for and conducting eye-tracking experiments with adults and children. Running and monitoring experimental code to support research activities.

Education

MSc in Language Science and Technology, 2023
Universität des Saarlandes, Germany. September 2020 – March 2023
Erasmus Mundus. Theoretical and Applied Linguistics, 2022
Universitat Pompeu Fabra, Spain. September 2021 – March 2022
BSc Computational Linguistics, 2020
Universität des Saarlandes, Germany. September 2017 – August 2020

How LLMs describe/predict 😉 me:

Unraveling linguistic mysteries through computational prowess, a PhD student in Computational Linguistics, learning at the forefront of language and technology. Claude 3 Sonnet
PhD student at the Universitat Pompeu Fabra in Barcelona, Spain, bridging language and technology to uncover the secrets of human communication. Meta-Llama-3-70B-Instruct
PhD Candidate in the vibrant field of Computational Linguistics at the UPF in Barcelona, captivated by the nexus of language and tech, crafting models that decode human speech. Yi-Large

📚 My Interests

Intricacies of Human Language
Natural Language Processing, particularly Machine Translation (MT), and ongoing exploration of low-resource MT
Natural Language Understanding for process descriptions
Spending time outdoors in nature 🍃

Publications

Enriching Wayúunaiki–Spanish Neural Machine Translation with Linguistic Information

Association for Computational Linguistics, Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP) ∙ July 2023

We present the first neural machine translation system for the low-resource language pair Wayúunaiki–Spanish and explore strategies to inject linguistic knowledge into the model to improve translation quality. We explore a wide range of methods and combine complementary approaches. Results indicate that incorporating linguistic information through linguistically motivated subword segmentation, factored models, and pretrained embeddings helps the system to generate improved translations, with the segmentation contributing most. In order to evaluate translation quality in a general domain and go beyond the available religious domain data, we gather and make publicly available a new test set and supplementary material. Although translation quality as measured with automatic metrics is low, we hope these resources will facilitate and support further research on Wayúunaiki.

Not a nuisance but a useful heuristic: Outlier dimensions favor frequent tokens in language models

Association for Computational Linguistics, Proceedings of the 8th BlackboxNLP Workshop ∙ March 2025

We study last-layer outlier dimensions, i.e. dimensions that display extreme activations for the majority of inputs. We show that outlier dimensions arise in many different modern language models, and trace their function back to the heuristic of constantly predicting frequent words. We further show how a model can block this heuristic when it is not contextually appropriate, by assigning a counterbalancing weight mass to the remaining dimensions, and we investigate which model parameters boost outlier dimensions and when they arise during training. We conclude that outlier dimensions are a specialized mechanism discovered by many distinct models to implement a useful token prediction heuristic.

Experience

Research Assistant

NLP Engineer

Student Assistant

Education

MSc in Language Science and Technology, 2023

Erasmus Mundus. Theoretical and Applied Linguistics, 2022

BSc Computational Linguistics, 2020

Enriching Wayúunaiki–Spanish Neural Machine Translation with Linguistic Information

Not a nuisance but a useful heuristic: Outlier dimensions favor frequent tokens in language models