AI Fail: To Popularize and Scale Chatbots, We Need Better Data

--

The lack of culturally reliable machine-ready datasets will prohibit us from creating AI products and services that are relevant to more than the status quo.

Storytellers and software engineers at IVOW developing the cultural storyteller Sina

by Davar Ardalan and Kee Malesky

As a tech company, designing the cultural storytelling bot, Sina, we wanted to better understand the data landscape before training our AI. In early July, together with Topcoder, we launched the Women in History Data Ideation Challenge. The results confirm some of our assumptions and fears: We can’t build a robust and culturally rich chatbot without better datasets and machine-ready content.

The goal of our ideation challenge was to get public data sources of women throughout history and suggest how that data could be used to gain new insights for AI products and solutions with a focus on women. In our case, we wanted to understand how to source stories on women for our chatbot Sina. The primary deliverable was to get a well thought-out overview of what is possible, what relevant data can be found and where, and how the data should be collected.

We are grateful to our sponsors and AI advisors who have been on this journey with us every step of the way. Our team has reviewed the 7 winning submissions. The results greatly expanded our understanding of the possibilities and imperatives facing us as we proceed. Here’s what we know to be true:

Historic gender and cultural biases perpetuate in the ecology of AI;

Current classification of gender, ethnicity, and race in Wikipedia is flawed and lacking;

Improved data ecologies that account for gender, culture, and history to create better algorithms are vital for popularizing future AI products.

We know that AI systems have been designed with inherent gender and racial biases. (You can track examples of bias in AI here.) This means that as more AI products and solutions are created, more historic cultural and gender biases will continue to be mapped into those AI systems.

Reacting to the data ideation challenge, Clinton Bonner, vice president of marketing at Topcoder, says that “Greater cultural representation and understanding built into machine learning will help create more effective AI, and that stands to benefit all. It starts with the action and partnership IVOW is driving. Topcoder is proud to be an important part of how IVOW is executing on their purpose and the impact we can have together.”

Our data ideation challenge with Topcoder just ended and we are sharing early findings

Summary Findings from Seven Winning Challenge Participants:

Not surprisingly, most of the Topcoder challenge participants used Wikipedia as a source for information on women in history and fiction. This in and of itself proves the limits of content we can draw on. Wikipedia has some 6 million articles and is a viable base for data collection and training data for future NLP-Machine Learning based data exploration. Having said this, most participants acknowledged the difficulties with Wikipedia — in particular, biases, poor writing, and boring raw facts; as well as the lack of global cultural references.

Top 4 Most Popular Ai Articles:

1. Natural Language Generation:
The Commercial State of the Art in 2020

2. This Entire Article Was Written by Open AI’s GPT2

3. Learning To Classify Images Without Labels

4. Becoming a Data Scientist, Data Analyst, Financial Analyst and Research Analyst

On Gender Bias:

Word embeddings trained on Wiki disproportionately associate male terms with science terms, and female terms with art terms, which could amplify stereotypes and lead to replicating biases. This was also shown in this 2016 report, “Semantics derived automatically from language corpora contain human-like biases.”

On Classification of Culture & Ethnicity:

The results show that even with many words joined under ontology and property in the query, the fields CulturalSource, TimePeriod, and Relations were still hard to fully populate. This suggests the necessity to use machine learning to extract values for these fields from TextualAbstract. Some applicants suggested creating categories or keywords, based on location, language, occupation, etc.

The significance of proper tagging was noted in one submission, an important consideration given the recent news about MIT’s racist, misogynistic database (more below). Another suggestion was to use machine learning to enable AI to distinguish between “important” and “unimportant” sentences in a text, as a way to write a better profile.

On Current Datasets:

A few of the applicants used other sources (Pantheon, Women in Hollywood, Fortune, etc.), which was encouraging since we know that Wikipedia is not sufficient. Pantheon datasets contain names of many historical female figures from various domains (politics, science, culture, sports, etc), with their cultural background (when and where they were born). For each person in the dataset, a short summary is collected from the corresponding Wiki article.

However, overall, in the case of “women in history,” Pantheon and Wikipedia mainly include well-known women in history, and famous people’s entities are subject to a much more intensive peer-review process, which can result in increased accuracy of information over time. But this also means that there are many cultures which are not properly documented because information about them on the web is insufficient. We want and need to collect data not just on famous women, but also on obscure figures from history, fiction, and mythology.

In addition, one of the sample women in history datasets gathered for this competition included the occupation for “pornographic actress” with 174 names included in a sample of 13,000+ famous women. This also indicates how critical it is to be clear about what emphasis we want to bring to our machine-readable AI datasets about women in history and their contributions. For context, this same dataset included the names of 7 Archeologists, 10 Anthropologists, 11 Architects, 211 Models and 3,950 Actors.

ML Jobs

What’s next? How can we collaborate on this work?

For those of us marching into automation, we must understand that in many ways, the current data ecology is incomplete, flawed, racist, and derogatory. That lack of diverse data and the prevalence of biased data hampers innovation and causes significant setbacks for future AI products and solutions.

Most recently, MIT announced that its 80 million “tiny images” dataset would be taken down immediately. According to the authors of the dataset, “Biases, offensive and prejudicial images, and derogatory terminology alienates an important part of our community — precisely those that we are making efforts to include. It also contributes to harmful biases in AI systems trained on such data.” The statement, issued on June 29, further acknowledged that the presence of such prejudicial data hurts efforts to foster “a culture of inclusivity in the computer vision community.”

Future automated decisions cannot be made based on current data ecosystems.

Putting Women on the AI Map: Phase 1 of our data ideation challenge with Topcoder proves that we must create machine-ready datasets focused on gender and culture so that AI products and services, like ours, can be more inclusive. We are actively designing a data focus for Phase 2 and looking for sponsors. The sponsorship packets are here.

Whitepaper: We are working on a whitepaper on this ideation challenge and will share the results in more detail by the end of August.

Events: We will be speaking about early results on Wednesday, July 22, as part of our Cultural AI and Brands virtual event, and also on September 8 as part of the Women in AI Global Summit.

If you’re interested in supporting us with research in this area please be in touch Davar@ivow.ai.

IVOW AI is an early stage startup focusing on cultural intelligence in AI. We address a much-needed market: the convergence of artificial intelligence to preserve culture with the need for marketers to better understand culture. We are part of WAIAccelerate, the Women in AI accelerator program; a KiwiTech Portfolio company; and incubating at We Work Labs in DC as we build our MVP.

Relevant Resources:

Women in History Data Ideation Challenge

Teaching Machines Through Our Stories — Women in History Ideation Challenge

Bias in AI: Examples Tracker From U.C. Berkeley Haas Center for Equity, Gender and Leadership

MIT Takes Down 80 Million Tiny Images data set due to racist and offensive content

Mitigating Bias in Artificial Intelligence U.C. Berkeley Haas Center for Equity, Gender and Leadership

Talking about Bias in AI with your Team With Case Study

Important Types of Bias to Tackle When Building AI Tools

Venture Beat: How Google Addresses ML Fairness and Eliminates Bias in Conversational AI Algorithms

Gender Shades Intersectional Accuracy Disparities inCommercial Gender Classification

Why Everything From Transit to iPhones Is Biased Toward Men

Diversity in AI is not your problem, it’s hers

StereoSet measures racism, sexism, and other forms of bias in AI language models

Gender Shades Intersectional Accuracy Disparities inCommercial Gender Classification

AI is Failing Women

Why Everything From Transit to iPhones Is Biased Toward Men

Invisible Women: Expose Data Bias In A World Designed for Men

Don’t forget to give us your 👏 !

--

--