Data/Machine Learning trends to watch in 2021

Published in

Becoming Human: Artificial Intelligence Magazine

10 min readJan 27, 2021

Maybe in 2021 we’ll get to go out in public. Photo by Andrea Cau on Unsplash

Most of the articles and news stories that I’ve read in January started out with some statement about how 2020 was wild/unprecedented/unpredictable/other adjective. So I’m going to skip that and jump right into things.

At Flatland Data Solutions we’re optimistic that 2021 will bring some revolutionary changes and advancements into the Data and Machine Learning space. These are the trends I expect to be big in the coming year. Drop a comment and let me know what you think!

Ethics and Machine Learning Bias

One of the major things that 2020 brought to the front of the news was discussions around bias and discrimination. There are 2 items in particular that I hope will continue to be prominent discussions in the coming year, particularly in regards to machine learning.

The first topic has been the focus of some discussion for many years, but in 2020 the BLM and defund the police movements highlighted the importance of this work. That topic is the bias that exists in most datasets and machine learning models. It’s common knowledge in the machine learning field that predictive models reflect and amplify human biases. This is an issue for every machine learning model, and everyone that works in ML needs to be aware of the impact failing to address this issue can have. This highlights the need for experts when it comes to the creations of machine learning algorithms (rather than the application of off-the-shelf machine learning solutions) that can have impacts in peoples’ lives. However, there are few places where this is more concerning than when it comes to predictive policing. Cities are recognizing that predictive policing amplifies racial profiling and banning it.

This extends to other sectors as well. For example, in medicine, biases can cause misdiagnoses in individuals from underrepresented populations. Additionally, in finance, algorithms can lead to discriminatory lending practices. In 2021 I hope that more scrutiny will be placed on predictive systems, and that more data scientists will be aware of these issues and will put in the extra work to identify and reduce bias in their models. I also believe that, in order to identify and reduce bias, there will be more research into and application of explainable models in the coming year.

Another key event in 2020 was Google’s firing of Timnit Gebru. Timnit Gebru is a strong advocate for diversity, and co-authored a paper identifying the risks of NLP models. This story is still playing out, but there has already been some waves caused by the event. The newly formed Alphabet Workers’ Union has questioned Google’s commitment to ethics following the event. In 2021 expect more discussion around corporate ethics and more groups willing to stand up and speak out.

Privacy

As you’re probably aware, at the beginning of January the US Capitol was stormed. The part that makes the incident relevant to this discussion is the role social media platforms played and responded to the event. Many social media companies silenced Donald Trump during or after the incident. While this isn’t strictly a machine learning issue (although machine learning will certainly be applied), censorship is a data science topic, and warrants discussion. There are 2 things here that are worth noting.

The first is that social media platforms are likely to be more willing to apply censorship. While in this instance, social media companies squelching calls for insurrection was undoubtedly the right thing to do, seeing a private company silence the president of the united states forces us to consider just how much control private companies have over public discourse, and how much they ought to. The approaches that these platforms are using to root out extremist groups rely on machine learning algorithms and will kick off many debates around bias, free speech, moderation algorithms, and tech’s role as a media outlet.

More people are realizing that “I don’t have anything to hide” isn’t a reason to sacrifice privacy. Photo by Matthew Henry on Unsplash

The second is that this will affect user behaviour. I expect more users to shift to apps that employ end-to-end encryption and have stronger privacy protections. Conversely, we’re seeing app platforms and hosting companies shut down apps that extremist groups utilize because of their privacy. Expect a lot of discussion and perhaps even legislation around social media platforms in regards to privacy and censorship.

Datasets and Data Augmentation

As more people become aware of the biases that can exist in ML models, I expect to see more requests for unbiased datasets. However, a completely unbiased dataset does not exist, so I expect to see opportunities in data augmentation that work to either combat biases, or increase the effectiveness of less-biased datasets. In 2020 facebook AI published their work on DEiT (Data Efficient Image Transformers). While this isn’t targeted towards biases at all, I believe it will be a hot topic.

We also saw MIT take down the tiny images dataset due to offensive content. This is likely the beginning of a shift towards more careful curation of datasets, and more conscientious dataset selection.

Data Quality

Recently, researchers began focusing more on smaller datasets with high quality, rich information, rather than excessively large datasets. A paper was even proposed for “Less Than One”-Shot learning. This means that a model can learn to identify more classes than there are examples in a dataset. The approach is mind-blowing, imagine being able to train a machine learning model to recognize all 10 digits by just showing it 5 images. While this technique requires that a dataset be constructed for this exact purpose, and therefore isn’t practical for the average user, it shows that there might be an argument to reversing the machine learning community’s proclivity towards “quantity over quality”. This is something that will require care, however, because it is harder to create a representative, unbiased dataset from fewer samples.

NLP

I won’t be impressed until a robot can reply to my slack messages for me. Photo by Akhil Yerabati on Unsplash

Large strides were made in NLP in 2020 with the release of GPT-3. However, there was a lot of controversy around GPT-3, which ties back to the ethics section of this blog. GPT-3 has been exclusively licensed to Microsoft, which means that Microsoft has full, exclusive access to the code and model. Others can currently make use of the model through a limited API, but there’s no guarantee how long this access will be allowed. This is an interesting decision considering that the model was created by OpenAI, a foundation whose charter states that they exist to ensure that “[AGI (Artificial General Intelligence)] benefits all of humanity”. One would expect that the decision to not release this model goes against their charter; however, many agree with this decision. Opening this model up to the world could lead to abuse in many different scenarios. One such risk is data leakage.

By not releasing GPT-3, there is room for competition. The GPT-3 model cost millions of dollars to train, but there are companies with the resources to compete, and with VC funding there’s always the opportunity for companies to crop up with the goal of creating a licensable GPT-3 competitor. NLP will be an interesting field to watch, and I’m very excited to see what breakthroughs this year will bring.

Cars/Drones/Logistics

This space is definitely worth mentioning, but a lot of big companies dropped or cut back on their self driving car projects in 2020. Most notably, Uber sold their self-driving unit to Aurora. This may have been an indication that growth in the field may be declining, but it could also be a symptom of a pandemic-induced tightening of pursestrings. In 2020 we did see a lot of growth for startups that supply tech or AI for self-driving cars, such as Luminar, so things might not all be downhill for self-driving tech (pun intended).

Despite Uber cutting back on their self-driving unit, many companies began testing self-driving trucks. Amazon also continues to automate last mile delivery, so perhaps in the next few years we’ll see automation creep into all aspects of delivery networks.

One interesting sector that might have some room for growth is the aftermarket self-driving addon space. Comma.ai offers an aftermarket self driving harness, but they don’t have a lot of competition. Now, to be fair this is a much harder problem than integrated self-driving tech, but there’s also a potentially larger market.

Federated Learning

As a solution to some of the ethics and privacy questions mentioned above, I believe research into federated learning will surge. Federated learning can allow for the building of machine learning models using users’ data while also protecting that users’ privacy. Currently, there are a lot of roadblocks to overcome, and I don’t know if we’ll ever see large models like ImageNet models be trained in a decentralized fashion like this, but for simpler tasks federated learning might be the solution to some more privacy-sensitive problems.

Brain drain + Education changes

There are at least two things that the pandemic disrupted in 2020: education and tech hubs.

I missed out on my convocation last year, but maybe I’ll catch the next one. Photo by Mohammad Shahhosseini on Unsplash

Firstly, universities shifted their studies to online. This has caused a lot of people to stop and consider if the exorbitant tuition fees is worth it, or if their money and time would be better spent taking a short online course or bootcamp. As someone that spent far too long in university, I will always advocate for the broad foundation that a university education will lay, but many people are questioning this. Add this to the fact that there is very little international tuition coming in, so universities are going to have to tighten up their educational offerings in order to stay above board. I don’t believe that universities will fail, but every student has encountered at least one tenured professor whose teaching style left a lot to be desired. It’s these things that will need to be improved in order for universities to remain attractive to potential students.

Secondly, let’s talk about brain drain. In the past there’s always been an influx of fresh graduates and smart people to tech hubs like San Francisco and New York. When the pandemic hit, we saw an efflux of many of the same people. When this pandemic is over, many bright grads might decide that instead of moving to San Francisco and paying exorbitant rent, they can stay in their home city and work remotely. Personally, I had previously said that I’d never work remotely, but here I am, writing this in my home office, where I plan to spend at least the next year. Many of my colleagues have opted to do the same thing when given the option to extend their remote employment status.

It will be interesting to see how many tech companies remain remote-friendly in the coming year, and how people will choose to locate themselves if they have the option.

Biology

One of the biggest breakthroughs in computational biology in the past year was AlphaFold’s success in protein folding. This success will have huge implications in the pharmaceutical industry. We may not see any solid applications of this within the year, but this paves the way for engineered and novel protein creation, which could lead to designer drugs with reduced side-effects.

Art

Image generated from the prompt “A lamp in the shape of a pikachu” https://openai.com/blog/dall-e/

Finally, the last sector that I’d like to bring attention to is art. Arguably the most-studied aspect of machine learning is image recognition, but image creation has made some huge leaps and bounds in the past year. The same group that created GPT-3 also created DALL-E, an image generator that is, in all honesty, unreasonably good. Not only can you ask it to create the perfect stock photo, but you can also prompt DALL-E to create it in the style you like. Here at Flatland we also wanted to join in on the fun, so we used AI to create new Pokemon video game assets.

Conclusion

That wraps things up. I’m looking forward to next year to see how these predictions fared, and to look back at all the exciting things that I’m sure are going to happen this year. Are there any topics that I missed? Leave a comment!