https://ift.tt/3xNj6io MONTHLY EDITION TDS editors and editorial associates reflect on the year’s most memorable articles Photo by Jessi...
MONTHLY EDITION
TDS editors and editorial associates reflect on the year’s most memorable articles
In a year full of uncertainty, pandemic-driven stress, and growing concerns around environmental and political trends, the energy and creativity of our authors were among the things that helped all of us at TDS to stay grounded. To show our gratitude, mark the upcoming end of the year, and celebrate our collective achievements, we’re sharing some of the posts that stood out the most to members of our extended community. We hope you find them as illuminating and insightful as we have.
Before we dive in, though, we’d also like to take a moment to thank all of you, our readers, for the time and passion you invest in TDS, and for your ongoing support, which makes our work possible to begin with. Here’s to you, and to a 2022 full of learning, discovery, and—hopefully—calmer times.
Julia Nikulski, Volunteer Editorial Associate
A few months ago, I used the GPT-2 model for the first time for a little side project. At first, I was really excited by the text it was able to generate, but then I realized that it used quite a bit of problematic language. So I started to look into the topic of toxic language generation and researched the ways these models are trained. This article by Jack Bandy provides an excellent overview of the content and issues with BookCorpus, a popular dataset used for pre-training some of the most popular language models. The data we feed to these models matters beyond pure performance metrics.
After you realize that the language models you use have inherent biases that will translate into the language they generate, you have to wonder what you can do about it. Alberto Romero’s article provides an overview of the process that OpenAI developed to adapt these models to society — by reducing their bias. While this approach is not a final and perfect answer, it is an important step in the right direction.
- Dirty Secrets of BookCorpus, a Key Dataset in Machine Learning
- OpenAI PALMS — Adapting GPT-3 to Society
Elliot Gunn, Editor
One of the best things I stumbled on this year was a live data science coding competition called SLICED. It seemingly came out of nowhere to dominate data science twitter every Tuesday evening during the summer. As Jin Hyun Cheong, PhD wrote back in June, the two co-hosts, Nick Wan and Meg Risdal, created a fun and welcoming space for data enthusiasts to code along with their own Kaggle submissions as they watched the contestants try to build the best model or visualization of the night. We were able to snag an interview with them to learn how the project originated and where they see SLICED heading next. I think I speak for all SLICED fans when I say, I can’t wait for Season 2!
Three reasons to watch #SLICED: A real time data science competition
Sara A. Metwalli, Volunteer Editorial Associate
Now that 2021 is coming to an end, I took some time to reflect on some TDS articles that I read and stood out for me — and going through my bookmarks list, two articles resonated with me more than others.
As a person who loves visualization and completely understands how important it can be not just for data science but for any tech field, Terence Shin’s “The 10 Best Data Visualizations of 2021” captured my attention. Learning to create good visualizations is not an easy-to-develop skill. It requires a lot of practice, not just on creating a visualization but also on storytelling techniques. When I was improving my storytelling and visualization skills, I went through countless great visualizations by data scientists and visualizers; doing so inspired me to improve my style and showed me what it takes for a visualization to be effective. This article contains 10 visualizations that will surely inspire you to take your skills to the next level.
The second article that stuck with me is Sharan Kumar Ravindran’s “No Experience? Here is How To Get Your First Data Science Job.” One of the most challenging things about pursuing a career in data science is not learning the technical or soft skills; instead, it is landing your first job with no previous experience. Job hunting, in general, is a tedious process, and it’s even worse when you have no experience in the field. This article will give you some good tips on how to go about finding your first role in data science and help you stay motivated and not get discouraged by how frustrating the process can be.
- The 10 Best Data Visualizations of 2021
- No Experience? Here is How To Get Your First Data Science Job
Caitlin Kindig, Editor
I’m especially drawn to articles that address issues that are important to me or revolve around unique topics that we don’t see too often. When looking back on this past year of content at TDS, Nina Sweeney instantly comes to mind as not only a new author but someone who always connects her projects to the bigger ethical picture. Her September 5 piece “Meeting Women Where They Are” provides a close look at urban travel as a gendered experience, effectively combining the fascinating topic of public transportation and everyday female realities. Sweeney’s article was inspired by Invisible Women: Data Bias in a World Designed for Men by Caroline Criado Perez, who addresses public transportation a few times throughout her work. Sweeney used NYC subway MTA turnstile data to determine where women are most likely to be at certain hours due to the unpaid household labour and errands that they often perform, and subsequently, to establish the best subway stations for a Women in Tech organization to increase exposure via email collection.
The weekly TDS Podcast often addresses topics such as AI ethics and policy, and we were lucky enough to have Margaret Mitchell, founder of Ethical AI and co-founder of ML Fairness at Google Research, on the podcast. She touched on ideas surrounding diverse perspectives within AI building processes, different forms of bias, and splitting test sets into subsets to show performance diversity. Like many of our guests, her insights regarding the importance of diversity in the development of all sorts of technology stayed consistent throughout the episode, and her commentary on navigating AI’s complex moral issues stayed with me for weeks after its publication.
Carlos Mougan, Volunteer Editorial Associate
Machine learning and data science have achieved astonishing results in the last few years. But, with its integration into day-to-day life, there has been rising concern around the need for responsible AI. Over the course of 2021 we have seen very fascinating blog posts on explainability, AI ethics, fairness, and many more. Here I share some of the posts that enlightened my year!
I enjoyed this practical post about measuring and understanding fairness by Divya Gopinath, who tackles one of the toughest challenges: how to define what “fair” means in the context of data science.
This TDS Podcast episode with Joaquin Quiñonero-Candela came a few months before he quit as Distinguished Tech Lead for Responsible AI at facebook; as I mentioned back in May, “that kind of reach comes with great responsibility — among other things, the responsibility to develop AI tools that are ethical, fair, and well characterized.”
Also, some self-branding: this recent post of mine is a scientific divulgation from a previously published paper at the European Congress of Machine Learning — a workshop on bias and fairness in the context of the European Central Bank.
- What does it mean to be fair? Measuring and understanding fairness
- Responsible AI at Facebook
- Explainable Artificial Intelligence (xAI). But, for who?
Ludovic Benistant, Editor
I loved this article, “Be Careful When Interpreting Predictive Models in Search of Causal Insights,” written by Scott Lundberg, Eleanor Dillon, Jacob LaRiviere, Jonathan Roth, and Vasilis Syrgkanis from Microsoft. Their article shows particularly well why we should be careful when estimating causal effects from machine learning models. They made a strong case for their thesis (with great graphs) and offered some key takeaways for our community. It’s a 16-minute read and totally worth it!
This year, we also received great articles from data scientists, data analysts, and machine learning engineers sharing their typical workday: What are they typically working on? What does their schedule look like? What are their daily challenges? Here are two reads that you might find interesting. The first one, “The Daily Life of a Health Data Scientist,” is by Lucy Rothwell, and the second, “Life as a data analyst in a research organization,” is by Emily A. Halford.
- Be Careful When Interpreting Predictive Models in Search of Causal Insights
- The Daily Life of a Health Data Scientist
- Life as a data analyst in a research organization
Ben Huberman, Editor in Chief
The posts that most often stay with me center issues I care about or highlight quirky, offbeat projects. And—shocking, I know—these were precisely the kinds of posts my memory served up when I thought about the past year and all the excellent work we’ve published on TDS.
“Data for change” and “data for good” are concepts that occasionally ring hollow, but not when writers approach their topics with care, skill, and concrete ideas. On the climate-change front, I highly recommend Jane Thompson’s study of the connection between flooding and real estate values and Ivana Kotorchevikj’s overview of AI’s own carbon footprint. They’re both timely and valuable contributions. On equity and diversity, Denisa Blackwood’s extensively researched look into the gender disparities within data science instantly became essential reading—and an important conversation-starter.
Regardless of topic, I have a soft spot for authors who dare to break the mold and offer readers something different. One that most certainly did that was Kie Ichikawa, who incorporated gorgeous visual storytelling into a post about the representation of natural cycles (spoiler alert: cherry blossoms are involved, and they’re so beautiful). Yuna Shin’s data-meets-art exploration was another fantastic project, bringing together a MOMA dataset, an interactive installation, and important insights on the absence of marginalized communities from major museum collections.
- Flooding: An Emerging Threat To the Modern Day Coastline
- There’s greater cost of deploying AI and ML models in production — the AI carbon footprint
- The Harsh Reality About Being a Woman in AI and Data Science
- Data Viz meets Death
- Using MoMA’s collection dataset to visualize whose stories are missing
The end of the year notwithstanding, we have even more reasons to celebrate—namely, all the wonderful new authors we welcomed to TDS over the past few weeks. Join us in waving excitedly at Antriksh Goel, Eason Liaw Yi Xian, Pengcheng Fu, Cleiton Rocha, Carlo Borella, Amir Hossini, Aleksander Molak, Alessandro Antini, James Fulton, Marianne Bellotti, Maharshi Roy, Fabio Magarelli, Do Kim, Andrei, Mark Jamison, Zeya, Ori Abramovsky, Emma-Sophia Nagels, Karen Bajador Valencia, Batran, Martijn van Attekum, Ali Faghihnejad, Louis Geisler, Nicholas Indorf, Payal Patel, Ayan Kundu, Varsha Lalwani, Matthew Turner, Grégoire Hornung, Dhruv Matani, Aaron Krumins, Parvathy Krishnan, Mustafa Hajij, Ignacio Oguiza, Devanshi Verma, Ron Ozminkowski, PhD, Felix Hofstätter, Ioan Catana, Zack Brodtman, Nicolai Vicol, Uday Kiran RAGE, Shiva Koreddi, Michael Kingston, terry leitch, lambert leong, Lipika Ramaswamy, Manuel Treffer, Nick Handel, Andrea D'Agostino, Tam D Tran-The, Isra Ahmad, Alexey Kravets, Fabio Chiusano, Gianmarco E. Saretto, Cyril Lemaire, Alexander Bricken, Aruna Pisharody, Shuyang Xiang, Deniz Tuzsus, Ang Li-Lian, and many others. Take a look at their profiles and check out their work!
December Edition: 2021 Highlights was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
from Towards Data Science - Medium https://ift.tt/3xUB1DR
via RiYo Analytics
No comments