Over the past two years, Facebook AI Research (FAIR) has worked with 13 universities around the world to create the largest first-person video dataset ever. Specifically, we trained an image recognition model for deep learning. AI trained on datasets excels at controlling robots that interact with humans and interpreting images from smart glasses. “Machines can only support our daily lives if we really understand the world through our eyes,” says Kristen Grauman of FAIR, who leads the project.
Such techniques can support people in need of assistance around the home or guide people in the tasks they are learning to complete. “The video in this dataset is much closer to how humans observe the world,” said Michael Brain, a computer vision researcher at Google Brain and Stony Brook University in New York, who is not involved in Ego4D. increase.
But the potential misuse is obvious and worrisome. The study is funded by Facebook, a social media giant recently accused in the Senate of benefiting people’s well-being. MIT Technology ReviewOwn investigation.
The business model of Facebook and other big tech companies is to squeeze as much data as possible from people’s online behavior and sell it to advertisers. The AI outlined in the project can extend to people’s daily offline activities, such as objects around a person’s home, activities they enjoyed, people they spent with, and even where their gaze remains ( Unprecedented personal information) will be revealed.
“There’s some privacy work we need to do to get this out of the world of exploratory research and make it a product,” says Grauman. “The work can even be inspired by this project.”
From the kitchen
Ego4D is a step change. The largest dataset to date for first-person video consists of 100 hours of footage of people in the kitchen. The Ego4D dataset consists of 3025 hours of video recorded by 855 people in 73 different locations in 9 countries (US, UK, India, Japan, Italy, Singapore, Saudi Arabia, Colombia and Rwanda).
Participants were of different ages and backgrounds. Some have been hired for visually interesting professions such as bakeries, machinists, carpenters and landscape architects.
Older datasets usually consist of semi-script video clips that are only a few seconds long. With Ego4D, participants wear head-mounted cameras for up to 10 hours at a time to capture first-person videos of unscripted daily activities such as walking down the street, reading, washing, shopping, playing with pets, playing board games, and more. Did. Interact with other people. Some footage also includes audio, data about where the participants’ eyes are focused, and multiple perspectives of the same scene. This is the first dataset of its kind, says Ryoo.
Facebook wants machines to see the world through our eyes
Source link Facebook wants machines to see the world through our eyes