On August 26, we celebrated Women’s Equality Day. It marked the 100th anniversary of the adoption of the 19th Amendment granting women the right to vote in the United States. It’s an important milestone in America’s history, though it wasn’t until 1965, when the Voting Rights Act passed allowing women of color to exercise their right to vote. This reminds us that over the past hundred years, there has been substantial progress toward gender equality, but the work is far from complete. At Waymo, equal opportunities are not just part of our culture, but are also an essential foundation for building truly inclusive technology and improving mobility for everyone.
During the one-hour discussion I was honored to moderate, we heard from Michelle Avary, Head of Automotive Industry & Autonomous Mobility, World Economic Forum; Raquel Urtasun, Chief Scientist of Uber ATG and the Head of Uber ATG Toronto; and Tilly Chang, Executive Director, SF County Transportation Authority. These accomplished women shared how they began their careers in transportation and autonomous driving, the challenges they’ve faced, and lessons they’ve learned.
We constantly hear about the ‘pipeline problem’ in tech and engineering specifically. In 2019, only 26% of professional computing jobs in the U.S. workforce were held by women. In transportation, these numbers are even more disturbing: in 2019, women represented only 15% of the transportation workforce. During our discussion, Michelle shared that according to the World Economic Forum, it would take 257 years for the global gender pay gap to close and for women to have all the same economic opportunities as men. We want these industries to do better, and we’re starting by bringing awareness to the low representation of women in mobility by highlighting their contributions and illuminating an inclusive path forward to all.
Here are some of the key pieces of advice we left with from our panelists:
- Mentors and networking are gold: We heard from all of our panelists about the power of networking, mentoring, and having a strong support system around you to help you achieve your goals. Tilly emphasized the importance of informal networks and the benefits she’s received from seeking out and finding these groups herself. Raquel asked how we can create better mechanisms to ensure women have access to mentors, so they don’t always have to proactively seek them out themselves. To the women entering technical industries, Raquel emphasized: “You are not alone when you suffer from discrimination or biases. It’s important to know that all of us have arrived in successful positions and struggled with that journey. It’s important that you don’t give up and continue pursuing your dream.”
- Putting in the work increases pipeline diversity: One of the biggest barriers in tech is ensuring that recruiting efforts are targeting a diverse set of candidates. As Michelle highlighted, we need to be proactive, going above and beyond when it comes to filling the pipeline. And then, once we have diverse candidates in the door, it’s about educating our teams to understand that people have different ways of communicating and creating space so they can hear the different ways these people communicate.
- More talent diversity leads to more inclusive mobility options: Tilly pointed out that mobility is a direct path to prosperity and shared her experiences in urban planning, a field that historically has largely been driven by men. She shared that there are “so many roots to it – not just for women. But for people of color, gender identities, physical abilities. Growing awareness through programs like this is critical…It is critical that we learn about these issues so that we can be a part of the solution.” Michelle reiterated this point, saying, “We need to ask the questions about how women, elderly, and low-income households travel. If we don’t understand these questions, we can’t solve for them. We also need leadership that represents the communities.”
At Waymo, we’re committed to fostering a diverse and inclusive culture and translating these values to the technology, products and experiences we design for all of our riders, partners and communities. We will continue this work and look forward to seeing you at one of our upcoming events. You can watch our discussion here and follow our hashtag #SelfDrivenWomen on Twitter and LinkedIn for more updates.
- August 31, 2020: The setting to control sharing with people who are not using a Google account will begin to appear in the Admin console. This setting may be on or off by default depending on your current settings. See “Getting started” below to learn more. This setting will not start taking effect for users until September 8, 2020 (for Rapid release domains) or September 28, 2020 (for Scheduled release domains).
- September 8, 2020: Users in Rapid Release domains will be able to use the new feature, if enabled by their admin.
- September 28, 2020: Users in Scheduled Release domains will be able to use the new feature, if enabled by their admin.
Why you’d use it
- Rich collaboration—including comments, edits, and more—with anyone you need to work with, regardless of whether they have a Google account.
- Audit logging for collaboration with visitors, so that all interactions are monitored and recorded.
- Ability to revoke access and remove collaborators as needed.
- Reduced need to download, email, or create separate files to work with external users who don’t have Google accounts.
- Sharing outside of your organization to users with non-Google accounts will be ON by default if you currently allow users in your organization to send sharing invitations to people outside your organization who are not using a Google account.
- Sharing outside of your organization to users with non-Google accounts will be OFF by default if you currently do not do not allow sharing outside your organization, or use domain whitelisting, Use our Help Center to learn more about sharing to non-Google accounts.
- Visitor sharing will be controlled by new settings at Admin console > Apps > G Suite > Drive and Docs > Sharing Settings. See image below. The new settings can be controlled at the domain or OU level.
- Rapid and Scheduled Release domains: Full rollout (1-3 days for feature visibility) starting on August 31, 2020
- Available to G Suite Business, G Suite Enterprise, G Suite for Education, G Suite Enterprise for Education, G Suite for Nonprofits, and G Suite Essentials customers
- Not available to G Suite Basic, Cloud Identity customers, or users with personal accounts
- This feature was listed as an upcoming G Suite release.
Quick launch summary
- You can use the new Ctrl+Alt+H (CMD+Option+H on Mac) shortcut to toggle braille support in Docs, Sheets, and Slides.
- When you use shortcuts to navigate, we now announce where your cursor moves to, including comments, headings, misspellings, and suggestions.
- We improved the reliability of navigating through lengthy documents and lists.
- Images, misspellings, and grammar errors are now verbalized directly by assistive technology.
- We’ve improved navigation and selection verbalizations when moving through tables and when selecting content, including announcing the entire cell’s contents.
- Admins: There is no admin control for this feature.
- End users: This feature will be ON by default for users with braille support turned on. Visit the Help Center to learn more about how to use a braille display with Docs editors, and make sure to update to the latest versions of your browser and screen reader to use all features.
- Available to all G Suite customers and users with personal Google Accounts
This is the final blog post for #11WeeksOfAndroid. Thank you for joining us over the past 11 weeks as we dove into key areas of Android development. In case you missed it, here’s a recap of everything we talked about during each week:
Week 1 – People and identity
Discover how to implement the conversation shortcut and bubbles with ‘conversation notifications’. Also, learn more about conversation additions and other System UI news, and discover the people and conversations developer documentation here. Finally, you can also listen to the Android Backstage podcast where the System UI team is interviewed on people and bubbles.
To tackle user and developer complexity that makes identity a challenge for developers, we’ve been working on One Tap and Block Store, part of our new Google Identity Services Library.
If you’re interested in learning more about Identity, we published the video “in Identity on Android: what’s new in sign-in,” where Vishal explains the new libraries in the Google Identity System.
Two teams that worked very early with us are the Facebook Messenger team and the direct messaging team from Twitter. Read the story from Twitter here and find out how we worked with Facebook on the implementation here.
Week 2 – Machine learning
We kicked off the week by announcing the winners of the #AndroidDevChallenge! Check out all the winning apps and see how they used ML Kit and TensorFlow Lite, all focused on demonstrating how machine learning can come to life in a powerful way to help users get things done, like an app to help visually impaired navigate crowded spaces or another to help students learn sign language.
We recently made ML Kit a standalone SDK and it no longer requires a Firebase account. Just one line in your build.gradle file and you can start bringing ML functionality into your app.
Find out about the importance of finding the unique intersection of user problems and ML strengths and how the People + AI Guidebook can help you make ML product decisions. Check out the interview with the Read Along team for more inspiration.
This week we also highlighted how adding a custom model to your Android app has never been easier.
Finally, try out our codelabs:
- ML Kit Codelab – Recognize, identify language and translate text with ML Kit and CameraX.
- Custom Model Codelab – Build an Android app to recognize flowers with TensorFlow Lite Model Maker and Android Studio ML model binding
Week 3 – Privacy and security
As shared in the “Privacy and Security” blog post, we’re giving users even more control and transparency over user data access.
In Android 11, we introduced various privacy improvements such as one time permissions that let users give an app access to the device microphone, camera, or location, just that one time. Learn more about building privacy-friendly apps with these new changes. You can also learn about various Android security updates in this video.
Other notable updates include:
- Permissions auto-reset: If users haven’t used an app that targets Android 11 for an extended period of time, the system will “auto-reset” all of the granted runtime permissions associated with the app and notify the user.
- Data access auditing APIs: In Android 11, developers will have access to new APIs that will give them more transparency into their app’s usage of private and protected data. Learn more about new tools in Android 11 to make your apps more private and stable.
- Scoped Storage: In Android 11, scoped storage will be mandatory for all apps that target API level 30. Learn more and check out the storage FAQ.
- Google Play system updates: Google Play system updates were introduced with Android 10 as part of Project Mainline, making it easier to bring core OS component updates to users.
- Jetpack Biometric library:The library has been updated to include new BiometricPrompt features in Android 11 in order to allow for backward compatibility.
Week 4 – Android 11 compatibility
We shipped the second Beta of Android 11 and added a new release milestone called Platform Stability to clearly signal to developers that all APIs and system behaviors are complete. Find out more about Beta 2 and platform stability, including what this milestone means for developers, and the Android 11 timeline. Note: since week #4, we shipped the third and final beta and are getting close to releasing Android 11 to AOSP and the ecosystem. Be sure to check that your apps are working!
To get your apps ready for Android 11, check out some of these helpful resources:
- Guide: Migrating your apps to Android 11
- Guide: Behavior changes that could affect your apps.
- Blog: New tools for testing app compatibility in Android 11
- Video: Testing app compatibility with Android Studio
- Video: Testing platform changes in Android 11
- Video: Platform stability and the Android release timeline
In our “Accelerating Android updates” blog post, we looked at how we’re continuing to get the latest OS to reach critical mass by expanding Android’s updatability architecture.
We also highlighted Excelliance Tech, who recently moved their LeBian SDK away from non-SDK interfaces, toward stable, official APIs so they can stay more compatible with the Android OS over time. Check out the Excelliance Tech story.
Week 5 – Languages
With the Android 11 beta, we further improved the developer experience for Kotlin on Android by officially recommending coroutines for asynchronous work. If you’re new to coroutines, check out:
- Android ❤️ Coroutines: How to manage async tasks in Kotlin.
- Coroutines learning pathway.
- New coroutines developer guide.
Also, check out our new Kotlin case studies page for the latest case studies and data, including the new Google Home case study, and our state of Kotlin on Android video. For beginners, we announced the launch of our new Android basics in Kotlin course.
If you’re a Java language developer, watch support for newer Java APIs on how we’ve made newer OpenJDK libraries available across versions of Android. With Android 11, we also updated the Android runtime to make app startup even faster with I/O prefetching.
Week 6 – Android Jetpack
Interested in what’s new in Jetpack? Check out the #Android11 Beta launch with a quick fly-by introducing many of the updates to our libraries, with tips on how to get started.
- Dive deeper into major releases like Hilt, with cheat sheets to help you get started, and learn how we migrated our own samples to use Hilt for dependency injection. Less boilerplate = more fun.
- Discover more about Paging 3.0, a complete rewrite of the library using Kotlin coroutines and adding features like improved error handling, better transformations, and much more.
- Get to know CameraX Beta, and learn how it helps developers manage edge cases across different devices and OS versions, so that you don’t have to.
This year, we’ve made several major improvements with the release of Navigation 2.3, which allows you to navigate between different screens of your app with ease while also allowing you to follow Android UI principles.
In Android 11, we continued our work to give users even more control over sensitive permissions. Now there are type-safe contracts for common intents and more via new ActivityResult APIs. These changes simplify how you request permissions, and we’ll continue to work on making permissions easier in the future.
Week 7 – Android developer tools
We have brought together an overview of what is new in Android Developer tools.
Check out the latest updates in design tools, and go even deeper:
Also, find out about debugging your layouts, with updates to the layout inspector. Discover the latest developments for Jetpack Compose Design tools, and also how to use the new database inspector in Android Studio.
Discover the latest development tools we have in place for Jetpack Hilt in Android Studio.
Learn about the build system in Android developer tools:
- New APIs in the Android Gradle Plugin
- Understanding your build with the build analyzer
- Configuration Caching deep dive
- Shrinking your app with R8
To learn about the latest updates on virtual testing, read this blog on the Android Emulator. Lastly, to see the latest changes for performance tools, watch performance profilers content about System Trace. Additionally, check out more about C++ memory profiling with Android Studio 4.1.
Week 8 – App distribution and monetization
We shared recent improvements we’ve made to app bundles, as well as our intention to require new apps and games to publish with this format in the second half of 2021. The new in-app review API means developers can now ask for ratings and reviews from within your app!
Don’t forget about our policy around more transparent subscriptions to help increase user trust in Google Play Billing. We also expanded our feature set to help you better reach and retain buyers, and launched Play Billing Library 3, which will be required by mid-2021.
Google Play Pass launched in nine new markets last month. Developers using both Google Play Pass and direct billing on Google Play have earned an average of 2.5 times US revenue with Google Play Pass, without diminishing Google Play store earnings. Learn more and express interest in joining.
Week 9 – Android beyond phones
Check out some of the highlights from this week, including;
- Android TV: We highlighted what’s new on Android TV and shared 6 upcoming launches. Find new resources to help developers build their first Android TV app, or even go deep on new integrations like Cast Connect and frictionless subscriptions.
- Android for cars: Discover ways to reach more drivers on Android for cars and more about the launch of the first car powered by Android Automotive OS with Google apps and services built in — the Polestar 2. As more manufacturers ship cars with this embedded functionality, we’re making it even easier for developers to build media apps on Android Automotive OS with updated documentation and emulators.
- Large screens: We launched ChromeOS.dev — a dedicated resource for technical developers, designers, product managers, and business leaders. Find best practices for expanding your app beyond the phone and Android development on Chrome OS.
Quick launch summary
|Dark theme for Google Voice|
- Rapid and Scheduled Release domains: Gradual rollout through the App store (up to 15 days for feature visibility) starting on August 31, 2020
- Available to all G Suite customers with Google Voice licenses and Voice users with personal accounts.
It’s the end of August, and that means it’s time for another update in our Stadia Savepoint series, detailing the additions we’ve made to the Stadia platform this month.
In August, players experienced the best of professional golf in PGA Tour 2K21, battled demons on Mars in DOOM and explored a beautiful hand-drawn adventure in Spiritfarer, in addition to many other new games now available for purchase on the Stadia store. Stadia Pro subscribers gained instant access to 23 games this month, the most free games we’ve offered yet–plus access to the first Free Play Days event for Borderlands 3. Our partners revealed new games coming soon to Stadia, including Larian Studios who announced that Baldur’s Gate 3 will be arriving in Early Access on Sept. 30.
We also introduced new deals for players featuring Stadia Pro and products:
- Get $10 off when you purchase the Claw and the Stadia Controller together on the Google store to play Stadia on your mobile device. Offer ends Aug. 31, 2020 at 11:59 p.m. PT.
- If you’re not yet a Pro member, get three free months of Stadia Pro as a perk with any eligible Chromebook released after June 2017.
Game volume slider on Chromecast and web
We’ve added a separate game volume slider to all audio settings on the Stadia platform. You can adjust the sliders for max volume, game audio and voice chat independently on Chromecast and in a Chrome browser.
Stadia Pro updates
- Claim six new games for free with Stadia Pro in September: Super Bomberman R Online, Gunsport, Hitman, Hello Neighbor, Metro: Last Light Redux and Embr Early Access.
- Twenty-three existing games are still available to add to your Stadia Pro collection: Destiny 2: The Collection, PLAYERUNKNOWN’S BATTLEGROUNDS, GRID, SteamWorld Quest: Hand of Gilgamech, SteamWorld Dig, SteamWorld Dig 2, SteamWorld Heist, The Turing Test, GYLT, Get Packed, Little Nightmares, Power Rangers: Battle for the Grid, SUPERHOT, Panzer Dragoon Remake, Crayta, Monster Boy and the Cursed Kingdom, West of Loathing, Orcs Must Die! 3, Strange Brigade, Kona, Metro 2033 Redux, Just Shapes & Beats and Rock of Ages 3: Make & Break.
- Act quickly: It’s your last chance to add Kona, GRID and Get Packed to your Stadia Pro collection before Sept. 1.
- There are still ongoing discounts for both Stadia Pro subscribers and all players – check out the web or mobile Stadia store for the latest.
Recent content launches on Stadia
Artificial intelligence (AI) and machine learning (ML) have been helping businesses become more efficient for years now, and their applications are growing seemingly by the day. Given this fast evolution, it’s understandable that many organizations aren’t quite sure how to best use AI and ML to help their business.
This was the case at mixi, a company that offers digital entertainment and lifestyle services in Japan, including the “mixi” social networking platform and the mobile game “Monster Strike.” Around 2018, the mixi team became acutely aware of AI’s role in shaping the future, and decided to take action to learn more about the technology and its potential to help their business. A big part of that was attending an immersive AI and ML training at Google Cloud’s Advanced Solutions Lab (ASL), which aims to help customers make better use of AI and ML technologies through virtual or onsite education.
For this post, we spoke with five mixi engineers—Rio Watanabe, Tanpopo Group CTO, XFLAG Development Division; Hidetaka Kojo, Tanpopo Group CTO, XFLAG Development Division; Harumitsu Shinda, Development Group Manager, Romi Division; Hiroki Kurasawa, Data Marketing Group, Digital Marketing, Marketing Division; and Ryutaro Osafune, Technical Art Group, Design Strategy, XFLAG Design Division—who joined this training at the Google campus in Shibuya, Tokyo, to find out more about the program, what they learned, and how they’re putting that knowledge into action.
Making AI the obvious choice for problem solving
Kojo and Watanabe work as part of the Tanpopo group handling technological development for the company’s XFLAG entertainment brand. The group’s name, which means “dandelion,” comes from the Japanese culinary tradition of placing dandelion decorations on sashimi. It was chosen as a symbol of AI’s potential to free engineers from simple tasks, just like placing dandelions on a sashimi production line.
“I don’t personally work in the AI department, but my work touches on several projects, so I still see the technical side,” Kojo explains. “At mixi, we want to make AI the obvious choice for problem solving, and have been using it as a live tool since around 2018.”
“For example, we used it to get the balance right when adding new cards to the card-fighting game ‘Fight League,’” Watanabe continues. “We created an AI player that behaves similarly to a real player to ensure that the cards’ abilities are appropriate before release (so that we don’t break the game’s balance).”
Outside XFLAG’s operations, the company’s technical art group also uses ML to help improve task efficiency and quality for artists working on game character art. A system called Generative Adversarial Networks (GAN) uses AI in image creation, and research and development is underway to create new characters for Monster Strike more efficiently.
“Essentially, the challenge is for it (GAN) to learn from thousands of existing character graphics (2D images) and create a brand-new character design ‘draft’ in the style of Monster Strike,” Osafune explains. “It should work as a creative AI to support the design work of our artists.”
Mixi’s use of AI is not limited to games, however. Company President Kenji Kasahara and CTO Tatsuma Murase have big ideas and are trying out AI in multiple projects at once.
In the marketing division, for example, ML predicts and adapts to user behavior.
“For example, in ‘Monster Strike,’ we can use AI to identify users who might be tempted to leave based on their in-game logs and subsequently target them with online ads. AI can help us analyze usage trends and encourage players to participate in our bespoke multiplayer system, too,” Kurasawa says. “We launched these initiatives two years ago and have already seen some great results. We use AI in other areas, too, including building a recommendation system that presents content that the individual user will like.”
Meanwhile, the Romi division, which is part of Vantage Studio and handles lifestyle operations, is developing a conversational robot that uses AI.
“A major focus of this project is the issue of understanding the user’s speech and deciding how to respond,” Shinda says. “The technology to naturally process Japanese speech and use it in conversation doesn’t really exist yet, so we are developing and training a model for this purpose using TensorFlow. The process of converting user speech to text was different in that there are already several excellent technologies available. So we could adopt the highest precision Speech-to-Text API.”
Gaining practical knowledge and technologies from experts at the forefront of AI
In the summer of 2019, after noticing a positive return on investment, mixi decided to take its AI adoption one step further, and signed up for Google Cloud’s Advanced Solutions Lab (ASL).
The ASL enables businesses to collaborate with Google Cloud and apply ML to solve high-impact business challenges. It lets technical teams learn from Google’s ML experts through an immersive experience. Delivered both virtually and at dedicated facilities around the globe, it helps technical teams develop competency in building end-to-end, productionized ML solutions to solve specific business challenges.
Kojo and other senior staff participated in the ASL program, giving them the opportunity to learn all about AI and ML over four weeks in November and December. Here are their five tips for getting the most out of the ASL experience:
1. Ask questions
“Honestly, I think you could absorb the knowledge and techniques by reading up, but of course there is something special about being taught by someone who actually does it for real,” Kojo says. “I asked lots of questions during the program, and many of the answers came down to intuition based on real experience, which I greatly appreciated.”
2. Look for “insider” tips
“In machine learning papers you mostly find one way of achieving a result, but they often lack details of the process. My motivation for taking part in ASL was to benefit from somebody who works on it and to learn insider tips related to the craft,” Shinda explains. “The lectures conveyed specific techniques, such as creating as large a learning model as the GPU will allow initially, to give room to increase the dropout rate later.”
3. Use the experience to pass knowledge on to colleagues
“Before taking part in ASL, I had studied what I needed and when I needed it, so it was great to learn about AI technologies in a more systematic way,” Watanabe says. “For example, the technique of visualizing a neural network using the ‘Neural Network Playground’ tool was new to me. That experience will prove useful as I pass on the knowledge I gained to my colleagues.”
4. Discover new (to you) technology
“My team had previously used a TensorFlow low-level API, but at the ASL I learned about the high-level Estimator API and how it’s already being used, which was all new to me,” Kurasawa explains. “It was also great to find out how to efficiently link Google Cloud services as part of a task in which we formed a complete machine learning sequence.”
5. Ask questions (again)
“We’re using GAN for our research and development, as I mentioned earlier, but because we had been mainly doing this through self-study, some basic elements were missing when we put our knowledge into practice,” Osafune says. “Knowing that the lecturer was a GAN expert, I was quite persistent in asking questions about it. [He laughs.] In the end, we basically got a full tech talk on GAN added to the curriculum. I appreciated having that kind of flexibility in the program.”
The team has already used some of the knowledge gained from ASL in some of their products and services, and they’re planning to pass on what they’ve learned to new graduates and existing engineers.
“I’d like to see an awareness of AI and ML not just in our engineers, but in roles such as planning, too,” Kojo concludes. “At the risk of repeating myself, we really want to make AI the obvious choice for solutions at mixi.”
Click hereto learn more about the Advanced Solutions Lab, which now includes the ASL Virtual offering. For more about Google Cloud AI solutions, join us September 1 forNext OnAir. To get your team started on their AI training and certification journey, join a live Cloud Study Jam on September 2.
Posted by Vikram Sharma, Software Engineering Intern; Jianing Wei, Staff Software Engineer; Tyler Mullen, Senior Software Engineer
Today, we are excited to release the Instant Motion Tracking solution in MediaPipe. It is built upon the MediaPipe Box Tracking solution we released previously. With Instant Motion Tracking, you can easily place fun virtual 2D and 3D content on static or moving surfaces, allowing them to seamlessly interact with the real world. This technology also powered MotionStills AR. Along with the library, we are releasing an open source Android application to showcase its capabilities. In this application, a user simply taps the camera viewfinder in order to place virtual 3D objects and GIF animations, augmenting the real-world environment.
Instant Motion Tracking in MediaPipe
Instant Motion Tracking
The Instant Motion Tracking solution provides the capability to seamlessly place virtual content on static or motion surfaces in the real world. To achieve that, we provide the six degrees of freedom tracking with relative scale in the form of rotation and translation matrices. This tracking information is then used in the rendering system to overlay virtual content on camera streams to create immersive AR experiences.
The core concept behind Instant Motion Tracking is to decouple the camera’s translation and rotation estimation, treating them instead as independent optimization problems. This approach enables AR tracking across devices and platforms without initialization or calibration. We do this by first finding the 3D camera translation using only the visual signals from the camera. This involves estimating the target region’s apparent 2D translation and relative scale across frames. The process can be illustrated with a simple pinhole camera model, relating translation and scale of an object in the image plane to the final 3D translation.
By finding the change in relative size of our tracked region from view position V1 to V2, we can estimate the relative change in distance from the camera.
Next, we obtain the device’s 3D rotation from its built-in IMU (Inertial Measurement Unit) sensor. By combining this translation and rotation data, we can track a target region with six degrees of freedom at relative scale. This information allows for the placement of virtual content on any system with a camera and IMU functionality, and is calibration free. For more details on Instant Motion Tracking, please refer to our paper.
A MediaPipe Pipeline for Instant Motion Tracking
A diagram of Instant Motion Tracking pipeline is shown below, consisting of four major components: a Sticker Manager module, a Region Tracking module, a Matrices Manager module, and lastly a Rendering System. Each of the components consists of MediaPipe calculators or subgraphs.
Diagram of Instant Motion Tracking Pipeline
The Sticker Manager accepts sticker data from the application and produces initial anchors (tracked region information) based on user taps, and user gesture controls for every sticker object. Initial anchors are then sent to our Region Tracking module to generate tracked anchors. The Matrices Manager combines this data with our device’s rotation matrix to produce six degrees-of-freedom poses as model matrices. After integrating any user-specified transforms like asset scaling, our final poses are forwarded to the Rendering System to render all virtual objects overlaid on the camera frame to produce the output AR frame.
Using the Instant Motion Tracking Solution
The Instant Motion Tracking solution is easy to use by leveraging the MediaPipe cross-platform framework. With camera frames, device rotation matrix, and anchor positions (screen coordinates) as input, the MediaPipe graph produces AR renderings for each frame, providing engaging experiences. If you wish to integrate this Instant Motion Tracking library with your system or application, please visit our documentation to build your own AR experiences on any device with IMU functionality and a camera sensor.
Augmenting The World with 3D Stickers and GIFs
Instant Motion Tracking solution allows bringing both 3D stickers and GIF animations into Augmented Reality experiences. GIFs are rendered on flat 3D billboards placed in the world, introducing fun and immersive experiences with animated content blended into the real environment.Try it for yourself!
Demonstration of GIF placement in 3D
MediaPipe Instant Motion Tracking is already helping PixelShift.AI, a startup applying cutting-edge vision technologies to facilitate video content creation, to track virtual characters seamlessly in the view-finder for a realistic experience. Building upon Instant Motion Tracking’s high-quality pose estimation, PixelShift.AI enables VTubers to create mixed reality experiences with web technologies. The product is going to be released to the broader VTuber community later this year.
Instant Motion Tracking helps PixelShift.AI create mixed reality experiences
We look forward to publishing more blog posts related to new MediaPipe pipeline examples and features. Please follow the MediaPipe label on Google Developers Blog and Google Developers twitter account (@googledevs).
We would like to thank Vikram Sharma, Jianing Wei, Tyler Mullen, Chuo-Ling Chang, Ming Guang Yong, Jiuqiang Tang, Siarhei Kazakou, Genzhi Ye, Camillo Lugaresi, Buck Bourdon, and Matthias Grundman for their contributions to this release.
For the last few years, we’ve collaborated with the image licensing industry to raise awareness of licensing requirements for content found through Google Images. In 2018, we began supporting IPTC Image Rights metadata; in February 2020 we announced a new metadata framework through Schema.org and IPTC for licensable images. Since then, we’ve seen widespread adoption of this new standard by websites, image platforms and agencies of all sizes. Today, we’re launching new features on Google Images which will highlight licensing information for images, and make it easier for users to understand how to use images responsibly.
What is it?
Images that include licensing information will be labeled with a “Licensable” badge on the results page. When a user opens the image viewer (the window that appears when they select an image), we will show a link to the license details and/or terms page provided by the content owner or licensor. If available, we’ll also show an additional link that directs users to a page from the content owner or licensor where the user can acquire the image.
Left: Images result page with the Licensable badge
Right: Images Viewer showing licensable image, with the new fields Get the image on and License details
We’re also making it easier to find images with licensing metadata. We’ve enhanced the usage rights drop-down menu in Google Images to support filtering for Creative Commons licenses, as well as those that have commercial or other licenses.
Updated Usage Rights filter
What are the benefits to image licensors?
- As noted earlier, if licensing metadata is provided from the image licensor, then the licensable badge, license details page and image acquisition page will be surfaced in the images viewer, making it easier for users to purchase or license the image from the licensor
- If an image resides on a page that isn’t set up to let a user acquire it (e.g. a portfolio, article, or gallery page), image licensors can link to a new URL from Google Images which takes the user directly to the page where they can purchase or license the image
- For image licensors, the metadata can also be applied by publishers who have purchased your images, enabling your licensing details to be visible with your images when they’re used by your customers. (This requires your customers to not remove or alter the IPTC metadata that you provide them.)
We believe this is a step towards helping people better understand the nature of the content they’re looking at on Google Images and how they can use it responsibly.
How do I participate?
To provide feedback on these features, please use the feedback tools available on the developer page for the licensable images features, the Google Webmaster Forum, and stay tuned for upcoming virtual office hours where we will review common questions.
What do image licensors say about these features?
“A collaboration between Google and CEPIC, which started some four years ago, has ensured that authors and rights holders are identified on Google Images. Now, the last link of the chain, determining which images are licensable, has been implemented thanks to our fruitful collaboration with Google. We are thrilled at the window of opportunities that are opening up for photography agencies and the wider image industry due to this collaboration. Thanks, Google.”
– Alfonso Gutierrez, President of CEPIC
“As a result of a multi-year collaboration between IPTC and Google, when an image containing embedded IPTC Photo Metadata is re-used on a popular website, Google Images will now direct an interested user back to the supplier of the image,” said Michael Steidl, Lead of the IPTC Photo Metadata Working Group. “This is a huge benefit for image suppliers and an incentive to add IPTC metadata to image files.”
– Michael Steidl, Lead of the IPTC Photo Metadata Working Group
“Google’s licensable image features are a great step forward in making it easier for users to quickly identify and license visual content. Google has worked closely with DMLA and its members during the features’ development, sharing tools and details while simultaneously gathering feedback and addressing our members’ questions or concerns. We look forward to continuing this collaboration as the features deploy globally.”
– Leslie Hughes, President of the Digital Media Licensing Association
“We live in a dynamic and changing media landscape where imagery is an integral component of online storytelling and communication for more and more people. This means that it is crucial that people understand the importance of licensing their images from proper sources for their own protection, and to ensure the investment required to create these images continues. We are hopeful Google’s approach will bring more visibility to the intrinsic value of licensed images and the rights required to use them.”
– Ken Mainardis, SVP, Content, Getty Images & iStock by Getty Images
“With Google’s licensable images features, users can now find high-quality images on Google Images and more easily navigate to purchase or license images in accordance with the image copyright. This is a significant milestone for the professional photography industry, in that it’s now easier for users to identify images that they can acquire safely and responsibly. EyeEm was founded on the idea that technology will revolutionise the way companies find and buy images. Hence, we were thrilled to participate in Google’s licensable images project from the very beginning, and are now more than excited to see these features being released.”
– Ramzi Rizk, Co-founder, EyeEm
“As the world’s largest network of professional providers and users of digital images, we at picturemaxx welcome Google’s licensable images features. For our customers as creators and rights managers, not only is the visibility in a search engine very important, but also the display of copyright and licensing information. To take advantage of this feature, picturemaxx will be making it possible for customers to provide their images for Google Images in the near future. The developments are already under way.”
– Marcin Czyzewski, CTO, picturemaxx
“Google has consulted and collaborated closely with Alamy and other key figures in the photo industry on this project. Licensable tags will reduce confusion for consumers and help inform the wider public of the value of high quality creative and editorial images.”
– James Hall, Product Director, Alamy
“Google Images’ new features help both image creators and image consumers by bringing visibility to how creators’ content can be licensed properly. We are pleased to have worked closely with Google on this feature, by advocating for protections that result in fair compensation for our global community of over 1 million contributors. In developing this feature, Google has clearly demonstrated its commitment to supporting the content creation ecosystem.
– Paul Brennan, VP of Content Operations, Shutterstock
“Google Images’ new licensable images features will provide expanded options for creative teams to discover unique content. By establishing Google Images as a reliable way to identify licensable content, Google will drive discovery opportunities for all agencies and independent photographers, creating an efficient process to quickly find and acquire the most relevant, licensable content.”
– Andrew Fingerman, CEO of PhotoShelter
Posted by Francois Spies, Product Manager, Google Images
Building AI-powered apps can be painful. I know. I’ve endured a lot of that pain because the payout of using this technology is often worth the suffering. The juice is worth the squeeze, as they say.
Happily, over the past five years, developing with machine learning has gotten much easier thanks to user-friendly tooling. Nowadays I find myself spending very little time building and tuning machine learning models and much more time on traditional app development.
In this post, I’ll walk you through some of my favorite, painless Google Cloud AI tools and share my tips for building AI-powered apps fast. Let’s get to it.
Use Pre-trained Models
One of the slowest and most unpleasant parts of machine learning projects is collecting labeled training data–labeled examples that a machine learning algorithm can “learn” from.
But for lots of common use cases, you don’t need to do that. Instead of building your own model from scratch, you can take advantage of pre-trained models that have been built, tuned, and maintained by someone else. Google Cloud’s AI APIs are one example.
The Cloud AI APIs allow you to use machine learning for things like:
Reading text from documents
Parsing structured documents, like forms and invoices
Detecting faces, emotions, and objects in pictures
And a whole lot more
The machine learning models that power these APIs are similar to the ones used in many Google apps (like Photos). They’re trained on huge datasets and are often impressively accurate! For example, when I used the Video Intelligence API to analyze my family videos, it was able to detect labels as specific as “bridal shower,” “wedding,” “bat and ball games,” and even “baby smiling.”
The Cloud AI APIs run in, well… the cloud. But if you need a free and offline solution, TensorFlow.js and ML Kit provide a host of pre-trained models you can run directly in the browser or on a mobile device. There’s an even larger set of pre-trained TensorFlow models in TensorFlow Hub.
Easy Custom Models with AutoML
Though you can find a pre-trained model for lots of use cases, sometimes you need to build something really custom. Maybe you want to build a model that analyzes medical scans like X -rays to detect the presence of disease. Or maybe you want to sort widgets from doodads on an assembly line. Or predict which of your customers is most likely to make a purchase when you send them a catalog.
For that, you’ll need to build a custom model. AutoML is a Google Cloud AI tool that makes this process as painless as possible. It lets you train a custom model on your own data, and you don’t even have to write code to do it (unless you want to).
In the gif below, you can see how I used AutoML Vision to train a model that detects busted components on a circuit board. The interface to label data is click-and-drag, and training a model is as simple as clicking the button “Train New Model.” When the model finishes training, you can evaluate its quality in the “Evaluate” tab and see where it’s made mistakes.
It works on images (AutoML Vision), video (AutoML Video), language (AutoML Natural Language and AutoML Translation), documents, and tabular data (AutoML Tables) like you might find in a database or spreadsheet.
Even though the AutoML interface is simple, the models it produces are often impressively high-quality. Under the hood, the AutoML trains different models (like neural networks), comparing different architectures and parameters and choosing the most accurate combinations.
Using AutoML models in your app is easy. You can either allow Google to host the model for you in the Cloud and access it through a standard REST API or client library (Python, Go, Node, Java, etc), or export the model to TensorFlow so you can use it offline.
So that, more or less, makes model training easy. But where do you get a big training dataset from?
Never Label Your Own Data
I mean it.
When I start an ML project, I first check to see if a pre-trained model that does what I want already exists.
If it doesn’t, I ask myself the same question about datasets. Almost any kind of dataset you could ever imagine exists on Kaggle, a dataset-hosting and competition site. From tweets about COVID-19 to a list of Chipotle locations to a collection of fake news articles, you can often find at least some dataset on Kaggle that will let you train a proof-of-concept model for your problem. Google Dataset Search is also a helpful tool for finding datasets that will query both Kaggle and other sources.
Sometimes, of course, you must label your own data. But before you hire hundreds of interns, consider using Google’s Data Labeling Service. To use this tool, you describe how you’d like your data to be labeled and then Google sends it out to teams of human labelers. The resulting labeled dataset can be plugged directly into AutoML or other AI Platform models for training.
From Model to Useable App
Lots of times, building (or finding) a functioning machine learning model isn’t the tricky part of a project. It’s enabling the other folks on your team to use that model on their own data. We faced this problem frequently in Google Cloud AI, which is why we decided to add interactive demos to our API product pages so you can upload our APIs and try them out fast.
Leading a successful machine learning project often comes to being able to build prototypes fast. For this, I have a handful of go-to tools and architectures:
The Google Cloud Storage + Cloud Functions Duo. Most ML projects are data in, data out. You upload some input data–an image, video, audio recording, text snippet, etc–and a model runs predictions on it (“output data”). A great way to prototype this type of project is with Cloud Storage and Cloud Functions. Cloud Storage is like a folder in the cloud: a spot for storing data in all formats. Cloud Functions are a tool for running blocks of code in the cloud without needing a devoted server. You can configure the two to work together by having a file that is uploaded to cloud storage “trigger” a cloud function to run.
I used this setup recently when I built a document AI pipeline:
When a document is uploaded to a cloud storage bucket, it triggers a cloud function that analyzes the document by type and moves it to a new bucket. That triggers a new cloud function, that uses the Natural Language API to analyze the document text. Check out the full code here.
Hopefully that’s convinced you getting started with machine learning doesn’t have to be painful. Here are some helpful tutorials and demos to get started with ML: