There are no new updates to share this week. Please see below for a recap of published announcements.
The announcements below were published on the Workspace Updates blog earlier this week. Please refer to the original blog posts for complete details.
Edit details in line with Google Tasks on the web
You can now add additional information to your personal tasks without opening the “Details” dialog box. | Learn more.
Admins will now be alerted when there is an issue with their Google Voice auto attendants or ring groups
Admins will now receive an alert in the Admin console’s Alert Center when an issue is detected with their auto attendant or ring group configuration along with instructions on how to quickly resolve the issue. | Learn more.
Admins will now be alerted when there is an issue with their Google Voice auto attendants or ring groups
Google Vault now supports new Google Sites. You can use Google Vault to set retention policies for Google Sites, perform searches of Google Sites data, and export Google Sites content. | Learn more.
Email threads with recipients outside your organization will be labeled “External”
We’re adding a new “External” label to email threads that include recipients outside your organization. This adds to the existing external recipient warning banner and can be turned on and off by admins. | Learn more.
For a recap of announcements in the past six months, check out What’s new in Google Workspace (recent releases).
Here’s a round-up of the key stories we published the week of April 30, 2021.
Introducing Open Saves: Open-source cloud-native storage for games
Open Saves is a brand-new, purpose-built single interface for multiple storage back ends that’s powered by Google Cloud and developed in partnership with 2K. Now, development teams can store game data without having to make the technical decisions on which storage solution to use. Read more.
Turbocharge workloads with new multi-instance NVIDIA GPUs on GKE
With the launch of multi-instance GPUs in GKE, now you can partition a single NVIDIA A100 GPU into up to seven instances that each have their own high-bandwidth memory, cache and compute cores. Each instance can be allocated to one container, for a maximum of seven containers per one NVIDIA A100 GPU. Further, multi-instance GPUs provide hardware isolation between containers, and consistent and predictable QoS for all containers running on the GPU. Read more.
Sign here! Creating a policy contract with Configuration as Data
Configuration as Data is an emerging cloud infrastructure management paradigm that allows developers to declare the desired state of their applications and infrastructure, without specifying the precise actions or steps for how to achieve it. However, declaring a configuration is only half the battle: you also want policy that defines how a configuration is to be used. Here’s how to create one.
SRE at Google: Our complete list of CRE life lessons
We created Customer Reliability Engineering, an offshoot of Site Reliability Engineering (SRE), to give you more control over the critical applications you’re entrusting to us. Since then, here on the Google Cloud blog, we’ve published over two dozen blogs to help you take the best practices we’ve learned from SRE teams at Google and apply them in your own environments. Here’s a guide to all of them.
The evolution of Kubernetes networking with the GKE Gateway controller
This week we announced the Preview release of the GKE Gateway controller, Google Cloud’s implementation of the Gateway API. Over a year in the making, the GKE Gateway controller manages internal and external HTTP/S load balancing for a GKE cluster or a fleet of GKE clusters. The Gateway API provides multi-tenant sharing of load balancer infrastructure with centralized admin policy and control. Read more.
How to transfer your data to Google Cloud
Any number of factors can motivate your need to move data into Google Cloud, including data center migration, machine learning, content storage and delivery, and backup and archival requirements. When moving data between locations, it’s important to think about reliability, predictability, scalability, security, and manageability. Google Cloud provides four major transfer solutions that meet these requirements across a variety of use cases. This cheat sheet helps you choose.
6 database trends to watch
In a data-driven, global, always-on world, databases are the engines that let businesses innovate and transform. As databases get more sophisticated and more organizations look for managed database services to handle infrastructure needs, there are a few key trends we’re seeing. Here’s what to watch.
All the posts from the week
When Jennifer Daniel, Google’s creative director for emoji, first joined the Unicode Technical Committee, she wondered, what’s the deal with the handshake emoji? Why isn’t there skin tone support? “There was a desire to make it happen, and it was possible to make it happen, but the group appeared to be stuck on how to make it happen,” Jennifer says.
So in 2019, she submitted the paperwork for Unicode to consider the addition of the multi-skin toned handshake.The proposal detailed how to create 25 possible combinations of different skin tones shaking hands. But encoding it all would be time-consuming; creating a new emoji can take up to two years, Jennifer explains. And while a regular, one-tone handshake emoji already existed, this particular addition would require making two new emoji hands (a right hand in all the various skin tone shades and a left in the various skin tone shades) in order to, as Jennifer explains, “make the ‘old’ handshake new again.”
Every Unicode character has to be encoded; it’s like a language, with a set of rules that are communicated from a keyboard to a computer so that what you see on your screen looks the way it’s supposed to. This is called binary — or all the ones and zeros behind the scenes that make up everything you see on the internet.
Every letter you are reading on this screen is assigned a code point. The Letter A? It’s Unicode code point U+0041, Jennifer says. When you send a word with the letter “A” to someone else, this code is what ensures they will see it. “So when we want to send a 🤦, which maps to U+1f926, that code point must be understood on the other end regardless of what device the recipient is using,” she says.
This means when one emoji can come in different forms — like with gender or skin tone options — the coding gets more complex. “If emoji are letters, think of it this way: How many accent marks can you add to a letter? Adding more detail, like skin tone, gender or other customization options like color, to emoji gets more complicated.” Adding skin tone to the handshake emoji meant someone had to propose a solution that operated within the strict limitations of how characters are encoded.
That someone was Jennifer. “I build on the shoulders of giants,” she quickly explains. “The subcommittee is made up of volunteers, all of whom are generous with their expertise and time.” First, Jennifer looked at existing emoji to see if there were any that could be combined to generate all 25 skin tone combinations. “When it appeared that none would be suitable — for instance, 🤜 🤛 are great but also a very different greeting — we had to identify new additions That’s when we landed on adding a leftwards hand and a rightwards hand.” Once these two designs and proposals were approved and code points assigned, the team could then propose a multi-skin toned handshake that built on the newly created code for each hand.
Aside from the actual coding, COVID-19 added new hurdles. Jennifer had proposed the emoji in November 2019 with the expectation it would land on devices in 2021, but because of COVID-19, all Unicode deployments were delayed six months.
Fortunately, the multi-skin toned handshake emoji should appear in the next release, Emoji 14.0, meaning you should see it appear in 2022. For Jennifer, it’s exciting to see it finally come to fruition. “These kinds of explorations are really important because the Unicode Consortium and Google really care about bringing inclusion into the Unicode Standard,” she says. “It’s easy to identify ‘quick solutions’ but I try to stop and ask what does equitable representation really look like, and when is it just performative?”
“Every time we add a new emoji, there’s a risk it could exclude people without our consciously knowing it,” Jennifer explains. “The best we can do is ensure emoji continue to be as broad, flexible and fluid as possible. Just like language. Just like you. 🦋”
Over the last year, COVID-19 presented unforeseen challenges for practically every type of business and organization—including schools, colleges, and universities. For educational institutions, the pandemic was an unapologetic agent of acceleration, shifting one billion learners from in-person to online learning within two months.
The rapid transition to online learning exposed many schools’ lack of readiness for the new online learning environment. It also widened the learning equity gap for students, with fewer than 40% of students from low-income families having access to the tools required for remote learning.
For those who do have online access, today’s students expect everything from engaging and collaborative digital learning experiences to skills-based training for their roles in the future workforce. Expectations are also high for 24×7 multi-channel tech support across all learning devices, applications, and platforms.
In these remarkable times, education technology companies have an important role to play in supporting academic institutions and students. Indeed, this is already happening, as the EdTech (Educational Technology) market is nearly tripling, with total global expenditures expected to reach $404 billion by 2025. However, the success of these EdTech companies depends on their performance in a number of areas, including:
Content and products: How quickly can they generate new content and react to learner needs with new products to additional markets for broader adoption?
Personalization: How effectively can they leverage artificial intelligence (AI) to provide a personalized experience to all types of learners?
Trust and security: How trusted and secure are their services when educational organizations are suffering the highest number of data breaches since 2005?
Here are a few examples of how EdTech companies are successfully using AI and analytics to capture this opportunity and transform their businesses:
Build better products: iSchoolConnect is an online platform that lets students explore schools, courses, and countries where they might study, and makes higher education admissions accessible to students around the globe. The company leverages AI services to help educational institutions optimize their academic operations by accelerating admission processing by greater than 90%, while saving significant costs.
Launch in new markets faster: Classroom creativity tools provider Book Creator uses AI APIs to enhance accessibility and improve the learner experience. “The broad suite of intelligent APIs enables us to deliver richer experiences, faster and more easily, without having to be experts in machine learning, drawing recognition, map embeds, or other areas,” says VP of engineering Thom Leggett.
Scale businesses securely: Using DevOps and CDN [content delivery network] services, Chrome browser recording extension creator Screencastify was able to support eight times growth in users overnight amid the COVID-19 pandemic, while maintaining consistent total cost of ownership. These technologies helped the company rapidly scale operations in response to the overnight increase in demand from consumers and assure student data privacy and security on a budget. “We know this is just the beginning, as more educators rely on technology to deliver richer, more interactive curricula to students,” says CEO James Francis.
Provide personalized learning and support: Smart analytics and AI can provide personalized support and recommendations for students, forecast demand, and predict shifts in learners’ preferences. Online learning platform Mindvalley uses cloud-based tools to understand and make decisions based on user activity and leverage machine learning to predict behavior.
Google Cloud is partnering with many of these leading EdTech companies, as well as industry-leading consortiums like Ed-Fi and Unizin, to standardize educational common data models and best practices for more agile and cost-effective integration of EdTech into existing environments.
The education landscape is changing rapidly, and EdTech has a major role to play as institutions adapt to the massive shift in learners’ preferences and expectations. We’re committed to empowering EdTech companies with the tools and services they need to help expand learning for everyone, anywhere.
In a data-driven, global, always-on world, databases are the engines that let businesses innovate and transform. As databases get more sophisticated and more organizations look for managed database services to handle infrastructure needs, there are a few key trends we’re seeing. Here’s what to watch.
We’re looking forward to what’s next in databases. You can’t predict the future, but you can be prepared for it—join us at our Data Cloud Summit to learn and connect, on May 26, 2021.
Posted by Bruno Panara, Google Registry Team
In my previous life as a startup entrepreneur, I found that life was more manageable when I was able to stay organized — a task that’s easier said than done. At Google Registry, we’ve been keeping an eye out for productivity and organization tools, and we’re sharing a few of our favorites with you today, just in time for spring cleaning.
.new shortcuts to save you time
Since launching .new shortcuts last year, we’ve seen a range of companies use .new domains to help their users get things done faster on their websites.
- If your digital workspace looks anything like mine, you’ll love these shortcuts: action.new creates a new Workona workspace to organize your Chrome tabs, and task.new helps keep track of your to-dos and projects in Asana.
- Bringing together notes and ideas can make it easier to get work done: coda.new creates a new Coda document to collect all your team’s thoughts, and jam.new starts a new collaborative Google Jamboard session.
- Spring cleaning wouldn’t be complete without a tidy cupboard: With sell.new you can create an eBay listing in minutes and free up some closet space. And if you own or manage a business, stay on top of your orders and keep services flowing by giving the shortcut — invoice.new — a try.
Visit whats.new to browse all the .new shortcuts, including our Spring Spotlights section.
Six startups helping you increase productivity
We recently sat down with six startups to learn how they’re helping their clients be more productive. From interviewing and hiring, to managing teamwork, calendars and meetings, check out these videos to learn how you can make the most of your time:
Arc.dev connects developers with companies hiring remotely, helping them find their next opportunity.
The founders of byteboard.dev, who came through Area 120, Google’s in-house incubator for experimental projects, thought that technical interviews were inefficient. So they redesigned them from the ground up to be more fair and relevant to real-world jobs.
To run more efficient meetings, try fellow.app. Streamlining agendas, note taking, action items and decision recording can help your team build great meeting habits.
Friday.app helps you organize your day so you can stay focused while sharing and collaborating with remote teammates.
Manage your time productively using inmotion.app, a browser extension that is a search bar, calendar, tab manager and distraction blocker, all in one.
No time to take your pet to the groomers? Find a groomer who will come to you and treat your pet to an in-home grooming session with pawsh.app.
Whether you’re a pet parent, a busy professional or just looking to sell your clutter online, we hope these tools help you organize and save time this season.
Last week we celebrated Earth Day — the second one that’s taken place during the pandemic. It’s becoming clear that these two challenges aren’t mutually exclusive. We know, for example, that climate change impacts the same determinants of health that worsen the effects of COVID-19. And, as reports have noted, we can’t afford to relax when it comes to the uneven progress we’re making toward a greener future.
At Google, we’re taking stock of where we’ve been and how we can continue building a more sustainable future. We’ve been deeply committed to sustainability ever since our founding two decades ago: we were the first major company to become carbon neutral and the first to match our electricity use with 100 percent renewable energy.
While we lead with our own actions, we can only fully realize the potential of a green and sustainable world through strong partnerships with businesses, governments, and nonprofits. At Google.org, we’re particularly excited about the potential for technology-based solutions from nonprofits and social innovators. Time and again we hear from social entrepreneurs who have game-changing ideas but need a little boost to bring them to life.
Through programs like our AI for Social Good Initiative and our most recent Google.org Impact Challenge on Climate, we are helping find, fund, and build these ideas. Already they’re having significant impact on critical issues from air quality to emissions analysis. In this month’s digest, you can read more about some of these ideas and the mark they’re making on the world.
In case you missed it
Earlier this month, Google sharedour latest series of commitments to support vaccine equity efforts across the globe. As part of this, Google.org is supporting Gavi, The Vaccine Alliance, in their latest fundraising push with initial funding to help fully vaccinate 250,000 people in low and middle income countries, technical assistance to improve their vaccine delivery systems and accelerate global distribution and Ad Grants to amplify fundraising efforts. We’ve since kicked off an internal giving campaign to increase our impact, bringing the total vaccinations funded to 880,000 to date, which includes matching funds from Gavi. And in the U.S., we’ve provided $2.5 million in overall grants to Partners in Health, Stop the Spread and Team Rubicon who are working directly with 500 community-based organizations to boost vaccine confidence and increase access to vaccines in Black, Latino and rural communities.
Hear from one of our grantees: WattTime
Gavin McCormick is the Executive Director of WattTime, a nonprofit that offers technology solutions that make it easy for anyone to achieve emissions reductions. WattTime is an AI Impact Challenge grantee and received both funding and a cohort of Google.org Fellows to help support their work, particularly a project that helps individuals and corporations understand how to use energy when it’s most sustainable and allows regulators to understand the state of global emissions.
“Data insights powered by AI help drive innovative solutions — from streaming services’ content suggestions to navigation on maps. But they’re still not often applied to some of the biggest challenges of our time like the climate crisis. My organization harnesses AI to empower people and companies alike to choose cleaner energy and slash emissions. Like enabling smart devices such as thermostats and electric vehicles to use electricity when power is clean and avoid using electricity when it’s dirty. Now with support from Google.org, we’re working with members of Climate TRACE — a global coalition we co-founded in 2019 of nonprofits, tech companies and climate leaders — to apply satellite imagery and other remote sensing technology to estimate nearly all types of human-caused greenhouse gas emissions in close to real time. We can’t solve the climate crisis if we don’t have an up-to-date understanding of where the emissions are coming from.”
A few words with a Google.org Fellow: Alok Talekar
Alok Talekar is a software engineer at Google who participated in a Google.org Fellowship with WattTime.
“I am a software engineer at Google and work on AI for social good with a focus on the agricultural sector in India. The Climate TRACE Google.org Fellowship with WattTime gave me the opportunity to change my career trajectory and work on climate crisis solutions full time. The mission that Gavin McCormick and team are pursuing is ambitious, and technology can help make it a reality. Over the course of the Fellowship, the team was able to use machine learning to process satellite imagery data of power plants around the world and determine when a particular plant was operational based on the imagery provided. I then helped the team to model and validate the bounds of accuracy of this approach in order to predict the cumulative annual emissions of a given power plant. I was proud to be able to contribute to the project in its early days and to be part of the core team that helped build this massive coalition for monitoring global emissions.”
Applications are now open for Canada’s first accelerator, entirely dedicated to cloud-native technology startups. We’re inviting up to twelve Canadian startups to participate in an intensive 10-week virtual bootcamp to prepare them for their next phase of growth.
Delivering web-based software is no longer as simple as ssh’ing into a LAMP server and vi’ing a php file. For good reasons, many of us have evolved practices and adopted technologies to address growing complexities and modern needs. Recently I put together a jigsaw puzzle of various technologies and practices so that I could deploy a globally distributed, scale-on-demand, edge-cached, webapp while taking advantage of container-based portability, infrastructure automation, and Continuous Integration / Continuous Delivery.
The major pieces of the puzzle include: The Java Virtual Machine (JVM), Scala, sbt, Docker, GraalVM, Cloud Native Buildpacks, bash, git, GitHub Actions, Google Cloud Run, Google Cloud Build, Google Cloud CDN, Google Container Registry, and Google Domains.
That is a lot of pieces! Let’s first look at the use case that I was solving for.
JavaDocs globally distributed & scale-on-demand
Libraries in the JVM ecosystem (created with Java, Kotlin, Scala, etc) are typically published to a repository called Maven Central. It currently has over 6 million artifacts (a version of some library). When a library author publishes their library they typically include an artifact that contains versioned documentation (i.e. JavaDoc). These artifacts are basically ZIP files containing some generated HTML. When you use a library typically you reference its JavaDoc either in your IDE or on a webpage where it has been published.
As a fun experiment I created a website that pulls JavaDocs out of Maven Central and displays them on a webpage. If you are familiar with Java libraries and can think of one, check out my website:
As an example, check out the gRPC Kotlin stub JavaDocs:
That site should have loaded super fast for you, no matter where you are, because I’ve put together the puzzle of creating a scale-on-demand webapp that is globally distributed with great edge caching. Here’s what the runtime architecture looks like:
latest JavaDocs for
index.html for JavaDoc
Best of all, the entire system is continuously delivered on merge to
main via a few lines of Cloud Build configuration. As a spoiler, here is all the build config you need to have the same sort of globally distributed, scale-on-demand, edge-cached, webapp:
To make things that easy and to make the app start super fast, I had to go on a bit of a journey putting together many different pieces. Let’s walk through everything.
Super-fast startup without sacrificing developer experience
The “JavaDoc Central” webapp is a proxy for Maven Central metadata and artifacts. It needs to query metadata from the repository like translating a version of “latest” to the actual latest version. When a user requests the JavaDoc for a given artifact it needs to pull that associated JavaDoc from Maven Central, extract it, and then serve the content.
Traditionally webapp hosting relied on over-provisioning so that when a request arrives the server is ready to handle it. Scale-on-demand takes a more efficient approach where underlying resources are dynamically allocated as requests come in but are also automatically deallocated when the number of requests decreases. This is also called auto-scaling or serverless. The nice thing about scale-on-demand is that there aren’t wasted / underutilized servers. But a challenge with this approach is that applications need to startup super fast because when the demand (number of requests) exceeds the available supply (underlying servers), a new server needs to be started so the excess demand can then be handled by a freshly started server. This is called a “cold-start” and has different impacts depending on many variables: programming platform, size of application, necessity for cache hydration, connection pooling, etc.
Cold-starts happen any time the demand exceeds the supply, not just when scaling up from zero servers.
An easy way to deal with some cold-start issues is to use programming platforms that don’t have significant startup overhead. JVM-based applications typically take at least a few seconds to startup because the JVM has startup overhead, JAR loading takes time, classpath scanning for dependency injection can be slow, etc. For this reason technologies like Node.js, Go, and Rust have been popular with scale-on-demand approaches.
Yet, I like working on the JVM for a variety of reasons including: great library & tooling ecosystem and modern high-level programming languages (Kotlin & Scala). I’m incredibly productive on the JVM and I don’t want to throw away that productivity just to better support scale-on-demand. For more details, read my blog: The Modern Java Platform – 2021 Edition
Luckily there is a way to have my cake and eat it too! GraalVM Native Image takes JVM-based applications and instead of running them on the JVM, it Ahead-Of-Time (AOT) compiles them into native applications. But that process takes time (minutes, not seconds) and I wouldn’t want to wait for that to happen as part of my development cycle. The good news is that I can run JVM-based applications on the JVM as well as native images. This is exactly what I do with the JavaDoc Central code. Here is what my development workflow looks like:
To create a native image with GraalVM I used a build tool plugin. Since I’m using Scala and the sbt build tool, I used the sbt-native-packager plugin but there are similar plugins for Maven and Gradle. This enables my Continuous Delivery system to run a command to create an AOT native executable from my JVM-based application:
GraalVM Native Image optionally allows native images to be statically linked so they don’t even need an operating system to run. The resulting container image for my entire statically linked JavaDoc webapp is only 15MB and starts up in well under a second. Perfect for on-demand scaling!
Multi-region deployment automation
When I first deployed the javadocs.dev site I manually created a service on Cloud Run that runs my 15MB container image but Cloud Run services are region-based so latency to them differ depending on where the user is (turns out the speed of light is fairly slow for round-the-globe TCP traffic). Cloud Run is available in all 24 Google Cloud regions but I didn’t want to manually create all those services and the related networking infrastructure to handle routing. There is a great Cloud Run doc called “Serving traffic from multiple regions” that walks through all the steps to create a Google Cloud Load Balancer in front of n-number of Cloud Run services. Yet, I wanted to automate all that so I embarked on a journey that further complicated my puzzle but resulted in a nice tool that I use to automate global deployments, network configurations, and global load balancing.
There are a number of different ways to automate infrastructure setup, including Terraform support for Google Cloud. But I just wanted a container image that’d run some
gcloud commands for me. Writing those commands is pretty straightforward but I also wanted to containerize them so they’d be easily reusable in automated deployments.
Typically, to containerize stuff like this, a
Dockerfile is used to define the steps needed to go from source to the thing that will be runnable in the container. But Dockerfiles are only reusable with copy & paste resulting in security and maintenance costs that are not evident initially. So I decided to build a Cloud Native Buildpack for gcloud scripts that anyone could reuse to create containers for gcloud automations. Buildpacks provide a way to reuse the logic for how source gets turned into runnable stuff in a container.
After an hour of learning how to create a Buildpack, the gcloud-buildpack was ready! There are only a couple pieces which you don’t really need to know about since Buildpacks abstracts away the process of turning source into a container image, but let’s go into them so you can understand what is under-the-covers.
Buildpack run image
Buildpacks add docker layers onto a “run image” so a Buildpack needs one of those. My gcloud-buildpack needs a run image that has the gcloud command in it. So I just created a new run image based on the gcloud base image and with two necessary labels (Docker metadata) for the Buildpacks:
I also needed to setup automation so the run image would automatically be created and stored on a container registry, and any changes would update the container image. I decided to use GitHub Actions to run the build and the GitHub Container Registry to store the container image. Here is the Action’s YAML:
Voila! The run image is available and continuously deployed:
Buildpacks participate in the Cloud Native Buildpack lifecycle and must implement at least two phases: detect & build. Buildpacks can be combined together so you can run something like:
pack build --builder gcr.io/buildpacks/builder:v1 foo
And all of the Buildpacks in the Builder Image will be asked if they know how to build the specified thing. In the case of the Google Cloud Buildpacks they know how to build Java, Go, Node.js, Python, and .NET applications. For my gcloud Buildpack I don’t have plans to add it to a Builder Image so I decided to have my detection always result in a positive result (meaning the buildpack will run no matter what). To do that my detect script just exits without an error. Note: You can create Buildpacks with any technology since they run inside the Builder Image in docker; I just decided to write mine in Bash because reasons.
The next phase for my gcloud Buildpack is to “build” the source but since the Buildpack is just taking shell scripts and adding them to my run image, all that needs to happen is to copy the scripts to the right place and tell the Buildpack lifecycle that they are executables / launchable processes. Check out the build code.
Since Buildpacks can be used via container images, my gcloud Buildpack needs to be built and published to a container registry. Again I used GitHub Actions:
From the user’s perspective, to use the gcloud Buildpack all they have to do is:
- Create a project containing a .sh file
- Build your project with pack:
Now with a gcloud Buildpack in place I’m ready to create a container image that makes it easy to deploy a globally load-balanced service on Cloud Run!
Easy Cloud Run
I created a bash script that automates the documented steps to setup a multiregion Cloud Run app so that they can all be done as part of a CI/CD pipeline. If you’re interested, check out the source. Using the new gcloud-buildpack I was able to package the command into a container image via GitHub Actions:
Now anyone can use the
ghcr.io/jamesward/easycloudrun container image with six environment variables, to automate the global load balancer setup and multi-region deployment. When this runs for the javadoccentral repo it looks like this:
All of the networking and load balancer configuration is automatically created (if it doesn’t exist) and the Cloud Run services are deployed with the
--ingress=internal-and-cloud-load-balancing option so that only the load balancer can talk to them. Even the http to https redirect is created on the load balancer. Here is what the load balancer and network endpoint groups look like in the Google Cloud Console:
Setting up a serverless, globally distributed application that is backed by 24 Google Cloud regions all happens in about 1 minute as part of my CI/CD pipeline.
Cloud Build CI/CD
Let’s bring this all together into a pipeline that tests the javadocs.dev application, creates the GraalVM Native Image container, and does the multi-region deployment. I used Cloud Build since it has GitHub integration and uses service accounts to control the permissions of the build (making it easy to enable Cloud Run deployment, network config setup, etc). The Cloud Build definition (source on GitHub):
Step 1 runs the application’s tests. Step 2 builds the application using GraalVM Native Image. Step 3 pushes the container images to the Google Cloud Container Registry. And finally Step 4 does the load balancer / network setup and deploys the application to all 24 regions.
In part I of this blog series we discussed best practices and patterns for efficiently deploying a machine learning model for inference with Google Cloud Dataflow. Amongst other techniques, it showed efficient batching of the inputs and the use of shared.py to make efficient use of a model.
In this post, we walk through the use of the RunInference API from tfx-bsl, a utility transform from TensorFlow Extended (TFX), which abstracts us away from manually implementing the patterns described in part I. You can use RunInference to simplify your pipelines and reduce technical debt when building production inference pipelines in batch or stream mode.
The following four patterns are covered:
- Using RunInference to make ML prediction calls.
- Post-processing RunInference results. Making predictions is often the first part of a multistep flow, in the business process. Here we will process the results into a form that can be used downstream.
- Attaching a key.Along with the data that is passed to the model, there is often a need for an identifier — for example, an IOT device ID or a customer identifier — that is used later in the process even if it’s not used by the model itself. We show how this can be accomplished.
- Inference with multiple models in the same pipeline. Often you may need to run multiple models within the same pipeline, be it in parallel or as a sequence of predict – process – predict calls. We walk through a simple example.
Creating a simple model
In order to illustrate these patterns, we’ll use a simple toy model that will let us concentrate on the data engineering needed for the input and output of the pipeline. This model will be trained to approximate multiplication by the number 5.
Please note the following code snippets can be run as cells within a notebook environment.
Step 1 – Set up libraries and imports
%pip install tfx_bsl==0.29.0 --quiet
Step 2 – Create the example data
In this step we create a small dataset that includes a range of values from 0 to 99 and labels that correspond to each value multiplied by 5.
Step 3 – Create a simple model, compile, and fit it
Let’s teach the model about multiplication by 5.
Next, check how well the model performs using some test data.
From the results below it looks like this simple model has learned its 5 times table close enough for our needs!
Step 4 – Convert the input to tf.example
In the model we just built, we made use of a simple list to generate the data and pass it to the model. In this next step we make the model more robust by using tf.example objects in the model training.
tf.example is a serializable dictionary (or mapping) from names to tensors, which ensures the model can still function even when new features are added to the base examples. Making use of tf.example also brings with it the benefit of having the data be portable across models in an efficient, serialized format.
To use tf.example for this example, we first need to create a helper class,
ExampleProcessor, that is used to serialize the data points.
ExampleProcess class, the in-memory list can now be moved to disk.
With the feature spec in place, we can train the model as before.
Note that these steps would be done automatically for us if we had built the model using a TFX pipeline, rather than hand-crafting the model as we did here.
Step 5 – Save the model
Now that we have a model, we need to save it for use with the RunInference transform. RunInference accepts TensorFlow saved model pb files as part of its configuration. The saved model file must be stored in a location that can be accessed by the RunInference transform. In a notebook this can be the local file system; however, to run the pipeline on Dataflow, the file will need to be accessible by all the workers, so here we use a GCP bucket.
Note that the gs:// schema is directly supported by the tf.keras.models.save_model api.
During development it’s useful to be able to inspect the contents of the saved model file. For this, we use the
saved_model_cli that comes with TensorFlow. You can run this command from a cell:
Abbreviated output from the saved model file is shown below. Note the signature def ‘serving_default’, which accepts a tensor of float type. We will change this to accept another type in the next section.
RunInference will pass a serialized tf.example to the model rather than a tensor of float type as seen in the current signature. To accomplish this we have one more step to prepare the model: creation of a specific signature.
Signatures are a powerful feature as they enable us to control how calling programs interact with the model. From the TensorFlow documentation:
“The optional signatures argument controls which methods in obj will be available to programs which consume SavedModels, for example, serving APIs. Python functions may be decorated with @tf.function(input_signature=…) and passed as signatures directly, or lazily with a call to get_concrete_function on the method decorated with @tf.function.”
In our case, the following code will create a signature that accepts a
tf.stringdata type with a name of
examples. This signature is then saved with the model, which replaces the previous saved model.
If you run the
saved_model_cli command again, you will see that the input signature has changed to
Pattern 1: RunInference for predictions
Step 1 – Use RunInference within the pipeline
Now that the model is ready, the RunInference transform can be plugged into an Apache Beam pipeline. The pipeline below uses TFXIO TFExampleRecord, which it converts to a transform via RawRecordBeamSource(). The saved model location and signature are passed to the RunInference API as a
SavedModelSpec configuration object.
Note: You can perform two types of inference using RunInference:
- In-process inference from a SavedModel instance. Used when the
saved_model_specfield is set in
- Remote inference by using a service endpoint. Used when the
ai_platform_prediction_model_specfield is set in
Below is a snippet of the output. The values here are a little difficult to interpret as they are in their raw unprocessed format. In the next section the raw results are post-processed.
Pattern 2: Post-processing RunInference results
The RunInference API returns a PredictionLog object, which contains the serialized input and the output from the call to the model. Having access to both the input and output enables you to create a simple tuple during post-processing for use downstream in the pipeline. Also worthy of note is that RunInference will consider the amenable-to-batching capability of the model (and does batch inference for performance purposes) transparently for you.
beam.DoFn takes the output of RunInference and produces formatted text with the questions and answers as output. Of course in a production system, the output would more normally be a Tuple[input, output], or simply the output depending on the use case.
Now the output contains both the original input and the model’s output values.
Pattern 3: Attaching a key
One useful pattern is the ability to pass information, often a unique identifier, with the input to the model and have access to this identifier from the output. For example, in an IOT use case you could associate a device id with the input data being passed into the model. Often this type of key is not useful for the model itself and thus should not be passed into the first layer.
RunInference takes care of this for us, by accepting a Tuple[key, value] and outputting Tuple[key, PredictLog]
Step 1 – Create a source with attached key
Since we need a key with the data that we are sending in for prediction, in this step we create a table in BigQuery, which has two columns: One holds the key and the second holds the test value.
Step 2 – Modify post processor and pipeline
In this step we:
- Modify the pipeline to read from the new BigQuery source table
- Add a map transform, which converts a table row into a Tuple[ bytes, Example]
- Modify the post inference processor to output results along with the key
Pattern 4: Inference with multiple models in the same pipeline
In part I of the series, the “join results from multiple models” pattern covered the various branching techniques in Apache Beam that make it possible to run data through multiple models.
Those techniques are applicable to RunInference API, which can easily be used by multiple branches within a pipeline, with the same or different models. This is similar in function to cascade ensembling, although here the data flows through multiple models in a single Apache Beam DAG.
Inference with multiple models in parallel
In this example, the same data is run through two different models: the one that we’ve been using to multiply by 5 and a new model, which will learn to multiply by 10.
Now that we have two models, we apply them to our source data.
Inference with multiple models in sequence
In a sequential pattern, data is sent to one or more models in sequence, with the output from each model chaining to the next model.
Here are the steps:
- Read the data from BigQuery
- Map the data
- RunInference with multiply by 5 model
- Process the results
- RunInference with multiply by 10 model
- Process the results
Running the pipeline on Dataflow
Until now the pipeline has been run locally, using the direct runner, which is implicitly used when running a pipeline with the default configuration. The same examples can be run using the production Dataflow runner by passing in configuration parameters including
--runner. Details and an example can be found here.
Here is an example of the multimodel pipeline graph running on the Dataflow service:
With the Dataflow runner you also get access to pipeline monitoring as well as metrics that have been output from the RunInference transform. The following table shows some of these metrics from a much larger list available from the library.
In this blog, part II of our series, we explored the use of the tfx-bsl RunInference within some common scenarios, from standard inference, to post processing and the use of RunInference API in multiple locations in the pipeline.
None of this would be possible without the hard work of many folks across both the Dataflow TFX and TF teams. From the TFX and TF team we would especially like to thank Konstantinos Katsiapis, Zohar Yahav, Vilobh Meshram, Jiayi Zhao, Zhitao Li, and Robert Crowe. From the Dataflow team I would like to thank Ahmet Altay for his support and input throughout.