Google News App

Comparing containerization methods: Buildpacks, Jib, and Dockerfile

As developers we work on source code, but production systems don’t run source, they need a runnable thing. Starting many years ago, most enterprises were using Java EE (aka J2EE) and the runnable “thing” we would deploy to production was a “.jar”, “.war”, or “.ear” file. Those files consisted of the compiled Java classes and would run inside of a “container” running on the JVM. As long as your class files were compatible with the JVM and container, the app would just work.

That all worked great until people started building non-JVM stuff: Ruby, Python, NodeJS, Go, etc. Now we needed another way to package up apps so they could be run on production systems. To do this we needed some kind of virtualization layer that would allow anything to be run. Heroku was one of the first to tackle this and they used a Linux virtualization system called “lxc” – short for Linux Containers. Running a “container” on lxc was half of the puzzle because still a “container” needed to be created from source code, so Heroku invented what they called “Buildpacks” to create a standard way to convert source into a container.

A bit later a Heroku competitor named dotCloud was trying to tackle similar problems and went a different route which ultimately led to Docker, a standard way to create and run containers across platforms including Windows, Mac, Linux, Kubernetes, and Google Cloud Run. Ultimately the container specification behind Docker became a standard under the Open Container Initiative (OCI) and the virtualization layer switched from lxc to runc (also an OCI project).

The traditional way to build a Docker container is built into the docker tool and uses a sequence of special instructions usually in a file named Dockerfile to compile the source code and assemble the “layers” of a container image.

Yeah, this is confusing because we have all sorts of different “containers” and ways to run stuff in those containers. And there are also many ways to create the things that run in containers. The bit of history is important because it helps us categorize all of this into three parts:

  • Container Builders – Turn source code into a Container Image
  • Container Images – Archive files containing a “runnable” application
  • Containers – Run Container Images

With Java EE those three categories map to technologies like:

  • Container Builders == Ant or Maven
  • Container Images == .jar, .war, or .ear
  • Containers == JBoss, WebSphere, WebLogic

With Docker / OCI those three categories map to technologies like:

  • Container Builders == Dockerfile, Buildpacks, or Jib
  • Container Images == .tar files usually not dealt with directly but through a “container registry”
  • Containers == Docker, Kubernetes, Cloud Run

Java Sample Application

Let’s explore the Container Builder options further on a little Java server application. .  If you want to follow along, clone my comparing-docker-methods project:

git clone https://github.com/jamesward/comparing-docker-methods.git

cd comparing-docker-methods

In that project you’ll see a basic Java web server in src/main/java/com/google/WebApp.java that just responds with “hello, world” on a GET request to /. Here is the source:

This project uses Maven with a minimal pom.xml build config file for compiling and running the Java server:

If you want to run this locally make sure you have Java 8 installed and from the project root directory, run:

./mvnw compile exec:java

You can test the server by visiting: http://localhost:8080

Container Builder: Buildpacks

We have an application that we can run locally so let’s get back to those Container Builders. Earlier you learned that Heroku invented Buildpacks to create standard, polyglot ways to go from source to a Container Image. When Docker / OCI Containers started gaining popularity Heroku and Pivotal worked together to make their Buildpacks work with Docker / OCI Containers. That work is now a sandbox Cloud Native Computing Foundation project: https://buildpacks.io/

To use Buildpacks you will need to install Docker and the pack tool. Now from the command line tell Buildpacks to take your source and turn it into a Container Image:

pack build --builder=gcr.io/buildpacks/builder:v1 comparing-docker-methods:buildpacks

Magic! You didn’t have to do anything and the Buildpacks knew how to turn that Java application into a Container Image. It even works on Go, NodeJS, Python, and .Net apps out-of-the-box. So what just happened?  Buildpacks inspect your source and try to identify it as something it knows how to build. In the case of our sample application it noticed the pom.xml file and decided it knows how to build Maven-based applications. The --builder flag told it where to get the Buildpacks from. In this case, gcr.io/buildpacks/builder:v1 are the Container Image coordinates to Google Cloud’s Buildpacks. Alternatively you could use the Heroku or Paketo Buildpacks. The parameter comparing-docker-methods:buildpacks is the Container Image coordinates for where to store the output. In this case it stores on the local docker daemon. You can now run that Container Image locally with docker:

docker run -it -ePORT=8080 -p8080:8080 comparing-docker-methods:buildpacks

Of course you can also run that Container Image anywhere that runs Docker / OCI Containers like Kubernetes and Cloud Run.

Buildpacks are nice because in many cases they just work and you don’t have to do anything special to turn your source into something runnable. But the resulting Container Images created from Buildpacks can be a bit bulky. Let’s use a tool called dive to examine what is in the created container image:

dive comparing-docker-methods:buildpacks

Container Image

Here you can see the Container Image has 11 layers and a total image size of 319MB. With dive you can explore each layer and see what was changed. In this Container Image the first 6 layers are the base operating system. Layer 7 is the JVM and layer 8 is our compiled application. Layering enables great caching so if only layer 8 changes, then layers 1 through 7 do not need to be re-downloaded. One downside of Buildpacks is how (at least for now) all of the dependencies and compiled application code are stored in a single layer. It would be better to have separate layers for the dependencies and the compiled application.

To recap, Buildpacks are the easy option that “just works” right out-of-the-box. But the Container Images are a bit large and not optimally layered.

Container Builder: Jib

The open source Jib project is a Java library for creating Container Images with Maven and Gradle plugins. To use it on a Maven project (like the one we from above), just add a build plugin to the pom.xml file:

Now a Container Image can be created and stored in the local docker daemon by running:

./mvnw compile jib:dockerBuild -Dimage=comparing-docker-methods:jib

Using dive we will see that the Container Image for this application is now only 127MB thanks to slimmer operating system and JVM layers. Also, on a Spring Boot application we can see how Jib layers the dependencies, resources, and compiled application for better caching:

Spring Boot Application

In this example the 18MB layer contains the runtime dependencies and the final layer contains the compiled application. Unlike with Buildpacks the original source code is not included in the Container Image. Jib also has a great feature where you can use it without docker being installed, as long as you store the Container Image on an external Container Registry (like DockerHub or the Google Cloud Container Registry). Jib is a great option with Maven and Gradle builds for Container Images that use the JVM.

Container Builder: Dockerfile

The traditional way to create Container Images is built into the docker tool and uses a sequence of instructions defined in a file usually named Dockerfile. Here is a Dockerfile you can use with the sample Java application:

In this example, the first four instructions start with the AdoptOpenJDK 8 Container Image and build the source to a Jar file. The final Container Image is created from the AdoptOpenJDK 8 JRE Container Image and includes the created Jar file. You can run docker to create the Container Image using the Dockerfile instructions:

docker build -t comparing-docker-methods:dockerfile 

Using dive we can see a pretty slim Container Image at 209MB:

Container Image

With a Dockerfile we have full control over the layering and base images. For example, we could use the Distroless Java base image to trim down the Container Image even further. This method of creating Container Images provides a lot of flexibility but we do have to write and maintain the instructions.

With this flexibility we can do some cool stuff. For example, we can use GraalVM to create a “native image” of our application. This is an ahead-of-time compiled binary which can reduce startup time, reduce memory usage, and alleviate the need for a JVM in the Container Image. And we can go even further and create a statically linked native image which includes everything needed to run so that even an operating system is not needed in the Container Image. Here is the Dockerfile to do that:

You will see there is a bit of setup needed to support static native images. After that setup the Jar is compiled like before with Maven. Then the native-image tool creates the binary from the Jar. The FROM scratch instruction means the final container image will start with an empty one. The statically linked binary created by native-image is then copied into the empty container.

Like before you can use docker to build the Container Image:

docker build -t comparing-docker-methods:graalvm .

Using dive we can see the final Container Image is only 11MB!

Container Image

And it starts up super fast because we don’t need the JVM, OS, etc. Of course GraalVM is not always a great option as there are some challenges like dealing with reflection and debugging. You can read more about this in my blog, GraalVM Native Image Tips & Tricks.

This example does capture the flexibility of the Dockerfile method and the ability to do anything you need. It is a great escape hatch when you need one.

Which Method Should You Choose?

  • The easiest, polyglot method: Buildpacks
  • Great layering for JVM apps: Jib
  • The escape hatch for when those methods don’t fit: Dockerfile

Check out my comparing-docker-methods project to explore these methods as well as the mentioned Spring Boot + Jib example.

Read More

Estimating the Impact of Training Data with Reinforcement Learning

Posted by Jinsung Yoon and Sercan O. Arik, Research Scientists, Cloud AI Team, Google Research

Recent work suggests that not all data samples are equally useful for training, particularly for deep neural networks (DNNs). Indeed, if a dataset contains low-quality or incorrectly labeled data, one can often improve performance by removing a significant portion of training samples. Moreover, in cases where there is a mismatch between the train and test datasets (e.g., due to difference in train and test location or time), one can also achieve higher performance by carefully restricting samples in the training set to those most relevant for the test scenario. Because of the ubiquity of these scenarios, accurately quantifying the values of training samples has great potential for improving model performance on real-world datasets.

Top: Examples of low-quality samples (noisy/crowd-sourced); Bottom: Examples of a train and test mismatch.

In addition to improving model performance, assigning a quality value to individual data can also enable new use cases. It can be used to suggest better practices for data collection, e.g., what kinds of additional data would benefit the most, and can be used to construct large-scale training datasets more efficiently, e.g., by web searching using the labels as keywords and filtering out less valuable data.

In “Data Valuation Using Deep Reinforcement Learning”, accepted at ICML 2020, we address the challenge of quantifying the value of training data using a novel approach based on meta-learning. Our method integrates data valuation into the training procedure of a predictor model that learns to recognize samples that are more valuable for the given task, improving both predictor and data valuation performance. We have also launched four AI Hub Notebooks that exemplify the use cases of DVRL and are designed to be conveniently adapted to other tasks and datasets, such as domain adaptationcorrupted sample discovery and robust learningtransfer learning on image data and data valuation.

Quantifying the Value of Data
Not all data are equal for a given ML model — some have greater relevance for the task at hand or are more rich in informative content than others. So how does one evaluate the value of a single datum? At the granularity of a full dataset, it is straightforward; one can simply train a model on the entire dataset and use its performance on a test set as its value. However, estimating the value of a single datum is far more difficult, especially for complex models that rely on large-scale datasets, because it is computationally infeasible to re-train and re-evaluate a model on all possible subsets.

To tackle this, researchers have explored permutation-based methods (e.g., influence functions), and game theory-based methods (e.g., data Shapley). However, even the best current methods are far from being computationally feasible for large datasets and complex models, and their data valuation performance is limited. Concurrently, meta learning-based adaptive weight assignment approaches have been developed to estimate the weight values using a meta-objective. But rather than prioritizing learning from high value data samples, their data value mapping is typically based on gradient descent learning or other heuristic approaches that alter the conventional predictor model training dynamics, which can result in performance changes that are unrelated to the value of individual data points.

Data Valuation Using Reinforcement Learning (DVRL)
To infer the data values, we propose a data value estimator (DVE) that estimates data values and selects the most valuable samples to train the predictor model. This selection operation is fundamentally non-differentiable and thus conventional gradient descent-based methods cannot be used. Instead, we propose to use reinforcement learning (RL) such that the supervision of the DVE is based on a reward that quantifies the predictor performance on a small (but clean) validation set. The reward guides the optimization of the policy towards the action of optimal data valuation, given the state and input samples. Here, we treat the predictor model learning and evaluation framework as the environment, a novel application scenario of RL-assisted machine learning.

Training with Data Value Estimation using Reinforcement Learning (DVRL). When training the data value estimator with an accuracy reward, the most valuable samples (denoted with green dots) are used more and more, whereas the least valuable samples (red dots) are used less frequently.

Results
We evaluate the data value estimation quality of DVRL on multiple types of datasets and use cases.

<!–

    –>
    • Model performance after removing high/low value samples
      Removing low value samples from the training dataset can improve the predictor model performance, especially in the cases where the training dataset contains corrupted samples. On the other hand, removing high value samples, especially if the dataset is small, decreases the performance significantly. Overall, the performance after removing high/low value samples is a strong indicator for the quality of data valuation.
      Accuracy with the removal of most and least valuable samples, where 20% of the labels are noisy by design. By removing such noisy labels as the least valuable samples, a high-quality data valuation method achieves better accuracy. We demonstrate that DVRL outperforms other methods significantly from this perspective.

      DVRL shows the fastest performance degradation after removing the most important samples and the slowest performance degradation after removing the least important samples in most cases, underlining the superiority of DVRL in identifying noisy labels compared to competing methods (Leave-One-Out and Data Shapley).

    • Robust learning with noisy labels
      We consider how reliably DVRL can learn with noisy data in an end-to-end way, without removing the low-value samples. Ideally, noisy samples should get low data values as DVRL converges and a high performance model would be returned.
      Robust learning with noisy labels. Test accuracy for ResNet-32 and WideResNet-28-10 on CIFAR-10 and CIFAR-100 datasets with 40% of uniform random noise on labels. DVRL outperforms other popular methods that are based on meta-learning.

      We show state-of-the-art results with DVRL in minimizing the impact of noisy labels. These also demonstrate that DVRL can scale to complex models and large-scale datasets.

    • Domain adaptation
      We consider the scenario where the training dataset comes from a substantially different distribution from the validation and testing datasets. Data valuation is expected to be beneficial for this task by selecting the samples from the training dataset that best match the distribution of the validation dataset. We focus on the three cases: (1) a training set based on image search results (low-quality web-scraped) applied to the task of predicting skin lesion classification using HAM 10000 data (high-quality medical); (2) an MNIST training set for a digit recognition task on USPS data (different visual domain); (3) e-mail spam data to detect spam applied to an SMS dataset (different task). DVRL yields significant improvements for domain adaptation, by jointly optimizing the data valuator and corresponding predictor model.

    <!–

–>

Conclusions
We propose a novel meta learning framework for data valuation which determines how likely each training sample will be used in training of the predictor model. Unlike previous works, our method integrates data valuation into the training procedure of the predictor model, allowing the predictor and DVE to improve each other’s performance. We model this data value estimation task using a DNN trained through RL with a reward obtained from a small validation set that represents the target task performance. In a computationally-efficient way, DVRL can provide high quality ranking of training data that is useful for domain adaptation, corrupted sample discovery and robust learning. We show that DVRL significantly outperforms alternative methods on diverse types of tasks and datasets.

Acknowledgements
We gratefully acknowledge the contributions of Tomas Pfister.

Read More

Cloud Acceleration Program: More reasons for SAP customers to migrate to Google Cloud

The arrival of COVID-19 caused massive disruption for companies around the globe and made digital transformation a more urgent priority. That’s why it’s so important that enterprises running their businesses on SAP have the agility, uptime and advanced analytics that Google Cloud can offer. But given the drain on financial and human resources that the pandemic has caused, many organizations are worried about the risks of migrating to the cloud and have considered hitting the brakes on their cloud migrations, just when they should be pressing the gas pedal. 

Last year we launched the Cloud Acceleration Program (CAP), which has significantly helped SAP customers speed their transitions to the cloud. This first-of-its-kind program empowers customers with solutions from both Google Cloud and our partners to simplify their cloud migrations. Google Cloud is also providing CAP participants with upfront financial incentives to defray infrastructure costs for SAP cloud migrations and help customers ensure that duplicate costs are not incurred during migration. Here’s what customers are saying about the program:

“We had plans to migrate to the cloud, but COVID brought into sharp focus the need to accelerate our SAP migration to the cloud. With help from Google Cloud and their Cloud Acceleration Program, we were able to get the skills and the funding to accelerate this effort dramatically. With our new strategic relationship with Google Cloud, we feel significantly better positioned for the future to take advantage of the elastic, scalable computing capabilities and vast amounts of innovation that [are] constantly being developed.”  

—Maneesh Gidwani, CIO of FIFCO, a global food, beverage, retail and hospitality organization

“The ability to leverage Managecore through the Cloud Acceleration Program dramatically reduced the risk and costs of our SAP migration to Google Cloud. The CAP program enabled Pegasystems to offset the upfront migration expense and significantly expedite our go-live process. With the help of Managecore we were able to focus on running our ERP business operations in the Cloud, rather than the technical elements of the project.”  

—David Vidoni, Vice President of IT, Pegasystems, a CRM and BPM software developer company 

Cloud Acceleration Program Partners step up for SAP customers

Google Cloud’s strongecosystem of partners are stepping up to the plate more than ever to help customers de-risk their SAP migrations to the cloud. By completing their migrations faster and with minimal cost, customers are now shifting their conversations from concerns about infrastructure and deployment, to higher-value topics such as optimizing costs and driving business value with analytics and machine learning tools.  

“As one of the early partner participants in the Cloud Acceleration Program, we have been able to apply these significant resources to help multiple Enterprise customers in their SAP cloud migration engagements. CAP allows HCL to efficiently get tools and resources to the customer to ease their migration risk concerns & costs. Our customers are now engaging to drive strategic conversations on how they can leverage the SAP platform on the Google Cloud to drive new insights, improve business KPIs and create new business models with capabilities such as Google Cloud analytics and machine learning tools.”

—Sanjay Singh, Senior VP & Global Head, HCL Google Ecosystem Unit

“At NIMBL, we’ve seen both a great deal of interest in Google Cloud by our SAP customers as well as the significant results being realized by those who have deployed. A common concern for many other customers still on this journey – however – continues to be the overall disruption that a cloud migration may cause. Our migration expertise combined with the industry best tools and resources that Cloud Acceleration Program (CAP) offers, helps provide customers with a clear and confident path to the cloud. As a CAP partner, Google Cloud continues to set us up for success with the resources and support we need to deliver these critical customer deployments.”

—Sergio Cipolla, Managing Partner, NIMBL Techedge Group

Google Cloud is a great place to run SAP

As pressures to transform increase for SAP enterprises, customers are looking to modernize on a smarter cloud.Google Cloud continues to be a great place to run SAP. Like no other for these unprecedented times, our Cloud Acceleration Program gets customers one step closer by reducing the complexities of migration and technical and financial risk management.Contact us to learn more aboutGoogle Cloud for SAP.

Read More

Exploring AI for radiotherapy planning with Mayo Clinic

More than 18 million new cancer cases are diagnosed globally each year, and radiotherapy is one of the most common cancer treatments—used to treat over halfof cancers in the United States. But planning for a course of radiotherapy treatment is often a time-consuming and manual process for clinicians. The most labor-intensive step in planning is a technique called “contouring” which involves segmenting both the areas of cancer and nearby healthy tissues that are susceptible to radiation damage during treatment. Clinicians have to painstakingly draw lines around sensitive organs on scans—a time-intensive process that can take up to seven hours for a single patient.

Technology has the potential to augment the work of doctors and other care providers, like the specialists who plan radiotherapy treatment. We’re collaborating with Mayo Clinic on research to develop an AI system that can support physicians, help reduce treatment planning time and improve the efficiency of radiotherapy. In this research partnership, Mayo Clinic and Google Health will work to develop an algorithm to assist clinicians in contouring healthy tissue and organs from tumors, and conduct research to better understand how this technology could be deployed effectively in clinical practice. 

Mayo Clinic is an international center of excellence for cancer treatment with world-renowned radiation oncologists. Google researchers have studied how AI can potentially be used to augment other areas of healthcare—from mammographies to the early deployment of an AI system that detects diabetic retinopathy using eye scans. 

In a previous collaboration with University College London Hospitals, Google researchers demonstrated how an AI system could analyze and segment medical scans of patients with head and neck cancer— similar to how expert clinicians would. Our research with Mayo Clinic will also focus on head and neck cancers, which are particularly challenging areas to contour, given the many delicate structures that sit close together. 

In this first phase of research with Mayo Clinic, we hope to develop and validate a model as well as study how an AI system could be deployed in practice. The technology will not be used in a clinical setting and algorithms will be developed using only de-identified data. 

While cancer rates continue to rise, the shortage of radiotherapy experts continues to grow as well. Waiting for a radiotherapy treatment plan can be an agonizing experience for cancer patients, and we hope this research will eventually support a faster planning process and potentially help patients to access treatment sooner.

Read More

Cloud Storage object lifecycle management gets new controls

Managing your cloud storage costs and reducing the risk of overspending is critical in today’s changing business environments. Today, we’re excited to announce the immediate availability of two new Object Lifecycle Management (OLM) rules designed to help protect your data and lower the total cost of ownership (TCO) within Google Cloud Storage. You can now transition objects between storage classes or delete them entirely based on when versioned objects became noncurrent (out-of-date), or based on a custom time stamp you set on your objects. The end result: more fine grained controls to reduce TCO and improve storage efficiencies. 

Delete objects based on archive time 

Many customers who leverage OLM protect their data against accidental deletion with Object Versioning. However, without the ability to automatically delete versioned objects based on their age, the storage capacity and monthly charges associated with old versions of objects can grow quickly. With the non-current time condition, you can filter based on archive time and use it to apply any/all lifecycle actions that are already supported, including delete and change storage class. In other words, you can now set a lifecycle condition to delete an object that is no longer useful to you, reducing your overall TCO. 

Here is a sample rule to delete all the noncurrent object versions that became versioned (noncurrent) more than 30 days ago:

This rule downgrades all the noncurrent object versions that became versioned (noncurrent) before January 31, 1980 in Coldline to Archive:

edit object lifecycle 1.jpg

Set custom timestamps

The second new Cloud Storage feature is the ability to set a custom timestamp in the metadata field to to assign a lifecycle management condition to OLM. Before this launch, the only timestamp that could be used for OLM was given to an object when writing to the Cloud Storage bucket. However, this object creation timestamp may not actually be the date that you care the most about. For example, you may have migrated data to Cloud Storage from another environment and want to preserve the original create dates from before the transfer. In order to set lifecycle rules based on dates that make more sense to you and your business case, you can now set a specific date and time and apply lifecycle rules to objects. All existing actions, including delete and change storage class are supported. 

If you’re running applications such as backup and disaster recovery applications, content serving, or a data lake, you can benefit from this feature by preserving the original creation date of an object when ingesting data into Cloud Storage. This feature delivers fine-grained OLM controls, resulting in cost savings and efficiency improvements, as a result of being able to set your own timestamps directly to the assets themselves. 

This sample rule deletes all objects in a bucket more than 2 years old since the specified custom timestamp:

This rule downgrades all objects with custom timestamp older than May 27, 2019 in Coldline to Archive:

edit object lifecycle 2.jpg

The ability to use age or custom dates with Cloud Storage object lifecycle management is now generally available. To get started or for more information, visit the Cloud Storage Lifecycle Documentation page or navigate to the Google Cloud Console.

Read More

Modernize your Java apps with Spring Boot and Spring Cloud GCP

It’s an exciting time to be a Java developer: there are new Java language features being released every 6 months, new JVM languages like Kotlin, and the shift from traditional monolithic applications to microservices architectures with modern frameworks like Spring Boot. And with Spring Cloud GCP, we’re making it easy for enterprises to modernize existing applications and build cloud-native applications on Google Cloud. 

First released two years ago, Spring Cloud GCP allows Spring Boot applications to easily utilize over a dozen Google Cloud services with idiomatic Spring Boot APIs. This means you don’t need to learn a Google Cloud-specific client library, but can still utilize and realize the benefits of the managed services:

  1. If you have an existing Spring Boot application, you can easily migrate to Google Cloud services with little to no code changes.

  2. If you’re writing a new Spring Boot application, you can leverage Google Cloud services with the framework APIs you already know.

Major League Baseball recently started their journey to the cloud with Google Cloud. In addition to modernizing their infrastructure with GKE and Anthos, they are also modernizing with a microservices architecture. Spring Boot is already the standard Java framework within the organization. Spring Cloud GCP allowed MLB to adopt Google Cloud quickly with existing Spring Boot knowledge.

“We use the Spring Cloud GCP to help manage our service account credentials and access to Google Cloud services.” – Joseph Davey, Principal Software Engineer at MLB

Similarly, bol.com, an online retailer, was able to develop their Spring Boot applications on GCP more easily with Spring Cloud GCP.

“[bol.com] heavily builds on top of Spring Boot, but we only have a limited capacity to build our own modules on top of Spring Boot to integrate our Spring Boot applications with GCP. Spring Cloud GCP has taken that burden from us and makes it a lot easier to provide the integration to Google Cloud Platform.” – Maurice Zeijen, Software Engineer at bol.com

Developer productivity, with little to no custom code

With Spring Cloud GCP, you can develop a new app, or migrate an existing app, to adopt a fully managed database, create event-driven applications, add distributed tracing and centralized logging, and retrieve secrets—all with little to no custom code or custom infrastructure to maintain. Let’s look at some of the integrations that Spring Cloud GCP brings to the table. 

Data

For a regular RDBMS, like PostgreSQL, MySQL, and MS SQL, you can use Cloud SQL and continue to use Hibernate with Spring Data, and connect to Cloud SQL simply by updating the JDBC configuration. But what about Google Cloud databases like Firestore, Datastore, and the globally-distributed RDBMS Cloud Spanner? Spring Cloud GCP implements all the data abstractions needed so you can continue to use Spring Data, and its data repositories, without having to rewrite your business logic. For example, you can start using Datastore, a fully-managed NoSQL database, just as you would any other database that Spring Data supports.

You can annotate a POJO class with Spring Cloud GCP annotations, similar to how you would annotate Hibernate/JPA classes:

Then, rather than implementing your own data access objects, you can extend a Spring Data Repository interface to get full CRUD operations, as well as custom query methods.

Spring Data and Spring Cloud GCP automatically implement the CRUD operations and generate the query for you. Best of all, you can use built-in Spring Data features like auditing and capturing data change events.

You can find full samples for Spring Data for Datastore, Firestore, and Spanner on GitHub.

Messaging

For asynchronous message processing and event-driven architectures, rather than manually provision and maintain complicated distributed messaging systems, you can simply use Pub/Sub. By using higher-level abstractions like Spring Integration, or Spring Cloud Streams, you can switch from an on-prem messaging system to Pub/Sub with just a few configuration changes.

For example, by using Spring Integration, you can define a generic business interface that can publish a message, and then configure it to send a message to Pub/Sub:

You can consume messages in the same way. The following is an example of using Spring Cloud Stream and the standard Java 8 streaming interface to receive messages from Pub/Sub by simply configuring the application:

You can find full samples with Spring Integration and Spring Cloud Stream on GitHub.

Observability

If a user request is processed by multiple microservices and you would like to visualize that whole call stack across microservices, then you can add distributed tracing to your services. On Google Cloud, you can store all the traces in Cloud Trace, so you don’t need to manage your own tracing servers and storage.

Simply add the Spring Cloud GCP Trace starter to your dependencies, and all the necessary distributed tracing context (e.g., trace ID, span ID, etc) is captured, propagated, and reported to Cloud Trace.

This is it—no custom code required. All the instrumentation and trace capabilities use Spring Cloud Sleuth. Spring Cloud GCP supports all of Spring Cloud Sleuth’s features, so distributed tracing is automatically integrated with Spring MVC, WebFlux, RestTemplate, Spring Integration, and more.

trace waterfall view.jpg

Cloud Trace generates a distributed trace graph. But notice the “Show Logs” checkbox. This Trace/Log correlation feature can associate log messages to each trace so you can see the logs associated with a request to isolate issues. You can use Spring Cloud GCP Logging starter and its predefined logging configuration to automatically produce the log entry with the trace correlation data.

You can find full samples with Logging and Trace  on GitHub.

Secrets

Your microservice may also need access to secrets, such as database passwords or other credentials. Traditionally, credentials may be stored in a secret store like HashiCorp Vault. While you can continue to use Vault on Google Cloud, Google Cloud also provides the Secret Manager service for this purpose. Simply add the Spring Cloud GCP Secret Manager starter so that you can start referring to the secret values using standard Spring properties:

In the applications.properties file, you can refer to the secret values using a special property syntax:

You can find a full sample with Secret Manager on GitHub.

More in the works, in open source

Spring Cloud GCP closely follows the Spring Boot and Spring Cloud release trains. Currently, Spring Cloud GCP 1.2.5 works with Spring Boot 2.3 and Spring Cloud Hoxton release train. Spring Cloud GCP 2.0 is on its way and it will support Spring Boot 2.4 and the Spring Cloud Ilford release train.

In addition to core Spring Boot and Spring Cloud integrations, the team has been busy developing new components to meet developers’ needs:

Developer success is important to us. We’d love to hear your feedback, feature requests, and issues on GitHub, so we can understand your needs and prioritize our development work. 

Try it out!

Want to see everything in action? Check out the Developer Hands-on Keynote from Google Cloud Next ‘20: On Air, where Daniel Zou shows how to leverage Spring Boot and Spring Cloud GCP when modernizing your application with Anthos, Service Mesh, and more:

Can you have both innovation and stability in enterprise IT? The risk of stability or security regressions when upgrading or patching applications holds back innovation and new capabilities. Developers, operators, data engineers, and scientists: join the Hands-On Keynote for the tour de code of tools and automation frameworks that increase productivity while letting you run where you need to, using the tools you’re already familiar with. The end result is increased trust and reduced risk, unlocking significant innovation speed for your organization.

You can also easily try Spring Cloud GCP with many samples. Or, you can take the guided Spring Boot on GCP course on Qwiklab or Coursera. Last but not least, you can find out about detailed features and configurations in the reference documentation.

Read More

All treats, no tricks with product recommendation reference patterns

In all things technology, change is the only constant. This year alone has brought more uncertainty than ever before, and the IT shadows have felt full of perils. With the onset of the pandemic, the way consumers shop has shifted faster than anyone could have predicted. The move to online shopping vs. brick and mortar stores was already happening, but it’s significantly accelerated just this year alone. Shoppers have quickly transitioned to online purchasing, resulting in increased traffic and varying fulfillment needs. Shopper expectations have evolved as well, with 66% of online purchasers choosing a retailer based on convenience, while only 47% choosing a retailer based on price/value, according to Catalyst and Kantar research.

So the pressure is on for retailers to become digital and make sure shoppers are happy. But there’s no reason to be spooked out. Done right, you can serve customers better with an understanding of their customers’ purchasing behavior and patterns using predictive analytics. Deep, data-driven insights are important to ensuring customer demand and preferences are accurately met. 

To make it easier to treat (not trick) your customers to better recommendations, we recently introduced Smart Analytics reference patterns, which are technical reference guides with sample code for common analytics use cases with Google Cloud, including predicting customer lifetime value, propensity to purchase, product recommendation systems, and more. We heard from many customers that you needed an easy way to put your analytics tools into practice, and that these are some common use cases.

Understanding product recommendation systems

Product recommendation systems are an important tool for understanding customer behavior. They’re designed to generate and provide suggestions for items or content a specific user would like to purchase or engage with. A recommendation system creates an advanced set of complex connections between products and users, and compares and ranks these connections in order to recommend products or services as customers browse your website, for example. A well-developed recommendation system will help you improve your shoppers’ experience on a website and result in better customer acquisition and retention. These systems can significantly boost sales, revenues, click-through-rates, conversions, and other important metrics because personalizing a user’s preferences creates a positive effect, in turn translating to customer satisfaction, loyalty, and even brand affinity. Instead of building from scratch and reinventing the wheel every time, you can take advantage of these reference patterns to quickly start serving customers. 

It’s important to emphasize that recommender systems are not new, and you can build your own in-house or from any cloud provider. Google Cloud’s unique ability to handle massive amounts of structured and unstructured data, combined with our advanced capabilities in machine learning and artificial intelligence, provide a powerful set of products and solutions for retailers to leverage across their business.

Using reference patterns for real-world cases

In this reference pattern, you will learn step-by-step how to build a recommendation system by using BigQuery ML (a.k.a. BigQu-eerie ML 👻) to generate product or service recommendations from customer data in BigQuery. Then, learn how to make that data available to other production systems by exporting it to Google Analytics 360 or Cloud Storage, or programmatically reading it from the BigQuery table. The key advantage of using BigQuery ML is really how quickly and simply you can build a machine learning model with data already stored in BigQuery. In addition, the ease of productionizing the recommendation system ultimately saves you time and money. The same person can now analyze data and also train and deploy models in BigQuery using BigQuery ML. You no longer need a data engineer in between to export data out of BigQuery for ML purposes.

You can also see this step-by-step guide that explores the e-commerce recommendation system,  as well as in this Notebook environment that helps walk you through the entire process of building such a system in your organization. You will learn how to:

  • Process sample data into a format suitable for training a matrix factorization model.

  • Create, train, and deploy a matrix factorization model.

  • Get predictions from the deployed model about what products your customers are most likely to be interested in.

  • Export prediction data from BigQuery to Google Analytics 360, Cloud Storage, or by programmatically reading it from the BigQuery table.

Learn how to use BigQuery ML to train and deploy a recommendation system.

Smart Analytics reference patterns are designed to reduce the time to value to implement analytics use cases and get you quickly to implementation. To get started, check out the existing reference patterns and select the one that best fits your needs.

Read More

Sundar Pichai’s testimony before the Senate Commerce Committee

Editor’s Note: Today the CEOs of Google, Facebook and Twitter are testifying before the U.S. Senate Commerce Committee. Read our CEO Sundar Pichai’s opening testimony below, describing how Section 230 makes it possible for Google to provide access to a wide range of information—including high-quality local journalism—while responsibly protecting people from harm and keeping their information private.

Chairman Wicker, Ranking Member Cantwell, and distinguished members of the Committee, thank you for the opportunity to appear before you today.

The internet has been a powerful force for good over the past three decades. It has radically improved access to information, whether it’s connecting Americans to jobs, getting critical updates to people in times of crisis, or helping a parent find answers to questions like “How can I get my baby to sleep through the night?”

At the same time, people everywhere can use their voices to share new perspectives, express themselves and reach broader audiences than ever before. Whether you’re a barber in Mississippi or a home renovator in Indiana, you can share a video and build a global fanbase—and a successful business—right from your living room.

In this way, the internet has been one of the world’s most important equalizers. Information can be shared—and knowledge can flow—from anyone, to anywhere. But the same low barriers to entry also make it possible for bad actors to cause harm.

As a company whose mission is to organize the world’s information and make it universally accessible and useful, Google is deeply conscious of both the opportunities and risks the internet creates. 

I’m proud that Google’s information services like Search, Gmail, Maps, and Photos provide thousands of dollars a year in value to the average American—for free. We feel a deep responsibility to keep the people who use our products safe and secure, and have long invested in innovative tools to prevent abuse of our services. 

When it comes to privacy we are committed to keeping your information safe, treating it responsibly, and putting you in control. We continue to make privacy improvements —like the changes I announced earlier this year to keep less data by default—and support the creation of comprehensive federal privacy laws.

We are equally committed to protecting the quality and integrity of information on our platforms, and supporting our democracy in a non-partisan way.

As just one timely example, our information panels on Google and YouTube inform users about where to vote and how to register. We’ve also taken many steps to raise up high-quality journalism, from sending 24 billion visits to news websites globally every month, to our recent $1 billion investment in partnerships with news publishers.

Since our founding, we have been deeply committed to the freedom of expression. We also feel a responsibility to protect people who use our products from harmful content and to be transparent about how we do that. That’s why we set and publicly disclose clear guidelines for our products and platforms, which we enforce impartially. 

We recognize that people come to our services with a broad spectrum of perspectives, and we are dedicated to building products that are helpful to users of all backgrounds and viewpoints.

Let me be clear: We approach our work without political bias, full stop. To do otherwise would be contrary to both our business interests and our mission, which compels us to make information accessible to every type of person, no matter where they live or what they believe.

Of course, our ability to provide access to a wide range of information is only possible because of existing legal frameworks, like Section 230. The United States adopted Section 230 early in the internet’s history, and it has been foundational to U.S. leadership in the tech sector. It protects the freedom to create and share content while supporting the ability of platforms and services of all sizes to responsibly address harmful content.   

We appreciate that this Committee has put great thought into how platforms should address content, and we look forward to having these conversations. 

As you think about how to shape policy in this important area, I would urge the Committee to be very thoughtful about any changes to Section 230 and to be very aware of the consequences those changes might have on businesses and customers.

At the end of the day, we all share the same goal: free access to information for everyone and responsible protections for people and their data. We support legal frameworks that achieve these goals, and I look forward to engaging with you today about these important issues, and answering your questions.

Read More

The Digital Services Act must not harm Europe’s economic recovery

In this extraordinary year, people and businesses are asking more, not less, from technology and technology companies. For many of us, and for many businesses, digital tools have been a lifeline during lockdown, helping us work, shop, find customers, connect with loved ones and get the latest public health information.

Helpful digital tools that serve millions of people don’t happen by accident—they need investment and rules that encourage that investment and innovation.  Twenty years ago, the European Union created a regulatory environment to do just that. Now it’s overhauling those rules, with a comprehensive reform called the Digital Services Act (DSA).  We fully support updating the rules, and think it’s more important than ever that this regulation delivers for European consumers and businesses. 

But a significant part of this reform will impact how digital tools can be built in the future, and by whom. That’s why, earlier this year, we shared our ideas with the European Commission, suggesting ways that existing legislation could be improved and warning of the risks if new rules are poorly designed.

Through the pandemic, people’s use of technology has jumped forward five years, with a 60 percent increase in internet usage. Searches for online shopping and how-to-buy online grew by 200 percent worldwide. Demand for the free digital skills courses that Google offers has increased by 300 percent. And many businesses—like restaurants, fashion designers, retailers and even hairdressers—have embraced digital to survive during painful lockdowns and restrictions. 

Now, just as in every economic downturn of the last 20 years, digital tools will be a vital catalyst for the economic recovery that must come after COVID-19. In rewriting the rules that govern the internet in Europe, the EU has an opportunity to rebuild the foundations so that everybody can thrive online and consumers can benefit from wide choice and lower prices. 

Yet reports suggest that some of the proposals being considered would do the opposite.  They would prevent global technology companies like Google from building innovative digital tools like the ones that people have used through lockdown—and that will help European businesses rebuild their operations. That would be a missed opportunity for Europe as it looks to the post-Covid future.

The DSA will not only affect a handful of global companies, but will also have broader impacts – including on the livelihoods of small business owners across Europe, who use digital services like ours to communicate with their customers, sell their products and services and fuel their growth. 

To take just one example, if you use Google Search to look for  “Thai food nearby,” —Google Maps shows you where the nearest restaurant is located and provides its contact details. And other links let you book a table directly (if local health restrictions allow) or see if you can pick up your meal to take away. 

The DSA could prevent Google from developing such user-centric features. That would clearly have an impact not just on how people use our services, but also on the thousands of restaurants which welcomed millions of diners in Europe using this free feature this year. 

At Google, we put innovation and continuous improvement at the heart of everything we do.  While we support the ambition of the DSA to create clear rules for the next 20 years that support economic growth, we worry that the new rules may instead slow economic recovery. We will advocate strongly for policies that will help ensure innovation and digital tools are at the heart of Europe’s recovery and future success.  

Over the past few months, we’ve seen the power of technology as a tool to bring people together, keep them safe and help them get through difficult times. Now, more than ever, we need to focus not on how to limit innovation by a few companies, but on how the full range of digital tools available can contribute to Europe’s recovery and future economic success. The key to that success? Giving people more, not less. 

Read More

Investing in the next generation of NY tech talent

New York City is my home. I’m a proud graduate and parent of three children in the New York City public school system, and I chose to stay and build my career here. Twelve years ago, after a career on Wall Street, I joined Google and currently serve as Chief Information Officer (CIO) and co-site lead for Google’s growing New York campus. Like me, Google has been fortunate to call New York home and is committed to connecting students, teachers and job seekers to the local tech economy. 

Today, as part of Google’s commitment to the continued growth of our city’s current and future tech workforce, Google.org is announcing $3.5 million in grants to three local organizations: Pursuit, ExpandEd Schools and CS4All

Supporting organizations like these is especially important as the COVID-19 pandemic has unearthed unsettling truths about equity and access to resources, especially in underserved communities of color. As we navigate the short and long term effects of the pandemic, we must come together to create equitable solutions that meet the needs of the moment and provide a strong foundation for the future. This starts by making sure every New Yorker has access to a quality education and the training and resources needed for in-demand jobs—these grantees are working to make this possible. 

Pursuit: Connecting New Yorkers to careers in tech 

Pursuit creates economic opportunity for adults from low-income communities by training them to code and build careers in technology. Their fellows come from groups that are historically underrepresented in tech and are made up of majority Black or Latino people, women, immigrants and those without Bachelor’s degrees. Upon completing the fellowship, they go on to work at top tech companies, increasing their salaries from $18,000 to $85,000 on average. With $2 million in funding from Google.org, Pursuit will build on its work to remove systemic barriers preventing low-income communities from accessing careers in technology and connect 10,000 New Yorkers with jobs in the tech industry. 

 

ExpandEd Schools: Supporting after-school educators  

ExpandEdsupports a strong after-school system that enables students to thrive and educators to grow. Google.org’s $1 million investment in ExpandED Pathways Fellowship Computer Science (CS) track will empower aspiring teachers of color from underserved communities to fulfill their professional goals through a 10-month after-school teaching practicum. Ultimately, this will help increase the number of diverse CS educators in New York City and nationwide.

CS4All: Sustaining Computer Science education in public schools

Computer Science for All (CS4All) began in 2015 as an innovative public-private partnership with the NYC Department of Education to train 5,000 teachers and bring equitable CS education to all 1.1 million public school students in NYC by 2025. As the program hits its halfway point, Google.org is providing $500,000 to fund their CS Leads program facilitated by the Fund for Public Schools. This will help provide more than 200 teachers with a comprehensive leadership training program focused on equity in CS education, peer coaching and in-school leadership.

The creativity and entrepreneurial spirit of New Yorkers is one of the reasons Google calls this city home. And I’m proud that the work we do helps nurture that spirit. Whether it’s standing alongside 26 CEOs from the largest employers in New York to launch the New York Jobs CEO Council with the goal of hiring 100,000 traditionally underserved New Yorkers by 2030, committing to additional hiring efforts focused on Black+ talent in NY or developing alternative pathways into the workforce, we believe tech should be for everyone and we’re committed to making that a reality. 

Read More