Operational resilience continues to be a key focus for financial services firms. Regulators from around the world are refocusing supervisory approaches on operational resilience to support the soundness of financial firms and the stability of the financial ecosystem. Our new white paper discusses the continuing importance of operational resilience to the financial services sector, and the role that a well-executed migration to Google Cloud can play in strengthening it. Here are the key highlights:
Operational resilience in financial services
Financial services firms and regulators are increasingly focused on operational resilience, reflecting the growing dependency that the financial services industry has on complex systems, automation and technology, and third parties.
Operational resilience can be defined as the “ability to deliver operations, including critical operations and core business lines, through a disruption from any hazard”1. Given this definition, operational resilience needs to be thought of as a desired outcome, instead of a singular activity, and as such, the approach to achieving that outcome needs to address a multitude of operational risks including:
Cybersecurity: Continuously adjusting key controls, people, processes and technology to prevent, detect and react to external threats and malicious insiders.
Pandemics: Sustaining business operations in scenarios where people cannot, or will not, work in close proximity to colleagues and customers.
Environmental and Infrastructure: Designing and locating facilities to mitigate the effects of localised weather and infrastructure events, and to be resilient to physical attacks.
Geopolitical: Understanding and managing risks associated with geographic and political boundaries between intragroup and third-party dependencies.
Third-party Risk: Managing supply chain risk, and in particular of critical outsourced functions by addressing vendor lock in, survivability and portability.
Technology Risk: Designing and operating technology services to provide the required levels of availability, capacity, performance, quality and functionality.
Operational resilience benefits from migrating to Google Cloud
There is a growing recognition among policymakers and industry leaders that, far from creating unnecessary new risk, a well-executed migration to public cloud technology over the coming years will provide capabilities to financial services firms that will enable them to strengthen operational resilience in ways that are not otherwise achievable.
Foundationally, Google Cloud’s infrastructure and operating model is of a scale and robustness that can provide financial services customers a way to increase their resilience in a highly commercial way.
Equally important are the Google Cloud products, and our support for hybrid and multi-cloud, that help financial services customers manage various operational risks in a differentiated manner:
Cybersecurity that is designed in, and from the ground up. From encryption by default, to our Titan security chip, to high-scale DOS defences, to the power of Google Cloud data analytics and Security Command Center our solutions help you secure your environment.
Solutions that decouple employees and customers from physical offices and premises. This includes zero-trust based remote access that removes the need for complex VPNs, rapidly deployed customer contact center AI virtual agents, and Google Workspace for best-in-class workforce collaboration.
Globally and regionally resilient infrastructure, data centers and support. We offer a global footprint of 24 regions and 73 zones allowing us to serve customers in over 200 countries, with a globally distributed support function so we can support customers even in adverse circumstances.
Strategic autonomy through appropriate controls. Our recognition that customers and policymakers, particularly in Europe, strive for even greater security and autonomy is embodied in our work on data sovereignty, operational sovereignty, and software sovereignty.
Portability, substitutability and survivability, using our open cloud. We understand that from a financial services firm’s perspective, achieving operational resilience may include solving for situations where their third parties are unable, for any reason, to provide the services contracted.
Reducing technical debt, whilst focusing on great financial products and services. We provide a portfolio of solutions so that financial services firms’ technology organisations can focus on delivering high-quality services and experiences to customers, and not on operating foundational technologies such as servers, networks and mainframes.
We are committed to ensuring that Google Cloud solutions for financial services are designed in a manner that best positions the sector in all aspects of operational resilience. Furthermore, we recognize that this is not simply about making Google Cloud resilient: the sector needs autonomy, sovereignty and survivability. You can learn more about Google Cloud’s point of view on operational resilience in financial services by downloading the white paper.
1. “Sound Practices to Strengthen Operational Resilience”, FRB, OCC, FDIC
Editor’s note: Today’s post comes from Wietse Venema, a software engineer and trainer at Binx.io and the author of the O’Reilly book about Google Cloud Run. In today’s post, Wietse shares how understand the full container lifecycle, and the possible state transitions within it, so you can make the most of Cloud Run.
Serverless platform Cloud Run runs and autoscales your container-based application. You can make the most of this platform when you understand the full container lifecycle and the possible state transitions within it. Let’s review the states, from starting to stopped.
First, some context for those who have never heard of Cloud Run before (if you have, skip down to “Starting a Container”). The developer workflow on Cloud Run is a straightforward, three-step process:
- Write your application using your favorite programming language. Your application should start an HTTP server.
- Build and package your application into a container image.
- Deploy the container image to Cloud Run.
Once you deploy your container image, you’ll get a unique HTTPS endpoint back. Cloud Run then starts your container on demand to handle requests and ensures that all incoming requests are handled by dynamically adding and removing containers. Explore the hands-on quickstart to try it out for yourself.
It’s important to understand the distinction between a container image and a container. A container image is a package with your application and everything it needs to run; it’s the archive you store and distribute. A container represents the running processes of your application.
You can build and package your application into a container image in multiple ways. Docker gives you low-level control and flexibility. Jib and Buildpacks offer a higher-level, hands-off experience. You don’t need to be a container expert to be productive with Cloud Run, but if you are, Cloud Run won’t be in your way. Choose the containerization method that works best for you and your project.
Starting a Container
When a container starts, the following happens:
- Cloud Run creates the container’s root filesystem by materializing the container image.
- Once the container filesystem is ready, Cloud Run runs the entrypoint program of the container (your application).
- While your application is starting, Cloud Run continuously probes port 8080 to check whether your application is ready. (You can change the port number if you need to.)
- Once your application starts accepting TCP connections, Cloud Run forwards incoming HTTP requests to your container.
Remember, Cloud Run can only deploy container images that are stored in a Docker repository on Artifact Registry. However, it doesn’t pull the entire image from there every time it starts a new container. That would be needlessly slow.
Instead, Cloud Run pulls your container image from Artifact Registry only once, when you deploy a new version (called a revision on Cloud Run). It then makes a copy of your container image and stores it internally.
The internal storage is fast, ensuring that your image size is not a bottleneck for container startup time. Large images load as quickly as small ones. That’s useful to know if you’re trying to improve cold start latency. A cold start happens when a request comes in and no containers are available to handle it. In this case, Cloud Run will hold the request while it starts a new container.
If you want to be sure a container is always available to handle requests, configure minimum instances, which will help reduce the number of cold starts.
Because Cloud Run copies the image, you won’t get into trouble if you accidentally delete a deployed container image from Artifact Registry. The copy ensures that your Cloud Run service will continue to work.
When a container is not handling any requests, it is considered idle. On a traditional server, you might not think twice about this. But on Cloud Run, this is an important state:
- An idle container is free. You’re only billed for the resources your container uses when it is starting, handling requests (with a 100ms granularity), or shutting down.
- An idle container’s CPU is throttled to nearly zero. This means your application will run at a really slow pace. That makes sense, considering this is CPU time you’re not paying for.
When your container’s CPU is throttled, however, you can’t reliably perform background tasks on your container. Take a look at Cloud Tasks if you want to reliably schedule work to be performed later.
When a container handles a request after being idle, Cloud Run will unthrottle the container’s CPU instantly. Your application — and your user — won’t notice any lag.
Cloud Run can keep idle containers around longer than you might expect, too, in order to handle traffic spikes and reduce cold starts. Don’t count on it, though. Idle containers can be shut down at any time.
If your container is idle, Cloud Run can decide to stop it. By default, a container just disappears when it is shut down.
However, you can build your application to handle a SIGTERM signal (a Linux kernel feature). The SIGTERM signal warns your application that shutdown is imminent. That gives the application 10 seconds to clean things up before the container is removed, such as closing database connections or flushing buffers with data you still need to send somewhere else. You can learn how to handle SIGTERMs on Cloud Run so that your shutdowns will be graceful rather than abrupt.
So far, I’ve looked at Cloud Run’s happy state transitions. What happens if your application crashes and stops while it is handling requests?
When Things Go Wrong
Under normal circumstances, Cloud Run never stops a container that is handling requests. However, a container can stop suddenly in two cases: if your application exits (for instance due to an error in your application code) or if the container exceeds the memory limit.
If a container stops while it is handling requests, it takes down all its in-flight requests at that time: Those requests will fail with an error. While Cloud Run is starting a replacement container, new requests might have to wait. That’s something you’ll want to avoid.
You can avoid running out of memory by configuring memory limits. By default, a container gets 256MB of memory on Cloud Run, but you can increase the allocation to 4GB. Keep in mind, though, if your application allocates too much memory, Cloud Run will also stop the container without a SIGTERM warning.
In this post, you learned about the entire lifecycle of a container on Cloud Run, from starting to serving and shutting down. Here are the highlights:
- Cloud Run stores a local copy of your container image to load it really fast when it starts a container.
- A container is considered idle when it is not serving requests. You’re not paying for idle containers, but their CPU is throttled to nearly zero. Idle containers can be shut down.
- With SIGTERM you can shut down gracefully, but it’s not guaranteed to happen. Watch your memory limits and make sure errors don’t crash your application.
2020 challenged some of the best laid plans by enterprises. With nearly everything moving online, Covid-19 pushed forward years of digital transformation. DevOps was at the heart of this transformation journey. After all, delivering software quickly, reliably, and safely to meet the changing needs of customers was crucial to adapt to this new normal.
It is unlikely that the pace of modernization will slow down in 2021. As IT and business leaders further drive digital adoption within their organizations via DevOps, the need to quantify the business benefit from a digital transformation remains top of mind. A reliable model is imperative to drive the right level of investments and measure the returns. This is precisely why we wrote How to Measure ROI of DevOps Transformation. This white paper is backed with scientific studies conducted by DevOps Research and Assessment, DORA, with 31,000 professionals worldwide over 6 years to provide clear guidance based on impartial industry data. We found the financial savings of DevOps transformation varies from from $10M to $259M a year.
Looking beyond cost to value
The most innovative companies undertake their technology transformations with a focus on the value they can deliver to their customers. Hence, in addition to measuring cost savings, we show how DevOps done right can be a value driver and innovation engine. Let’s look deeper into how we quantify the cost and value-generating power of DevOps.
Here, we focus on quantifying the cost savings and efficiencies realized by implementing DevOps—for example, how an investment in DevOps reduces costs by cutting the time it takes to resolve outages and avoiding downtime as much as possible.
However, focusing solely on reducing costs can rarely yield systemic, long-term gains; thereby increasing the importance of going beyond cost-driven strategies. The cost savings achieved in year one “no longer count” beyond year two as the organization adjusts to a new baseline of costs and performance. Worse, only focusing on cost savings signals to technical staff their job is potentially at risk due to automation rather than being liberated from drudge work to better drive business growth. This leads to negative effects on morale and productivity.
There are two value drivers in a DevOps transformation, (1) improved efficiency through the reduction of unnecessary rework, and (2) the potential revenue gained by reinvesting the time saved in new offer capabilities.
Adding these cost and value driven categories together, IT and business decision makers can get an estimate of the potential value their organizations can expect to gain from a DevOps transformation. This helps justify the investment needed to implement the required changes. To quantify the impact, we leverage industry benchmark data across low, medium, high, and elite DevOps teams, as described by DORA in its annual Accelerate: State of DevOps report.
Combining cost and value
As an example, let’s consider the impact of a DevOps transformation on a large organization with 8,500 technical staff and a medium IT performer. Using the data gained from the DevOps report, we can calculate both the cost and value driven categories along with total impact.
While this example represents what a medium IT performer at a large organization might expect by investing in DevOps, companies of all sizes and performance profiles can leverage DevOps to drive performance. In the white paper, we calculate the impact of DevOps across organizations of different sizes—small, medium, and large—as well as across four distinct performance profiles—low, medium, high, elite.
There will be variation in these measurements based on your team’s current performance, compensation, change fail rate, benefits multiplier, and deployments per year, so we share our methodology in the white paper and invite you to customize the approach based on your specific needs and constraints.
Years of DORA research show that undertaking a technology transformation initiative can produce sizable returns for any organization. Our goal with the white paper is to provide IT and business decision makers an industry backed, data driven foundational basis for determining their investment in DevOps. Download the white paper here to calculate the impact of DevOps on your organization, while driving your digital transformation.
Security issues continue to disrupt the status quo for global enterprises. Recent incidents highlight the need to re-think our security plans and operations; attackers are getting smarter, attacks are more sophisticated, and assumptions about what is and isn’t locked down no longer hold. The challenge, however, is to enable disruptive innovation in security without disrupting security operations.
Today, we’re excited to announce the general availability of Google’s comprehensive zero trust product offering, BeyondCorp Enterprise, which extends and replaces BeyondCorp Remote Access. Google is no stranger to zero trust—we’ve been on this journey for over a decade with our own implementation of BeyondCorp, a technology suite we use internally to protect Google’s applications, data, and users. BeyondCorp Enterprise brings this modern, proven technology to organizations so they can get started on their own zero trust journey. Living and breathing zero trust for this long, we know that organizations need a solution that will not only improve their security posture, but also deliver a simple experience for users and administrators.
A modern, proven, and open approach to zero trust
Because our own zero trust journey at Google has been ongoing for a decade, we realize customers can’t merely flip a switch to make zero trust a reality in their own organizations, especially given varying resources and computing environments that might look different than ours. Nonetheless, these enterprises understand the zero trust journey is an imperative.
As a result, we’ve invested many years in bringing our customers a solution that is cost-effective and requires minimal disruption to existing deployments and business processes, using trust, reliability and scale as our primary design criteria.
The end result is, BeyondCorp Enterprise, delivering three key benefits to customers and partners:
1) A scalable, reliable zero trust platform in a secure, agentless architecture, including:
Non-disruptive, agentless support delivered through the Chrome Browser, which supports more than 2 billion users worldwide.
Google’s global network with 144 network edge locations, available in more than 200 countries and territories, so that users can work reliably from anywhere.
The entire surface area protected by our scalable DDoS protection service, proven to withstand the largest DDoS attacks recorded (2.5 TB/sec) in recent times.
Built-in, verifiable platform security, which has been made more important with recent software supply chain attacks.
2) Continuous and real-time end-to-end protection
Embedded data and threat protection, newly added to Chrome, to prevent malicious or unintentional data loss and exfiltration and malware infections from the network to the browser.
Strong phishing-resistant authentication to ensure that users are who they say they are.
Continuous authorization for every interaction between a user and a BeyondCorp-protected resource.
End-to-end security from user to app and app to app (including microsegmentation) inspired by the BeyondProd architecture.
Automated public trust SSL certificate lifecycle management for internet-facing BeyondCorp endpoints powered by Google Trust Services.
3) A solution that’s open and extensible, to support a wide variety of complementary solutions
Built on an expanding ecosystem of technology partners in our BeyondCorp Alliance which democratizes zero trust and allows customers to leverage existing investments.
Open at the endpoint to incorporate signals from partners such as Crowdstrike and Tanium, so customers can utilize this information when building access policies.
Extensible at the app to integrate into best-in-class services from partners such as Citrix and VMware.
In short, if cloud-native zero trust computing is the future—and we believe it is—then our solution is unmatched when it comes to providing scale, security and user experience. With BeyondCorp Enterprise, we are bringing our proven, scalable platform to customers, meeting their zero trust requirements wherever they are.
Customers are committed to zero trust
We’ve worked with customers around the world to battle-test our BeyondCorp Enterprise technology and to help them build a more secure foundation for a modern, zero-trust architecture within their organization. Vaughn Washington, VP of Engineering at Deliveroo, a global food delivery company headquartered in the UK, says, “We love that BeyondCorp Enterprise makes it so easy to bring the zero trust model to our distributed workforce. Having secure access to applications and associated data is critical for our business. With BeyondCorp Enterprise, we manage security at the app level, which removes the need for traditional VPNs and associated risks. With BeyondCorp Enterprise and Chrome Enterprise working together, we have additional visibility and controls to help us keep our data secure.”
“We want to improve the experience for our developers and continue to raise the bar on our security posture by adopting a zero trust architecture. Google’s experience with zero trust and the capabilities of BeyondCorp Enterprise made them an ideal partner for our journey,” said Tim Collyer, Director of Enterprise Information Security at Motorola Solutions, Inc.
Support from a robust ecosystem of partners
Our partners are key to our effort to further promote and democratize this technology. The BeyondCorp Alliance allows customers to leverage existing controls to make adoption easier while adding key functionality and intelligence that enables customers to make better access decisions. Check Point, Citrix, CrowdStrike, Jamf, Lookout, McAfee, Palo Alto Networks, Symantec (a division of Broadcom), Tanium and VMware are members of our Alliance who share our vision.
“As we enter a new era of security, enterprises want a seamless security model attuned to the realities of remote work, cloud applications, and mobile communications. Zero trust is that model, and critical to its efficacy is the ability to readily assess the health of endpoints. Who is accessing them? Do they contain vulnerabilities? Are they patched and compliant?” said Orion Hindawi, co-founder and CEO of Tanium. “With Google Cloud, we’re on a journey to offer today’s distributed businesses joint solutions that provide visibility and control into activities across any network to any application for both users and endpoints.”
Matthew Polly, VP WW Alliances, Channels, and Business Development at CrowdStrike said, “In today’s complex threat environment, zero trust security is fundamental for successful protection. BeyondCorp Enterprise customers will be able to seamlessly leverage the power of the CrowdStrike Falcon platform to deliver complete protection through verified access control to their business data and applications and secure their assets and users from the sophisticated tactics of cyber adversaries, including lateral movement.”
“The rapid move to the cloud and remote work are creating dynamic work environments that promise to drive new levels of productivity and innovation. But they have also opened the door to a host of new security concerns and sparked a significant increase in cyberattacks,” said Fermin Serna, Chief Information Security Officer, Citrix. “To defend against them, enterprises must take an intelligent approach to workspace security that protects employees without getting in the way of their experience following the zero trust model. And with Citrix Workspace and BeyondCorp Enterprise, they can do just this.”
Dan Quintas, Sr. Director of Product Management at VMware also added, “Google’s commitment to security is clear and in today’s environment, device access policies are a key piece of the zero trust framework. Using Workspace ONE integrations in BeyondCorp Enterprise, customers can leverage device compliance status information to protect corporate information and ensure their users stay productive and secure.”
We also continue to collaborate with Deloitte’s industry-leading cyber practice to deliver end-to-end architecture, design, and deployment services to assist our customers’ zero-trust journeys.
“Implementing and operationalizing a zero trust architecture is critically important for organizations today,” said Deborah Golden, Deloitte Risk & Financial Advisory Cyber & Strategic Risk leader and principal, Deloitte & Touche LLP. “Both Google Cloud and Deloitte are well positioned to deliver this secure transformative change for our clients and together provide a modern security approach that’s seamless to integrate into existing infrastructures.”
Take the next step
The adoption of zero trust is an imperative for security modernization, and BeyondCorp Enterprise can help organizations overcome the challenges that come with the embrace of such a disruptive innovation. To learn more about BeyondCorp Enterprise, register for our upcoming webinar on Feb 23 and be sure to check out our BeyondCorp product home page.
To learn more about the security features of Chrome Enterprise, including the new threat and data protection features available in BeyondCorp Enterprise, attend our upcoming webinar on January 28 by registering here.
Businesses increasingly gather data to better understand their customers, products, marketing, and more. But unlocking valuable and meaningful insights from that data requires powerful, reliable, and scalable solutions. We hear from our BigQuery and Looker customers that they’ve been able to modernize business intelligence (BI) and allow self-service discovery on the data the business collects. Insights are quickly made available not just to data scientists or data analysts, but to everyone in your organization, including key business decision-makers.
In this post, we hear from several Google Cloud customers who’ve used BigQuery and Looker and how they’re using their data insights to unlock new opportunities.
Data analysis, accelerated
Sunrun, the leader in residential solar power, offers clean, reliable, affordable solar energy and battery storage solutions. With the increasing demand for renewable energy, Sunrun needed a better way to manage their growing volumes of data across installation operations, installed systems, customer operations, and sales.
Their legacy data stack required IT and data team support for almost every internal data request. Sunrun’s legacy Oracle data warehouse wasn’t equipped to scale across growing analytics demands or easily unlock predictive insights, and this limitation led to data silos and conflicts.
After their evaluation process, Sunrun migrated to Google Cloud’s smart analytics platform—including BigQuery and Looker —to reduce extract, transform, and load (ETL) complexity, run fast queries with ease, and make data accessible and trusted throughout the organization.
Optimization of construction processes through insights into productivity and labor data, making planning more efficient and identifying areas of opportunity.
A 50% reduction in data warehouse design time, ETL, and data modeling.
A reduction of their entire data development cycle by more than 60% to enable accelerated decision-making with a modernized, simplified architecture.
An enablement of self-service analytics across their core business through a hub-and-spoke analytics model, ensuring all metrics are governed and trusted.
A unification of metric definitions throughout the company with LookML, Looker’s modeling layer.
Looker dashboards that facilitate regular executive huddles to set and execute data-driven strategies based on a single source of truth.
With Looker, Sunrun was able to bring the IT and business sides of the organization closer together, and improve their ability to recognize trends across their retail business, including the performance and impact of their relationships with major retail partners. Across Sunrun, data is analyzed with the customer’s experience and business goals in mind. Since Sunrun’s migration from their on-premises legacy data stack to a modern cloud environment, they’ve created infrastructure and business-wide efficiencies to help them meet the growing demand for solar power.
Business intelligence you can build upon
After relying upon Excel workbooks for data analysis, Emery Sapp & Sons, a heavy civil engineering company, chose BigQuery and Looker as key components of a new data stack that could scale with their business growth. This unified their wide variety of data sources and provided them with a holistic view of the business. Looker met their need to enable user-friendly self-service across the organization, so that all teams could access and act on accurate data through a business-user friendly interface, all with minimal maintenance.
Pre-built, automated cost and payroll reports in Looker deliver data on schedule in a fraction of the time that Emery Sapp & Sons teams used to spend generating reports.
A weekly profitability and accounts receivable dashboard with real-time data allows them to better predict cash flows and provide guidance on which customers they need to be talking with.
Tracking of Zendesk support tickets in Looker easily shows what’s open, urgent, high priority, pending, and closed, allowing them to identify trends.
Instant access to total outstanding amounts and bills owing reports for the accounts receivable team. Branch managers can sort that information by customer and prioritize follow-up communications.
Now able to visualize the necessary information intuitively, Emery Sapp & Sons can quickly understand and act upon important data. Since modernizing their data stack, they’ve cut hours they once spent on manual activities and freed up time to concentrate on what the data means for their business. They can now focus on strategic initiatives that will fuel their growth and serve their customers.
Advancing care in an uncertain time
Commonwealth Care Alliance (CCA) is a community-based healthcare organization providing and coordinating care for high-need individuals who are often vulnerable or marginalized. At the first signs of COVID-19 last winter, CCA knew their members would need enhanced care and attention. Their staff and clinicians would need reliable data that was available quickly and integrated across many domains and sources.
Fortunately, they had already put in place an advanced analytics platform with BigQuery and Looker, which the CCA data science team has used to deliver valuable information and predictive insights to CCA’s clinicians, and to develop and deploy data ops and machine learning (ML) ops capabilities. All of Google Cloud was available under a single business associate agreement (BAA) to meet CCA’s HIPAA requirements, and BigQuery proved elastic and available as a service. These two features offered reliable platform performance and allowed the small data science team to stay focused and nimble while remaining compliant.
Using a query abstraction and a columnar-based data engine, CCA could adapt to clinicians’ changing needs and provide data and predictive insights via general dashboards and role-specific dashboards—internally referred to as action boards, which help clinicians decide how to react to the specific needs of each member.
Regular updates to BigQuery and Looker from CCA’s internal care management platform and electronic health records.
Quick creation and distribution of custom concepts—such as “high risk for COVID-19”—in Looker’s flexible modeling layer, LookML.
Tailored dashboards allow each clinician and care manager to access data relevant to their members, including recommended actions for coordinated care.
Looker’s user attributes and permissions integrate with data, such as service disruptions, to allow clinicians to understand and react to changing conditions.
Using BigQuery and Looker, CCA’s data science team provides secure, companywide access to trusted data without burdening internal resources. As the COVID-19 pandemic and its effects continue to evolve, CCA continually uses the latest available information to update and guide their member support and care strategies. Now, the data science team can move on to deeper feature engineering and causal inference to enrich the insights delivered to their clinicians and the care provided to their members.
Saving $10,000 a month and more
Label Insight helps retailers and brands stay on top of trends and market demand by analyzing the packaging and labeling of different products. Their customers use this information to inform decisions around repackaging existing products or creating new products that are in line with the latest dietary trends.
Before, with their on-premises legacy BI system, numerous data silos, and cumbersome processes, it became increasingly costly, complicated, and time-consuming to quickly extract helpful insights from the data. Though Label Insight had rich data sets, accessing them would often take one person an entire week of analysis. This process was not scalable, repeatable, or reliable.
Today, Label Insight’s new data platform includes BigQuery as their data warehouse and Looker for business intelligence. When evaluating data warehouse offerings, their executive team found that the more they used BigQuery, the more they’d receive significant benefits and ROI for the company. BigQuery now offers them virtually infinite, cost-effective, scalable storage capacity and unrivalled performance.
With easy-to-set-up dashboards, reporting, and analytics, Looker democratizes data for users across the entire Label Insight organization. Looker also enables governance and control, helping them make use of the high-quality data in BigQuery, and freeing up their data team from constantly managing reporting requests. With Looker’s ability to integrate insights via embedded analytics into its existing applications like Slack, Label Insight can access consistent, accurate data in their favorite task management tools, enabling everyone to continue providing value to their customers.
An ROI of 200%, with a savings of 120 labor hours on reporting per week, which has opened up time and resources for their teams to pursue new initiatives.
A recurring savings amounting to $10,000/month.
An approximately 60% (and growing) user engagement score on the platform, and with the help of their Looker superusers, goals to continue growing that number.
Extract, transform and load (ETL) automation with Fivetran provides quick and easy access to data across their 17 different sources.
Modernizing Label Insight’s data technology stack has transformed their business in all the ways they were hoping for.
Home-run engagement for fans and clubs
The fan data engineering team at Major League Baseball (MLB) is responsible for managing more than 350 data pipelines to ingest data from third-party and internal sources and centralize it in an enterprise data warehouse (EDW). That EDW drives data-related initiatives across the internal product, marketing, finance, ticketing, shop, analytics, and data science departments, and from all 30 MLB Clubs. The team had previously used Teradata as their EDW.
MLB was experiencing issues such as query failures and latency and synchronization problems with their EDW. Providing direct user access was often challenging due to network connectivity restrictions and client software setup issues. With a migration from Teradata to BigQuery completed in 2019, MLB has realized numerous benefits from their modern, cloud-first data warehouse platform.
Side-by-side performance tests run with minimal cost and no commitment. By switching from on-demand to flat-rate pricing, MLB could fix costs, avoid surprise overages, and share unused capacity between departments.
Data democratization boosted by the secure, one-click sharing of datasets with any Workspace user or group.
Access to BigQuery’s web console to review and run SQL queries on data, and to use Connected Sheets to analyze large data sets with pivot tables in a familiar interface.
A 50% increase in query completion speed compared with the previous EDW.
Integrations with several services MLB uses, including Google Ads, Google Campaign Manager, and Firebase.
Integration of BigQuery with Looker, MLB’s new BI tool, which provides a clean and high-performing interface for business users to access and drill into data.
A reduction in operational overhead of the previous database administration.
Support coverage by Google for any major service issues, letting IT teams focus on more strategic work.
MLB can now take a more comprehensive and frictionless approach to using data to serve their fans and the league. Two projects already facilitated by their move to BigQuery and Looker include:
OneView: This initiative compiles over 30 pertinent data sources into a single table, with one row per fan, to facilitate downstream personalization and segmentation initiatives like news article personalization.
Real-time form submission reporting: By using the Google-provided Dataflow template to stream data from Pub/Sub in real time to BigQuery, MLB creates Looker dashboards with real-time reporting on form submissions for initiatives such as their “Opening Day Pick ‘Em” contest. This allows their editorial team to create up-to-the-minute analyses of results.
With MLB’s new data stack up and running, they’re able to serve data stakeholders better than ever before, and can harness their new data-driven capabilities to create better online and in-person experiences for their fans.
Ready to modernize your business intelligence? Explore the combined data analytics solution of BigQuery and Looker.
At Google Cloud, we strive to bring Site Reliability Engineering (SRE) culture to our customers not only through training on organizational best practices, but also with the tools you need to run successful cloud services. Part and parcel of that is comprehensive observability tooling—logging, monitoring, tracing, profiling and debugging—which can help you troubleshoot production issues faster, increase release velocity and improve service reliability.
We often hear that implementing observability is hard, especially for complex distributed applications that are implemented in different programming languages, deployed in a variety of environments, that have different operational costs, and many other factors. As a result, when migrating and modernizing workloads onto Google Cloud, observability is often an afterthought.
Nevertheless, being able to debug the system and gain insights into the system’s behavior is important for running reliable production systems. Customers want to learn how to instrument services for observability and implement SRE best practices using tools Google Cloud has to offer, but without risking production environments. With Cloud Operations Sandbox, you can learn in practice how to kickstart your observability journey and answer the question, “Will it work for my use-case?”
Cloud Operations Sandbox is an open-source tool that helps you learn SRE practices from Google and apply them on cloud services using Google Cloud’s operations suite (formerly Stackdriver). Cloud Operations Sandbox has everything you need to get started in one click:
Demo service – an application built using microservices architecture on modern, cloud-native stack (a modified fork of a Online Boutique microservices demo app)
One-click deployment – automated script that deploys and configures the service to Google Cloud, including:
Service Monitoring configuration
Tracing with OpenTelemetry
Cloud Profiling, Logging, Error Reporting, Debugging and more
Load generator – a component that produces synthetic traffic on the demo service
SRE recipes – pre-built tasks that manufacture intentional errors in the demo app so you can use Cloud Operations tools to find the root cause of problems like you would in production
An interactive walkthrough to get started with Cloud Operations
Launching the Cloud Operations Sandbox is as easy as can be. Simply:
Go to cloud-ops-sandbox.dev
Click on the “Open in Google Cloud Shell” button.
This creates a new Google Cloud project. Within that project, a Terraform script creates a Google Kubernetes Engine (GKE) cluster and deploys a sample application to it. The microservices that make up the demo app are pre-instrumented with logging, monitoring, tracing, debugging and profiling as appropriate for each microservices language runtime. As such, sending traffic to the demo app generates telemetry that can be useful for diagnosing the cloud service’s operation. In order to generate production-like traffic to the demo app, an automated script deploys a synthetic load generator in a different geo-location than the demo app.
It also adds and automatically configures uptime checks, service monitoring (SLOs and SLIs), log-based metrics, alerting policies and more.
At the end of the provisioning script you’ll get a few URLs of the newly created project:
You can follow the user guide to learn about the entire Cloud Operations suite of tools, including tracking microservices interactions in Cloud Trace (thanks to the OpenTelemetry instrumentation of the demo app) and see how to apply the learnings to your scenario.
Finally, to remove the Sandbox once you’re finished using it, you can run
Following SRE principles is a proven method for running highly reliable applications in the cloud. We hope that the Cloud Operations Sandbox gives you the understanding and confidence you need to jumpstart your SRE practice.
William Gibson said it best: “The future is already here—it’s just not evenly distributed.”
The cloud has arrived. Data security in the cloud is too often a novel problem for our customers. Well-worn paths to security are lacking. We often see customers struggling to adapt their data security posture to this new reality. There is an understanding that data security is critical, but a lack of well understood principles to drive an effective data security program. Thus, we are excited to share a view of how to deploy a modern and effective data security program.
Today, we are releasing a new white paper “Designing and deploying a data security strategy with Google Cloud” that accomplishes exactly that. It was written jointly by Andrew Lance of Sidechain (Sidechain blog post about this paper) and Dr. Anton Chuvakin, with a fair amount of help from other Googlers, of course.
Before we share some of our favorite quotes from the paper, let me spend a few more minutes explaining the vision behind it.
Specifically, we wanted to explore both the question of starting a data security program in a cloud-native way, as well as adjusting your existing daily security program when you start utilizing cloud computing.
Imagine you are migrating to the cloud and you are a traditional company. You have some data security capabilities, and most likely you have an existing daily security program, part of your overall security program. Perhaps you are deploying tools like DLP, encryption, data classification and possibly others. Suddenly, or perhaps not so suddenly, you’re migrating some of your data processing and some of your data to the cloud. What to do? Do my controls still work? Are my practices current? Am I looking at the right threats? How do I marry my cloud migration effort and my other daily security effort? Our paper seeks to address this scenario by giving you advice on the strategy, complete with Google Cloud examples.
On the other hand, perhaps you are the company that was born in the cloud. In this case, you may not have an existing data security effort. However, if you plan to process sensitive or regulated data in the cloud, you need to create one. How does a cloud native data security program look like? Which of the lessons learned by others on premise I can ignore? What are some of the cloud-native ways for securing the data?
As a quick final comment, the paper does not address the inclusion of privacy requirements. It is a worthwhile and valuable goal, just not the one we touched in the paper.
Here are some of our favorite quotes from the paper:
“Simply applying a data security strategy designed for on-premise workloads isn’t adequate [for the cloud]. It lacks the ability to address cloud-specific requirements and doesn’t take advantage of the great amount of [cloud] security services and capabilities”
A solid cloud data security strategy should rely on three pillars: “Identity / Access Boundaries / Visibility” (the last item covers the spectrum of assessment, detection, investigation and other monitoring and observability needs)
Useful questions to ponder include ”How does my data security strategy need to change to accommodate a shift to the cloud? What new security challenges for data protection do I need to be aware of in the cloud? What does my cloud provider offer that could streamline or replace my on-premise controls?”
“You will invariably need to confront data security requirements in your journey to the cloud, and performing a “lift and shift” for your data security program won’t work to address the unique opportunities and challenges the cloud offers.”
“As your organization moves its infrastructure and operations to the cloud, shift your data protection strategies to cloud-native thinking.”
At Google Cloud, we strive to accelerate our customers’ digital transformations. As our customers leverage the cloud for business transformation, adapting data security programs to this new environment is essential.
Editor’s note: Arcules, a Canon Company, delivers the next generation of cloud-based video monitoring, access control, and video analytics—all in one unified, intuitive platform. Here, we look at how they turned to Google Cloud SQL’s fully managed services so they could focus more of their engineers’ time on improving their architecture.
As the leading provider of unified, intelligent security-as-a-service solutions, Arcules understands the power of cloud architecture. We help security leaders in retail, hospitality, financial and professional services use their IP cameras and access control devices from a single, unified platform in the cloud. Here, they can gather actionable insights from video analytics to help enable better decision-making. Since Arcules is built on an open platform model, organizations can use any of their existing cameras with our system; they aren’t locked into particular brands, ensuring a more scalable and flexible solution for growing businesses.
As a relatively young organization, we were born on Google Cloud, where the support of open-source tools like MySQL allowed us to bootstrap very quickly. We used MySQL heavily at the time of our launch, though we’ve eventually migrated most of our data over to PostgreSQL, which works better for us from the perspective of both security and data segregation.
Our data backbone
Google Cloud SQL, the fully managed relational database service, plays a significant role in our architecture. For Arcules, convenience was the biggest factor in choosing Cloud SQL. With Google Cloud’s managed services taking care of tasks like patch management, they’re out of sight, out of mind. If we were handling it all ourselves by deploying it on Google Kubernetes Engine (GKE), for example, we’d have to manage the updates, migrations, and more. Instead of patching databases, our engineers can spend time to improve performance of our codes or features of our products or automated our infrastructure in other areas to maintain and adopt an immutable infrastructure. Because we have an immutable infrastructure involving a lot of automation, it’s important that we stay on top of keeping everything clean and reproducible.
Our setup includes containerized microservices on Google Kubernetes Engine (GKE), connecting to the data through Cloud SQL Proxy sidecars. Our services are all highly available, and we use multi-region databases. Nearly everything else is fully automated from a backup and deployment perspective, so all of the microservices handle the databases directly. All five of our teams work directly with Cloud SQL, with four of them building services, and one providing ancillary support.
Our data analytics platform (covering many centuries of video data) was born on PostgreSQL, and we have two main types of analytics—one for measuring overall people traffic in a location and one for heat maps in a location. Because our technology is so geographically relevant, we use the PostGIS plugin for PostgreSQL in intersections, so we can re-regress over the data. In heat mapping, we generate a colorized map over a configurable time period—such as one hour or 30 days—using data that displays where security cameras have detected people. This allows a customer to see, for example, a summary of a building’s main traffic and congestion points during that time window. This is an aggregation query that we run on demand or periodically, whichever happens first. That can be in response to a query to the database, or it can also be calculated as a summary of aggregated data over a set period of time.
We also store data in Cloud SQL for user management, which tracks data starting from UI login. And we track data around management of remote video and other devices, such as when a user plugs a video camera into our video management software, or when adding access control. That is all orchestrated through Cloud SQL, so it’s very essential to our work. We’re moving to have the databases fully instrumented in the deployment pipeline, and ultimately embed site reliability engineering (SRE) practices with the teams as well.
Cloud SQL lets us do what we do best
Geographical restrictions and data sovereignty issues have forced us to reexamine our architecture and perhaps deploy some databases on GKE or Compute Engine, though one thing is clear: we’ll still be deploying any database we can on Cloud SQL. The time we save having Google manage our databases is time better spent on building new solutions. We ask ourselves: how can we make our infrastructure do more for us? With Cloud SQL handling our database management tasks, we’re free to do more of what we’re really good at.
At Google, we’re dedicated to building technology that helps people do more for the planet, and to fostering sustainability at scale. We continue to be the world’s largest corporate purchaser of renewable energy, and in September made a commitment to operate on 24/7 carbon-free energy in all our data centers and campuses worldwide by 2030.
As we’ve shared previously, our commitment to a sustainable future for the earth takes many forms. This includes empowering our partners and customers to establish a disaster recovery (DR) strategy with zero net operational carbon emissions, regardless of where their production workload is.
In this post, we’ll explore carbon considerations for your disaster recovery strategy, how you can take advantage of Google Cloud to reduce net carbon emissions, and three basic scenarios that can help optimize the design of your DR failover site.
Balancing your DR plan with carbon emissions considerations: It’s easier than you think
A DR strategy entails the policies, tools, and procedures that enable your organization to support business-critical functions following a major disaster, and recover from an unexpected regional failure. Sustainable DR, then, means running your failover site (a standby computer server or system) with the lowest possible carbon footprint.
From a sustainability perspective, we frequently hear that organizations have trouble balancing a robust DR approach with carbon emissions considerations. In order to be prepared for a crisis, they purchase extra power and cooling, backup servers, and staff an entire facility—all of which sit idle during normal operations.
In contrast, Google Cloud customers can lower their carbon footprint by running their applications and workloads on a cloud provider that has procured enough renewable energy to offset the operational emissions of its usage. In terms of traditional DR planning, Google Cloud customers don’t have to worry about capacity (securing enough resources to scale as needed) or the facilities and energy expenditure associated with running equipment that may only be needed in the event of a disaster.
When it comes to implementing a DR strategy using Google Cloud, there are three basic scenarios. To help guide your DR strategy, here’s a look at what those scenarios are, plus resources and important questions to ask along the way.
1. Production on-premises, with Google Cloud as the DR site
If you operate your own data centers or use a non-hyperscale data center, like many operated by hosting providers, some of the energy efficiency advantages that can be achieved at scale might not be available to you. For example, an average data center uses almost as much non-computing or “overhead” energy (such as cooling and power conversion) as they do to power their servers.
Creating a failover site on-premises means not only are you running data centers that are not optimized for energy efficiency, but you are operating idle servers in a backup location that is consuming electricity with associated carbon emissions that are likely not offset. When designing your DR strategy, you can avoid increasing your carbon footprint by using Google Cloud as the target for your failover site.
You could create your DR site on Google Cloud by replicating your on-prem environment. Replicating environments means that your DR failover site can directly take advantage of Google Cloud’s carbon-neutral data centers, which offsets the energy consumption and costs of running a DR site on-prem. However, the reality is that if you are just replicating your on-prem environment, there is an opportunity for you to optimize how your DR site will consume electricity. Google Cloud will offset all of the emissions of a DR site running on our infrastructure, but to truly take advantage of operating at the lowest possible carbon footprint, you should optimize the way you configure your DR failover environment on Google Cloud.
To do that, there are three patterns—cold, warm, and hot—that can be implemented when your application runs on-prem and your DR solution is on Google Cloud. Get an in-depth look at those patterns here.
The graph below illustrates how the pattern chosen relates to your “personal” energy use. In this context, we define “personal” energy costs as energy wasted on idle resources.
Optimizing your personal energy use consists of more than offsetting where you run your DR site. It involves thinking about your DR strategy carefully beyond taking the simplest “let’s just replicate everything” approach. Some of the important questions you need to ask include:
Are there some parts of your application that can withstand a longer recovery time objective (RTO) than others?
Can you make use of Google Cloud storage as part of your DR configuration?
Can you get closer to a cold DR pattern, and thus optimize your personal energy consumption?
The elephant in the room, though, is “What if I absolutely need to have resources when I need them? How do I know the resources will be there when I need them? How will this work if I optimize the design of my DR failover site on Google Cloud such that I have minimal resources running until I need them?”
In this situation, you should look into the ability to reserve Compute Engine zonal resources. This ensures resources are available for your DR workloads when you need them. Using reservations for virtual machines also means you can take advantage of discounting options (which we discuss later in this post).
In summary, using Google Cloud as the target for your failover site can help immediately lower your net carbon emissions, and it’s also important to optimize your DR configuration by asking the right questions and implementing the right pattern. Lastly, if your particular use case permits, consider migrating your on-prem workloads to Google Cloud altogether. This will enable your organization to really move the needle in terms of reducing its carbon footprint as much as possible.
2. Production on Google Cloud, with Google Cloud as the DR site
Running your applications and DR failover site on Google Cloud means there are zero net operational emissions to operate both your production application and the DR configuration.
From here, you want to focus on optimizing the design of your DR failover site on Google Cloud. The most optimal pattern depends on your use case.
For example, a full high availability (HA) configuration, or hot pattern, means you are using all your resources. There are no standby resources idling, and you are using what you need, when you need it, all the time. Alternatively, your RTO may not require a full HA configuration, but you can adopt a warm or cold pattern when you need to scale or spin up resources as needed in the event of a disaster or major event.
Adopting a warm or cold pattern means all or some of the resources needed for DR are not in use until you need them. This may lead to the exact same questions we mentioned in scenario #1: What if I absolutely need to have resources when I need them in case of a disaster or major event? How do I know the resources will be there when I need them? How will this work?
A simple solution is, like in the previous scenario, to reserve Compute Engine zonal resources for your workloads when you need them. And since you’re running your production on Google Cloud, you can work with your Google Cloud sales representative to forecast your usage and take advantage of committed use discounts. These are where you purchase compute resources (vCPUs, memory, GPUs, and local SSDs) at a discounted price in return for committing to paying for those resources for one or three years. Committed use discounts are ideal for workloads with predictable resource needs.
Taking advantage of committed use discounts enables Google Cloud to use your forecasting to help ensure our data centers are optimized for what you need, when you need it—rather than Google Cloud over-provisioning and essentially running servers that are not optimally used. Sustainability is a balancing act between the power that is being consumed, what sort of power is in use, and the usage of the resources that are being powered by the data centers.
3. Production on another cloud, with Google Cloud as the DR site
As with running production on-prem, your overall carbon footprint is a combination of what you use outside of Google Cloud and what you’re running on Google Cloud (which is carbon neutral). If you’re running production on another cloud, you should investigate the sustainability characteristics of its infrastructure relative to your own sustainability goals. There are multiple ways to achieve carbon neutrality, and many providers are on different journeys towards their own sustainability goals. For the past three years, Google focused on matching its electricity consumption with renewable energy, and in September 2020 set a target to source carbon-free energy 24/7 for every data center. We believe these commitments will help our cloud customers meet their own sustainability targets.
Regardless of which scenario applies to your organization, using Google Cloud for DR is an easy way to lower your energy consumption. When Google Cloud says we partner with our customers, we really mean it. We meet our customers where they are, and we are grateful for our customers who work with us by forecasting their resource consumption so we know where to focus our data center expansion. Our data centers are designed to achieve net-zero emissions and are optimized for maximum utilization. The resulting benefits get passed to our customers, who in turn can lower their carbon footprint. When it comes to sustainability, we get more done when we work together.
Keep reading: Get more insights that can guide your journey toward 24×7 carbon-free energy. Download the free whitepaper, “Moving toward 24×7 Carbon-Free Energy at Google Data Centers: Progress and Insights.”
We are excited to announce a broad set of new traffic serving capabilities for Cloud Run: end-to-end HTTP/2 connections, WebSockets support, and gRPC bidirectional streaming, completing the types of RPCs that are offered by gRPC. With these capabilities, you can deploy new kinds of applications to Cloud Run that were not previously supported, while taking advantage of serverless infrastructure. These features are now available in public preview for all Cloud Run locations.
Support for streaming is an important part of building responsive, high-performance applications. The initial release of Cloud Run did not support streaming, as it buffered both the request from the client and the service’s response. In October, we announced server-side streaming support, which lets you stream data from your serverless container to your clients. This allowed us to lift the prior response limit of 32 MB and support server-side streaming for gRPC. However, this still did not allow you to run WebSockets and gRPC with either client-streaming or bidirectional streaming.
WebSockets and gRPC bidirectional streaming
With the new bidirectional streaming capabilities, Cloud Run can now run applications that use WebSockets (e.g., social feeds, collaborative editing, multiplayer games) as well as the full range of gRPC bi-directional streaming APIs. With these bidirectional streaming capabilities, both the server and the client keep exchanging data over the same request. WebSockets and bidirectional RPCs allow you to build more responsive applications and APIs. This means you can now build a chat app on top of Cloud Run using a protocol like WebSockets, or design streaming APIs using gRPC.
Here’s an example of a collaborative live “whiteboard” application running as a container on Cloud Run, serving two separate WebSocket sessions on different browser windows. Note the real time updates to the canvases on both windows:
Using WebSockets on Cloud Run doesn’t require any extra configuration and works out of the box. To use client-side streaming or bidirectional streaming with gRPC, you need to enable HTTP/2 support, which we talk about in the next section.
It’s worth noting that WebSockets streams are still subject to the request timeouts configured on your Cloud Run service. If you plan to use WebSockets, make sure to set your request timeout accordingly.
End-to-end HTTP/2 support
Even though many apps don’t support it, Cloud Run has supported HTTP/2 since its first release, including end-to-end HTTP/2 for gRPC. It does so by automatically upgrading clients to use the protocol, making your services faster and more efficient. However, until now, HTTP/2 requests were downgraded to HTTP/1 when they were sent to a container.
Starting today, you can use end-to-end HTTP/2 transport on Cloud Run. This is useful for applications that already support HTTP/2. For apps that don’t support HTTP/2, Cloud Run will simply continue to handle HTTP/2 traffic up until it arrives at your container.
For your service to serve traffic with end-to-end HTTP/2, your application needs to be able to handle requests with the HTTP/2 cleartext (also known as “h2c”) format. We have developed a sample h2c server application in Go for you to try out the “h2c” protocol. You can build and deploy this app to Cloud Run by cloning the linked repository and running:
In the example command above, the “
--use-http2” option indicates that the application supports the “h2c” protocol and ensures the service gets the HTTP/2 requests without downgrading them.
Once you’ve deployed the service, use the following command to validate that the request is served using HTTP/2 and not being downgraded to HTTP/1:
curl -v --http2-prior-knowledge https://<SERVICE_URL>
You can also configure your service to use HTTP/2 in the Google Cloud Console:
With these new networking capabilities, you can now deploy and run a broader variety of web services and APIs to Cloud Run. To learn more about these new capabilities, now in preview, check out the WebSockets demo app or the sample h2c server app.