Google News App
Would you rather listen than read? Check out our new tech blog podcast.
So you’ve created your buckets, and now you want to use the power of the cloud to serve your content. With a can-do attitude and the details of this post, you’ll learn how to get your data into Cloud Storage with a variety of upload methods. Let’s go!
When you upload an object to your Cloud Storage bucket, it will consist of the data you want to store, along with any associated metadata. When it comes to the actual uploading, you’ve got a few different options to choose from, which we’ll go over below. For more detail, check out the documentation. And for general, conceptual information on uploads and downloads, read this.
First, we’ll cover the Cloud Console. This provides you with an in-browser experience where you can easily click to create buckets and folders, and then choose, or drag and drop the files from your local machine to upload.
For production environments, you may want an automated, command line solution.
For this, we provide the gsutil tool. gsutil is a Python application that lets you access Cloud Storage from the command line, providing you with the ability to do all sorts of things like creating buckets, moving objects, or even editing metadata.
To use it, run the gsutil program with a variety of command line options. For example, this command uploads a directory of files from your local machine to your Cloud Storage bucket using parallel upload.
And this command lists out specific objects that have a version-specific URL using a wildcard.
More cool stuff you can do with the gsutil tool can be found inthis documentation.
At some point, you might need to interface with Cloud Storage directly from your code, rather than going out to a command line option. You can include the client libraries into your code and call a simple api to get data into a bucket or folder.
And before you even ask about language, with options in C++, C#, Go, Java, Node.js, PHP, Python, and Ruby—we’ve got you covered.
For example, check out this Python code to upload an object to a Cloud Storage bucket:
Check out even more code samples here.
JSON and XML
And finally, if none of that does the trick, there’s always the JSON and XML APIs, which can let you kick off an HTTP POST request to upload data directly to a bucket or folder. It’s a bit more complex, but it’s there if you need it.
Cloud Storage Transfer Appliance
Now, for you folks with LOTS of data, it’s worth noting that it might not be feasible to upload all of that data directly from your on-prem systems to Google Cloud—for that you can use the Cloud Storage Transfer Appliance.
We ship you a fancy device, you connect it, add your data, and send it back to us. Plus you get this cool looking box on your desk for a while, which can be a great conversation starter, if you’re into that kind of thing. More details here.
More clouds, more problems? Not so!
Don’t worry if your data is in another cloud, we’ve got easy-to-use guides to help you get up and running with supporting a multicloud environment, and getting that data over to Cloud Storage.
Of course, now that the data is in Cloud Storage, you’ve got to figure out the best ways to serve it to your users worldwide. Stay tuned for best practices around getting that data out into the world in our next post.
Posted by Natasha Jaques, Google Research and Michael Dennis, UC Berkeley
The effectiveness of any machine learning method is critically dependent on its training data. In the case of reinforcement learning (RL), one can rely either on limited data collected by an agent interacting with the real world, or a simulated training environment that can be used to collect as much data as needed. This latter method of training in simulation is increasingly popular, but it has a problem — the RL agent can learn what is built into the simulator, but tends to be bad at generalizing to tasks that are even slightly different than the ones simulated. And obviously building a simulator that covers all the complexity of the real-world is extremely challenging.
An approach to address this is to automatically create more diverse training environments by randomizing all the parameters of the simulator, a process called domain randomization (DR). However, DR can fail even in very simple environments. For example, in the animation below, the blue agent is trying to navigate to the green goal. The left panel shows an environment created with DR where the positions of the obstacles and goal have been randomized. Many of these DR environments were used to train the agent, which was then transferred to the simple Four Rooms environment in the middle panel. Notice that the agent can’t find the goal. This is because it has not learned to walk around walls. Even though the wall configuration from the Four Rooms example could have been generated randomly in the DR training phase, it’s unlikely. As a result, the agent has not spent enough time training on walls similar to the Four Rooms structure, and is unable to reach the goal.
Instead of just randomizing the environment parameters, one could train a second RL agent to learn how to set the environment parameters. This minimax adversary can be trained to minimize the performance of the first RL agent by finding and exploiting weaknesses in its policy – e.g. building wall configurations it has not encountered before. But again there is a problem. The right panel shows an environment built by a minimax adversary in which it is actually impossible for the agent to reach the goal. While the minimax adversary has succeeded in its task — it has minimized the performance of the original agent — it provides no opportunity for the agent to learn. Using a purely adversarial objective is not well suited to generating training environments, either.
In collaboration with UC Berkeley, we propose a new multi-agent approach for training the adversary in “Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design”, a publication recently presented at NeurIPS 2020. In this work we present an algorithm, Protagonist Antagonist Induced Regret Environment Design (PAIRED), that is based on minimax regret and prevents the adversary from creating impossible environments, while still enabling it to correct weaknesses in the agent’s policy. PAIRED incentivizes the adversary to tune the difficulty of the generated environments to be just outside the agent’s current abilities, leading to an automatic curriculum of increasingly challenging training tasks. We show that agents trained with PAIRED learn more complex behavior and generalize better to unknown test tasks. We have released open-source code for PAIRED on our GitHub repo.
To flexibly constrain the adversary, PAIRED introduces a third RL agent, which we call the antagonist agent, because it is allied with the adversarial agent, i.e., the one designing the environment. We rename our initial agent, the one navigating the environment, the protagonist. Once the adversary generates an environment, both the protagonist and antagonist play through that environment.
The adversary’s job is to maximize the antagonist’s reward while minimizing the protagonist’s reward. This means it must create environments that are feasible (because the antagonist can solve them and get a high score), but challenging to the protagonist (exploit weaknesses in its current policy). The gap between the two rewards is the regret — the adversary tries to maximize the regret, while the protagonist competes to minimize it.
The methods discussed above (domain randomization, minimax regret and PAIRED) can be analyzed using the same theoretical framework, unsupervised environment design (UED), which we describe in detail in the paper. UED draws a connection between environment design and decision theory, enabling us to show that domain randomization is equivalent to the Principle of Insufficient Reason, the minimax adversary follows the Maximin Principle, and PAIRED is optimizing minimax regret. Below, we show how each of these ideas works for environment design:
What’s interesting about minimax regret is that it incentivizes the adversary to generate a curriculum of initially easy, then increasingly challenging environments. In most RL environments, the reward function will give a higher score for completing the task more efficiently, or in fewer timesteps. When this is true, we can show that regret incentivizes the adversary to create the easiest possible environment the protagonist can’t solve yet. To see this, let’s assume the antagonist is perfect, and always gets the highest score that it possibly can. Meanwhile, the protagonist is terrible, and gets a score of zero on everything. In that case, the regret just depends on the difficulty of the environment. Since easier environments can be completed in fewer timesteps, they allow the antagonist to get a higher score. Therefore, the regret of failing at an easy environment is greater than the regret of failing on a hard environment:
So, by maximizing regret the adversary is searching for easy environments that the protagonist fails to do. Once the protagonist learns to solve each environment, the adversary must move on to finding a slightly harder environment that the protagonist can’t solve. Thus, the adversary generates a curriculum of increasingly difficult tasks.
We can see the curriculum emerging in the learning curves below, which plot the shortest path length of a maze the agents have successfully solved. Unlike minimax or domain randomization, the PAIRED adversary creates a curriculum of increasingly longer, but possible, mazes, enabling PAIRED agents to learn more complex behavior.
But can these different training schemes help an agent generalize better to unknown test tasks? Below, we see the zero-shot transfer performance of each algorithm on a series of challenging test tasks. As the complexity of the transfer environment increases, the performance gap between PAIRED and the baselines widens. For extremely difficult tasks like Labyrinth and Maze, PAIRED is the only method that can occasionally solve the task. These results provide promising evidence that PAIRED can be used to improve generalization for deep RL.
Admittedly, these simple gridworlds do not reflect the complexities of the real world tasks that many RL methods are attempting to solve. We address this in “Adversarial Environment Generation for Learning to Navigate the Web”, which examines the performance of PAIRED when applied to more complex problems, such as teaching RL agents to navigate web pages. We propose an improved version of PAIRED, and show how it can be used to train an adversary to generate a curriculum of increasingly challenging websites:
Above, you can see websites built by the adversary in the early, middle, and late training stages, which progress from using very few elements per page to many simultaneous elements, making the tasks progressively harder. We test whether agents trained on this curriculum can generalize to standardized web navigation tasks, and achieve a 75% success rate, with a 4x improvement over the strongest curriculum learning baseline:
Deep RL is very good at fitting a simulated training environment, but how can we build simulations that cover the complexity of the real world? One solution is to automate this process. We propose Unsupervised Environment Design (UED) as a framework that describes different methods for automatically creating a distribution of training environments, and show that UED subsumes prior work like domain randomization and minimax adversarial training. We think PAIRED is a good approach for UED, because regret maximization leads to a curriculum of increasingly challenging tasks, and prepares agents to transfer successfully to unknown test tasks.
We would like to recognize the co-authors of “Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design”: Michael Dennis, Natasha Jaques, Eugene Vinitsky, Alexandre Bayen, Stuart Russell, Andrew Critch, and Sergey Levine, as well as the co-authors of “Adversarial Environment Generation for Learning to Navigate the Web”: Izzeddin Gur, Natasha Jaques, Yingjie Miao, Jongwook Choi, Kevin Malta, Manoj Tiwari, Honglak Lee, Aleksandra Faust. In addition, we thank Michael Chang, Marvin Zhang, Dale Schuurmans, Aleksandra Faust, Chase Kew, Jie Tan, Dennis Lee, Kelvin Xu, Abhishek Gupta, Adam Gleave, Rohin Shah, Daniel Filan, Lawrence Chan, Sam Toyer, Tyler Westenbroek, Igor Mordatch, Shane Gu, DJ Strouse, and Max Kleiman-Weiner for discussions that contributed to this work.
Posted by Don Turner – Android Developer Relations Engineer
This article takes a look at what’s changed in the Android ecosystem for audio developers recently, the audio latency of popular Android devices, and discusses Android’s suitability for real-time audio apps.
Over the past four years we have taken a number of actions that have improved audio latency.
These actions, coupled with a renewed focus from device manufacturers on audio latency, have led to significant improvements in the device ecosystem. The average latency of the most popular Android phones has dropped to under 40ms, which is well within the range required for real-time applications.
device popularity source: appbrain.com
Digging into the data we can see that in 2017 there was a significant difference between the highest and lowest values (222ms).
device popularity source: appbrain.com
Compare that to the data for 2021. The range has reduced by a factor of 8 to just 28ms, providing a far more consistent audio experience. This is more impressive when you consider that there are now multiple OEMs on the most-popular list, compared to only a single manufacturer in 2017. In addition, many of the devices on the list are not high-end flagship models.
device popularity source: appbrain.com
Up to now I’ve been referring to round-trip audio latency. Round-trip latency involves three components in the audio chain: audio input, audio processing and audio output.
Many real-time audio apps generate audio from screen tap events rather than relying on input audio. These kinds of apps are sensitive to “tap-to-tone” latency – the time taken from tapping on the screen to hearing a sound. The latency introduced by tapping the touch screen is anywhere from 10-35ms, with 20ms being fairly typical on modern Android devices.
To estimate tap-to-tone latency given round-trip latency, you can subtract the audio input latency (typically 5ms), and add the touch latency (typically 20ms). In other words, add 15ms to the round-trip latency. Given the numbers above, this means the average tap-to-tone latency of the most popular android phones is also well under that required for most real-time audio applications.
Looking to the future
Despite the significant reductions in audio latency across the Android ecosystem our work is nowhere near complete. 20ms round-tip latency is required for Android professional audio apps, and 10ms remains the long term goal. And at this time some less popular devices still have high audio latency. However, if you have been holding back on developing an Android app because of audio latency, it might be time to reconsider.
Data sources and tools
various internal data sources
When migrating from on-premises to the cloud, many Google Cloud customers want scalable solutions to detect and alert on higher-layer network anomalies, keeping the same level of network visibility they have on-prem. The answer may be to combine Packet Mirroring with an Intrusion Detection System (IDS) such as the open-source Suricata, or some other preferred threat detection system. This type of solution can provide the visibility you need in the cloud to detect malicious activity, alert, and perhaps even implement security measures to help prevent subsequent intrusions.
However, design strategies for Packet Mirroring plus IDS can be confusing, considering the number of available VPC design options. For instance, there’s Google’s global VPC, Shared VPCs and VPC Peerings. In this blog, we’ll show you how to use Packet Mirroring and virtual IDS instances in a variety of VPC designs, so you can inspect network traffic while keeping the ability to use the supported VPC options that Google Cloud provides.
Packet Mirroring basics
But first, let’s talk some more about Packet Mirroring, one of the key tools for security and network analysis in a Google Cloud networking environment. Packet Mirroring is functionally similar to a network tap or a span session in traditional networking: Packet Mirroring captures network traffic (ingress and egress) from select “mirrored sources,” copies the traffic, and forwards the copy to “collectors.” Packet Mirroring captures the full payload of each packet, not just the headers. Also, because Packet Mirroring is not based on any sampling period, you can use it for in-depth packet-level troubleshooting, security solutions, and application-layer network analysis.
Packet Mirroring relies on a “Packet Mirroring policy” with five attributes:
Mirrored traffic (filter)
Here’s a sample Packet Mirroring policy:
When creating a Packet Mirroring policy, consider these key points:
Mirrored sources and collectors must be in the same region, but can be in different zones—or even different VPCs or projects.
Collectors must be placed behind an Internal Load Balancer (ILB).
Mirrored traffic consumes additional bandwidth on the mirrored sources. Size your instances accordingly.
The collectors see network traffic at Layer 3 and above the same way that the mirrored VMs see the traffic. This includes any NATing and/or SSL decryption that may occur at a higher layer within Google Cloud.
There are two user roles that are especially relevant for creating and managing Packet Mirroring:
“compute.packetMirroringUser”– This role allows users rights to create, update, and delete Packet Mirroring policies. This role is required in the project where the Packet Mirroring Policy will live.
“compute.packetMirroringAdmin”– This role allows users to mirror the desired targets to collect their traffic.
Using Packet Mirroring to power IDS
An IDS needs to see traffic to be able to inspect it. You can use Packet Mirroring to feed traffic to a group of IDSs; this approach has some significant benefits over other methods of steering traffic to an IDS instance. For example, some cloud-based IDS solutions require special software (i.e., an agent) to run on each source VM, and that agent duplicates and forwards traffic to the IDS. With Packet Mirroring, you don’t need to deploy any agents on VMs and traffic is mirrored to IDS in a cloud-native way. And while an agent-based solution is fully distributed and prevents network bottlenecks, it requires that the guest operating system support the software. Furthermore, with an agent-based solution, CPU utilization and network traffic on the VM will most certainly increase because the guest VM and its resources are tasked with duplicating traffic. High CPU utilization related to network throughput is a leading contributor to poor VM performance.
Another common approach is to place a virtual appliance “in-line” between the network source and destination. The benefit of this design is that the security appliance can act as an Intrusion Prevention System (IPS) and actually block or deny malicious traffic between networks. However, an in-line solution, where traffic is routed through security appliances, doesn’t capture east-west traffic within VMs in the same VPC. Because subnet routes are preferred in a VPC, in-line solutions which are fed traffic via static routes, can’t alert on intra-VPC traffic. Thus, a large portion of network traffic is left unanalyzed; a traditional in-line IDS/IPS solution only inspects traffic at a VPC or network boundary.
Packet Mirroring solves both these problems. It doesn’t require any additional software on the VMs, it’s fully distributed across each mirrored VM, and traffic duplication happens transparently at the SDN layer. The Collector IDS is placed out-of-path behind a load balancer and receives both north-south traffic and east-west traffic.
Using Packet Mirroring in various VPC configurations
Packet Mirroring works across a number of VPC designs, including:
Single VPC with a single region
Single VPC with multiple regions
Here are a few recommendations that apply to each of these scenarios:
Use a unique subnet for the mirrored instances and collectors. This means if the mirrored sources and the collectors are in the same VPC, create multiple subnets in each region. Place the resources that need to be mirrored in one subnet and place the collectors in the other. There is no default recommended size for the collector subnet, but make sure to allocate enough space for all the collectors that might be in that region plus a little more. Remember, you can always add additional subnets to a region in Google Cloud.
Don’t assign public IPs to virtual IDS instances. Rather, use CloudNAT to provide egress Internet access. Not assigning a public IP to your instances helps them from being exposed externally to traffic from the internet.
If possible, use redundant collectors (IDS instances) behind the ILB for high availability.
Now, let’s take a look at these designs one by one.
Single VPC with a single region
This is the simplest of all the supported designs. In this design, all mirrored sources exist in one region in a standard VPC. This is most suitable for small test environments or VPCs where network management is not dedicated to a networking team. Note that the mirrored sources, Packet Mirroring policy, collector ILB and the IDS instances, are all contained to the same region and same VPC. Lastly, CloudNAT is configured to allow the IDS instances internet access. Everything is contained in a single region, single VPC, and single project.
Single VPC with multiple regions
Because mirrored instances and collectors must be in the same region, it stands to reason that a VPC that contains subnets in multiple regions needs multiple collectors, multiple ILBs and multiple Packet Mirroring policies. To account for multiple regions, simply stamp out a similar deployment to the one above multiple times. We still recommend using CloudNAT.
The following example shows a single VPC that spans two different regions, however, a similar architecture can be used for a VPC with any number of regions.
Packet Mirroring also supports Shared VPC. In this example, the collectors (IDSs), ILB and the Packet Mirroring policy all exist inside the host project. The collectors use their own non-shared subnet. The mirrored sources (WebServers), however, exist inside their service project using a shared subnet from the Shared VPC. This allows the deployment of an IDS solution to be left up to the organization’s cloud network operations group, freeing application developers to focus on application development. CloudNAT is configured to allow the IDS instances Internet access.
Packet Mirroring also supports when collectors and mirrored sources are in different VPCs that are peered together, such as in a hub-and-spoke design. The same requirements for mirroring traffic between VPCs are applicable. For example, the collector and mirrored sources must be in the same region. In the below example, the mirrored sources (WebServers) and the Packet Mirroring policy exist in VPC_DM_20 in the DM_20 project. On the other side, the ILB and collectors (IDSs) exist in the peered VPC named VPC_SECURITY in the DM_IDS project. This allows the users in the source VPC to selectively choose what traffic is forwarded to the collector across the VPC peering. CloudNAT is configured to allow the IDS instances internet access. Keep in mind the Packet Mirroring role requirements between the different projects. Proper IAM permissions must be configured.
Don’t sacrifice network visibility
Using Packet Mirroring to power a cloud IDS solution, whether it’s open-source or proprietary, is a great option that many Google Cloud customers use. The key is where to place your collectors, ILBs and the Packet Mirroring policy itself—especially when you use a more advanced VPC design. Once multiple VPCs and GCP projects get introduced into the deployment, the implementation only becomes more complex. Hopefully, this blog has shown you how to use Packet Mirroring with an IDS in some of the more common VPC designs. For a hands-on tutorial, check out QwikLabs’ Google Cloud Packet Mirroring with OpenSource IDS, which walks you through creating a VPC, building an IDS instance, installing Suricata and deploying Packet Mirroring.
Today we’re excited to announce the release of an open source connector to read streams of messages from Pub/Sub Lite into Apache Spark. Pub/Sub Lite is a scalable, managed messaging service for Spark users on GCP who are looking for an exceptionally low-cost ingestion solution. The connector allows you to use Pub/Sub Lite as a replayable source for Structured Streaming’s processing engine with exactly-once guarantees1 and ~100ms processing latencies. The connector works in all Apache Spark 2.4.X distributions, including Dataproc, Databricks, or manual Spark installations.
What is Pub/Sub Lite?
Pub/Sub Lite is a recently released, horizontally scalable messaging service that lets you send and receive messages asynchronously between independent applications. Publisher applications publish messages to a Pub/Sub Lite topic, and subscriber applications (like Apache Spark) read the messages from the topic.
Pub/Sub Lite is a zonal service. While you can connect to Pub/Sub Lite from anywhere on the internet, running publisher and subscriber applications in the same zone as the topic they connect to will help minimize networking egress cost and latency.
A Lite topic consists of a pre-configured number of partitions. Each partition is an append-only timestamped log of messages. Each message is an object with several fields, including message body, a user-configurable
event_timestamp, and an automatically set
publish_timestamp based on when Pub/Sub Lite stores the incoming message. A topic has a throughput and storage capacity that the user configures. To configure the topic capacity, you will have to consider a handful of properties, such as the number of partitions, storage/throughput capacity for each partition, and message retention period.
The Pub/Sub Lite pricing model is based on provisioned topic throughput and storage capacity. Plan to provision enough capacity to accommodate peaks in traffic; then, as your traffic changes, you can adjust the throughput and storage capacity of your topics. Pub/Sub Lite’s Monitoring metrics let you easily detect conditions when you need to increase your capacity. Start by creating alerting policies that will notify you when your backlog is growing unexpectedly:
subscription/backlog_quota_bytes should be comfortably lower than
topic/storage_quota_byte_limit. If a subscription exceeds the storage capacity, the Pub/Sub Lite service removes the oldest message from the partition, regardless of the message retention period for the oldest message. You should also set up alerts for
topic/subscribe_quota_utilization to make sure publish/subscribe throughputs are comfortably below limit.
Pub/Sub Lite scales vertically by allowing you to increase the throughput capacity of each partition in increments of 1MiB/s. You can increase the number of partitions in a topic as well, but this will not preserve the order of messages. The connector v0.1.0 will require you to restart with a new subscription on repartitioning, but we plan to remove this limitation soon—please keep an eye on the release notes. When starting with Pub/Sub Lite, it’s best practice to slightly overprovision the number of partitions so that the per-partition publishing and subscribing throughput capacities can be set to the lower bounds of 4 MiB/s and 8 MiB/s, respectively. As the application traffic increases, you can update the Lite topic to increase both the publishing and subscribing capacities up to 16 MiB/s and 32 MiB/s per partition, respectively. You can adjust publish and subscribe throughput capacity of a partition independently.
Architecture for Pub/Sub Lite + Structured Streaming
Pub/Sub Lite is only a part of a stream processing system. While Pub/Sub Lite solves the problem of message ingestion and delivery, you’ll still need a message processing component.
Apache Spark is a popular processing framework that’s commonly used as a batch processing system. Streaming processing was introduced in Spark 2.0 using a micro-batch engine. The Spark micro-batch engine processes data streams as small batch jobs that periodically read new data from the streaming source, then run a query or computation on it. The time period for each micro-batch can be configured via triggers to run at fixed intervals. The number of tasks in each Spark job will be equal to the number of partitions in the subscribed Pub/Sub Lite topic. Each Spark task will read the new data from one Pub/Sub Lite partition, and together create a streaming DataFrame or Dataset.
Each Different Structure Streaming pipeline must have its own independent subscription. Note that all subscriptions attached for one topic share the subscribing throughput capacity of that topic.
The connector also supports Spark’s experimental continuous processing mode. In this mode, the connector is designed to map each topic partition to a long-running Spark task. Once the job is submitted, the Spark driver will instruct the executors to create long-running tasks, each with a streaming connection to a different partition within the topic. Note that this mode is not yet considered production-ready; it only supports limited queries and provides only at-least-once guarantees.
Using Pub/Sub Lite with Spark Structured Streaming
Processing streams of data in Pub/Sub Lite with Spark is as simple as the Python script below. For a detailed guide to run a full Java end-to-end word count sample in Dataproc, please refer to the GitHub Readme.
First, instantiate a Spark Session object and read in a Dataframe from the Pub/Sub Lite subscription:
The following snippet processes the stream in two-second-long batches and prints the resulting messages to the terminal:
In practice, you’ll perform transformations on this data. To do this, you will need to consider the schema of the DataFrame:
A common transformation from BinaryType to StringType is as follows:
Benchmarks for throughput performance
To get a sense of the throughput performance of the connector, as well as Pub/Sub Lite itself, we turned up an example pipeline in a Dataproc YARN cluster. In the example, the pipeline consumed backlogs from Pub/Sub Lite with no further processing. The Dataproc YARN cluster consisted of one master node and two worker nodes. All nodes were n1-standard-4 machines (4 vCPUs, 15GB memory). All messages were 1 KiB. The
total spark process throughput was calculated using processedRowsPerSecond per batch, and
spark process throughput per partition was calculated with
total spark process throughput divided by the number of partitions.
Note that for 25 partitions, the workers were overloaded, and since the processing wall time per batch was determined by the slowest partition, the processedRowsPerSecond dropped dramatically. We can see that this drop is correlated with CPU saturation by looking at CPU utilization:
For basic read operation as a baseline, it’s recommended to have 12 partitions (8 MiB/s subscribe throughput each) in a cluster with 8 CPUs. This suggests an approximate rule of thumb: a single n1-standard-series vCPU can handle 12 MiB/s of read throughput. Any significant processing of messages will decrease this capacity.
The benchmark above did not consider memory allocation. In practice, long trigger time or spiky traffic could lead to large micro batches, requiring more memory. Also, complex queries such as aggregation and extended watermarks would require more memory.
We hope you’ll find Pub/Sub Lite to be a useful service for your streaming applications. Please give the connector and Pub/Sub Lite a try following the full set of directions here. We would be grateful for feedback and bug reports submitted as GitHub Issues. We also welcome code contributions to this open source project.
1. Pub/Sub Lite connector as source is compatible with exactly-once guarantee. It needs an idempotent sink to ensure exactly-once guarantee.
March is Women’s History Month – a time for us to come together and celebrate women-led startups and the amazing work they are doing in the tech industry. Our first of four features highlights the founders of PacketFabric—Jezzibell Gilmore and Anna Claiborne—and how they utilized Google Cloud to build their telecommunications startup.
Our rapidly-growing global startup, PacketFabric, was built with the vision to redefine the networking industry, and change the way that businesses connect to the world. In a heavily male dominated field, we joined together to break through the technology industry, and hope to inspire, hire and empower women along the way.
As once co-workers, and also long-time customers in telecom, we constantly felt the pain and frustrations of the archaic ways of the industry. Legacy telecom service providers had failed to evolve network infrastructure. That’s where the idea of PacketFabric was born – a globally interconnected Network-as-a-Service (NaaS) platform that helps digital businesses connect. Since launching out of stealth in 2017, PacketFabric is now powering top enterprises throughout the globe – and we’re just getting started.
Our team was always built as a fully remote company, and employees live by our motto “Automate everything, all the time” to innovate together and disrupt the entrenched internet infrastructure industry.
PacketFabric is a private network that provides secure, reliable on-demand services between customers and their cloud service providers, whether private or public. We help our customers automate their network connectivity in order to move their data from one place to another. With this connectivity, customers can build, provision, and change their network infrastructure quickly and painlessly, while saving time and money. Using our network, customers cut down the sales and provisioning time from 60 days to just 60 seconds. Google Cloud tools play a crucial role in driving this improved performance. Additionally, since we are a Google Cloud Direct Interconnect Partner, our customers can connect to Google Cloud for additional services such as data storage from any location.
Some of our customers include leading companies in pharmaceuticals, financial services, technology, media and entertainment, and professional sports leagues, among many other verticals in the enterprise. With the help of Google Cloud, we are able to provide our industry leading customers with industry leading telecoms solutions.
PacketFabric + Google Cloud: A combination for success
As an early adopter of BigTable, we have leveraged Google Cloud’s easy-to-use tools since the start of our company. As we scaled our business, we migrated to CloudSQL for PostgreSQL to capture our massive amounts of telemetry data and bring added value to our customers. PostgreSQL allows us to store customer metrics and create customized solutions, gained from insight data across multiple customer clouds. Our customers now receive detailed, real-time insight into how their network services are performing, giving them more control over their network.
Along with CloudSQL, we get the same serverless capabilities with tools such as Container Registryand Cloud Storage. We use Container Registry to store, manage, and secure our container images. And Cloud Storage so we can easily transfer data between clouds and have easy access to that data.
Just like CloudSQL, Container Registry and Cloud Storage can automatically scale up or down depending on demand, so we pay only for what we use. Most importantly, given Container Registry’s extensible architecture, we can also easily connect it with our existing CI/CD process. Lastly, with the built-in GKE integration, we get access to unique industry-first capabilities such as release channels, multi-cluster support, 4-way auto scaling, along with node auto repair to help improve availability.
Google’s Identity-Aware Proxy is another critical feature in keeping our customer data secure. We can control exactly who has access to what data, as well as access to our Cloud-based applications and VMs running on Google Cloud. This ensures that our multitude of GKE clusters are safe and secure.
Helping Improve Diversity of Women in Tech
From our time in the industry and researching the lack of women in tech, we have found that the problem starts early. Young girls are discouraged from pursuing tech as early as 8-12 years old. As a result, women earn only 18% of Computer Science degrees in the United States.
Increasing representation of women in the tech industry has been top of mind for many of us over the past few years, especially at PacketFabric. It is very important for young girls and women to see themselves represented in tech to empower them to pursue more technical roles and study for STEM degrees. As women co-founders, we hope to serve as supportive voices and positive role models for women of all ages looking to join the tech industry. We hope that seeing more women co-founders like us encourages all women to see the immense power and capacity we have to disrupt the tech industry.
If you want to learn more about how Google Cloud can help your startup, visit our startup page here where you can apply for our Startup Program, and sign up for our monthly startup newsletter to get a peek at our community activities, digital events, special offers, and more.
TL;DR – More than just alerts, budgets can also send notifications to Pub/Sub. Once they’re in Pub/Sub, you can hook up all kinds of services to react to them. You can use the information about the budget along with some code to do just about anything.
So, we’ve talked about how to set up a budget and how to add more emails to a budget alert. That’s great, but it’s also been limited so far to just getting alerts based on those thresholds. What if you wanted to do something more, like integrate another service or actually take action on a budget alert?
Good news: you can use programmatic budget notifications to do exactly that!
Bad news: programmatic budget notifications is really hard to say 5 times fast.
Let’s look at how to set them up (it’s more than one checkbox this time) and start to look at what we can do with them!
Pub/Sub saves the day
Before you update any budgets, you should first create a Pub/Sub topic. If you’re not familiar with Pub/Sub, check out this page to learn more. In short, it’s a tool that helps you handle messages between publishers and subscribers (hence the name). We’re gonna keep things super simple and just use one topic that can have any number of publishers (things that send it messages) and any number of subscribers (things that can receive messages).
In this case, the event publisher will be your budget, and we’ll come back to add the subscribers later. For now, you can find Pub/Sub using the left-nav. Remember from that my last post that you’ll need a project to have Pub/Sub in, but you can always use the one you used previously for the workspace!
Let’s keep things simple, so use that Create Topic button at the top to create a new topic. You can name it something like “budget-notification-topic” if you want to be appropriately verbose. Leave the encryption key option as-is (unless you want this blog post to be even longer) and create the topic. You should see a screen that gives you the full name of the topic and then you’re good to go!
Now head back to your budgets and either create a new one or edit an existing one. The checkbox we’re looking for is right under the one we used in the last post and looks like this:
Check that box and then choose the topic you just made (you may need to select your project first). Then hit save and you’re good to go!
What’s in a notification anyway?
You’ve set up a publisher (your budget) that will send events to your topic, but what does that actually mean? For starters, the budget is going to send notifications multiple times a day to your topic, and they’ll look something like this:
This is just a sample of the message with a subset of properties
Here’s the full notification format if you want to see more, but we’re mainly going to focus on a few key properties.
costAmountis the current cost against that budget, for whatever filters you chose (such as just Compute Engine products, or just your dev projects)
budgetAmountis the amount you’ve configured for the budget, and
LAST_MONTH_COSTdepending on how you set the budget up
costIntervalStartis the start of the current time period where costs are being measured, which will be the start of the month
alertThresholdExceededis the last threshold that has been passed based on the ones you’ve set up. If you want a refresher on thresholds, check out the first post
budgetDisplayNameis the name of the budget, but you can actually get the unique ID of the budget through some extra metadata (that we’ll come back to later)
So with these basic properties, we get a lot of information about the budget! On top of that, we’ll get this notification multiple times a day (last time I checked I got it over 40 times scattered throughout a day) so we’ll always get pretty up-to-date information.
Note: Even though the notifications come in pretty consistently, cost data can still take some time to be reported from the resource level. The budget information will be up to date with the best information it has, but plan accordingly.
Another important note is that this notification doesn’t interfere with your threshold alerts. You can keep all of those the same and you’ll still get your alerts in the same way, plus these notifications will be sent to your Pub/Sub topic.
Well that’s fine and dandy, but now we need to actually do something with the notification. So, let’s use the lightweight Cloud Functions to be a subscriber of our topic.
Cloud Functions saves the day
Use the left-nav to head to find Cloud Functions and head there.
Just like Pub/Sub, you’ll need to have a project (and you’ll need to make sure you have billing enabled). You can use the same project for your workspace, Pub/Sub, and Functions related to budgets to help keep things organized.
Once again, let’s keep things simple and focus on creating a lightweight function that just receives a message. Here’s a guide on creating a Python function if you want to dive deeper. Create a new function and name it “budget-notification-logger” and choose whatever region you’d like. The key part is to choose the Pub/Sub trigger and then select the topic you created earlier, then hit save.
On the second step, we’ll keep the function code super simple just to know we received a notification. I’ll show you the code in Python 3.7 but it should be easy to do in your language of choice. So, choose the Python 3.7 runtime and leave the entry point as
Note: You may see a notification to enable the Cloud Build API, which is required to deploy certain functions. Follow the path to enable it and then go back to the function when it’s ready.
The sample code should be perfect for what we need, which is just some code that receives a message and then print it out. Go ahead and deploy the function as-is!
Pub/Sub + Cloud Functions actually save the day
The function is ready to go, but now we need to actually make sure it’s working. If you click on the three dots (or context menu if you want to call it that) on the right-side, you can click “View logs” to see the logs for the function, including our print statement.
The log viewer should show that you’ve created the function. You can sit here and wait for a budget notification to come in, but it could take a while. In order to make sure everything is working, we can send a test message in Pub/Sub. In a new tab/window, head back to the Pub/Sub page and click on your specific topic. At the top of the screen, click on that Publish Message button.
Once again, we’ll keep things simple and just send the sample notification from before to your topic, which you should be able to copy and paste as-is. In this case, we’re publishing a test message to make sure everything is working, but ultimately your budget should start sending regular notifications as well.
Once you click Publish, head back to your tab/window that was showing the logs for your function. You may need to wait a few seconds before the log interface picks it up and you can click the button at the bottom to load newer logs to pick it up. After a bit, you should see something that looks like this:
Success! We can see that our message was sent from Pub/Sub to the function and we simply printed it to the logs. If you check back on the logs page later, you should also see messages from your actual budget with real data come through.
With the power of code, there’s a lot more we can do based on our budget. In the next post, we’ll walk through a more useful action by sending our budget to Slack. Meanwhile, here’s the documentation if you want to read more about programmatic budget notifications!
Quick launch summary
|Responding to invitations via email will now require you to be signed in|
- Admins: There is no admin control for this feature.
- End users: There is no end user setting for this feature. Visit the Help Center to learn more about responding to event invitations.
- Rapid Release and Scheduled Release domains: Extended rollout (potentially longer than 15 days for feature visibility) starting on March 4, 2021
- Available to Google Workspace Essentials, Business Starter, Business Standard, Business Plus, Enterprise Essentials, Enterprise Standard, Enterprise Plus, Education Fundamentals, and Education Plus, as well as G Suite Basic, Business, and Nonprofits customers
Posted by Joe Hicks, Yun Peng, Olek Wojnar
Google and Debian work together to make COVID-19 researchers’ lives easier
- Bazel is now available as an easy to install package distributed on Debian and Ubuntu.
- Tensorflow packaging for Debian is progressing.
Olek Wojnar, Debian Developer, reached out to the Bazel team about packaging and distributing Bazel on Debian (and other Linux distributions such as Ubuntu) in service of delivering Tensorflow Machine Learning functionality for COVID-19 researchers:
“I’m working with the Debian Med team right now to get some much-needed software packaged and available for users in the medical community to help with the COVID-19 pandemic. At least one of the packages we desperately need requires Bazel to build. Clearly this is an unusual and very critical situation. I don’t think it’s an exaggeration to say that lives may literally depend on us getting better tools to the medical professionals out there, and quickly. The entire international community would be extraordinarily grateful if @google and the @bazelbuild team could prioritize helping with this!”
The Bazel team jumped in to help Olek and the COVID-19 research community. Yun Peng, Software Engineer at Google with Olek Wojnar led the team of Bazel and Debian volunteers to move the project forward. The joint effort between Debian and Google has produced some great results, including packaging the Bazel bootstrap variant in 6 months time (Debian 11 — released in Late 2021; Ubuntu 21.04 — 22 April 2021). Bazel is now available as an easy to install package distributed on Debian and Ubuntu. The extended Google team continues to work with Debian towards the next step of packaging and distributing Tensorflow on Debian and other Linux distributions.
In addition to Yun and Olek, other contributors to this project include Michael R. Crusoe of Debian, Joe Hicks, John Field, Philipp Wollermann, and Tobias Werth of Google.