Google News App
A guest post by Chris Knorowski, SensiML CTO
TinyML reduces the complexity of adding AI to the edge, enabling new applications where streaming data back to the cloud is prohibitive. Some examples of applications that are making use of TinyML right now are :
- Visual and audio wake words that trigger an action when a person is detected in an image or a keyword is spoken .
- Predictive maintenance on industrial machines using sensors to continuously monitor for anomalous behavior.
- Gesture and activity detection for medical, consumer, and agricultural devices, such as gait analysis, fall detection or animal health monitoring.
One common factor for all these applications is the low cost and power usage of the hardware they run on. Sure, we can detect audio and visual wake words or analyze sensor data for predictive maintenance on a desktop computer. But, for a lot of these applications to be viable, the hardware needs to be inexpensive and power efficient (so it can run on batteries for an extended time).
Fortunately, the hardware is now getting to the point where running real-time analytics is possible. It is crazy to think about, but the Arm Cortex-M4 processor can do more FFT’s per second than the Pentium 4 processor while using orders of magnitude less power. Similar gains in power/performance have been made in sensors and wireless communication. TinyML allows us to take advantage of these advances in hardware to create all sorts of novel applications that simply were not possible before.
At SensiML our goal is to empower developers to rapidly add AI to their own edge devices, allowing their applications to autonomously transform raw sensor data into meaningful insight. We have taken years of lessons learned in creating products that rely on edge optimized machine learning and distilled that knowledge into a single framework, the SensiML Analytics Toolkit, which provides an end-to-end development platform spanning data collection, labeling, algorithm development, firmware generation, and testing.
So what does it take to build a TinyML application?
Building a TinyML application touches on skill sets ranging from hardware engineering, embedded programming, software engineering, machine learning, data science and domain expertise about the application you are building. The steps required to build the application can be broken into four parts:
- Collecting and annotating data
- Applying signal preprocessing
- Training a classification algorithm
- Creating firmware optimized for the resource budget of an edge device
This tutorial will walk you through all the steps, and by the end of it you will have created an edge optimized TinyML application for the Arduino Nano 33 BLE Sense that is capable of recognizing different boxing punches in real-time using the Gyroscope and Accelerometer sensor data from the onboard IMU sensor.
What you need to get started
We will use the SensiML Analytics Toolkit to handle collecting and annotating sensor data, creating a sensor preprocessing pipeline, and generating the firmware. We will use TensorFlow to train our machine learning model and TensorFlow Lite Micro for inferencing. Before you start, we recommend signing up for SensiML Community Edition to get access to the SensiML Analytics Toolkit.
- We will use the SensiML Open Gateway, an open-source python application to stream data from edge devices.
- We will use the SensiML Data Capture Lab (Windows 10) to record and label the sensor data.
- We will use Google Colab to train our model using TensorFlow Lite for Microcontrollers
- We will use the SensiML Analytics Studio for offline validation and code generation of the firmware
- We will use Visual Studio Code with the Platform IO extension to flash the firmware.
- Arduino Nano 33 BLE Sense
- Adafruit Li-Ion Backpack Add-On (optional)
- Lithium-Ion Polymer Battery ( 3.7v 100mAh)
- Zebra Byte Case
- Glove and Double Sided Tape
The Arduino Nano 33 BLE Sense has an Arm Cortex-M4 microcontroller running at 64 MHz with 1MB Flash memory and 256 KB of RAM. If you are used to working with cloud/mobile this may seem tiny, but many applications can run in such a resource-constrained environment.
The Nano 33 BLE Sense also has a variety of onboard sensors which can be used in your TinyML applications. For this tutorial, we are using the motion sensor which is a 9-axis IMU (accelerometer, gyroscope, magnetometer).
For wireless power, we used the Adafruit Li-Ion Battery Pack. If you do not have the battery pack, you can still walk through this tutorial using a suitably long micro USB cable to power the board. Though collecting gesture data is not quite as fun when you are wired. See the images below hooking up the battery to the Nano 33 BLE Sense.
Building Your Data Set
For every machine learning project, the quality of the final product depends on the quality of your data set. Time-series data, unlike image and audio, are typically unique to each application. Because of this, you often need to collect and annotate your datasets. The next part of this tutorial will walk you through how to connect to the Nano 33 BLE Sense to stream data wirelessly over BLE as well as label the data so it can be used to train a TensorFlow model.
For this project we are going to collect data for 5 different gestures as well as some data for negative cases which we will label as Unknown. The 5 boxing gestures we are going to collect data for are Jab, Overhand, Cross, Hook, and Uppercut.
We will also collect data on both the right and left glove. Giving us a total of 10 different classes. To simplify things we will build two separate models one for the right glove, and one for the left. This tutorial will focus on the left glove.
Streaming sensor data from the Nano 33 over BLE
The first challenge of a TinyML project is often to figure out how to get data off of the sensor. Depending on your needs you may choose Wi-Fi, BLE, Serial, or LoRaWAN. Alternatively, you may find storing data to an internal SD card and transferring the files after is the best way to collect data. For this tutorial, we will take advantage of the onboard BLE radio to stream sensor data from the Nano 33 BLE Sense.
We are going to use the SensiML Open Gateway running on our computer to retrieve the sensor data. To download and launch the gateway open a terminal and run the following commands:
git clone https://github.com/sensiml/open-gateway
pip3 install -r requirements.txt
The gateway should now be running on your machine.
Next, we need to connect the gateway server to the Nano 33 BLE Sense. Make sure you have flashed the Data Collection Firmware to your Nano 33. This firmware implements the Simple Streaming Interface specification which creates two topics used for streaming data. The /config topic returns a JSON describing the sensor data and /stream topic streams raw sensor data as a byte array of Int16 values.
To configure the gateway to connect to your sensor:
- Go to the gateway address in your browser (defaults to localhost:5555)
- Click on the Home Tab
- Set Device Mode: Data Capture
- Set Connection Type: BLE
- Click the Scan button, and select the device named Nano 33 DCL
- Click the Connect to Device button
The gateway will pull the configuration from your device, and be ready to start forwarding sensor data. You can verify it is working by going to the Test Stream tab and clicking the Start Stream button.
Setting up the Data Capture Lab Project
Now that we can stream data, the next step is to record and label the boxing gestures. To do that we will use the SensiML Data Capture Lab. If you haven’t already done so, download and install the Data Capture Lab to record sensor data.
We have created a template project to get you started. The project is prepopulated with the gesture labels and metadata information, along with some pre-recorded example gestures files. To add this project to your account:
- Download and unzip the Boxing Glove Gestures Demo Project
- Open the Data Capture Lab
- Click Upload Project
- Click Browse which will open the file explorer window
- Navigate to the Boxing Glove Gestures Demo folder you just unzipped and select the Boxing Glove Gestures Demo.dclproj file
- Click Upload
Connecting to the Gateway
After uploading the project, you can start capturing sensor data. For this tutorial we will be streaming data to the Data Capture Lab from the gateway over TCP/IP. To connect to the Nano 33 BLE Sense from the Data Capture Lab through the gateway:
- Open the Project Boxing Glove Gestures Demo
- Click Switch Modes -> Capture Mode
- Select Connection Method: Wi-Fi
- Click the Find Devices button
- Enter the IP Address of your gateway machine, and the port the server is running on (typically 127.0.0.1:5555)
- Click Add Device
- Select the newly added device
- Click the Connect button
You should see sensor data streaming across the screen. If you are having trouble with this step, see the full documentation here for troubleshooting.
Capturing Boxing Gesture Sensor Data
The Data Capture Lab can also play videos that have been recorded alongside your sensor data. If you want to capture videos and sync them up with sensor data see the documentation here. This can be extremely helpful during the annotation phase to help interpret what is happening at a given point in the time-series sensor waveforms.
Now that data is streaming into the Data Capture Lab, we can begin capturing our gesture data set.
- Select “Jab” from the Label dropdown in the Capture Properties screen. (this will be the name of the file)
- Select the Metadata which captures the context (subject, glove, experience, etc.)
- Then click the Begin Recording button to start recording the sensor data
- Perform several “Jab” gestures
- Click the Stop Recording button when you are finished
After you hit stop recording, the captured data will be saved locally and synced with the cloud project. You can view the file by going to the Project Explorer and double-clicking on the newly created file.
The following video walks through capturing sensor data.
Annotating Sensor Data
To classify sensor data in real-time, you need to decide how much and which portion of the sensor stream to feed to the classifier. On edge devices, it gets even more difficult as you are limited to a small buffer of data due to the limited RAM. Identifying the right segmentation algorithm for an application can save on battery life by limiting the number of classifications performed as well as improving the accuracy by identifying the start and end of a gesture.
Segmentation algorithms work by taking the input from the sensor and buffering the data until they determine a new segment has been found. At that point, they pass the data buffer down to the result of the pipeline. The simplest segmentation algorithm is a sliding window, which continually feeds a set chunk of data to the classifier. However, there are many drawbacks to the sliding window for discrete gesture recognition, such as performing classifications when there are no events. This wastes battery and runs the risk of having events split across multiple windows which can lower accuracy.
Segmenting in the Data Capture Lab
We identify events in the Data Capture Lab by creating Segments around the events in your sensor data. Segments are displayed with a pair of blue and red lines when you open a file and define where an event is located.
The Data Capture Lab has two methods for labeling your events: Manual and Auto. In manual mode you can manually drag and drop a segment onto the graph to identify an event in your sensor data. Auto mode uses a segmentation algorithm to automatically detect events based on customizable parameters. For this tutorial, we are going to use a segmentation algorithm in Auto mode. The segmentation algorithms we use for determining events will also be compiled as part of the firmware so that the on-device model will be fed the same segments of data it was trained against.…
A big part of ensuring the availability of your applications is establishing and monitoring service-level metrics—something that our Site Reliability Engineering (SRE) team does every day here at Google Cloud. The end goal of our SRE principles is to improve services and in turn the user experience.
The concept of SRE starts with the idea that metrics should be closely tied to business objectives. In addition to business-level SLAs, we also use SLOs and SLIs in SRE planning and practice.
Defining the terms of site reliability engineering
These tools aren’t just useful abstractions. Without them, you won’t know if your system is reliable, available, or even useful. If the tools don’t tie back to your business objectives, then you’ll be missing data on whether your choices are helping or hurting your business.
As a refresher, here’s a look at SLOs, SLAs, and SLIS, as discussed by our Customer Reliability Engineering team in their blog post, SLOs, SLIs, SLAs, oh my – CRE life lessons.
1. Service-Level Objective (SLO)
SRE begins with the idea that availability is a prerequisite for success. An unavailable system can’t perform its function and will fail by default. Availability, in SRE terms, defines whether a system is able to fulfill its intended function at a point in time. In addition to its use as a reporting tool, the historical availability measurement can also describe the probability that your system will perform as expected in the future.
When we set out to define the terms of SRE, we wanted to set a precise numerical target for system availability. We term this target the availability Service-Level Objective (SLO) of our system. Any future discussion about whether the system is running reliably and if any design or architectural changes to it are needed must be framed in terms of our system continuing to meet this SLO.
Keep in mind that the more reliable the service, the more it costs to operate. Define the lowest level of reliability that is acceptable for users of each service, then state that as your SLO. Every service should have an availability SLO—without it, your team and your stakeholders can’t make principled judgments about whether your service needs to be made more reliable (increasing cost and slowing development) or less reliable (allowing greater velocity of development). Excessive availability has become the expectation, which can lead to problems. Don’t make your system overly reliable if the user experience doesn’t necessitate it, and especially if you don’t intend to commit to always reaching that level. You can learn more about this by participating in The Art of SLOs training.
Within Google Cloud, we implement periodic downtime in some services to prevent a service from being overly available. You could also try experimenting with occasional planned-downtime exercises with front-end servers, as we did with one of our internal systems. We found that these exercises can uncover services that are using those servers inappropriately. With that information, you can then move workloads to a more suitable place and keep servers at the right availability level.
2. Service-Level Agreement (SLA)
At Google Cloud, we distinguish between an SLO and a Service-Level Agreement (SLA). An SLA normally involves a promise to a service user that the service availability SLO should meet a certain level over a certain period. Failing to do so then results in some kind of penalty. This might be a partial refund of the service subscription fee paid by customers for that period, or additional subscription time added for free. Going out of SLO will hurt the service team, so they will push hard to stay within SLO. If you’re charging your customers money, you’ll probably need an SLA.
Because of this, and because of the principle that availability shouldn’t be much better than the SLO, the availability SLO in the SLA is normally a looser objective than the internal availability SLO. This might be expressed in availability numbers: for instance, an availability SLO of 99.9% over one month, with an internal availability SLO of 99.95%. Alternatively, the SLA might only specify a subset of the metrics that make up the internal SLO.
If you have an SLO in your SLA that is different from your internal SLO (as it almost always is), it’s important for your monitoring to explicitly measure SLO compliance. You want to be able to view your system’s availability over the SLA calendar period, and quickly see if it appears to be in danger of going out of SLO.
You’ll also need a precise measurement of compliance, usually from logs analysis. Since we have an extra set of obligations (described in the SLA) to paying customers, we need to measure queries received from them separately from other queries. This is another benefit of establishing an SLA—it’s an unambiguous way to prioritize traffic.
When you define your SLA’s availability SLO, be careful about which queries you count as legitimate. For example, if a customer goes over quota because they released a buggy version of their mobile client, you may consider excluding all “out of quota” response codes from your SLA accounting.
3. Service-Level Indicator (SLI)
Our Service-Level Indicator (SLI) is a direct measurement of a service’s behavior, defined as the frequency of successful probes of our system. When we evaluate whether our system has been running within SLO for the past week, we look at the SLI to get the service availability percentage. If it goes below the specified SLO, we have a problem and may need to make the system more available in some way, such as by running a second instance of the service in a different city and load-balancing between the two. If you want to know how reliable your service is, you must be able to measure the rates of successful and unsuccessful queries as your SLIs.
If you’re building a system from scratch, make sure that SLIs and SLOs are part of your system requirements. If you already have a production system but don’t have them clearly defined, then that’s your highest priority work.
Data centers may be in the midst of a flash revolution, but managing hard disk drives (HDDs) is still paramount. According to IDC, stored data will increase 17.8% by 2024 with HDD as the main storage technology.
At Google Cloud, we know first-hand how critical it is to manage HDDs in operations and preemptively identify potential failures. We are responsible for running some of the largest data centers in the world—any misses in identifying these failures at the right time can potentially cause serious outages across our many products and services. In the past, When a disk was flagged for a problem, the main option was to repair the problem on site using software. But this procedure was expensive and time-consuming. It required draining the data from the drive, isolating the drive, running diagnostics, and then re-introducing it to traffic.
That’s why we teamed up with Seagate, our HDD original equipment manufacturer (OEM) partner for Google’s data centers, to find a way to predict frequent HDD problems. Together, we developed a machine learning (ML) system, built on top of Google Cloud, to forecast the probability of a recurring failing disk—a disk that fails or has experienced three or more problems in 30 days.
Let’s take a peek.
Managing disks by the millions is hard work
There are millions of disks deployed in operation that generate terabytes (TBs) of raw telemetry data. This includes billions of rows of hourly SMART(Self-Monitoring, Analysis and Reporting Technology) data and host metadata, such as repair logs, Online Vendor Diagnostics (OVD) or Field Accessible Reliability Metrics (FARM) logs, and manufacturing data about each disk drive.
That’s hundreds of parameters and factors that must be tracked and monitored across every single HDD. When you consider the number of drives in an enterprise data center today, it’s practically impossible to monitor all these devices based on human power alone.
To help solve this issue, we created a machine learning system to predict HDD health in our data centers.
Reducing risk and costs with a predictive maintenance system
Our Google Cloud AI Services team (Professional Services), along with Accenture, helped Seagate build a proof of concept based on the two most common drive types.
The ML system was built on the following Google Cloud products and services:
Terraform helped us configure our infrastructure and manage resources on Google Cloud.
Google internal technologies enabled us to migrate data files to Google Cloud.
BigQuery andDataflow allowed us to build highly scalable data pipelines to ingest, load, transform, and store TB of data, including raw HDD health data, features (used for training and prediction), labels, prediction results, and metadata.
We built, trained, and deployed our time-series forecasting ML model using:
AI Platform Notebooks for experimentation
AutoML Tables for ML model experimentation and development
Custom Transformer-based Tensorflow model trained on Cloud AI Platform.
UI views in Data Studio and BigQuery made it easy to share results for executives, managers, and analysts.
In the past, when we flagged a disk problem, the main fix was to repair the disk on site using software. But this procedure was expensive and time-consuming. It required draining the data from the drive, isolating the drive, running diagnostics, and then re-introducing it to traffic.
“End-to-end automated MLOps using Google Cloud products from data ingestion to model training, validation and deployment added significant value to the project.” according to Vamsi Paladugu, Director of Data and Analytics at Seagate.
Vamsi also added, “Automated implementation of infrastructure as code using Terraform and DevOps processes, aligning with Seagate security policies and flawless execution of the design and setup of the infrastructure is commendable.”
Now, when an HDD is flagged for repair, the model takes any data about that disk before repair (i.e. SMART data and OVD logs) and uses it to predict the probability of recurring failures.
Data is critical—build a strong data pipeline
Making device data useful through infrastructure and advanced analytics tools is a critical component of any predictive maintenance strategy.
Every disk has to continuously measure hundreds of different performance and health characteristics that can be used to monitor and predict its future health. To be successful, we needed to build a data pipeline that was both scalable and reliable for both batch and streaming data processes for a variety of different data sources, including:
SMART system indicators from storage devices to detect and anticipate imminent hardware failures.
Host data, such as notifications about failures, collected from a host system made up of multiple drives.
HDD logs (OVD and FARM data) and disk repair logs.
Manufacturing data for each drive, such as model type and batch number.
Important note: We do not share user data at any time during this process.
With so much raw data, we needed to extract the right features to ensure the accuracy and performance of our ML models. AutoML Tables made this process easy with automatic feature engineering. All we had to do was use our data pipeline to convert the raw data into AutoML input format.
BigQuery made it easy to execute simple transformations, such as pivoting rows to columns, joining normalized tables, and defining labels, for petabytes of data in just a few seconds. From there, the data was imported directly into AutoML Tables for training and serving our ML models.
Choosing the right approach — two models put to the test
The AutoML model extracted different aggregates of time-series features, such as the minimum, maximum, and average read error rates. These were then concatenated with features that were not time-series, such as drive model type. We used a time-based split to create our training, validation, and testing subsets. AutoML Tables makes it easy to import the data, generate statistics, train different models, tune hyperparameter configurations, and deliver model performance metrics. It also offers an API to easily perform and batch online predictions.
For comparison, we created a custom Transformer-based model from scratch using Tensorflow. The Transformer model didn’t require feature engineering or creating feature aggregates. Instead, raw time series data was fed directly into the model and positional encoding was used to track the relative order. Features that were not time-series were fed into a deep neural network (DNN). Outputs from both the model and the DNN were then concatenated and a sigmoid layer was used to predict the label.
So, which model worked better?
The AutoML model generated better results, outperforming the custom transformer model or statistical model system. After we deployed the model, we stored our forecasts in our database and compared the predictions with actual drive repair logs after 30 days. Our AutoML model achieved a precision of 98% with a recall of 35% compared to precision of 70-80% and recall of 20-25% from custom ML model). We were also able to explain the mode by identifying the top reasons behind the recurring failures and enabling ground teams to take proactive actions to reduce failures in operations before they happened.
Our top takeaway: MLOps is the key to successful production
The final ingredient to ensure you can deploy robust, repeatable machine learning pipelines is MLOps. Google Cloud offers multiple options to help you implement MLOps, using automation to support an end-to-end lifecycle that can add significant value to your projects.
For this project, we used Terraform to define and provision our infrastructure and GitLab for source control versioning and CI/CD pipeline implementation.
Our repository contains two branches for development and production, which corresponds to an environment in Google Cloud. Here is our high-level system design of the model pipeline for training and serving:
We used Cloud Composer, our fully managed workflow orchestration service, to orchestrate all the data, training, and serving pipelines we mentioned above. After an ML engineer has evaluated the performance-trained model, they can trigger an activation pipeline that promotes the model to production by simply appending an entry in a metadata table.
“Google’s MLOps environment allowed us to create a seamless soup-to-nuts experience, from data ingestion all the way to easy to monitor executive dashboards.” said Elias Glavinas, Seagate’s Director of Quality Data Analytics, Tools & Automation.
Elias also noted, “AutoML Tables, specifically, proved to be a substantial time and resource saver on the data science side, offering auto feature engineering and hyperparameter tuning, with model prediction results that matched or exceeded our data scientists’ manual efforts. Add to that the capability for easy and automated model retraining and deployment, and this turned out to be a very successful project.”
What’s coming next
The business case for using an ML-based system to predict HDD failure is only getting stronger. When engineers have a larger window to identify failing disks, not only can they reduce costs but they can also prevent problems before they impact end users. We already have plans to expand the system to support all Seagate drives—and we can’t wait to see how this will benefit our OEMs and our customers!
We’d like to give thanks to Anuradha Bajpai, Kingsley Madikaegbu, and Prathap Parvathareddy for implementing the GCP infrastructure and building critical data ingestion segments. We’d like to give special thanks to Chris Donaghue, Karl Smayling, Kaushal Upadhyaya, Michael McElarney, Priya Bajaj, Radha Ramachandran, Rahul Parashar, Sheldon Logan, Timothy Ma and Tony Oliveri for their support and guidance throughout the project. We are grateful to Seagate team (Ed Yasutake, Alan Tsang, John Sosa-Trustham, Kathryn Plath and Michael Renella) and our partner team from Accenture (Aaron Little, Divya Monisha, Karol Stuart, Olufemi Adebiyi, Patrizio Guagliardo, Sneha Soni, Suresh Vadali, Venkatesh Rao and Vivian Li) who partnered with us in delivering this successful project.
- Line dash styles
- Line thickness
- Admins: There is no admin control for this feature.
- End users: Visit the Help Center to learn more about adding and editing a chart in Google Sheets.
- Rapid and Scheduled Release domains: Extended rollout (potentially longer than 15 days for feature visibility) starting on May 7, 2021
- Available to all Google Workspace customers, as well as G Suite Basic and Business customers
Posted by Zarana Parekh, Software Engineer and Jason Baldridge, Staff Research Scientist, Google Research
The past decade has seen remarkable progress on automatic image captioning, a task in which a computer algorithm creates written descriptions for images. Much of the progress has come through the use of modern deep learning methods developed for both computer vision and natural language processing, combined with large scale datasets that pair images with descriptions created by people. In addition to supporting important practical applications, such as providing descriptions of images for visually impaired people, these datasets also enable investigations into important and exciting research questions about grounding language in visual inputs. For example, learning deep representations for a word like “car”, means using both linguistic and visual contexts.
Image captioning datasets that contain pairs of textual descriptions and their corresponding images, such as MS-COCO and Flickr30k, have been widely used to learn aligned image and text representations and to build captioning models. Unfortunately, these datasets have limited cross-modal associations: images are not paired with other images, captions are only paired with other captions of the same image (also called co-captions), there are image-caption pairs that match but are not labeled as a match, and there are no labels that indicate when an image-caption pair does not match. This undermines research into how inter-modality learning (connecting captions to images, for example) impacts intra-modality tasks (connecting captions to captions or images to images). This is important to address, especially because a fair amount of work on learning from images paired with text is motivated by arguments about how visual elements should inform and improve representations of language.
To address this evaluation gap, we present “Crisscrossed Captions: Extended Intramodal and Intermodal Semantic Similarity Judgments for MS-COCO“, which was recently presented at EACL 2021. The Crisscrossed Captions (CxC) dataset extends the development and test splits of MS-COCO with semantic similarity ratings for image-text, text-text and image-image pairs. The rating criteria are based on Semantic Textual Similarity, an existing and widely-adopted measure of semantic relatedness between pairs of short texts, which we extend to include judgments about images as well. In all, CxC contains human-derived semantic similarity ratings for 267,095 pairs (derived from 1,335,475 independent judgments), a massive extension in scale and detail to the 50k original binary pairings in MS-COCO’s development and test splits. We have released CxC’s ratings, along with code to merge CxC with existing MS-COCO data. Anyone familiar with MS-COCO can thus easily enhance their experiments with CxC.
|Crisscrossed Captions extends the MS-COCO evaluation sets by adding human-derived semantic similarity ratings for existing image-caption pairs and co-captions (solid lines), and it increases rating density by adding human ratings for new image-caption, caption-caption and image-image pairs (dashed lines).*|
Creating the CxC Dataset
If a picture is worth a thousand words, it is likely because there are so many details and relationships between objects that are generally depicted in pictures. We can describe the texture of the fur on a dog, name the logo on the frisbee it is chasing, mention the expression on the face of the person who has just thrown the frisbee, or note the vibrant red on a large leaf in a tree above the person’s head, and so on.
The CxC dataset extends the MS-COCO evaluation splits with graded similarity associations within and across modalities. MS-COCO has five captions for each image, split into 410k training, 25k development, and 25k test captions (for 82k, 5k, 5k images, respectively). An ideal extension would rate every pair in the dataset (caption-caption, image-image, and image-caption), but this is infeasible as it would require obtaining human ratings for billions of pairs.
Given that randomly selected pairs of images and captions are likely to be dissimilar, we came up with a way to select items for human rating that would include at least some new pairs with high expected similarity. To reduce the dependence of the chosen pairs on the models used to find them, we introduce an indirect sampling scheme (depicted below) where we encode images and captions using different encoding methods and compute the similarity between pairs of same modality items, resulting in similarity matrices. Images are encoded using Graph-RISE embeddings, while captions are encoded using two methods — Universal Sentence Encoder (USE) and average bag-of-words (BoW) based on GloVe embeddings. Since each MS-COCO example has five co-captions, we average the co-caption encodings to create a single representation per example, ensuring all caption pairs can be mapped to image pairs (more below on how we select intermodality pairs).
The next step of the indirect sampling scheme is to use the computed similarities of images for a biased sampling of caption pairs for human rating (and vice versa). For example, we select two captions with high computed similarities from the text similarity matrix, then take each of their images, resulting in a new pair of images that are different in appearance but similar in what they depict based on their descriptions. For example, the captions “A dog looking bashfully to the side” and “A black dog lifts its head to the side to enjoy a breeze” would have a reasonably high model similarity, so the corresponding images of the two dogs in the figure below could be selected for image similarity rating. This step can also start with two images with high computed similarities to yield a new pair of captions. We now have indirectly sampled new intramodal pairs — at least some of which are highly similar — for which we obtain human ratings.
|Top: Pairs of images are picked based on their computed caption similarity. Bottom: Pairs of captions are picked based on the computed similarity of the images they describe.|
Last, we then use these new intramodal pairs and their human ratings to select new intermodal pairs for human rating. We do this by using existing image-caption pairs to link between modalities. For example, if a caption pair example ij was rated by humans as highly similar, we pick the image from example i and caption from example j to obtain a new intermodal pair for human rating. And again, we use the intramodal pairs with the highest rated similarity for sampling because this includes at least some new pairs with high similarity. Finally, we also add human ratings for all existing intermodal pairs and a large sample of co-captions.
The following table shows examples of semantic image similarity (SIS) and semantic image-text similarity (SITS) pairs corresponding to each rating, with 5 being the most similar and 0 being completely dissimilar.
|Examples for each human-derived similarity score (left: 5 to 0, 5 being very similar and 0 being completely dissimilar) of image pairs based on SIS (middle) and SITS (right) tasks. Note that these examples are for illustrative purposes and are not themselves in the CxC dataset.|
MS-COCO supports three retrieval tasks:
- Given an image, find its matching captions out of all other captions in the evaluation set.
- Given a caption, find its corresponding image out of all other images in the evaluation set.
- Given a caption, find its other co-captions out of all other captions in the evaluation set.
MS-COCO’s pairs are incomplete because captions created for one image at times apply equally well to another, yet these associations are not captured in the dataset. CxC enhances these existing retrieval tasks with new positive pairs, and it also supports a new image-image retrieval task. With its graded similarity judgements, CxC also makes it possible to measure correlations between model and human rankings. Retrieval metrics in general focus only on positive pairs, while CxC’s correlation scores additionally account for the relative ordering of similarity and include low-scoring items (non-matches). Supporting these evaluations on a common set of images and captions makes them more valuable for understanding inter-modal learning compared to disjoint sets of caption-image, caption-caption, and image-image associations.
We ran a series of experiments to show the utility of CxC’s ratings. For this, we constructed three dual encoder (DE) models using BERT-base as the text encoder and EfficientNet-B4 as the image encoder:
- A text-text (DE_T2T) model that uses a shared text encoder for both sides.
- An image-text model (DE_I2T) that uses the aforementioned text and image encoders, and includes a layer above the text encoder to match the image encoder output.
- A multitask model (DE_I2T+T2T) trained on a weighted combination of text-text and image-text tasks.
|CxC retrieval results — a comparison of our text-text (T2T), image-text (I2T) and multitask (I2T+T2T) dual encoder models on all the four retrieval tasks.|
From the results on the retrieval tasks, we can see that DE_I2T+T2T (yellow bar) performs better than DE_I2T (red bar) on the image-text and text-image retrieval tasks. Thus, adding the intramodal (text-text) training task helped improve the intermodal (image-text, text-image) performance. As for the other two intramodal tasks (text-text and image-image), DE_I2T+T2T shows strong, balanced performance on both of them.
|CxC correlation results for the same models shown above.|
For the correlation tasks, DE_I2T performs the best on SIS and DE_I2T+T2T is the best overall. The correlation scores also show that DE_I2T performs well only on images: it has the highest SIS but has much worse STS. Adding the text-text loss to DE_I2T training (DE_I2T+T2T) produces more balanced overall performance.
The CxC dataset provides a much more complete set of relationships between and among images and captions than the raw MS-COCO image-caption pairs. The new ratings have been released and further details are in our paper. We hope to encourage the research community to push the state of the art on the tasks introduced by CxC with better models for jointly learning inter- and intra-modal representations.
The core team includes Daniel Cer, Yinfei Yang and Austin Waters. We thank Julia Hockenmaier for her inputs on CxC’s formulation, the Google Data Compute Team, especially Ashwin Kakarla and Mohd Majeed for their tooling and annotation support, Yuan Zhang, Eugene Ie for their comments on the initial versions of the paper and Daphne Luong for executive support for the data collection.
Google’s BeyondCorp Enterprise recently launched, offering organizations a zero trust solution that enables secure access to applications and cloud resources with integrated threat and data protection. These threat and data protection capabilities are delivered directly through Chrome, so organizations can easily take advantage of our web-based protections.
Due to BeyondCorp Enterprise’s agentless approach utilizing the Chrome browser, these capabilities are extremely easy to adopt and deploy. The solution is delivered as a non-disruptive overlay to your existing architecture, with no need to install additional software, clients, or agents. Threat and data protection features in BeyondCorp Enterprise help prevent web-based threats such as malware, phishing and social engineering. Additionally, because BeyondCorp Enterprise leverages the browser, users are able to support different operating systems, meaning you can use things like file scanning, Data Loss Prevention (DLP) rules, and security alerts regardless of whether you operate on Windows, Mac, Linux or Chrome OS.
The administration of those capabilities is directly integrated into Chrome Browser Cloud Management, a no-cost cloud-based solution that provides enhanced visibility, reporting and management of Chrome Browser. Below we’ve covered threat and data capabilities your organization can use with Chrome and BeyondCorp Enterprise and how they work:
Protect Chrome users with BeyondCorp Enterprise threat protection
With BeyondCorp Enterprise enabled through Chrome Browser Cloud Management you can protect against threats such as malware and phishing for your Chrome users as they download and upload files.
Imagine one of your users is downloading a file found on the web to reference for an upcoming presentation. Or maybe they are uploading a file to a sharing site that they have never used before. In each of these scenarios, BeyondCorp Enterprise provides three layers of protection:
First, BeyondCorp Enterprise uses real time URL checking against Google Safe Browsing to determine if it is malicious or a phishing site.
If the site is deemed to be unsafe, you can configure the upload/download to be blocked or to log the activity.
If the site is deemed safe, the verification continues by examining the file’s metadata.
The file’s binary strings, hashes, certificates and file signature are analyzed for the presence of malware by Google Cloud.
If the file is verified to be safe by Google Cloud based on the metadata, the user can proceed. If the file fails the verification, additional actions can take place where the file can be blocked or sent securely to advanced sandboxes in Google Cloud to execute the file and determine its authenticity. During this process, the file can be delayed until checks are completed or released right away with the verification occuring in the background. These actions are determined by the administrator and can be configured accordingly.
After all these checks, if the file is still found to be safe, the file can be successfully downloaded or uploaded by the user. If not, the download/upload is blocked to protect your user and your internal site.
Protect your data with BeyondCorp Enterprise data protection in Chrome
This capability prevents sensitive data from being uploaded, downloaded or pasted from a user’s clipboard into a web form. Here is a workflow demonstrating this:
Using BeyondCorp Threat and Data Protection, you can integrate Data Loss Prevention (DLP) features to use with Chrome to implement sensitive data detection for files that are uploaded and downloaded, and for content that is pasted or dragged and dropped.
Data protection features work by creating rules that trigger actions to happen. These actions include blocking data from being uploaded/downloaded/pasted and/or logging activity details.
This capability provides 90+ different preconfigured content detectors to trigger actions based on certain types of data, but you can also define your own custom detectors. If you are a Google Workspace customer, you may be familiar with this data protection engine as it is used for data protection in Gmail and Google Drive.
Give it a try
BeyondCorp Enterprise was built to provide an easy to use experience for both end users and administrators. All of the settings in Chrome Browser Cloud Management are configurable to provide the user experience that you desire, and for your analysts, log reports can be easily accessed and viewed within the Security Dashboard in the Google Admin Console.
With BeyondCorp Enterprise and these Chrome features, you can improve your security posture and provide a seamless experience for your workforce.
Looking to learn more about BeyondCorp Enterprise and Chrome? Tune into Google Cloud Security Talks on May 12, 2021, or watch on-demand.
And for step-by-step instructions on how to set up BeyondCorp Enterprise in Chrome Browser Cloud Management, check out this demo video. For additional information on the BeyondCorp Enterprise threat and data protection features available in Chrome, view this video.
Where should your application store data?Of course, the choice depends on the use case. This post covers the different storage options available within Google Cloud across three storage types: object storage, block storage, and file storage. It also covers the use cases that are best suited for each storage option.
Object storage – Cloud Storage
Cloud Storage is an object store for binary and object data, blobs, and unstructured data. You would typically use it for any app, any type of data that you need to store, for any duration. You can add data to it or retrieve data from it as often as you need. The objects stored have an ID, metadata, attributes, and the actual data. The metadata could include all sorts of things about security classification of the file, the applications that can access it, and similar information.
Object store use cases include applications that need data to be highly available and highly durable, such as streaming videos, serving images and documents, and websites. It is also used for storing large amounts of data for use cases such as genomics and data analytics. You can also use it for storing backups and archives for compliance with regulatory requirements. Or, use it to replace old physical tape records and move them over to cloud storage. It is also widely used for disaster recovery because it takes practically no time to switch to a backup bucket to recover from a disaster.
There are 4 storage classes that are based on budget, availability and access frequency.
1. Standard buckets for high-performance, frequent access and highest availability:
– Regional / dual-regional locations for data accessed frequently / high throughput needs
– Multi-region for serving content globally
2. Nearline for data access less than once a month access
3. Coldline for data accessed roughly less than once a quarter
4. Archive for data that you want to put away for years
It costs a bit more to use standard storage because it allows for automatic redundancy and frequent access options. Nearline, coldline and archive storage offer 99% availability and cost significantly less.
Block storage – Persistent Disk and Local SSD
Persistent Disk and Local SSD are block storage options. They are integrated with Compute Engine virtual machines and Kubernetes Engine. With block storage, files are split into evenly sized blocks of data, each with its own address but with no additional information (metadata) to provide more context for what that block of data is. Block storage can be directly accessed by the operating system as a mounted drive volume.
Persistent Disk is a block store for VMs that offers a range of latency and performance options. I have covered persistent disk in detail in this article. The use cases of Persistent Disk include disks for VMs and shared read-only data across multiple VMs. It is also used for rapid, durable backups of running VMs. Because of the high-performance options available, Persistent Disk is also a good storage option for databases.
Local SSD is also block storage but it is ephemeral in nature, and therefore typically used for stateless workloads that require the lowest available latencies. The use cases include flash optimized databases, host caching layers for analytics, or scratch disks for any application, as well as scale out analytics and media rendering.
File storage – Filestore
Now, Filestore! As fully managed Network Attached Storage (NAS), Filestore provides a cloud-based shared file system for unstructured data. It offers really low latency and provides concurrent access to tens of thousands of clients with scalable and predictable performance up to hundreds of thousands of IOPS, tens of GB/s of throughput, and hundreds of TBs. You can scale capacity up and down on-demand. Typical use cases of Filestore include high performance computing (HPC), media processing, electronics design automation (EDA), application migrations, web content management, life science data analytics, and more!
That was a quick overview of different storage options in Google Cloud. For a more in-depth look into each of these storage options check out this cloud storage options page or this video 👇
The pandemic continues to deeply affect our lives around the globe. In some places, new cases are surging and returning to work is the last thing on people’s minds. In other areas, conditions are improving and companies are starting to think about transitioning their workforce back to the office.
Exactly when and how to do this remains complex and varies by country, industry, and company. What’s certain is that hybrid work will become an essential part of the business world moving forward. And finding solutions that bridge the gap between “in-person” and “somewhere else” are crucial.
At Google, we’ve been focused on what the hybrid transformation means for us. To prepare for hybrid work, using modern solutions is a key enabler.
Chrome OS: Supporting Google’s return to office strategy
At Google, we’ll move to a hybrid work weekwhere most Googlers spend approximately three days in the office and two days wherever they work best. It’s no surprise that as the modern, secure, cloud-first platform, Chrome OS is playing a key role in our transition to a hybrid work model. Because Chrome OS devices can easily be shared, more flexible working models and spaces are now possible. And with user profiles stored in the cloud and collaboration solutions like Google Workspace, employees can log in to any Chrome OS device, access what they need and pick up where they left off.
Here are just a few ways we are using Chrome OS and its tools to support the return to the office:
In select office locations, Googlers can reserve desks through an internal booking tool set up with a high-performance Chromebox, keyboard, mouse, and monitor. Employees can log in to the Chromebox which syncs their cloud profile, and start working with the same environment they have on all their Chrome OS devices.
We’re announcing newdocking stations that are designed for Chrome OS devices and allow employees to bring in their Chromebook from home, connect to the dock with one USB-C cable, and use a monitor, keyboard, and mouse for a full desktop experience.
Every Chrome OS device enables a zero-trust security working model with BeyondCorp Enterprise providing our workforce with simple and secure access to applications while providing additional security controls for IT.
We’ve deployed the new Chrome OS Readiness Tool to our extended workforce to identify employees that are able to switch to Chrome OS. This allows us to expand the latest security, deployment, and manageability benefits to more of the workforce.
Additional ways Chrome OS is helping organizations with return to office
We aren’t the only ones supporting our return to office and hybrid work strategy with Chrome OS. We’ve heard more ways our customers are using Chrome OS to make the transition as smooth as possible. These include:
Streamlining deployment and management of Chrome OS devices using zero-touch enrollment which allows devices to automatically enroll into a corporate domain without IT configuration.
Grab & Go Chromebooks being used for frontline and hybrid information workers, allowing employees to grab a Chromebook from a cart and get to work right away.
Parallels Desktop for Chrome OS being deployed to allow employees to access Windows or legacy apps locally on their Chrome OS devices.
Existing Windows and Mac devices being modernized and repurposed to run a Chrome OS experience using CloudReady. (Google is currently offering a CloudReady promotion. Learn more here.)
While the Chrome OS team has been working towards making remote working as seamless as possible for IT, we’ve also made advancements in supporting traditional technology that’s required in the office.
In October, we announced Chrome Enterprise Recommended: a collection of identity, printing, productivity, communications, and virtualization solutions that are verified to run great on Chrome OS.
With the increased usage of video conferencing on Chrome OS devices, we’ve made improvements to Google Meet and Zoom performance including camera and video improvements to reduce any unnecessary processing and features that intelligently adapt to your device, your network, and what you are working on.
For improved access to Windows and legacy apps, VMware Horizon introduced multi-monitor support and USB redirection and Citrix Workspace released a tech preview with webcam enhancements and Microsoft Teams optimizations.
We integrated with the Okta Workflows platform, so IT administrators can include Chrome OS in it’s access logic and deploy quickly without code. Recently, we’ve added the ability to require users to reauthenticate on their Chrome OS device once Okta has detected that the user has changed their password. Try out this new capability by signing up for our Trusted Tester program.
We’ve increased our support for direct IP printing with additional printer models since the beginning of 2020. In addition, we have made improvements to management features, including support for multiple print servers and launched policy APIs to provide a better IT admin experience.
Join us for a digital event with Modern Computing Alliance and its newest member HP
Last year we announced the launch of the Modern Computing Alliance—a collaboration of industry leaders including Box, Chrome Enterprise, Citrix, Dell, Google Workspace, Imprivata, Intel, Okta, RingCentral, Slack, VMware, and Zoom aim to create the pioneering solutions that businesses need. We are proud to introduce our newest member, HP, who will bring their hardware expertise to the Modern Computing Alliance and work closely with the alliance to ensure true silicon-to-cloud innovation.
As an alliance, we’ve been deeply engaged in the hybrid work shift. We invite you to join us for a digital event where we discuss questions about returning to work. Like how product design can encourage participation and collaboration regardless of where employees are, important security issues to keep in mind, and what factors businesses should consider to support a safe, working environment.
Home. Heading back. Hybrid.
Hear from the experts on hybrid work and return to office.
Date: May 20th, 2021
If you are ready to try Chrome OS today, it’s easy to get started. You can contact us to get connected to a partner, sign up for a free 30-day trial of Chrome Enterprise Upgrade to start managing Chrome OS devices, or deploy the Chrome OS Readiness Tool to identify employees that are ready to switch to Chrome OS.