FermiNet: Quantum Physics and Chemistry from First Principles

Weve developed a new neural network architecture, the Fermionic Neural Network or FermiNet, which is well-suited to modeling the quantum state of large collections of electrons, the fundamental building blocks of chemical bonds.Read More

How CEVA uses TensorFlow Lite for Always-On Speech Recognition on the Edge

A guest article by Ido Gus of CEVA

CEVA is a leading licensor of wireless connectivity and smart sensing technologies. Our products help OEMs design power-efficient, intelligent and connected devices for a range of end markets, including mobile, consumer, automotive, robotics, industrial and IoT.

In this article, we’ll describe how we used TensorFlow Lite for Microcontrollers (TFLM) to deploy a speech recognition engine and frontend, called WhisPro, on a bare-metal development board based on our CEVA-BX DSP core. WhisPro detects always-on wake words and speech commands efficiently, on-device.

Figure 1 CEVA Multi-microphone DSP Development Board

About WhisPro

WhisPro is a speech recognition engine and frontend targeted to run on low power, resource constrained edge devices. It is designed to handle the entire data flow from processing audio samples to detection.

WhisPro supports two use cases for edge devices:

  • Always-on wake word detection engine. In this use case, WhisPro’s role is to wake a device in sleep mode when a predefined phrase is detected.
  • Speech commands. In this use case, WhisPro’s role is to enable a voice-based interface. Users can control the device using their voice. Typical commands can be: volume up, volume down, play, stop, etc.

WhisPro enables voice interface on any SoC that has a CEVA BX DSP core integrated into it, lowering entry barriers to OEMs and ODM interested in joining the voice interface revolution.

Our Motivation

Originally, WhisPro was implemented using an in-house neural network library called CEVA NN Lib. Although that implementation achieved excellent performance, the development process was quite involved. We realized that, if we ported the TFLM runtime library and optimized it for our target hardware, the entire model porting process would become transparent and more reliable (far fewer lines of code would need to be written, modified, and maintained).

Building TFLM for CEVA-BX DSP Family

The first thing we had to do is to figure out how to port TFLM to our own platform. We found that following this porting to a new platform guide to be quite useful.
Following the guide, we:

  • Verified DebugLog() implementation is supported by our platform.
  • Created a TFLM runtime library project in CEVA’s Eclipse-based IDE:
    • Created a new CEVA-BX project in CEVA’s IDE
    • Added all the required source files to the project
  • Built the TFLM runtime library for the CEVA-BX core.
    This required the usual fiddling with compiler flags, including paths (not all required files are under the “micro” directory), linker script, and so on.

Model Porting Process

Our starting point is a Keras implementation of our model. Let’s look at the steps we took to deploy our model on our bare-metal target hardware:

Converted theTensorFlow model to TensorFlow Lite using the TF built-in converter:

$ python3 -m tensorflow_docs.tools.nbfmt [options] notebook.ipynb

converter = tf.lite.TFLiteConverter.from_keras_model(keras_model)
converter.experimental_new_converter = True
tflite_model = converter.convert()
open("converted_to_tflite_model.tflite", "wb").write(tflite_model)

Used quantization:

$ python3 -m tensorflow_docs.tools.nbfmt [options] notebook.ipynb

converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.representative_dataset = representative_data_gen

Converted the TensorFlow Lite model to TFLM using xxd:

$ python3 -m tensorflow_docs.tools.nbfmt [options] notebook.ipynb

$> xxd –I model.tflite > model.cc

Here we found that some of the model layers (for example, GRU) were not properly supported (at the time) by TFLM. It is very reasonable to assume that, as TFLM continues to mature and Google and the TFLM community invest more in it, issues like this will become rarer.
In our case, though, we opted to re-implement the GRU layers in terms of Fully Connected layers, which was surprisingly easy.


The next step was to integrate the TFLM runtime library and the converted model into our existing embedded C frontend, which handles audio preprocessing and feature extraction.

Even though our frontend was not written with TFLM in mind, it was modular enough to allow easy integration by implementation of a single simple wrapper function, as follows:

  1. Linked the TFLM runtime library into our embedded C application (WhisPro frontend)
  2. Implemented a wrapper-over-setup function for mapping the model into a usable data structure, allocating the interpreter and tensors
  3. Implemented a wrapper-over-execute function for mapping data passed from the WhisPro frontend into tflite tensors used by the actual execute function
  4. Replaced the call to the original model execute function with a call to the TFLM implementation

Process Visualization

The process we described is performed by two components:

  • The microcontroller supplier, in this case, CEVA – is responsible for optimizing TFLM for its hardware architecture.
  • The microcontroller user, in this case, CEVA WhisPro developer – is responsible for deploying a neural network based model, using an optimized TFLM runtime library, on the target microcontroller.

What’s Next

This work has proven the importance of the TFLM platform to us, and the significant value supporting TFLM can add to our customers and partners by enabling easy neural network model deployment on edge devices. We are committed to further support TFLM on the CEVA-BX DSP family by:

  • Active contribution to the TFLM project, with the goal of improving layer coverage and overall platform maturity.
  • Investing in TFLM operator optimization for execution on CEVA-BX cores, aiming for full coverage.

Final Thoughts

While the porting process had some bumps along the way, at the end it was a great success, and took about 4-5 days’ worth of work. Implementing a model in C from scratch, and handcrafting model conversion scripts from Python to C, could take 2-3 weeks (and lots of debugging).

CEVA Technology Virtual Seminar

To learn more, you are welcome to watch CEVA’s virtual seminar – Wireless Audio session, covering TFLM, amongst other topics.

Read More

Project Euphonia’s new step: 1,000 hours of speech recordings

Muratcan Cicek, a PhD candidate at UC Santa Cruz, worked as a summer intern on Google’s Project Euphonia, which aims to improve computers’ abilities to understand impaired speech. This work was especially relevant and important for Muratcan, who was born with cerebral palsy and has a severe speech impairment.

Before his internship, Muratcan recorded 2,000 phrases for Project Euphonia. These phrases, expressions like “Turn the lights on” and “Turn up thermostat to 74 degrees,” were used to build a personalized speech recognition model that could better recognize the unique sound of his voice and transcribe his speech. The prototype allowed Muratcan to share the transcription in a video call so others could better understand him. He used the prototype to converse with co-workers, give status updates during team meetings and connect with people in ways that were previously impossible. Muratcan says, “Euphonia transformed my communication skills in a way that I can leverage in my career as an engineer without feeling insecure about my condition.”

Muratcan, a Google intern

Muratcan, a summer research intern on the Euphonia team, uses the Euphonia prototype app

1,000 hours of speech samples

The phrases that Muratcan recorded were key to training custom machine learning models that could help him be more easily understood. To help other people that have impaired speech caused by ALS, Parkinson’s disease or Down Syndrome, we need to gather samples of their speech patterns. So we’ve worked with partners like CDSS, ALS TDI, ALSA, LSVT Global, Team Gleason and CureDuchenne to encourage people with speech impairments to record their voices and contribute to this research.

Since 2018, nearly 1,000 participants have recorded over 1,000 hours of speech samples. For many, it’s been a source of pride and purpose to shape the future of speech recognition, not only for themselves but also for others who struggle to be understood.

I contribute to this research so that I can help not only myself, but also a larger group of people with communication challenges that are often left out. Project Euphonia participant

While the technology is still under development, the speech samples we’ve collected helped us create personalized speech recognition models for individuals with speech impairments, like Muratcan. For more technical details about how these models work, see the Euphonia and Parrotron blog posts. We’re evaluating these personalized models with a group of early testers. The next phase of our research aims to improve speech recognition systems for many more people, but it requires many more speech samples from a broad range of speakers.

How you can contribute

To continue our research, we hope to collect speech samples from an additional 5,000 participants. If you have difficulty being understood by others and want to contribute to meaningful research to improve speech recognition technologies, learn more and consider signing up to record phrases. We look forward to hearing from more participants and experts— and together, helping everyone be understood.

Read More

Measuring Gendered Correlations in Pre-trained NLP Models

Posted by Kellie Webster, Software Engineer, Google Research

Natural language processing (NLP) has seen significant progress over the past several years, with pre-trained models like BERT, ALBERT, ELECTRA, and XLNet achieving remarkable accuracy across a variety of tasks. In pre-training, representations are learned from a large text corpus, e.g., Wikipedia, by repeatedly masking out words and trying to predict them (this is called masked language modeling). The resulting representations encode rich information about language and correlations between concepts, such as surgeons and scalpels. There is then a second training stage, fine-tuning, in which the model uses task-specific training data to learn how to use the general pre-trained representations to do a concrete task, like classification. Given the broad adoption of these representations in many NLP tasks, it is crucial to understand the information encoded in them and how any learned correlations affect performance downstream, to ensure the application of these models aligns with our AI Principles.

In “Measuring and Reducing Gendered Correlations in Pre-trained Models” we perform a case study on BERT and its low-memory counterpart ALBERT, looking at correlations related to gender, and formulate a series of best practices for using pre-trained language models. We present experimental results over public model checkpoints and an academic task dataset to illustrate how the best practices apply, providing a foundation for exploring settings beyond the scope of this case study. We will soon release a series of checkpoints, Zari1, which reduce gendered correlations while maintaining state-of-the-art accuracy on standard NLP task metrics.

Measuring Correlations
To understand how correlations in pre-trained representations can affect downstream task performance, we apply a diverse set of evaluation metrics for studying the representation of gender. Here, we’ll discuss results from one of these tests, based on coreference resolution, which is the capability that allows models to understand the correct antecedent to a given pronoun in a sentence. For example, in the sentence that follows, the model should recognize his refers to the nurse, and not to the patient.

The standard academic formulation of the task is the OntoNotes test (Hovy et al., 2006), and we measure how accurate a model is at coreference resolution in a general setting using an F1 score over this data (as in Tenney et al. 2019). Since OntoNotes represents only one data distribution, we also consider the WinoGender benchmark that provides additional, balanced data designed to identify when model associations between gender and profession incorrectly influence coreference resolution. High values of the WinoGender metric (close to one) indicate a model is basing decisions on normative associations between gender and profession (e.g., associating nurse with the female gender and not male). When model decisions have no consistent association between gender and profession, the score is zero, which suggests that decisions are based on some other information, such as sentence structure or semantics.

BERT and ALBERT metrics on OntoNotes (accuracy) and WinoGender (gendered correlations). Low values on the WinoGender metric indicate that a model does not preferentially use gendered correlations in reasoning.

In this study, we see that neither the (Large) BERT or ALBERT public model achieves zero score on the WinoGender examples, despite achieving impressive accuracy on OntoNotes (close to 100%). At least some of this is due to models preferentially using gendered correlations in reasoning. This isn’t completely surprising: there are a range of cues available to understand text and it is possible for a general model to pick up on any or all of these. However, there is reason for caution, as it is undesirable for a model to make predictions primarily based on gendered correlations learned as priors rather than the evidence available in the input.

Best Practices
Given that it is possible for unintended correlations in pre-trained model representations to affect downstream task reasoning, we now ask: what can one do to mitigate any risk this poses when developing new NLP models?

  • It is important to measure for unintended correlations: Model quality may be assessed using accuracy metrics, but these only measure one dimension of performance, especially if the test data is drawn from the same distribution as the training data. For example, the BERT and ALBERT checkpoints have accuracy within 1% of each other, but differ by 26% (relative) in the degree to which they use gendered correlations for coreference resolution. This difference might be important for some tasks; selecting a model with low WinoGender score could be desirable in an application featuring texts about people in professions that may not conform to historical social norms, e.g., male nurses.
  • Be careful even when making seemingly innocuous configuration changes: Neural network model training is controlled by many hyperparameters that are usually selected to maximize some training objective. While configuration choices often seem innocuous, we find they can cause significant changes for gendered correlations, both for better and for worse. For example, dropout regularization is used to reduce overfitting by large models. When we increase the dropout rate used for pre-training BERT and ALBERT, we see a significant reduction in gendered correlations even after fine-tuning. This is promising since a simple configuration change allows us to train models with reduced risk of harm, but it also shows that we should be mindful and evaluate carefully when making any change in model configuration.
    Impact of increasing dropout regularization in BERT and ALBERT.
  • There are opportunities for general mitigations: A further corollary from the perhaps unexpected impact of dropout on gendered correlations is that it opens the possibility to use general-purpose methods for reducing unintended correlations: by increasing dropout in our study, we improve how the models reason about WinoGender examples without manually specifying anything about the task or changing the fine-tuning stage at all. Unfortunately, OntoNotes accuracy does start to decline as the dropout rate increases (which we can see in the BERT results), but we are excited about the potential to mitigate this in pre-training, where changes can lead to model improvements without the need for task-specific updates. We explore counterfactual data augmentation as another mitigation strategy with different tradeoffs in our paper.

What’s Next
We believe these best practices provide a starting point for developing robust NLP systems that perform well across the broadest possible range of linguistic settings and applications. Of course these techniques on their own are not sufficient to capture and remove all potential issues. Any model deployed in a real-world setting should undergo rigorous testing that considers the many ways it will be used, and implement safeguards to ensure alignment with ethical norms, such as Google’s AI Principles. We look forward to developments in evaluation frameworks and data that are more expansive and inclusive to cover the many uses of language models and the breadth of people they aim to serve.

This is joint work with Xuezhi Wang, Ian Tenney, Ellie Pavlick, Alex Beutel, Jilin Chen, Emily Pitler, and Slav Petrov. We benefited greatly throughout the project from discussions with Fernando Pereira, Ed Chi, Dipanjan Das, Vera Axelrod, Jacob Eisenstein, Tulsee Doshi, and James Wexler.

1 Zari is an Afghan Muppet designed to show that ‘a little girl could do as much as everybody else’.

Read More

How NetEase Yanxuan uses TensorFlow for customer service chat bots

Posted by Liu Huiyun, a senior algorithm engineer at NetEase

With the development of natural language processing (NLP) technology, Intelligent customer service has become an important use case in the e-commerce field. In recent years, this use case has received more and more attention. This is because, in the purchasing process, users need to be transferred to a customer services system for consultation and support if they encounter any problems or have questions. If the customer service system is able to provide accurate and effective responses, this will directly improve the user experience and have a positive impact on purchase conversion. For example:

  • In pre-sales scenarios, users may ask for more detailed information about the products or promotional activities that they are interested in.
  • In post-sales scenarios, users often have questions about returning and exchanging products, shipping fees, and logistics issues.

During actual business operations, NetEase Yanxuan, a large eCommerce platform in China, produces and accumulates large volumes of information, such as product attributes, activity operations, aftersales policies. In the meantime, the corresponding business logic is complicated. Intelligent customer service is an intelligent dialog system that leverages this information to automatically answer user questions or help human customer service representatives do so.

However, the e-commerce field involves many detailed and complicated business aspects, and users may ask their questions in many different ways and in a colloquial manner. These features require Intelligent customer service systems to possess strong semantic understanding. To this end, we have combined general customer scenarios with Yanxuan’s businesses and designed a deep learning based system. Check Yanxuan Intelligent customer service Framework full picture

in Figure 1 and Figure 2.

  • As a user inputs a question, the input text and its contextual information are first sent to the intent recognition (IR) module.
  • The intent recognition module analyzes the user’s multi-layered intents and then distributes them to different sub-modules.
  • The sub-modules are responsible for more targeted business Q&A, and different sub-modules apply different technical solutions.

As you can see, deep learning algorithms are applied to different modules in the framework. Because of the advanced NLP algorithms, we can extract more general and multi-granular semantic information from the user’s utterance.

Figure 3 shows the Xiaoxuan bot answering questions in a real dialog scenario. Next, I will introduce the different sub-modules that apply deep learning technology.

Xiaoxuan bot answering questions
Figure 3. Online Conversation Example

Intent Recognition Module — Multilayer Classification Model

As the user inputs text, we use a multilayer classification intent recognition model built with TensorFlow to analyze the input text, its context, and the historical behavior of the user. We divide first-level intents into four main categories: pre-sales product questions, aftersales questions, casual chatting, and the rest. When users ask common policy-related a ftersales questions, the input is summarized into more detailed sub-level intents. Click here (Figure 4) to check the structure of the intent recognition process.

In essence, intent recognition can be viewed as a classification problem. When building a classification system, we use the Attention+BiLSTM (ABL) model structure as a preliminary baseline. Except for the raw input text, we further design more features fed to the deep model, such as n-grams and positional encoding in the Transformer model. Ultimately, more manually crafted features improves the model accuracy by three percentage points. In addition, we also use a fine-tuned BERT model to train a classification model with less labeled data, and it performs as good as an ABL model. Pretrained models have better generalization, and can learn more semantic information based on fewer labeled data. However, this approach requires more computing resources.

FAQ Module — Text Matching Model

Answering FAQs is a key function of Intelligent customer service systems. This module is composed of two components, recall and re-rank.

  • The recall stage adopts discrete searches at the word granularity as well as semantic searches based on dense sentence vectors.
  • The re-rank stage uses a text matching model built with TensorFlow to re-rank the recalled candidature Q&A pairs.
  • Then, filter by the final mixed strategies, the module returns the final answer.

In the automatic Q&A field, text matching algorithms are commonly applied to sentence similarity task and natural language inference task. From the most basic Siamese-LSTM networks, the structure of matching modules has evolved through InferNet, Decomposable Attention, ESIM, and finally to BERT models. Generally speaking, matching algorithms can be categorized into two kinds, one is representation-based and the other is interaction-based. Representation methods are focused on the encoding of single sentences, regardless of the interactive semantics between sentences which is used in interaction methods.

At the service layer, we adopt a variety of question matching solutions:

  1. Perform association matching between input question Q and answer A.
  2. Perform similarity matching between input question Q₁ and standard question Q₂.
  3. Perform similar question matching between input question Q and standard question Qs.

These three methods perform question relevance recall and Q&A association matching in different ways. In the match and rank stages, we can use flexible weighted discrimination.

We built a Siamese-LSTM model to use as our baseline model and then implemented the following model iteration solutions:

  • We converted the LSTM units into the encoders of the Transformer model and replaced the cosine distance characterization module with the sentence-pair vector feature: to connect to the MLP layer.
  • We integrated an ESIM model with ELMo features.
  • We fine tuned the BERT model.

Tests showed that these optimizations improved these models. For example, the encoders of the Transformer model showed better accuracy in tasks (1) and (3), increasing performance by nearly 5 percentage points.

In addition, we found that, without any additional feature construction or techniques, BERT could provide stable and outstanding matching performance. This is because, in the pretraining stage, BERT aims to predict whether a contextual relationship exists between two sentences, so it can learn the relationships between sentences. In addition, the self-attention mechanism is adept at capturing deep semantics and can obtain fine-grained matching results for a word in sentence A and any word in sentence B. This is crucial for text matching tasks.

KBQA Module — NER Module

In the product knowledge-base Q&A (KBQA) and shopping guide modules, we built a named-entity recognition (NER) model for the e-commerce field based on TensorFlow. The model can recognize product names, product attribute names, product attribute values, and other key product information in the questions asked by users, as shown in Figure 5. Then, entity names are sent to downstream modules, where Q&A knowledge graph techniques are used to generate a final answer.

Figure 5. E-commerce NER Example

Generally, NER algorithm models use a bidirectional LSTM with a Conditional Random Field (CRF) layer. The former captures the before and after features, understands the context, and fully extracts contextual information. The latter focuses on the probabilistic transfer constructed from the local and global features of the current dialogue text, effectively mining the semantic information of the text. Yanxuan uses a BiLSTM-CRF model as a word-granularity baseline model, which serves the Intelligent customer service system. In later experiments, we tested feature extraction and fine-tuned BERT models.

In bert-based model optimization, we tried to use bert to extract sentence vector features and incorporate them into bilstm and crf, as well as two methods of bert-based fine-tuning: the last layer of embedding prediction, and the embedding method of weighted hidden layers. On the test set, the feature fusion performed best, with F1 as high as 0.92, followed by the multi-hidden layer fusion method (0.90), and finally the single high-layer method (0.88). In terms of the time efficiency of online inference, feature fusion takes about 100ms, and fine-tuning the model takes about 10ms.

The performance results using Yanxuan’s dataset are shown in Table 1. These results tell us the following:

  • Feature extraction provides better performance than fine tuning. In addition to using BiLSTM for semantic and structure information extraction, by introducing BERT features into a feature extraction model, we obtain a wider range of semantic and structural representations. The performance boost obtained by adding additional parameters, as in feature extraction, is significantly higher than that of normal fine tuning.
  • Multilayer feature fusion provides better performance than high-level features. This is because, for sequence tagging tasks, we need to consider both the semantic representation and the fusion of other granular representations of the sentence, such as syntactic structure information.
  • In terms of response time, feature extraction, which adds additional parameters, is well-suited to offline systems, but cannot meet the needs of online systems. Fine-tuned models, however, can meet the timeliness requirements of online systems.

Casual Chat Module — Generative Model

A standalone customer service bot must be able to answer difficult questions from users. At the same time, it must also have the ability to chat casually so as to demonstrate both its humanity and intelligence.

To give our bot this capability, we built a casual chat module capable of handling routine chatting. This module includes two key models: retrieval-based QA and generative QA.

  • The retrieval-based QA model first recalls answers from a prepared corpus and then uses a text matching model to re-rank the answers.
  • The generative QA model uses the Transformer generative model trained using TensorFlow’s tensor2tensor to generate responses in an end-to-end (E2E) manner.

However, a purely E2E approach to response generation is difficult to control. Therefore, we decided to fuse the two models in our online system to ensure more reliable responses.

Model Deployment

Figure 6 shows an online service flow based on the BERT model. Thanks to the open-source TensorFlow versions of language models such as BERT, only a small number of labeled samples need to be used to build various text models that feature high accuracy. Then, we can use GPUs to accelerate computation in order to meet the QPS requirements of online services. Finally, we can quickly deploy and launch the model based on TensorFlow Serving (TFS). Therefore, it is the support provided by TensorFlow that allows us to deploy and iterate online services in a stable and efficient manner.

Figure 6. BERT-based Online Service Flow


As deep learning technology continues to develop, new models will make new breakthroughs in the NLP field. By continuing to apply academic advances in the industry, we can achieve outstanding business results. However, this would not be possible without the work of TensorFlow. In Yanxuan’s business scenarios, TensorFlow provides flexible and refined APIs that enables engineers to deal with agile development and test new models, greatly facilitating algorithm model iteration.

Read More

Neural Structured Learning in TFX

Posted by Arjun Gopalan, Software Engineer, Google Research

Edited by Robert Crowe, TensorFlow Developer Advocate, Google Research


Neural Structured Learning (NSL) is a framework in TensorFlow that can be used to train neural networks with structured signals. It handles structured input in two ways: (i) as an explicit graph, or (ii) as an implicit graph where neighbors are dynamically generated during model training. NSL with an explicit graph is typically used for Neural Graph Learning while NSL with an implicit graph is typically used for Adversarial Learning. Both of these techniques are implemented as a form of regularization in the NSL framework. As a result, they only affect the training workflow and so, the model serving workflow remains unchanged. In the rest of this post, we will mostly focus on how graph regularization can be implemented using the NSL framework in TFX.

The high-level workflow for building a graph-regularized model using NSL entails the following steps:

  1. Build a graph, if one is not available.
  2. Use the graph and the input example features to augment the training data.
  3. Use the augmented training data to apply graph regularization to a given model.

These steps don’t immediately map onto existing TFX pipeline components. However, TFX supports custom components which allow users to implement custom processing within their TFX pipelines. See this blog post for an introduction to custom components in TFX. So, to create a graph-regularized model in TFX incorporating the above steps, we will make use of additional custom TFX components.

To illustrate an example TFX pipeline with NSL, let’s consider the task of sentiment classification on the IMDB dataset. A colab-based tutorial demonstrating the use of NSL for this task with native TensorFlow is available here, which we will use as the basis for our TFX pipeline example.

Graph Regularization With Custom TFX Components

To build a graph-regularized NSL model in TFX for this task, we will define three custom components using the custom Python functions approach. Here is a TFX pipeline schematic for our example using these custom components. For brevity, we have skipped components that typically come after the Trainer component like the Evaluator, Pusher, etc.

example chart

Figure 1: Example TFX pipeline for text classification using graph regularization

In this figure, only the custom components (in pink) and the Graph-regularized Trainer component have NSL-related logic. It’s worth noting that the custom components shown here are only illustrative and it may be possible to build a functionally equivalent pipeline in other ways. We now describe each of the custom components in further detail and show code snippets for them.


This custom component assigns a unique ID to each training example that is used to associate each training example with its corresponding neighbors from the graph.

def IdentifyExamples(
orig_examples: InputArtifact[Examples],
identified_examples: OutputArtifact[Examples],
id_feature_name: Parameter[str],
component_name: Parameter[str]
) -> None:

# Compute the input and output URIs.

# For each input split, update the TF.Examples to include a unique ID.
with beam.Pipeline() as pipeline:
| 'ReadExamples' >> beam.io.ReadFromTFRecord(
os.path.join(input_dir, '*'),
| 'AddUniqueId' >> beam.Map(make_example_with_unique_id, id_feature_name)
| 'WriteIdentifiedExamples' >> beam.io.WriteToTFRecord(
file_path_prefix=os.path.join(output_dir, 'data_tfrecord'),

identified_examples.split_names = orig_examples.split_names

The make_example_with_unique_id() function updates a given example to include an additional feature containing a unique ID.


As mentioned above, in the IMDB dataset, no explicit graph is given as an input. So, we will build one before we can demonstrate graph regularization. For this example, we will use a pre-trained text embedding model to convert raw text in the movie reviews to embeddings, and then use the resulting embeddings to build a graph.

The SynthesizeGraph custom component handles graph building for our example and notice that it defines a new Artifact called SynthesizedGraph, which will be the output of this custom component.

"""Custom Artifact type"""
class SynthesizedGraph(tfx.types.artifact.Artifact):
"""Output artifact of the SynthesizeGraph component"""
TYPE_NAME = 'SynthesizedGraphPath'
'span': standard_artifacts.SPAN_PROPERTY,
'split_names': standard_artifacts.SPLIT_NAMES_PROPERTY,

def SynthesizeGraph(
identified_examples: InputArtifact[Examples],
synthesized_graph: OutputArtifact[SynthesizedGraph],
similarity_threshold: Parameter[float],
component_name: Parameter[str]
) -> None:

# Compute the input and output URIs

# We build a graph only based on the 'train' split which includes both
# labeled and unlabeled examples.
create_embeddings(train_input_examples_uri, output_graph_uri)
build_graph(output_graph_uri, similarity_threshold)
synthesized_graph.split_names = artifact_utils.encode_split_names(

The create_embeddings() function involves converting the text in movie reviews to corresponding embeddings using some pre-trained model on TensorFlow Hub. The build_graph() function involves invoking the build_graph() API in NSL.


The purpose of this custom component is to combine the example features (text in the movie reviews) with the graph built from embeddings to produce an augmented training dataset. The resulting training examples will include features from their corresponding neighbors as well.

def GraphAugmentation(
identified_examples: InputArtifact[Examples],
synthesized_graph: InputArtifact[SynthesizedGraph],
augmented_examples: OutputArtifact[Examples],
num_neighbors: Parameter[int],
component_name: Parameter[str]
) -> None:

# Compute the input and output URIs

# Separate out the labeled and unlabeled examples from the 'train' split.
train_path, unsup_path = split_train_and_unsup(train_input_uri)

# Augment training data with neighbor features.
train_path, unsup_path, graph_path, output_path, add_undirected_edges=True,

# Copy the 'test' examples from input to output without modification.

augmented_examples.split_names = identified_examples.split_names

The split_train_and_unsup() function involves splitting the input Examples into labeled and unlabeled examples and the pack_nbrs() NSL API creates the augmented training dataset.

Graph-regularized Trainer

Now that all of our custom components are implemented, the remaining NSL-specific addition to the TFX pipeline is in the Trainer component. Below is a simplified view of the graph-regularized Trainer component.


estimator = tf.estimator.Estimator(
model_fn=feed_forward_model_fn, config=run_config, params=HPARAMS)

# Create a graph regularization config.
graph_reg_config = nsl.configs.make_graph_reg_config(

# Invoke the Graph Regularization Estimator wrapper to incorporate
# graph-based regularization for training.
graph_nsl_estimator = nsl.estimator.add_graph_regularization(


As you can see, once a base model has been created (in this case a feed-forward neural network), it’s straightforward to convert it to a graph-regularized model by invoking the NSL wrapper API.

And that’s it! We now have all of the missing pieces that are required to build a graph-regularized NSL model in TFX. A colab-based tutorial that demonstrates this example end-to-end in TFX is available here. Feel free to try it and customize it as you want!

Adversarial Learning

As mentioned in the introduction above, another aspect of Neural Structured Learning is adversarial learning where instead of using explicit neighbors from a graph for regularization, implicit neighbors are created dynamically and adversarially to confuse the model. So, regularizing using adversarial examples is an effective way to improve a model’s robustness. Adversarial learning using NSL can be easily integrated into a TFX pipeline. It does not require any custom components and only the trainer component needs to be updated to invoke the adversarial regularization wrapper API in NSL.


We have demonstrated how to build a graph-regularized model with NSL in TFX using custom components. It’s certainly possible to build graphs in other ways as well as structure the overall pipeline differently. We hope that this example provides a basis for your own NSL workflows.

Additional Links

For more information on NSL, check out the following resources:


We’d like to thank the Neural Structured Learning and TFX teams at Google as well as Aurélien Geron for their support and contributions.

Read More

Announcing the 2020 Google PhD Fellows

Posted by Susie Kim, Program Manager, University Relations

Google created the PhD Fellowship Program in 2009 to recognize and support outstanding graduate students who seek to influence the future of technology by pursuing exceptional research in computer science and related fields. Now in its twelfth year, these Fellowships have helped support approximately 500 graduate students globally in North America and Europe, Africa, Australia, East Asia, and India.

It is our ongoing goal to continue to support the academic community as a whole, and these Fellows as they make their mark on the world. We congratulate all of this year’s awardees!

Algorithms, Optimizations and Markets
Jan van den Brand, KTH Royal Institute of Technology
Mahsa Derakhshan, University of Maryland, College Park
Sidhanth Mohanty, University of California, Berkeley

Computational Neuroscience
Connor Brennan, University of Pennsylvania

Human Computer Interaction
Abdelkareem Bedri, Carnegie Mellon University
Brendan David-John, University of Florida
Hiromu Yakura, University of Tsukuba
Manaswi Saha, University of Washington
Muratcan Cicek, University of California, Santa Cruz
Prashan Madumal, University of Melbourne

Machine Learning
Alon Brutzkus, Tel Aviv University
Chin-Wei Huang, Universite de Montreal
Eli Sherman, Johns Hopkins University
Esther Rolf, University of California, Berkeley
Imke Mayer, Fondation Sciences Mathématique de Paris
Jean Michel Sarr, Cheikh Anta Diop University
Lei Bai, University of New South Wales
Nontawat Charoenphakdee, The University of Tokyo
Preetum Nakkiran, Harvard University
Sravanti Addepalli, Indian Institute of Science
Taesik Gong, Korea Advanced Institute of Science and Technology
Vihari Piratla, Indian Institute of Technology – Bombay
Vishakha Patil, Indian Institute of Science
Wilson Tsakane Mongwe, University of Johannesburg
Xinshi Chen, Georgia Institute of Technology
Yadan Luo, University of Queensland

Machine Perception, Speech Technology and Computer Vision
Benjamin van Niekerk, University of Stellenbosch
Eric Heiden, University of Southern California
Gyeongsik Moon, Seoul National University
Hou-Ning Hu, National Tsing Hua University
Nan Wu, New York University
Shaoshuai Shi, The Chinese University of Hong Kong
Yaman Kumar, Indraprastha Institute of Information Technology – Delhi
Yifan Liu, University of Adelaide
Yu Wu, University of Technology Sydney
Zhengqi Li, Cornell University

Mobile Computing
Xiaofan Zhang, University of Illinois at Urbana-Champaign

Natural Language Processing
Anjalie Field, Carnegie Mellon University
Mingda Chen, Toyota Technological Institute at Chicago
Shang-Yu Su, National Taiwan University
Yanai Elazar, Bar-Ilan

Privacy and Security
Julien Gamba, Universidad Carlos III de Madrid
Shuwen Deng, Yale University
Yunusa Simpa Abdulsalm, Mohammed VI Polytechnic University

Programming Technology and Software Engineering
Adriana Sejfia, University of Southern California
John Cyphert, University of Wisconsin-Madison

Quantum Computing
Amira Abbas, University of KwaZulu-Natal
Mozafari Ghoraba Fereshte, EPFL

Structured Data and Database Management
Yanqing Peng, University of Utah

Systems and Networking
Huynh Nguyen Van, University of Technology Sydney
Michael Sammler, Saarland University, MPI-SWS
Sihang Liu, University of Virginia
Yun-Zhan Cai, National Cheng Kung University

Read More

Make your everyday smarter with Jacquard

Technology is most helpful when it’s frictionless. That is why we believe that computing should power experiences through the everyday things around you—an idea we call “ambient computing.” That’s why we developed the Jacquard platform to deliver ambient computing in a familiar, natural way: By building it into things you wear, love and use every day. 

The heart of Jacquard is the Jacquard Tag, a tiny computer built to make everyday items more helpful. We first used this on the sleeve of a jacket so that it could recognize the gestures of the person wearing it, and we built that same technology into the Cit-E backpack with Saint Laurent. Then, we collaborated with Adidas and EA on our GMR shoe insert, enabling its wearers to combine real-life play with the EA SPORTS FIFA mobile game. 

Whether it’s touch or movement-based, the tag can interpret different inputs customized for the garments and gear we’ve collaborated with brands to create. And now we’re sharing that two new backpacks, developed with Samsonite, will integrate Jacquard technology. A fine addition to our collection, the Konnect-I Backpack comes in two styles: Slim ($199) and Standard ($219).

  • Jacquard Samsonite
  • Jacquard Samsonite
  • Jacquard Samsonite
  • Jacquard Samsonite

While they might look like regular backpacks, the left strap unlocks tons of capabilities. Using your Jacquard app, you can customize what gestures control which actions—for instance, you can program Jacquard to deliver call and text notifications, trigger a selfie, control your music or prompt Google Assistant to share the latest news. For an added level of interaction, the LED light on your left strap will light up according to the alerts you’ve set.

This is only the beginning for the Jacquard platform, and thanks to updates, you can expect your Jacquard Tag gear to get better over time. Just like Google wants to make the world’s information universally accessible and useful, we at Jacquard want to help people access information through everyday items and natural movements.

Read More

How TensorFlow docs uses Jupyter notebooks

Posted by Billy Lamberta, TensorFlow Team

Jupyter notebooks are an important part of our TensorFlow documentation infrastructure. With the JupyterCon 2020 conference underway, the TensorFlow docs team would like to share some tools we use to manage a large collection of Jupyter notebooks as a first-class documentation format published on tensorflow.org.

As the TensorFlow ecosystem has grown, the TensorFlow documentation has grown into a substantial software project in its own right. We publish ~270 notebook guides and tutorials on tensorflow.org—all tested and available in GitHub. We also publish an additional ~400 translated notebooks for many languages—all tested like their English counterpart. The tooling we’ve developed to work with Jupyter notebooks helps us manage all this content.

Graph showing Notebooks published

When we published our first notebook on tensorflow.org over two years ago for the 2018 TensorFlow Developer Summit, the community response was fantastic. Users love that they can immediately jump from webpage documentation to an interactive computing experience in Google Colab. This setup allows you to run—and experiment with—our guides and tutorials right in the browser, without installing any software on your machine. This tensorflow.org integration with Colab made it much easier to get started and changed how we could teach TensorFlow using Jupyter notebooks. Other machine learning projects soon followed. Notebooks can be loaded directly from GitHub into Google Colab with just the URL:


For compute-intensive tasks, Colab provides TPUs and GPUs at no cost. The TensorFlow documentation, such as this quickstart tutorial, has buttons that link to both its notebook source in GitHub and to load in Colab.

Better collaboration

Software documentation is a team effort, and notebooks are an expressive, education-focused format that allows engineers and writers to build up an interactive demonstration. Jupyter notebooks are JSON-formatted files that contain text cells and code cells, typically executed in sequential order from top-to-bottom. They are an excellent way to communicate programming ideas, and, with some discipline, a way to share reproducible results.

On the TensorFlow team, notebooks allow engineers, technical writers, and open source contributors to collaborate on the same document without the tension that exists between a separate code example and its published explanation. We write TensorFlow notebooks so that the documentation is the code—self-contained, easily shared, and tested.

Notebook translations with GitLocalize

Documentation needs to reach everyone around the world—something the TensorFlow team values. The TensorFlow community translation project has grown to 10 languages over the past two years. Translation sprints are a great way to engage with the community on open source documentation projects.

To make TensorFlow documentation accessible to even more developers, we worked with Alconost to add Jupyter notebook support to their GitLocalize translation tool. GitLocalize makes it easy to create translated notebooks and sync documentation updates from the source files. Open source contributors can submit pull requests and provide reviews using the TensorFlow GitLocalize project: gitlocalize.com/tensorflow/docs-l10n.

Jupyter notebook support in GitLocalize not only benefits TensorFlow, but is now available for all open source translation projects that use notebooks with GitHub.

TensorFlow docs notebook tools

Incorporating Jupyter notebooks into our docs infrastructure allows us to run and test all the published guides and tutorials to ensure everything on the site works for a new TensorFlow release—using stable or nightly packages.

Benefits aside, there are challenges with managing Jupyter notebooks as source code. To make pull requests and reviews easier for contributors and project maintainers, we created the TensorFlow docs notebook tools to automate common fixes and communicate issues to contributors with continuous integration (CI) tests. You can install the tensorflow-docs pip package directly from the tensorflow/docs GitHub repository:

$ python3 -m pip install -U git+https://github.com/tensorflow/docs


While the Jupyter notebook format is straightforward, notebook authoring environments are often inconsistent with JSON formatting or embed their own metadata in the file. These unnecessary changes can cause diff churn in pull requests that make content reviews difficult. The solution is to use an auto-formatter that outputs consistent notebook JSON.

nbfmt is a notebook formatter with a preference for the TensorFlow docs notebook style. It formats the JSON and strips unneeded metadata except for some Colab-specific fields used for our integration. To run:

$ python3 -m tensorflow_docs.tools.nbfmt [options] notebook.ipynb

For TensorFlow docs projects, notebooks saved without output cells are executed and tested; notebooks saved with output cells are published as-is. We prefer to remove outputs to test our notebooks, but nbfmt can be used with either format.

The --test flag is available for continuous integration tests. Instead of updating the notebook, it returns an error if the notebook is not formatted. We use this in a CI test for one of our GitHub Actions workflows. And with some further bot integration, formatting patches can be automatically applied to the contributor’s pull request.


The easiest way to scale reviews is to let the machine do it. Every project has recurring issues that pop up in reviews, and style questions are often best settled with a style guide (TensorFlow likes the Google developer docs style guide). For a large project, the more patterns you can catch and fix automatically, the more time you’ll have available for other goals.

nblint is a notebook linting tool that checks documentation style rules. We use it to catch common style and structural issues in TensorFlow notebooks:

>$ python3 -m tensorflow_docs.tools.nblint [options] notebook.ipynb

Lints are assertions that test specific sections of the notebook. These lints are collected into style modules. nblint tests the google and tensorflow styles by default, and other style modules can be loaded at the command-line. Some styles require arguments that are also passed at the command-line, for example, setting a different repo when linting the TensorFlow translation notebooks:

$ python3 -m tensorflow_docs.tools.nblint 

Lint tests can have an associated fix that makes it easy to update notebooks to pass style checks automatically. Use the --fix argument to apply lint fixes that overwrite the notebook, for example:

$ python3 -m tensorflow_docs.tools.nblint --fix 
--arg=repo:tensorflow/docs notebook.ipynb

Learn more

TensorFlow is a big fan of Project Jupyter and Jupyter notebooks. Along with Google Colab, notebooks changed how we teach TensorFlow and scale a large open source documentation project with tested guides, tutorials, and translations. We hope that sharing some of the tools will help other open source projects that want to use notebooks as documentation.

Read a TensorFlow tutorial and then run the notebook in Google Colab. To contribute to the TensorFlow documentation project, submit a pull request or a translation review to our GitLocalize project.

Special thanks to Mark Daoust, Wolff Dobson, Yash Katariya, the TensorFlow docs team, and all TensorFlow docs authors, reviewers, contributors, and supporters.

Read More