Announcing the First Group of Google Open Source Peer Bonus winners in 2021!

 

Google Open Source Peer Bonus logo

The Google Open Source Peer Bonus program is designed to reward external open source contributors nominated by Googlers for their exceptional contributions to open source. We are very excited to announce our first group of winners in 2021!

Our current winners have contributed to a wide range of projects including Apache Beam, Kubernetes, Tekton and many others. We reward open source enthusiasts not only for their code contributions, but also community work, documentation, mentorships and other types of engagement.

We have award recipients from 25 countries all over the world: Austria, Canada, China, Cyprus, Denmark, Finland, France, Germany, India, Isle of Man, Italy, Japan, Korea, Netherlands, Norway, Russia, Singapore, Spain, Sweden, Switzerland, Uganda, Taiwan, Ukraine, United Kingdom, and the United States.

Open source encourages innovation through collaboration and our modern world, and technology that we rely on, wouldn’t be the same without you—the contributors, who are in many cases volunteers. We would like to thank you for your hard work and congratulate you on receiving this award!

Below is the list of current winners who gave us permission to thank them publicly:

Winner Project
Kashyap Jois Android FHIR SDK
David Allison AnkiDroid
Chad Dombrova Apache Beam
Jeff Klukas Apache Beam
Steve Niemitz Apache Beam
Yoshiki Obata Apache Beam
Jaskirat Singh CHAOSS – Community Health Analytics Open Source Software
Eric Amorde CocoaPods
Subrata Banik coreboot
Ned Batchelder Coverage.py & related CPython internals
Matthew Bryant CursedChrome
Simon Legner devdocs.io
Dmitry Gutov Emacs/company-mode
Brian Jost Firebase
Joe Hinkle Firebase iOS SDK
Lorenzo Fiamigo Firebase iOS SDK
Mike Gerasymenko Firebase iOS SDK
Morten Bek Ditlevsen Firebase iOS SDK
Angel Pons Flashrom
Ole André Vadla Ravnås Frida
Junegunn Choi fzf
Alex Saveau Gradle Play Publisher
Nate Graham KDE
Amit Sagtani KDE Community
Niklas Hansson Kubeflow Pipelines
William Teo Kubeflow Pipelines
Antonio Ojea Kubernetes
Dan Mangum Kubernetes
Jian Zeng Kubernetes
Darrell Commander libjpeg-turbo
James (purpleidea) mgmt
Kareem Ergawy MLIR
Lily Ballard Nix / Fish
Eelco Dolstra Nix, NixOS, Nixpkgs
Samuel Dionne-Riel NixOS
Dmitry Demensky Open source TypeScript definitions for Google Maps Platform
Kay Williams OpenSSF
Hassan Kibirige plotnine
Henry Schreiner pybind11
Paul Moore Python ‘pip’ project
Tzu-ping Chung Python ‘pip’ project
Alex Grönholm Python ‘wheel’ project
Ramon Santamaria raylib
Alexander Weiss restic
Michael Eischer restic
Ben Lesh rxjs
Takeshi Nakatani s3fs
Daniel Wee Soong Lim SymbiFlow
Unai Martinez-Corral SymbiFlow, Surelog, Verible, more
Andrea Frittoli Tekton
Priti Desai Tekton
Vincent Demeester Tekton
Chengyu Zhang testsmt & testsmt/yinyang
Dominik Winterer testsmt & testsmt/yinyang
Tom Rini U-Boot

Thank you for your contributions to open source!

By Maria Tabak — Google Open Source Programs Office


Read More

Analyzing genomic data in families with deep learning

The Genomics team at Google Health is excited to share our latest expansion to DeepVariant – DeepTrio.

First released in 2017, DeepVariant is an open source tool that enables researchers and clinicians to analyze an individual’s genome sequencing data and identify genetic variants, such as those that may cause disease. Our continued work on DeepVariant has been recognized for its top-of-class accuracy. With DeepTrio, we have expanded DeepVariant to be able to consider the genetic variants in the sequence data of a mother-father-child trio.

Humans are diploid organisms, carrying two copies of the human genome. Every individual inherits one copy of the genome from their mother, and the other from their father. Parental inheritance informs analysis of traits and diseases that follow Mendelian inheritance. DeepTrio learns to use the properties of Mendelian inheritance directly from sequencing data in order to more accurately identify genetic variants in cases when both parent and a child sample can be co-analyzed.

Modifying DeepVariant to analyze trio samples

DeepVariant learns to classify positions in a genome as reference or variant using representations of data similar to the “genome browser” which experts use in analysis. “Improving the Accuracy of Genomic Analysis with DeepVariant 1.0” provides a good overview.

DeepVariant receives data as a window of the genome centered on a candidate variant which it is asked to classify as either reference (no variant), heterozygous (one copy of a variant) or homozygous (both copies are variant). DeepVariant sees the sequence evidence as channels representing features of the data (see: “Looking through DeepVariant’s eyes” for a deeper explanation).

We modified DeepTrio to represent the sequence data from a trio in a single image, with a fixed height for each sample and the child in the middle. Using gold standard samples from NIST Genome in a Bottle for truth labels, we train one model to call variants in the child and another to call variants in the top parent. To call both parents, we flip the position of the parent samples.
An image of 4 of the channels that DeepTrio uses in classification (these, and 4 other channels are shown in a stack.

conceptual schematic of how trio files are used to create examples, which are then called by DeepTrio.

Figure 1. (top) An image of 4 of the channels that DeepTrio uses in classification (these, and 4 other channels are shown in a stack. (bottom) conceptual schematic of how trio files are used to create examples, which are then called by DeepTrio.

Measuring DeepTrio’s improved accuracy

We show that DeepTrio is more accurate than DeepVariant for both parent and child variant detection, with an especially pronounced advantage at lower coverages. This enables researchers to either analyze samples at higher accuracy, or to maintain comparable accuracy at a substantially reduced expense.

To assess the accuracy of DeepTrio, we compare its accuracy to DeepVariant using extensively characterized gold standards made available by NIST Genome in a Bottle. In order to have an evaluation dataset which is never seen in training, we exclude chromosome 20 from training and perform evaluations on chromosome 20.

We train DeepVariant and DeepTrio for sequencing data from two different instruments, Illumina and Pacific Biosciences (PacBio), for more information on the differences between these technologies, please see our previous blog. These sequencers both randomly sample the genome in an error-prone manner. To accurately analyze a genome, the same region needs to be sampled repeatedly. The depth of sampling at a position is called coverage. Sequencing to greater coverage is more expensive in an approximately linear manner. This often forces trade-offs between cost, accuracy, and samples sequenced. As a result, in trios parents are often sequenced at lower depth.

In the charts below, we plot the accuracy of DeepTrio and DeepVariant across a range of coverages.

DeepTrio child accuracy

DeepTrio parent accuracy

Figure 2. F1-score for DeepTrio (solid line) and DeepVariant (dashed line) on a child sample (top) and a parent sample (bottom), sequenced with an Illumina (blue) and PacBio (black) instrument. F1 is measured for all types of small variants on chromosome 20, across samples with a range of sequencing coverage (x-axis).

DeepTrio’s performance on de novo variants

Each individual has roughly 5 million variants relative to the human reference genome. The overwhelming majority of these are inherited from their parents. A small number, around 100, are new (referred to as de novo), due to copying errors during DNA replication. We demonstrate that DeepTrio substantially reduces false positives for de novo variants. For Illumina data, this comes with a smaller decrease in recovery of true positives, while for PacBio data, this trade-off does not occur.

To assess accuracy we analyzed sites where both parents are called as non-variant, but the child is called as heterozygous variant. We observe that DeepTrio is more reluctant to call a variant as de novo, which is similar to how a human would require a higher level of evidence for sites violating Mendelian inheritance. This results in a much lower false positive rate for these de novo variants, but a slightly lower recall rate in DeepTrio Illumina. Usually when this occurs, the child is still called as a variant, but the parents are given “no-call” (the classifier is not confident enough to make a call).
Accuracy on de novo calls (child heterozygous variant, parents reference call) for recall of true de novo events

Accuracy on de novo calls (child heterozygous variant, parents reference call) for recall of false positive de novo events

Figure 3. Accuracy on de novo calls (child heterozygous variant, parents reference call) for recall of true de novo events (top) and false positive de novo events (bottom) for DeepTrio (solid line) and DeepVariant (dashed line) on Illumina (blue) and PacBio (black). Accuracy is measured on chromosome 20, across samples with a range of sequencing coverage (x-axis).

Contributing to rare disease research

By releasing DeepTrio as open source software, we hope to improve analysis of genomic data, by allowing scientists to more accurately analyze samples. We hope this will enable research and clinical pipelines, leading to better resolution of rare disease cases, and improve development of therapeutics.

In addition to the release of DeepTrio’s code as open source, we have also released the sequencing data that we generated in order to train these models. That data is described in our pre-print “An Extensive Sequence Dataset of Gold-Standard Samples for Benchmarking and Development”. By releasing both this production model, and the data required to train models of similar complexity, we hope to contribute to methods development by the genomics community.

By Andrew Carroll, Product Lead Genomics and Howard Yang, Program Manager Genomics — Google Health



Read More

Lyra – enabling voice calls for the next billion users

 

Lyra Logo

The past year has shown just how vital online communication is to our lives. Never before has it been more important to clearly understand one another online, regardless of where you are and whatever network conditions are available. That’s why in February we introduced Lyra: a revolutionary new audio codec using machine learning to produce high-quality voice calls.

As part of our efforts to make the best codecs universally available, we are open sourcing Lyra, allowing other developers to power their communications apps and take Lyra in powerful new directions. This release provides the tools needed for developers to encode and decode audio with Lyra, optimized for the 64-bit ARM android platform, with development on Linux. We hope to expand this codebase and develop improvements and support for additional platforms in tandem with the community.

The Lyra Architecture

Lyra’s architecture is separated into two pieces, the encoder and decoder. When someone talks into their phone the encoder captures distinctive attributes from their speech. These speech attributes, also called features, are extracted in chunks of 40ms, then compressed and sent over the network. It is the decoder’s job to convert the features back into an audio waveform that can be played out over the listener’s phone speaker. The features are decoded back into a waveform via a generative model. Generative models are a particular type of machine learning model well suited to recreate a full audio waveform from a limited number of features. The Lyra architecture is very similar to traditional audio codecs, which have formed the backbone of internet communication for decades. Whereas these traditional codecs are based on digital signal processing (DSP) techniques, the key advantage for Lyra comes from the ability of the generative model to reconstruct a high-quality voice signal.

Lyra Architecture Chart

The Impact

While mobile connectivity has steadily increased over the past decade, the explosive growth of on-device compute power has outstripped access to reliable high speed wireless infrastructure. For regions where this contrast exists—in particular developing countries where the next billion internet users are coming online—the promise that technology will enable people to be more connected has remained elusive. Even in areas with highly reliable connections, the emergence of work-from-anywhere and telecommuting have further strained mobile data limits. While Lyra compresses raw audio down to 3kbps for quality that compares favourably to other codecs, such as Opus, it is not aiming to be a complete alternative, but can save meaningful bandwidth in these kinds of scenarios.

These trends provided motivation for Lyra and are the reason our open source library focuses on its potential for real time voice communication. There are also other applications we recognize Lyra may be uniquely well suited for, from archiving large amounts of speech, and saving battery by leveraging the computationally cheap Lyra encoder, to alleviating network congestion in emergency situations where many people are trying to make calls at once. We are excited to see the creativity the open source community is known for applied to Lyra in order to come up with even more unique and impactful applications.

The Open Source Release

The Lyra code is written in C++ for speed, efficiency, and interoperability, using the Bazel build framework with Abseil and the GoogleTest framework for thorough unit testing. The core API provides an interface for encoding and decoding at the file and packet levels. The complete signal processing toolchain is also provided, which includes various filters and transforms. Our example app integrates with the Android NDK to show how to integrate the native Lyra code into a Java-based android app. We also provide the weights and vector quantizers that are necessary to run Lyra.

We are releasing Lyra as a beta version today because we wanted to enable developers and get feedback as soon as possible. As a result, we expect the API and bitstream to change as it is developed. All of the code for running Lyra is open sourced under the Apache license, except for a math kernel, for which a shared library is provided until we can implement a fully open solution over more platforms. We look forward to seeing what people do with Lyra now that it is open sourced. Check out the code and demo on GitHub, let us know what you think, and how you plan to use it!

By Andrew Storus and Michael Chinen – Chrome

Acknowledgements

The following people helped make the open source release possible:
Yero Yeh, Alejandro Luebs, Jamieson Brettle, Tom Denton, Felicia Lim, Bastiaan Kleijn, Jan Skoglund, Yaowu Xu, Jim Bankoski (Chrome), Chenjie Gu, Zach Gleicher, Tom Walters, Norman Casagrande, Luis Cobo, Erich Elsen (DeepMind).


Read More

Don’t Copy That Surface

This post is part of a new series we’re bringing you of deeper dives into the careful trade-offs and complex engineering that goes into making Chrome fast and reliable. This debugging adventure by Chrome developer and blogger Bruce Dawson reduced CPU usage by about 3% when using a webcam – a real help for those of us relying on video calls.


Video conferencing took on elevated importance in 2020. I’m not on the Google Meet team but I do work on Chrome, so I fired up my favorite profiler during one of my daily meetings to see if I could find anything useful.
There is a lot going on during video conferencing, spread across multiple processes. With my usual dozens of tabs open there were 37 Chrome processes, with six of them actively participating in the video conference. In addition there were over 200 other processes running (87 copies of svchost.exe, for instance), with four of those involved in video conferencing. You may well wonder why it takes 10 processes to connect two people, so here is a list of the processes and their roles:
  • audiodg.exe – Windows Audio Device Graph Isolation, audio output
  • dwm.exe – Windows Desktop Window Manager, showing video
  • svchost.exe – Windows Camera Frame Server (webcam capture)
  • System – Windows system process, does miscellaneous tasks on behalf of processes
  • chrome.exe – browser process, the master control program
  • chrome.exe – renderer process, the Meet tab
  • chrome.exe – GPU process, in charge of rendering pages
  • chrome.exe – NetworkService utility process, talking to the network
  • chrome.exe – VideoCaptureService, talking to the Windows Camera Frame Server
  • chrome.exe – AudioService, controls audio input and output

These tasks are spread across different processes for security and stability. If one of them crashes it can be restarted without taking everything down. If one of them is compromised due to a security bug then it is isolated from the rest of the system and the damage may be contained.

This all makes good sense, but having this many processes involved can make performance-profiling challenging. It can be challenging to look through all of these processes to find areas of potential improvement. It is made more difficult by the fact that I know little about the Meet architecture.

Analyzing a profile

Video conferencing is CPU intensive – you have to record, compress, transmit, receive, decompress, and display both audio and video. The data below shows CPU samples recorded by Microsoft’s Event Tracing for Windows (ETW). This sampling profiler works by interrupting every running thread about 1,000 times a second and recording a call stack. I used Windows Performance Analyzer (WPA) to display the results. In the screenshot below I am looking at a 10-second period and over 16,000 samples (representing about 16 seconds of CPU time) were recorded across the 10 processes involved in video conferencing:

That’s a lot of samples to look through, but the call stacks are collated so that you can drill down on the busiest stacks. I didn’t find anything in the first Chrome process, but in the second one I did:

It doesn’t look like much, but I recognized immediately that the 124 samples in KiPageFault were worth investigating. Most of the CPU-intensive work in this trace was important and unavoidable work but I had a hunch that these samples represented avoidable work – something that I could fix. And, even though they represented just 0.75% of the samples I suspected that they indicated a somewhat greater cost.

I recognized their importance immediately because this is something that I have seen before. KiPageFault means that the processor touched some memory that had been allocated, but was not currently in the process. This could mean that the pages had been removed from the process to save memory, but in an active process on a machine with lots of memory, that didn’t make sense. What was more likely was that this represented recently allocated memory.

When a program allocates a small amount of memory, the local memory manager (sometimes called the “heap”) will usually have some available that it can give to the program. But if it doesn’t have an appropriate block of memory then it will ask the operating system (OS) for some. If a program allocates a large amount of memory (greater than a MB or so) then the heap will definitely ask for more memory. This is, in itself, a relatively cheap operation. The heap asks the OS for some memory, the OS says “sure”, then the OS makes note of the fact that it promised this memory, and that’s it. The OS does not, at that time, actually give the program any memory. This is the way of the world on Windows, Linux, Android and it is good but it can be confusing and surprising. If the process never touches the memory then the memory is never added to the process, but if the process does touch the memory then individual pages of zeroed memory are brought into the process. These are called demand-zero page faults because zeroed pages are “faulted” into the process on demand.

In other words, allocating a large block of memory is quite cheap, but doesn’t actually set up the promised memory. Then, when the program tries to use the memory and the CPU discovers that there is no memory at that address it triggers an exception, which wakes up the OS. The OS checks its records and realizes that it did in fact promise to put memory at that address so it then puts some there and restarts the program. This happens so quickly that if you’re not paying attention you will miss it, but it shows up when profiling as samples hitting in KiPageFault.

This bizarre dance happens again for every 4-KiB block in the allocation – 4 KiB is the size of the pages that the CPU and the OS work on.

The cost is small. Across this 10-second period only 124 samples – representing about 124 ms or 0.124 seconds – hit inside of KiPageFault. The total cost of the enclosing CopyImage_SSE4_1 function was about 240 ms, so the page faults accounted for more than half of this function, but barely a quarter of the cost of the OnSample function on line 15.

The total costs of these page faults is modest but they hint at many other costs:

  • If this memory is being allocated repeatedly (presumably every frame) then it must also be freed every frame. On line 26 we can see that the Release function which frees the memory uses another 64 samples.
  • When the pages are freed the operating system has to zero them (for security reasons) so that they are ready to be reused. This is done in the Windows System process – an almost entirely hidden cost. Sure enough when I looked in the System process I saw 138 samples in the MiZeroPageThread. I found that 87% of the KiPageFault samples in the entire system were in the CopyImage_SSE4_1 call so presumably 87% of the 138 samples in the MiZeroPageThread were due to this pattern.

I analyzed these hidden costs of memory allocation in a 2014 blog post. The basic memory architecture of Windows hasn’t changed since then so the hidden costs remain about the same.

In addition to CPU samples my ETW trace contained call stacks for every call to VirtualAlloc. This WPA screenshot shows a 10-second period where the OnSample function does 298 allocations that are each 1.320 MB, roughly 30 per second:

At this point we can estimate that the cost of these repeated allocations is 124 (faulting in) plus 64 (freeing) plus 124 (87% of the zeroing samples) for a total of 312 samples. This gets us up to 1.9% of the total CPU cost of video conferencing. Fixing this is not going to change the world, but this is a change worth doing.

But wait, there’s more!

We are locking this buffer so that we can look at the contents, but it turns out we don’t actually want the lock call to copy the buffer at all. We just want the lock call to describe the buffer to us so that we can look at it in place. Therefore the entire cost of the MFCopyImage call is waste! That’s another 116 samples. In addition, in the CMF2DMediaBuffer::Unlock call on line 26 there is another call to CMF2DMediaBuffer::ContiguousCopyFrom. That’s because the Unlock call assumes that we might have modified the copy of the buffer, so it copies it back. So the 101 samples there are all waste as well!

If we can examine this buffer without the alloc/copy/copy/free dance then we can save 312 samples plus 116 samples (the rest of the copying cost) plus 101 samples (the copying-it-back cost) for a total saving of 3.2%. This is getting better all the time.

Note that sampled data is only statistically valid, and the actual percentages vary significantly depending on the computer and the exact workload. But, the point remains – it is a non-dramatic but worthwhile change to investigate.

Despite spending years in the video-game business my knowledge of these graphics-buffer locking and unlocking APIs is weak. I ended up relying on the wisdom of my Twitter followers to come to the conclusion that the copying was entirely avoidable, and to get a rough pattern for how it could be fixed. After filing an overly verbose bug I delegated the task of actually fixing it. The fix landed in M85 and was deemed important enough that it was then backported to M84.

You’d have to be paying very close attention to see the difference – spread across a Chrome process and the system process – but I hope that this helped some computers run a bit cooler and last longer on their batteries. And, while this inefficiency was found by profiling Google Meet, the improvement actually benefits any product that uses the webcam inside Chrome (and other Chromium-based browsers).

Verification

After the fix landed I compared two 10-second ETW traces from Chrome Canary before and after the change, each taken with no other programs running except a single Chrome tab running the Google Meet pre-meeting page. In both cases I looked at a 10-second period of time in the profiler. This showed:

CPU time in OnSample:

Before: 458 ms (432 ms of which were in Lock/Unlock/KiPageFault)
After: 27 ms

Allocations:

Before: 30 allocations per second of 1.32 MB (one per frame, running at 30 fps – a higher framerate would mean more allocations), totalling 396 MB over 10 seconds
After: 0 allocations

CPU time in the System process’s MiZeroPageThread:

Before: 36 ms
After: 0 ms

These measurements showed – in three different ways – that the performance problem was fixed. The memory copying in OnSample was gone, the repeated allocations were gone, and the system process was doing less work. Mission accomplished, bug closed.

Read More

Introducing TestParameterInjector: A JUnit4 parameterized test runner

 When writing unit tests, you may want to run the same or a very similar test for different inputs or input/output pairs. In Java, as in most programming languages, the best way to do this is by using a parameterized test framework.

JUnit4 has a number of such frameworks available, such as junit.runners.Parameterized and JUnitParams. A couple of years ago, a few Google engineers found the existing frameworks lacking in functionality and simplicity, and decided to create their own alternative. After a lot of tweaks, fixes, and feature additions based on feedback from clients all over Google, we arrived at what TestParameterInjector is today.

As can be seen in the graph below, TestParameterInjector is now the most used framework for new tests in the Google codebase:

Graph of the different parameterized test frameworks in Google

How does TestParameterInjector work?

The TestParameterInjector exposes two annotations: @TestParameter and @TestParameters. The following code snippet shows how the former works:


@RunWith(TestParameterInjector.class)

public class MyTest {

  @TestParameter boolean isDryRun;

  @Test public void test1(@TestParameter boolean enableFlag) {

    // This method is run 4 times for all combinations of isDryRun and enableFlag

  }

  @Test public void test2(@TestParameter MyEnum myEnum) {

    // This method is run 6 times for all combinations of isDryRun and myEnum

  }

  enum MyEnum { VALUE_A, VALUE_B, VALUE_C }

}

Annotated fields (such as isDryRun) will cause each test method to run for all possible values while annotated method parameters (such as enableFlag) will only impact that test method. Note that the generated test names will typically be helpful but concise, for example: MyTest#test2[isDryRun=true, VALUE_A].
The other annotation, @TestParameters, can be seen at work in this snippet:

@RunWith(TestParameterInjector.class)

public class MyTest {

  @Test

  @TestParameters({

    “{age: 17, expectIsAdult: false}”,

    “{age: 22, expectIsAdult: true}”,

  })

  public void personIsAdult(int age, boolean expectIsAdult) {

    // This method is run 2 times

  }

}

In contrast to the first example, which tests all combinations, a @TestParameters-annotated method runs once for each test case specified.

How does TestParameterInjector compare to other frameworks?

To our knowledge, the table below summarizes the features of the different frameworks in use at Google:

TestParameterInjector

junit.runners. Parameterized

JUnitParams

Burst

DataProvider

Jukito

Theories

Documentation

GitHub

GitHub

GitHub

GitHub

GitHub

GitHub

junit.org

Supports field injection

Supports parameter injection

Considers sets of parameters correlated or orthogonal

both are supported

correlated

correlated

orthogonal

correlated

orthogon

al

orthogonal

Refactor friendly

(✓)

Learn more

Our GitHub README at https://github.com/google/TestParameterInjector gives an overview of the possibilities of this framework. Let us know on GitHub if you have any questions, comments, or feature requests!

By Jens Nyman, TestParameterInjector team


1Parameters are considered dependent. You specify explicit combinations to be run.

2Parameters are considered independent. The framework will run all possible combinations of parameters


Read More

Student applications for Google Summer of Code 2021 are now open!

Student applications for Google Summer of Code (GSoC) 2021 are now open!

Google Summer of Code introduces students from around the world to open source communities. The program exposes students to real-world software development scenarios, helps them develop their technical skills, and introduces them to our enthusiastic and generous community of GSoC mentors. Since 2005, GSoC has brought over 16,000 student developers from 111 countries into 715 open source communities!

Google Summer of Code logo

Now in our 17th consecutive year, the GSoC program has made some exciting changes for 2021. Students will now focus on a 175-hour project over a 10-week coding period (entirely online) and receive stipends based on the successful completion of their project milestones. We are also opening up the program to students 18 years of age and older, who are enrolled in post-secondary academic programs (including university, masters, PhD programs, licensed coding schools, community colleges, etc.) or have graduated from such a program between December 1, 2020 and May 17, 2021.

Ready to apply? The first step is to browse the list of 2021 GSoC organizations and look for project ideas that appeal to you. Next, reach out to the organization to introduce yourself and determine if your skills and interests are a good fit. Since spots are limited, we recommend writing a strong proposal and submitting a draft early so you can communicate with the organization and get their feedback to increase your odds of being selected. We recommend reading through the student guide and advice for students for important tips on preparing your proposal. Students may register and submit project proposals on the GSoC site from now until Tuesday, April 13th at 19:00 UTC.

You can find more information on our website, which includes a full timeline of important dates, GSoC videos, FAQ’s and Program Rules.

Good luck to all of the student applicants!

By Romina Vicente, Project Coordinator for Google Open Source Programs Office


Read More

A safer default for navigation: HTTPS

Starting in version 90, Chrome’s address bar will use https:// by default, improving privacy and even loading speed for users visiting websites that support HTTPS. Chrome users who navigate to websites by manually typing a URL often don’t include “http://” or “https://”. For example, users often type “example.com” instead of “https://example.com” in the address bar. In this case, if it was a user’s first visit to a website, Chrome would previously choose http:// as the default protocol1. This was a practical default in the past, when much of the web did not support HTTPS.

Chrome will now default to HTTPS for most typed navigations that don’t specify a protocol2. HTTPS is the more secure and most widely used scheme in Chrome on all major platforms. In addition to being a clear security and privacy improvement, this change improves the initial loading speed of sites that support HTTPS, since Chrome will connect directly to the HTTPS endpoint without needing to be redirected from http:// to https://. For sites that don’t yet support HTTPS, Chrome will fall back to HTTP when the HTTPS attempt fails. This change is rolling out initially on Chrome Desktop and Chrome for Android in version 90, with a release for Chrome on iOS following soon after.

HTTPS protects users by encrypting traffic sent over the network, so that sensitive information users enter on websites cannot be intercepted or modified by attackers or eavesdroppers. Chrome is invested in ensuring that HTTPS is the default protocol for the web, and this change is one more step towards ensuring Chrome always uses secure connections by default.

1 One notable exception to this is any site in the HSTS preload list, which Chrome will always default to HTTPS.
2 IP addresses, single label domains, and reserved hostnames such as test/ or localhost/ will continue defaulting to HTTP.


Read More

Season of Docs announces the successful 2020 long-running projects

And, that’s a wrap! Season of Docs has announced the 2020 program results for long-running projects. You can view a list of successfully completed technical writing projects on the website along with their final project reports.


15 technical writers successfully completed their long-running technical writing projects. During the program, technical writers spent a few months working closely with an open source community. They brought their technical writing expertise to improve the project’s documentation while the open source projects provided mentors to introduce the technical writers to open source tools, workflows, and the project’s technology.

Congratulations to the technical writers and organization mentors on these successful projects!

What’s next?

Program participants should expect an email in the next few weeks about how to get their Season of Docs 2020 t-shirt!

If you were excited about participating, please do write social media posts. See the promotion and press page for images and other promotional materials you can include, and be sure to use the tag #SeasonOfDocs when promoting your project on social media. To include the tech writing and open source communities, add #WriteTheDocs, #techcomm, #TechnicalWriting, and #OpenSource to your posts.

If you’re interested in participating in a future Season of Docs, we’re currently accepting organization applications for the 2021 program. Be sure to sign up for the announcements email list to stay informed!

By Kassandra Dhillon and Erin McKean, Google Open Source Programs Office


Read More

Mitigating Side-Channel Attacks

The web platform relies on the origin as a fundamental security boundary, and browsers do a pretty good job at preventing explicit leakage of data from one origin to another. Attacks like Spectre, however, show that we still have work to do to mitigate implicit data leakage. The side-channels exploited through these attacks prove that attackers can read any data which enters a process hosting that attackers’ code. These attacks are quite practical today, and pose a real risk to users.

Our goal must be to ensure that sensitive data doesn’t unexpectedly enter an attacker’s process. Browsers shoulder a large chunk of this responsibility: Chromium’s Site Isolation can separate the sites you visit into distinct OS-level processes, cross-origin read blocking prevents attackers from loading a subset of otherwise-vulnerable cross-origin resources, and APIs that substantially increase attackers’ bandwidth (like SharedArrayBuffer) are being locked to cross-origin isolated contexts. This last mechanism, however, points in the direction of work that browsers can’t do on their own.

Web developers know their applications intimately, and can make informed decisions about each page’s and resource’s risk of exposure. To defend users’ data against exfiltration, web developers must step in, evaluate resources they host, and instruct browsers to isolate those resources accordingly. At a high-level, this defense consists of:

  1. Deciding when to respond to requests by examining incoming headers, paying special attention to the Origin header on the one hand, and various Sec-Fetch- prefixed headers on the other.
  2. Restricting attackers’ ability to load resources as a subresource by setting a cross-origin resource policy of same-origin (opening up to same-site or cross-origin only when necessary).
  3. Restricting attackers’ ability to frame resources as a document by opting into framing protections via X-Frame-Options: SAMEORIGIN or CSP’s more granular frame-ancestors directive: for example, frame-ancestors ‘self’ https://trusted.embedder.
  4. Restricting attackers’ ability to reference your application’s windows by setting a cross-origin opener policy. In the best case, you can default to a restrictive same-origin value, opening up to same-origin-allow-popups or unsafe-none only if necessary.
  5. Preventing MIME-type confusion attacks and increasing the robustness of passive defenses like cross-origin read blocking by setting correct Content-Type headers, and globally asserting X-Content-Type-Options: nosniff.

Together, these various defenses help all browsers offer some degree of process-level protection for users’ data, whether or not Site Isolation is available.

To find out more about employing these defenses, check out Post-Spectre Web Development. It includes practical examples that explain in more detail how the security primitives discussed above apply to resources you might have on your sites.

These are useful steps you can take today to protect your origin against implicit data leaks. Looking forward, we hope to help the web shift to safer defaults which protect users against these attacks without requiring developer action.


Read More

Chrome 90 Beta: AV1 Encoder for WebRTC, New Origin Trials, and More

Unless otherwise noted, changes described below apply to the newest Chrome beta channel release for Android, Chrome OS, Linux, macOS, and Windows. Learn more about the features listed here through the provided links or from the list on ChromeStatus.com. Chrome 90 is beta as of March 11, 2021.

AV1 Encoder

An AV1 encoder is shipping in Chrome desktop that is specifically optimized for video conferencing with WebRTC integration. The benefits of AV1 include:

  • Better compression efficiency than other types of video encoding, reducing bandwidth consumption and improve visual quality
  • Enabling video for users on very low bandwidth networks (offering video at 30kbps and lower)
  • Significant screen sharing efficiency improvements over VP9 and other codecs.

This is an important addition to WebRTC especially since it recently became an official W3C and IETF standard.

Origin Trials

This version of Chrome introduces the origin trials described below. Origin trials allow you to try new features and give feedback on usability, practicality, and effectiveness to the web standards community. To register for any of the origin trials currently supported in Chrome, including the ones described below, visit the Chrome Origin Trials dashboard. To learn more about origin trials in Chrome, visit the Origin Trials Guide for Web Developers. Microsoft Edge runs its own origin trials separate from Chrome. To learn more, see the Microsoft Edge Origin Trials Developer Console.

New Origin Trials

getCurrentBrowsingContextMedia()

The mediaDevices.getCurrentBrowsingContextMedia() method allows capturing a MediaStream with the current tab’s video (and potentially audio), similar to getDisplayMedia(). Unlike getDisplayMedia(), calling this new method presents the user with a simple accept/reject dialog box. If the user accepts, the current tab is captured. In all other ways, getCurrentBrowsingContextMedia() is identical to getDisplayMedia(). This origin trial is expected to run through Chrome 92.

MediaStreamTrack Insertable Streams (a.k.a. Breakout Box)

An API for manipulating raw media carried by MediaStreamTracks such as the output of a camera, microphone, screen capture, or the decoder part of a codec and the input to the decoder part of a codec. It uses WebCodecs interfaces to represent raw media frames and exposes them using streams, similar to the way the WebRTC Insertable Streams API exposes encoded data from RTCPeerConnections. This is intended to support use cases such as:

  • Funny Hats: Refers to manipulation of media either before encoding and after decoding to provide effects such as background removal, funny hats, voice effects.
  • Machine Learning: Refers to applications such as real-time object identification/annotation.

This origin trial is expected to run through Chrome 92.

Subresource Loading with Web Bundles

Subresource loading with Web Bundles provides a new approach to loading a large number of resources efficiently using a format that allows multiple resources to be bundled., e.g. Web Bundles. The output of JavaScript bundlers (e.g. webpack) doesn’t interact well with browsers. They are good tools but:

  • Their output can’t interact with the HTTP cache at any finer grain than the bundle itself (not solved with this origin trial). This can make them incompatible with new requirements like dynamic bundling (e.g. small edit with tree shaking could invalidate everything).
  • They force compilation and execution to wait until all bytes have arrived. Ideally loading multiple subresources should be able to utilize full streaming and parallelization, but that’s not possible if all resources are bundled as one JavaScript file. (This origin trial allows compilation to proceed in parallel. For JavaScript modules, execution still needs to wait for the entire tree due to the current deterministic execution model).
  • They can require non-JavaScript resources like stylesheets and images to be encoded as JavaScript strings, which forces them to be parsed twice and can increase their size. This origin trial allows those resources to be loaded in their original form.

This origin trial is expected to run through Chrome 92.

WebAssembly Exception Handling

WebAssembly now provides exception handling support. Exception handling allows code to break control flow when an exception is thrown. The exception can be any that is known by the WebAssembly module, or it may be an unknown exception that was thrown by a called imported function. This origin trial is expected to run through Chrome 94.

Completed Origin Trials

The following features, previously in a Chrome origin trial, are now enabled by default.

WebXR AR Lighting Estimation

Lighting estimation allows sites to query for estimates of the environmental lighting conditions within WebXR sessions. This exposes both spherical harmonics representing the ambient lighting, as well as a cubemap texture representing “reflections”. Adding Lighting Estimation can make your models feel more natural and like they “fit” better with the user’s environment.

Other Features in this Release

CSS

aspect-ratio Interpolation

The aspect-ratio property allows for automatically computing the other dimension if only one of width or height is specified on any element. This property was originally launched as non-interpolable (meaning that it would snap to the target value) when animated. This feature provides smooth interpolation from one aspect ratio to another.

Custom State Pseudo Classes

Custom elements now expose their states via the state CSS pseudo class. Built-in elements have states that can change over time depending on user interaction and other factors, which are exposed to web authors through pseudo classes. For example, some form controls have the “invalid” state, which is exposed through the :invalid pseudo class. Since custom elements also have states it makes sense to expose their states in a manner similar to built-in elements.

Implement ‘auto’ value for appearance and -webkit-appearance

The default values of CSS property appearance and -webkit-appearance for the following form controls are changed to 'auto'.

  • <input type=color> and <select>
  • Android only: <input type=date>, <input type=datetime-local>, <input type=month>, <input type=time>, and <input type=week>

Note that the default rendering of these controls are not changed.

overflow: clip Property

The clip value for overflow results in a box’s content being clipped to the box’s overflow clip edge. In addition, no scrolling interface is provided, and the content cannot be scrolled by the user or programmatically. Additionally the box is not considered a scroll container, and does not start a new formatting context. As a result, this value has better performance than overflow: hidden.

overflow-clip-margin Property

The overflow-clip-margin property enables specifying how far outside the bounds an element is allowed to paint before being clipped. It also allows the developer to expand the clip border. This is particularly useful for cases where there is ink overflow that should be visible.

Permissions-Policy Header

The Permissions-Policy HTTP header replaces the existing Feature-Policy header for controlling delegation of permissions and powerful features. The header allows sites to more tightly restrict which origins can be granted access to features.

The Feature Policy API, introduced in Chrome 74, was recently renamed to “Permissions Policy”, and the HTTP header has been renamed along with it. At the same time, the community has settled on a new syntax, based on structured field values for HTTP.

Protect application/x-protobuffer via Cross-Origin-Read-Blocking

Protect application/x-protobuffer from speculative execution attacks by adding it to the list of never sniffed MIME types used by Cross-Origin-Read-Blocking. application/x-protobuf is already protected as a never sniffed mime type. application/x-protobuffer is another commonly used MIME type that is defined as an "ALT_CONTENT_TYPE" by the protobuf library.

Seeking Past the End of a File in the File System Access API

When data is passed to FileSystemWritableFileStream.write() that would extend past the end of the file, the file is now extended by writing 0x00 (NUL). This enables creating sparse files and greatly simplifies saving content to a file when the data to be written is received out of order.
Without this functionality, applications that receive file contents out of order (for example, BiTtorrent downloads) would have to manually resize the file either ahead of time or when needed during writing.

StaticRange Constructor

Currently, Range is the only constructible range type available to web authors. However, Range objects are “live” and maintaining them can be expensive. For every tree change, all affected Range objects need to be updated. The new StaticRange objects are not live and represent a lightweight range type that is not subject to the same maintenance cost as Range. Making StaticRange constructible allows web authors to use them for ranges that do not need to be updated on every DOM tree change.

Support Specifying Width and Height on <source> Elements for <picture>

The <source> element now supports width and height properties when used inside a <picture> element. This allows Chrome to compute an aspect ratio for <picture> elements. This matches similar behavior for <img>, <canvas> and <video> elements.

WebAudio: OscillatorOptions.periodicWave is Not Nullable

It is no longer possible to set periodicWave to null when creating a new OscillatorNode object. This value is set on the options object passed to the OscillatorNode() constructor. The WebAudio spec doesn’t allow setting this value to null. Chrome now matches both the spec and Firefox.

JavaScript

This version of Chrome incorporates version 9.0 of the V8 JavaScript engine. It specifically includes the changes listed below. You can find a complete list of recent features in the V8 release notes.

Relative Indexing Method for Array, String, and TypedArrays

Array, String, and TypedArray now support the at() method, which supports relative indexing with negative numbers. For example, the code below returns the last item in the given array.

let arr = [1,2,3,4];
arr.at(-1);

Deprecations, and Removals

This version of Chrome introduces the deprecations and removals listed below. Visit ChromeStatus.com for lists of current deprecations and previous removals.

Remove Content Security Policy Directive ‘plugin-types’

The ‘plugin-types’ directive allows developers to restrict which types of plugin can be loaded via <embed> or <object> html elements. This allowed developers to block Flash in their pages. Since Flash support has been discontinued, there is no longer any need for this policy directive.

Remove WebRTC RTP Data Channels

Chrome has removed support for the non-standard RTP data channels in WebRTC. Users should use the standard SCTP-based data channels instead.

Return Empty for navigator.plugins and navigator.mimeTypes

Chrome now returns empty for navigator.plugins and navigator.mimeTypes. With the removal of Flash, there is no longer the need to return anything for these properties.

Read More