Lately the patent business has begun to make use of machine-learning (ML) algorithms so as to add effectivity and insights to enterprise practices. 

Any firm, patent workplace, or educational establishment that works with patents—producing them via innovation, processing functions about them, or growing refined methods to research them—will profit from doing patent analytics and machine studying in Google Cloud. 

At this time, we’re excited to launch a white paper that outlines a technique to coach a BERT (bidirectional encoder representation from transformers) mannequin on over 100 million patent publications from the U.S. and different international locations utilizing open-source tooling. The paper describes tips on how to use the skilled mannequin for a lot of use circumstances, together with tips on how to extra successfully carry out prior artwork looking to find out the novelty of a patent software, robotically generate classification codes to help with patent categorization, and autocomplete. The white paper is accompanied by a colab notebook as effectively the trained model hosted in GitHub. 

Google’s launch of the BERT mannequin (paper, blog post, and open-source code) in 2018 was an necessary breakthrough that leveraged transformers to outperform different main state-of-the-art fashions throughout main NLP benchmarks, together with GLUE, MultiNLI, and SQuAD. Shortly after its launch, the BERT framework and lots of extra transformer-based extensions gained widespread business adoption throughout domains like search, chatbots, and translation.

We imagine that the patents area is ripe for the applying of algorithms like BERT as a result of technical traits of patents in addition to their enterprise worth. Technically, the patent corpus is giant (tens of millions of recent patents are issued yearly world-wide), complicated (patent functions usually common ~10,000 phrases and are sometimes meticulously wordsmithed by inventors, attorneys, and patent examiners), distinctive (patents are written in a extremely specialised ‘legalese’ that may be unintelligible to a lay reader), and extremely context dependent (many phrases are used to imply fully various things in numerous patents). 

Patents additionally symbolize great enterprise worth to a lot of organizations, with firms spending tens of billions of {dollars} a yr growing patentable know-how and transacting the rights to make use of the ensuing know-how and patent workplaces all over the world spending extra billions of {dollars} a yr reviewing patent functions.

We hope that our new white paper and its related code and mannequin will assist the broader patent group in its software of ML, together with:

  • Company patent departments seeking to enhance their inside fashions and tooling with extra superior ML methods.

  • Patent workplaces considering leveraging state-of-the-art ML approaches to help with patent examination and prior artwork looking.

  • ML and NLP researchers and lecturers who may not have thought of utilizing the patents corpus to check and develop novel NLP algorithms.

  • Patent researchers and lecturers who may not have thought of making use of the BERT algorithm or different transformer primarily based approaches to their examine of patents and innovation.

To be taught extra, you may obtain the complete white paper, colab notebook, and trained model. Moreover, see Google Patents Public Datasets: Connecting Public, Paid, and Private Patent Data, Expanding your patent set with ML and BigQuery, and Measuring patent claim breadth using Google Patents Public Datasets for extra tutorials that will help you get began with patent analytics in Google Cloud.

Leave a Reply

Your email address will not be published. Required fields are marked *