top of page
  • armstrongWebb

AI and the rise of the Transformers

Updated: Aug 11, 2021


No, I'm not referring to the polymorphic robots, so beloved as wannabe Christmas gifts (well, the toy versions anyway).


In my previous blog post on Natural Language Processing (NLP), I wrote about Transformers. Released in 2017, Google introduced the world to the concept of the Transformer. And it has transformed the way AI processes text.


In summary, the Transformer is the biggest step forward in increasing AI's ability to 'understand' language and context. It is incorporated in models such as OpenAI's GPT-3, which is acknowledged as one of the most, if not the most, powerful GPT models currently available.


GPT-3 uses a Deep Learning model that is trained on data representing more than 410 billion pairs of tokens (a token is, for example, a character, or a word). And it is estimated to have cost approximately $4.6 million of computing resources to train the model.



Given some text, GPT-3 assesses what the text is aiming to convey and the context within which it is being stated. It processes this information to generate the next letter, word, sentence or paragraph etc. This is then fed back into the model, to generate the next letter, word etc. Some of the quality of the generated text is so high that it is difficult to differentiate it from that of a flesh-and-blood author. So, a simple example with a generic GPT might be:


Input to a GPT model: The cat sat


Likely to be generated by the GPT model (eg): on the mat.


And because GPT understands context, it is highly unlikely to output: an exam.


The GPT-generated text is not just limited to prose, as we shall see.


The main down-side of a GPT, such as GPT-3, is that it is not free to use. Hardly surprising given the resources that have gone in to its creation.


Enter GPT-Neo and GPT-J....




A GPT series of Transformers has been created by the organisation EleautherAI (EAI). EAI's objective is to provide Open Source GPT Transformers. And they've made an extremely impressive start.


Already available is their GPT-Neo-2.7B model that sports 2.7 billion tuneable parameters. Now this may not sound like much compared with GPT-3's 170 billion, but bear in mind that OpenAI's preceding model was GPT-2, which had 1.5 billion parameters.


If you follow the link to EleutherAI's web page on HuggingFace (a link is available at the bottom of this article), you will see the following...

Well, almost. I have entered The cat sat on the mat.


When I clicked on Compute, the model returned:


...as you can see it has generated words that are in context with the preceding text.



Moving up a gear to GPT-J-6B...


In June 2021, ELeutherAI also made their 6 billion parameter model available for Open Source use. It has been trained on 835 GB of data (including Wikipedia), representing over 400 billion tokens.


I have initialised and run the model againt three different types of text examples:

  1. A very short story. I provided a starting sentence and the Model generated related text;

  2. A program. I described the (simple) program I wanted, in what programming language (Python), and it returned a piece of related program code;

  3. 5 ways to combat Climate Change.


These are now demonstrated. The only part I changed in each case is the text labelled context, which is then provided to the GPT-J-6B model as input text..



1. A very short story...



...and a few seconds later it generated this (my initial text is in bold):




2. A program. Calculating the circumference and area of a circle of 'Radius1'.


...it generated the following output:

The generated code works and does precisely what I requested (assuming 'pi' has previously been defined)!



3. 5 ways to combat Climate Change



...generated:



In summary....


GPT NLP models continue to make great strides. And are becoming increasingly effective at 'understanding' the context of text. This, coupled with their potentially huge knowledge bases, has the makings of creating very powerful and disruptive tools.


Although GPT-3 is the most powerful model out there, the more accessible and impressive GPT-7-6B is available to all.


Try the GPT-Neo-2.7B model out for yourself. Enter some text and see what it generates. Click this link to go to GPT-Neo-2.7B

25 views0 comments

Kommentare


bottom of page