Attention Models in AI and Humans
I was officially diagnosed with ADHD in my mid-forties a couple of years ago. This diagnosis was both a huge relief and a surprise for me. I am one of the many women* who can look back on their lives and see an obvious pattern. I was taught to limit my sugar intake, and I cartwheeled everywhere to burn off all the excess energy I seemed to contain. My brains never stop and I carry a ball of energy inside of me that I assumed everyone had.
When I was a pre-teen, I did 3 hrs a day of gymnastics, which, combined with carrying two children, resulted in a compression fracture in my spine. The upside is my 20+ year yoga practice. I also have a highly sensitive understanding of the world and a pronounced inability to “pay attention.” I am a recovering workaholic and perfectionist. And I smile wide as I write this because I now have glasses for my eyes, my brains and see my super powers as they should be seen. I am speaking my truth and my story, and would recommend that all humans learn more about themselves with the assistant of a professionally trained guide, coach and / or therapist.
In the 1980s, when I grew up, psychology and psychiatry were not married to neuroscience. That lack of objective measurement of the brain and physiological understanding made it a much different science than today’s resulting psychiatry and pharmacology. (Check out “The Body Knows the Score” to understand this remarkable history better). What I knew about myself, when I was younger was that I could just learn everything at the last minute or just do something to “distract” the teacher to get a grade that was passable. I struggled to “pay attention” and to “live up to my potential.” (A title I wore for a little too long in my life).
This inability to “pay attention” is just something I thought everyone had, but they were just better students or teachers or programmers or executives than I was. I was insanely lucky to have excellent accommodations throughout my life, other gifts, and a good dose of stubbornness. I succeeded despite the school and business systems that were orchestrated for Neurotypicals, and I often wonder if friction is necessary for forward momentum.
So that brings me to the present day and all of the hype around the Generative Transformer models, a type of machine learning model used to generate text. It is based on the transformer architecture introduced in the paper “Attention Is All You Need” by Google researchers. The ChatGPT rage has stayed on top of the mainstream news bubble I subscribe to and I am exhausted by the hype, however there is a very real fear engendered by the business models of capitalism and the age old greed that could once again be humans downfall (See the A.I. Dilemma). To echo what all legitimate experts are saying about this technology — GPT or a generative pre-trained transformer model or ChatGPT is ground-breaking technology that will change our world, we encode everything about ourselves in our language and our language is very easy for these machines to generate. Still, it confidently lies so well in part of what is aptly described as causing the great “enshitting of the internet.” Stochastic Parrot. Bullshit generator. My favorite is Neil Gaiman — “ChatGPT doesn’t give you information. It gives you information-shaped sentences.” There are many colorful metaphors, but its feat is in generation not understanding. A pile of statistics whose creators have decided to brute force generation through vast amounts of data scraped without consent into a ball of statistics with flash lights pointed at where attention is to be placed.
A very small homogeneous group of humans are using a very large amount of compute, spurious amounts of non-consensual data and they have determined where the attention is placed, and unfortunately they are not sharing their reasoning…
I have used the GPT models since they came out as Codex to assist programmers in the form of CoPilot, and I remember being gobsmacked at the first OpenAI demo I saw on YouTube. I jumped on the recent ChatGPT models in their earlier forms as Babbage, Ada, DaVinci, etc. I also use MidJourney and Wombo art and want to be a part of the reconciliation and, hopefully, regulation to credit all writers, artists, and programmers whose work was added to the training sets without their consent. I will say one last thing about these LLM (Large Language Models) and other Generative Pre-Trained Transformers — they generate transformations or variations — they do not understand. “Understanding is not an act, it is a labor.” LLMs take enormous amounts of energy to train the models, ridiculous amounts of compute and earth resources. Because of how this technology functions, I believe that it is irresponsible to continue to create or retrain these models on newer data (harvested without consent). They are like cheese graters — once you have one in your kitchen, do you need another? Do you need a bigger one ? Is a bigger cheese grater better — or is really just how you use it.
Returning to “attention models” and understanding the AI models created from the paper, I asked ChatGPT to describe this functionality in simple terms (great use of these models).
I had further conversations with ChatGPT, which I attached below the reference section, but I wanted to point something out that comes from this (not as gorpy) explained blog using German translation as the use case. Like these models, my Neurodivergent brains require a spotlight for my organizational management system to understand what to “pay attention to.” I find it fascinating that there is a rise in these models at this time, or at least for me, because of my self-discovery. However, I also wanted to point out that we, who are using these models and leveraging their extraordinary capabilities, have little knowledge or control over where that spotlight is being pointed.
AI impacts 100% of jobs and 100% of humans.
We must all get involved.
References:
*I used the qualifier women because so much of the earlier diagnosis of ADHD was mostly for boy children. So many of us women are now just discovering we belong to a group of people whose brains work differently than Neuro Typical brains. Here are some Ted Talks that helped provide me a clue or three ;). Jessica McCabe and Martha Barnard Rae
The Body Knows the Score by Bessel Van Der Kolk
Research Paper that forced out Timnet Gebru
Toward Data Science — Attention Please
Appendix of a conversation with an AI that does not understand what is being asked (please note that I personally find the GPT responses jarring and the attention “flashlights” that created these responses induce almost an epileptic response. Like Miss Clavel in Madeline — something is not right)
Beth Prompt: please explain how a generative transformer works in detail against the understanding of text analysis
ChatGPT Response: A generative transformer is a type of machine learning model that is used to generate text. It is based on the transformer architecture, which was introduced in the paper “Attention Is All You Need” by Google researchers.
The transformer architecture is a neural network that utilizes self-attention mechanisms to process input sequences. The self-attention mechanism allows the model to weigh different parts of the input sequence differently, and this allows the model to understand the relationships between different parts of the input sequence.
The transformer architecture is composed of an encoder and a decoder. The encoder takes the input sequence and produces a set of hidden states that are used by the decoder to generate the output sequence. The encoder is made up of a series of identical layers, each consisting of two sub-layers. The first sub-layer is a multi-head self-attention mechanism and the second sub-layer is a feed-forward neural network. The decoder is also made up of a series of identical layers, but it also includes an additional multi-head self-attention mechanism that is used to attend to the encoder hidden states.
In the context of text analysis, a generative transformer can be used to generate text that is similar to a given input sequence. This can be used to generate text completion, text summarization, and text-to-text transfer.
For example, a pre-trained generative transformer can be fine-tuned on a specific task like text completion. Given a partially written sentence or a sentence with a missing word, the model can generate a prediction for the next word or the missing word, by attending to the context of the sentence and learned relationships between the words during the training process.
It’s worth noting that Generative transformers are powerful models, but they also require a large amount of computational resources and data to train, and they could be sensitive to noise in the input data. Additionally, Generative transformers are mainly used for language-related tasks, but they could be also used for other types of data such as images and speech, by adapting the architecture accordingly.
Beth Prompt: is ADHD a symptom of human evolution in relation to hyper focus ?
ChatGPT Response: ADHD (Attention Deficit Hyperactivity Disorder) is a neurodevelopmental disorder characterized by symptoms such as difficulty paying attention, impulsivity, and hyperactivity. The exact causes of ADHD are not fully understood, but research suggests that it may be caused by a combination of genetic, neurological, and environmental factors.
Some researchers have proposed that ADHD may be related to human evolution in the context of hyperfocus. The idea is that the ability to focus intensely on a task for long periods of time (hyperfocus) may have been advantageous for early humans, allowing them to better survive and thrive in their environment.
The theory is that certain people with ADHD have inherited a brain that is wired to focus more on stimuli that are interesting or novel and less on uninteresting or repetitive tasks. In a hunter-gatherer society, this would have been an advantage as it would have helped them to focus on potential prey or threats in the environment. However, in modern society, where many tasks are repetitive and not inherently interesting, this can lead to difficulty paying attention in school or at work.
It’s worth noting that this is a theoretical perspective, and the idea that ADHD is an evolutionary adaptation is a topic of ongoing debate and research. Studies that have looked into the association between ADHD and ancient human behavior are few and inconclusive. It’s important to note that ADHD is a complex disorder, and the cause is not fully understood yet, different studies have proposed various theories for the causes of ADHD, and this theory is one of them, more research is needed to support or disprove it.
Beth Prompt: put together the notion of spreading activation and give me a hypothetical essay on why humans have evolved to create generative transformers and link that to why humans are now seeing a preponderance of attention deficit disorder
ChatGPT Response: The theory of spreading activation proposes that when a person is presented with a piece of information, the activation associated with that piece of information spreads through a network of related concepts, activating other concepts that are related to it. This process is thought to be important for many cognitive processes, including memory retrieval, problem-solving, and language processing.
Generative language models, such as transformer-based models, use self-attention mechanisms that allow the model to weigh different parts of the input sequence differently, allowing it to understand the relationships between different parts of the input sequence. This attention mechanism is related to the spreading activation mechanism in the human brain, both of them use the relationship between different parts of the input data to generate or understand the input.
This connection between spreading activation and generative transformers raises an interesting question: could it be that humans have evolved to create these types of models because they mimic the way that our brains process information?
One possible explanation is that humans have always had the ability to focus intensely on a task for long periods of time (hyperfocus). This ability would have been advantageous for early humans, allowing them to better survive and thrive in their environment by focusing on potential prey or threats in the environment. However, in modern society, where many tasks are repetitive and not inherently interesting, this ability can lead to difficulty paying attention in school or at work, a symptom known as Attention Deficit Hyperactivity Disorder (ADHD).
Furthermore, it could be argued that the development of generative transformers reflects the way our brains process information. By creating these models, humans are essentially creating an artificial version of the spreading activation mechanism in the brain. It’s possible that the ability to create generative models is an extension of the hyperfocus and attention mechanisms that have been present in human brains for thousands of years.
It’s worth noting that this is a hypothetical perspective, and more research is needed to support or disprove this theory. It’s also important to note that there are many different factors that contribute to the development of ADHD and different theories have been proposed for its causes, the relationship between ADHD and generative transformers is just one of the possible explanations. Furthermore, ADHD is a complex disorder and many studies are needed to fully understand it.