Model of learning and covert attention
The model is a neural network, where learning and selective attention take place simultaneously. Attention focuses not only momentarily to relevant targets, but it also guides learning. Attention emerges from local competition in all parts of the network; the network implements the biased-competition model. The network is hierarchical, with alternating feature expansion and dimension reduction layers. The upper levels develop increasingly global and invariant representations.
The network can be fed with many kind of information (e.g. visual or auditory or linguistic, or all of these). It then learns to represent independent objects or dynamical events in this data, and learns to selectively attend to a limited set of targets at a time.
A schema of the network is illustrated in Figure 1. The whole network consists of smaller networks, which could roughly correspond to the hypercolumns in the cortex. Neurons in one such unit receive the same set of inputs, which they learn to represent. The units can have for example 10-300 neurons. These neurons compete with each other softly. Few neurons with the most excitation stay active after the competition. Contextual connections bias this competition.
Figure 1. Architecture of the model.
The units can receive many kind of contextual inputs (bias):
- Lateral bias from the neighbouring units leads to Gestalt like continuous representations across different units.
- Temporal context leads to temporally regular representations.
- Context from the above layers allow for choosing those features that are relevant for the high-level intentions of the cortex.
- Long-range connections can mediate context that allows for representing coherent information between modalities. For instance, motor cortex can bias the visual cortex to represent such objects that relate to the current actions of the animal.
The network has dynamics on three different time-scales:
- Instantaneously, selective attention focuses on different targets.
- Associations between different features are learnt fast. Because of coherent attention, the features that are represented simulateneously usually relate to each other, even if they are in different sensory modalities. Therefore, associations between them can be learnt easily. These associations are then used as one form of context to the competing neural populations.
- Slow tuning of features. The bottom-up weights are adapted in Hebbian style. Attention and lateral associations guide this learning to form relevant invariances and features that represent independent components/objects in the world.
More detailed description of the model can be found from my master's thesis
The experiments I have done with this model are with abstract vectors. I have only used static data. Therefore things such as novelty detection or learning of dynamics or temporal pattern recognition haven't yet been tested. I am now working with real visual data.
Coherent selection of population codes
In this experiment, the selection and information integration capabilities of the network were. The result was that the network is able to select between different population codes. While neurons compete only on local level, there emerges global competition between large population codes. In addition, when a new target wins the competition in some part of the net, it can cause changes in distal parts from that place; the whole network can start to represent the specific targert.
The bottom-up weights were not adapted, only the contextual (association) weights. The network had two distinct "modalities" with one layer, and then one "association area", that received inputs from both of the former layers. The two lower layers could not communicate directly, but only through the association area (first bottom-up connection to the association area, then contextual connections from there to the other modality).
The network was taught with a set of random vectors. While the first half of the vector is fed to the first modality, the second half is fed to the second modality.
After the network learnt these vectors, it was fed with a superposition of two different vectors it had learnt. Let us call these vector "dog" and "cat". The first modality was fed with a sum of dog and a cat, but on the other modality the sum was weighted. For example, the second modality could receive input that is 0.2 * dog + 0.8 * cat. The mixture ratio of cat and dog to the second modality was altered.
Figure 2 shows the results. The second modality represent both cat and a dog as population vectors. There is soft selection, though. Around mixture ratio 0.5, the representation strengths vary more steeply than the inputs. When the association area receives this information, it performs even more selection. Finally, this information travels to the first modality through contextual connections, where it causes selection between equally stong inputs.
Figure 2. The highest figure depicts the representation strenghts in the association area. The lowest figures depict the representation (solid lines) and input (dashed lines) strengths of the two modalities. On y-axis is the strength and on x-axis is the mixture ratio of two different input vectors to the second modality.
In this experiment, the bottom-up weights were adaptive, too. The network had six layers, with alternating feature expansion (simple cell) and dimension reduction (complex cell) layers. The data consisted of 204-dimensional vectors, which were generated randomly from 5 different object clusters. Each object had hundreds of thousands of different instantiations. The network learnt to represent the object clusters invariantly with respect to the actual appearance of the object, although it was not given the class information as a teaching signal.
More detailed description can be found in my thesis
In this experiment, the same network as in the invariance experiment was used. This time, the association weight synapses were adaptive, such that they lose strength temporarily if they are on. This causes the population codes to lose competition after they have been represented a while, and attention will jump to a new target.
Figure 3. Strength of representation of different objects as a function of time. The left figure is with ambiguous input. The right figure is with only one object as input (the "leg" object).