Many machine learning algorithms are trained on a large dataset, and if you later wish to add a few more data then the algorithm must be run on the entire set again from scratch. Even so-called biologically-inspired ones such as neural networks (which are incrementally update-able, though sometimes new data will cause you to unlearn old data…) generally require hundreds of iterations over the same set of data to gradually converge. Other models which can be incrementally updated are order-invariant: i.e., introducing data to the model in a different order will result in the same output. Often (e.g., in Bayesian/’rational’ models), this is seen as desirable (e.g., theoretically we don’t want the order of evidence presented in a court case to influence the jury’s decision), but it departs widely from human behavior. These deviations from order-invariance are often decried as fallacies (though see the heuristics view), but order effects in associative learning studies (e.g., blocking) can be understood to stem from shifting attention driven by familiarity, surprisal, and other factors–all of which make sense, given the constraints of in situ cognition. I have investigated and modeled order effects from associative learning (e.g., highlighting) as well as word-learning. The advantage of having a fast learning rate is that it allows us to quickly test and exploit correlations that we notice in our environment. I am currently working on modeling the starting small effect in grammar learning, and I’m generally interested in asking whether and when it can be advantageous (optimally–over time, or computationally) to learn via incremental updating instead batch updating.
From my study of cognitive psychology, I find the following learning principles to be not only evident in humans, but desirable in any mind that must act now, even as it learns:
- New stimuli are compared to existing representations, then either directly integrated into existing structures (if unsurprising, or a good fit) or generate a new cluster if dissimilar to all existing things
- Repeated encounters with similar enough information should strengthen the representation, whereas…
- One-off occurrences should be stored in episodic memory, but only make the transfer to semantic memory if additional similar stimuli come along to justify a new entry/’cluster’
- There must be a way to periodically (developmentally and perhaps during sleep) reorganize mental representations: splitting/joining concepts; adding/removing/combining feature dimensions