
This work was sponsored by enioka.
The task one is:
Given a finite set of objects
pool, and an objectquery. Look, which ever comes first, within approximatelyttime, or at mostnobjects subset ofpoolthat are nearqueryaccording to the metricm, call that subsetcandidates.
Most popular approach nowadays, and the one I could start to get under control is to rely on Hierarchical Navigable Small Worlds. HNSW is reminiscent of Ben Goertzel glocals from opencog fame.
HNSW datastructure use a deterministic heuristic with the distance function to construct clusters, still, it use random to spread pression on memory, and compute. It looks like a skip-list with multiple path ways. A regular natural sort will not speed queries, because of that glocals are not sorted, instead nodes are related to points in a higher level layer. The higher the level the less points they are. The highest level has only one point. Each point has a maximum x neighbor in a lower level. Horizontal navigation must be done through the parent point.
While I put that whole on a flashcard, you can delice with the following brute force algorithm that will generate a pool of points nearest to a point query:
(import (chezscheme))
(import (letloop r999))
(import (scheme generator))
;; brute force algorithm
(define total (string->number (or (getenv "HYPER_TOTAL") "65536")))
(define exponent (string->number (or (getenv "HYPER_EXPONENT") "8")))
(define pk
(lambda args
(write args)
(newline)
(car (reverse args))))
(define point-distance
(lambda (a b) (abs (- a b))))
(define point-random (lambda () (random (expt 2 exponent))))
(define pool (map (lambda _ (point-random)) (iota total)))
(define query (point-random))
(define ~.two
(lambda (x)
(inexact (/ (round (* x 100)) 100))))
(define candidates
(let* ((objects (map (lambda (x) (cons (point-distance query x) x)) pool))
(objects* (sort (lambda (u v) (< (car u) (car v))) objects))
(distance-max (apply max (map car objects*))))
(map (lambda (x) (cons (~.two (/ (- distance-max (car x)) distance-max))
(cdr x)))
objects*)))
(pk 'query (string-upcase (number->string query 36)) query)
(newline)
(for-each pk (map (lambda (x) (list (car x) (number->string (cdr x) 36) (cdr x))) candidates))