v hyper.dev · nn - 1 - exact brute force rank

This work was sponsored by enioka.

The task one is:

Given a finite set of objects pool, and an object query. Look, which ever comes first, within approximately t time, or at most n objects subset of pool that are near query according to the metric m, call that subset candidates.

Most popular approach nowadays, and the one I could start to get under control is to rely on Hierarchical Navigable Small Worlds. HNSW is reminiscent of Ben Goertzel glocals from opencog fame.

HNSW datastructure use a deterministic heuristic with the distance function to construct clusters, still, it use random to spread pression on memory, and compute. It looks like a skip-list with multiple path ways. A regular natural sort will not speed queries, because of that glocals are not sorted, instead nodes are related to points in a higher level layer. The higher the level the less points they are. The highest level has only one point. Each point has a maximum x neighbor in a lower level. Horizontal navigation must be done through the parent point.

While I put that whole on a flashcard, you can delice with the following brute force algorithm that will generate a pool of points nearest to a point query:

(import (chezscheme))
(import (letloop r999))
(import (scheme generator))

;; brute force algorithm

(define total (string->number (or (getenv "HYPER_TOTAL") "65536")))
(define exponent (string->number (or (getenv "HYPER_EXPONENT") "8")))

(define pk
  (lambda args
    (write args)
    (newline)
    (car (reverse args))))

(define point-distance
  (lambda (a b) (abs (- a b))))

(define point-random (lambda () (random (expt 2 exponent))))

(define pool (map (lambda _ (point-random)) (iota total)))
(define query (point-random))

(define ~.two
  (lambda (x)
    (inexact (/ (round (* x 100)) 100))))

(define candidates
  (let* ((objects (map (lambda (x) (cons (point-distance query x) x)) pool))
         (objects* (sort (lambda (u v) (< (car u) (car v))) objects))
         (distance-max (apply max (map car objects*))))
    (map (lambda (x) (cons (~.two (/ (- distance-max (car x)) distance-max))
                           (cdr x)))
         objects*)))

(pk 'query (string-upcase (number->string query 36)) query)
(newline)
(for-each pk (map (lambda (x) (list (car x) (number->string (cdr x) 36) (cdr x))) candidates))
Source: https://mezbreeze.itch.io/portraits-volume-one