Open Bug 1782586 Opened 2 years ago Updated 1 year ago

Improve the DBSCAN clustering algorithm for text clustering

Categories

(Core :: DOM: Core & HTML, task, P2)

task

Tracking

()

People

(Reporter: gregtatum, Unassigned)

References

(Blocks 1 open bug)

Details

In Bug 1782578 I am implementing a text recognition modal. I used the DBSCAN algorithm to cluster the text results into groups. I ended up using a single point located at the text start (depending on LTR or RTL) of the text results.

I believe this could be improved by changing the algorithm to do distance lookups based on the text quads. Essentially this improvement would search above and below using the full width of the text, rather than a single point.

I'm breaking this out to a separate bug as the current solution works, and this improvement may prove a bit more complicated to implement.

Assignee: gtatum → nobody
You need to log in before you can comment on or make changes to this bug.