I’ve been making some more headway into designing the site’s interconnection framework between documents. They’re still mostly just a combination of sketches and musings with no particular direction. While I still don’t know how to fully implement this feature, I’m beginning to gain some insight into the fundamentals of this problem and how I might be able to solve it. I know that I’ll have to make sure that the framework for mapping out what documents are related to each other, add semantic meaning to that connection, and figure out a ranking algorithm to show the most relevant documents first is as robust as possible.
That there is going to be foundation of the site. Several key things will be based on, developed, and become influenced by that foundation:
- app’s business logic
- main and miscellaneous API)
- AWS images and further deployment on the servers
Dear God, I’m kind of making a search engine, aren’t I? What have I gotten myself into?!
Today’s breakthrough comes in the form of that particular insight into this problem. I really am designing a search engine to find other documents that are related to one particular document, finding out why they’re related, and finding a method for ranking results. So maybe all my sketching isn’t a total waste. :)
So that’s step one. Now to figure out how to create a link between two documents. I imagine doing a separate table with columns for the docType and docIDs of the two documents and then an integer column to represent the connection from A to B. That’s the simplest way I could put things. However, this brings to mind the problem of commutativity. While each relationship isn’t strictly commutative, there’s always going to be another parallel relationship between two documents. Consider two statute documents:
A amends B.
B is amended by A.
A makes reference to B
B is cited by A
These binary pair relationships go hand-in-hand. Will I have to model the parallel relationship separately? Or do I go about it programmatically and just switch variables? The latter seems the easier route and reduces the duplication of data. On that note, I’ll need a robust way to ensure that there isn’t any accidental duplication.
I think that there must be a way to create a unique hash out of the combinant parts of the documents being mapped. Something like:
hash = docTypeA × docIDA × docTypeB × docIDB
That should be enough to create a unique number that can be generated and added to the table. The column must be have only unique values, otherwise data entry will fail. It’ll fail thanks to the commutative property of multiplication since swapping values for A and B should still net the same hash. Thank you, basic arithmetic! :)