Uniquly identifying a stack

Igor de Oliveira Couto igor at superstudent.net
Sun Jul 1 21:50:35 EDT 2012


Hello, Peter,

On 02/07/2012, at 11:02 AM, Peter Haworth wrote:

> Probelm is I need to maintain uniqueness acorss two versions of the same
> stack file.  For example if both versions have a stack named "myStack" but
> then its name gets changed to "YourStack" in one of the versions, it's no
> longer identifiable as the same stack as "MyStack".
> 
> I have some ideas as to how to deal with this but wanted to check if anyone
> had come up with a generic solution to this problem.

I believe that this is an issue that all version control software (vcs) has to deal with, and the possible solutions and approaches are quite well-documented in various open-source groups. The main question is: if I have file "A", and change its name to "B", should the software consider it an entirely new file, or should it be able, somehow, to identify it as the old one, but with a name change? This problem happens not only with file names, but in fact, with any file metadata - for instance: what happens if I change the file access permissions? What happens if my system changed the 'modified date' for the file?

It seems to me, that because of this, the new trend is for the VCS to store the file *data* (= contents) separately from the *metadata* (= name, dates and permissions). So, the VCS internally may have a table named 'file_info', where each record is the metadata for a certain file. Then, there would be a second table, 'file_content', where each record would be the actual file dump/data. Each file_info relates to a single file_content. But the advantage of this design, is that a file_content may actually be connected to several file_infos.

For instance: in our example, if you create file "A", and store it in the system, it stores the metadata into a file_info, and the contents as a 'file_content' record. If tomorrow I rename my file to "B", the system will recognise that the contents are the same, so it will create another 'file_info' record, but it will point it to the same 'file_content', rather than storing a duplicate of that.

I believe that VCS like GIT use fast hashing functions - like MD5 or SHA1 - to store and compare contents (ie., the 'ID' of each file_content record is actually the hash), which makes it a very fast algorithm when comparing the existing contents with new: you don't have to actually compare the contents of a file, you simply find the hash for the file, and see if you already have anything with the same hash in your file_contents table.

I hope this explanation helps a little.

Kindest regards,

--
Igor Couto
Sydney, Australia





More information about the use-livecode mailing list