Replies: 4 comments 4 replies
-
I transferred this to a discussion as it's more appropriate here (I only just enabled discussions, not your fault). I don't think our needs diverge here, I too was looking to make a tool that I can use to continuously pull from Keep to keep a local directory. Truth be told I was hoping that the The "sync/mirror" problem is relevant but only the pulling half of it, since this tool doesn't push notes at all. I believe an acceptable solution without needing a local database (or some persistent state storage) is possible, though if push comes to shove, I'm not against using a local sqlite database. To answer some of your suggestions:
Both I've been thinking/experimenting with a fourth option, and that is to use the Base36 uses 0-9 + 26 alphabet characters to encode. It is case-insensitive.
The numbers in parenthesis (e.g "base36:(10)") is how many characters that encoding is. "blake-6" means a 6 byte digest output, "blake-7" is a 7 byte digest output. And the 7 byte (56 bits) output seemingly always comes out to 11 characters (from my observations in an expanded set I was testing). the 6 byte (48 bit) output varies in size a little. Seeing how you were willing to go with What're your thoughts on that? |
Beta Was this translation helpful? Give feedback.
-
Proposal (tl;dr)Make the ID format meaningful and configurable (based on the timestamp), but not unique, scan the frontmatter to index the notes and detect the true ID. So general flow for an export/clone:
Truncated HashI'm comfortable with the unlikely chance of (truncated) hash collision - I've gone through the same sort of exercises as you did above before for other contexts. So that approach seems fine from that perspective. But I don't prefer that approach. ID OptionsFor the file format, I prefer something meaningful or useful as much as possible, though - so:
Usefulness of the IDDeterministic ID is useful for:
ID from FrontmatterAs far as matching a file and its corresponding note, we could also just rely on the frontmatter. We have the unique Google Keep ID in the frontmatter of the file. Scanning the files should be so quick I don't see any reason not to just do it every time an export process runs. Reading all of the notes and building an index shouldn't take as much time. BenchmarkHere's a rough benchmark of 149 notes. Frontmatter scanned using: def index_note_files(directory):
index = {}
for file in pathlib.Path(directory).glob("*.md"):
with open(file, "r") as f:
fm = frontmatter.load(f)
index[fm.metadata['id']] = file
return index
This is for 149 notes - so not a lot, but I can't imagine people will have more than a couple thousand notes in Google Keep, but I'm curious to be proven wrong! I took out image download from the main step since that's the least predictable portion. If we needed to optimize (and I'd be surprised if we ever do) - you could use the filename ID as the first pass, identify potential matches (i.e. two note files with the same ID), and look at the frontmatter for those two notes. I'd just go for scanning the whole lot, though. Local DB (e.g. SQLite)I'm opposed to a SQLite DB for this until it's needed. If we did need one, I'd consider a caching layer more automatic/transparent, like karlicoss/cachew. (I love SQLite and use it for production workloads, when needed). Design considerations:
Uploading to KeepYou mentioned not wanting to support upload back to Keep, and I'm mostly in agreement with that. Someone's going to want it more strongly than I currently do, though. Side annoyancePart of this whole thing is really a consumer tooling problem. E.g. Obsidian.md doesn't read the files for titles when linking/opening notes, so you want the filename to contain enough info to make that process easier and look better. I place most of the blame and responsibility on Obsidian and similarly limited tools, but that doesn't make the problem disappear. |
Beta Was this translation helpful? Give feedback.
-
I've implemented the majority of my proposal. It's based on the assumption you'll start with a fresh download. It doesn't try to reconcile old format files or media downloads. So remove everything local and re-download.
TODO:
ISSUES:
I ran |
Beta Was this translation helpful? Give feedback.
-
Thanks for the work! I've merged your merged by #21 |
Beta Was this translation helpful? Give feedback.
-
Problem
The current approach of iterating over the notes and assigning an incrementing ID has the problem of generating different core IDs every time you run the export.
This manifests over two subsequent runs without any changes on the Keep side, so the ordering the API is returning notes is effectively random.
So running twice you could end up with the following files:
Potential Solutions
There is a unique ID for each note, but it's really verbose (e.g.
15e1ff865e7.b01d13e55b1750fc
). I did try that approach locally, and didn't like the resulting filenames.Alternate ideas for uniquely prefixing each note:
note.timestamps.created.timestamp()
) directly? How much resolution does this actually have in the Keep API json?YYYYMMDDHHMMSS
based on create date of the note (of course, it's possible for two notes, especially coming from different people) to be created in the same second, so this might require further logic to eliminate duplicates.note.timestamps.created.timestamp()
,note.id
] prior to iterating. Assumes notes will never be created in the past. This has the side-benefit of making older notes have the lower numbers.Using
note.id
is the only "safe" choice if you really want to uniquely identify a note, since you could gain new notes in the future if someone shares a note with you - and that note could be older than other notes you already have.Further Challenges / Considerations
Any approach above that simply fixes the prefix won't account for notes changing titles on the Keep side. To fully account for that, you would have to find the existing note (if any) by ID, delete it, and write the new file with the same ID but the new title.
Context
This may be where my needs and yours diverge too much. I'm looking for a script I can run continuously and pull the latest Google Keep notes locally and basically keep a local mirror of my Google Keep notes. Google Keep is great for quick shared notes both in desktop browser and mobile, and I'm considering using it as part of my broader notes flow - as something I can reference from my notes, but not necessarily edit/take over from the local computer side. So it's possible I need to fork and write
keep-mirror
instead.For this project, I think just implementing one of the three options above is the best approach for a simple solution to provide more consistency, without falling into the "sync/mirror" rabbit hole.
Beta Was this translation helpful? Give feedback.
All reactions