François-René Rideau ([info]fare) wrote,
@ 2005-09-28 13:55:00
Previous Entry  Add to memories!  Tell a Friend!  Next Entry
Entry tags:code, collaboration, en, hacker, j820, linux, monotone, smop, snail, versioning

Monotone

You may remember that I am looking for a solution to the SNAIL problem, which will have to be based on epistemic monotonic logic. Well, monotone offers half of the solution, monotonic logic. And that is also the more useful half to me, considering that the instantaneous latency of my SNAIL networks is actually low: when my computers are connected to each other, then the communication latency between them is below a few tenths of seconds at most.

monotone is relatively immature, and I had problems with it; moreover, I don't expect it to ever provide the epistemic monotonic logic that I'm longing for. Nevertheless, its basic design seems like the Right Thing to me, the implementation seems sound, the resulting package is already very useful and very easy to get going with, the transition from CVS is remarkably painless, and I expect it to stabilize into something reliable and usable over time. Even in its current state, it is vastly superior to centralized systems such as cvs or its modern replacement subversion; it offers builtin support for importing from a CVS server and for synchronizing with ongoing CVS development. Incidentally, my central CVS server being offline for a week was what prompted me to finally switch. It is also more to my liking than the popular decentralized system darcs, since it maintains databases of everything that is (locally) known about all the code you want, instead of splitting it into as many active per-project, per-branch copies. A monotone checkout is thus very lightweight and very easy to tweak manually, yet robustly managed using cryptographic identification of file contents. File renaming is well-supported, and commits are of course atomic.

Still, I had a few annoyances with the current version of monotone, and I'd like to share my experience with you, so you may avoid a few pitfalls and get more productive.

Firstly, monotone was mightily confused by the entry for "." that it itself created in .mt-attrs when I did a monotone add . -- it was easy to fix this first problem by editing .mt-attrs by hand, but it took me quite some time to understand what was going on; happily, monotone itself could tell it was a bug, so I could choose the right tool to debug, namely strace. Now, I do monotone add $(monotone list unknown) to recursively add files below the top (or current) directory.

Another annoyance is that it insists in using its own network protocol for database synchronization, whereas it is not always possible or affordable to reserve a port for this service, even allegedly cryptographically secure: this involves issues of trust and wasted time by the administrators who manage the firewall and the reliable allocation of ports to servers. I solved this problem with a combination of ssh port redirection and zsh hackery.

# cat zsh.functions.monotone
mo_ad () { monotone add $(monotone list unknown) }
mo_rm () { \rm $@ ; monotone rm $@ }
mo_mv () { \mv $@ ; monotone mv $@ }
mo_di () { monotone diff $@ }
mo_cm () { monotone commit $@ }
mo_diff () { ( cd $1 && shift && mo_di $@ ) | less }
mo_commit () {( cd $1 && shift && mo_cm $@ )}
mo_up () {( cd $1 && shift && monotone update $@ )}

mo_run_server () {
  coproc ( exec monotone --quiet >& /dev/null \
                --db=$DB serve localhost:$PORT "$BRANCHES"
  ) ; SERVPID=$!
  echo "Launched a local monotone server with PID=$SERVPID"
  echo "for database $DB on port $PORT (locally)."
  mo_kill_server () {
    echo "Killing monotone server with PID=$SERVPID"
    kill $SERVPID
  }
  if [ -n "$MONOTONE_PASSWORD" ] ; then
    print -rp -- "$MONOTONE_PASSWORD"
  fi
}
mo_run_client () {
  echo "Connecting to host $HOST to run a monotone client"
  echo "for database $RDB on port $RPORT (remotely)."
  stty sane
  REMOTE_MONOTONE=(monotone --db=$RDB sync localhost:$RPORT $BRANCHES)
  SSH_ARGS=(
    -R ${RPORT}:localhost:$PORT
    $HOST
    ${(qqq)REMOTE_MONOTONE} # stty sane ;
  )
  if [ -n "${REMOTE_MONOTONE_PASSWORD:=$MONOTONE_PASSWORD}" ] ; then
    print -r -- "$REMOTE_MONOTONE_PASSWORD" |
      if command_p cotty 2> /dev/null ; then
        cotty -1 ssh -t $SSH_ARGS ### Allow for nice output on TTY.
      else ssh $SSH_ARGS ; fi ### Ugly TTY output.
  else
    ssh -t $SSH_ARGS
  fi
}
mo_sync () {(
  DB=$1 HOST=$2 PORT=${3:-35253}
  RDB=${4:-$DB} RPORT=${5:-$PORT}
  BRANCHES=${6:-'*'}
  mo_run_server
  trap mo_kill_server EXIT
  # sleep 1 # Give the server some time to breath before startup
  mo_run_client
  mo_kill_server
  trap "echo done." EXIT
)}

Note however that said hackery assumes that you configured get_passphrase (see below) on the local end of the mo_sync command. Indeed, monotone's key management is lacking and that makes for not so good compromises between security and usability: not only does it lack an ssh-agent-like system to manage keys (or an interface to ssh-agent itself -- why reinvent incompatible wheels?), it also insists in keeping the secret key in the database itself rather than in a separately manageable secret file. Happily you can customize monotone with a dynamic language; unhappily this is yet another unremarkable crippled language that I had to learn, namely lua; lua hackers on irc.freenode.net were very helpful. Kudos! Here is my ~/etc/monotone/monotonerc. I'm sure you can do better with respect to key management, but it's too much for my expected diminishing returns.

-- -*- lua -*-
-- In your ~/.monotone/monotonerc, insert the following:
--   dofile(os.getenv("HOME").."/etc/monotone/monotonerc")
--
-- You may also insert a definition for your passphrase:
-- function get_passphrase(keypair_id)
--  return "secret passphrase"
-- end

function get_netsync_read_permitted (branch, identity)
  if (identity == "fare@tunes.org") then return true end
  return false
end

function get_netsync_write_permitted (identity)
  if (identity == "fare@tunes.org") then return true end
  return false
end

function fare_mkSet(s)
  local rv = {}
  for w in string.gfind(s, "%S+") do rv[w] = true end
  return rv
end

fare_bad_extensions = fare_mkSet[[
	o a la lo so  pyc pyo  lib fas fasl x86f
	depend deps
	out tmp swp cache
	ps eps pdf
	aux dvi log bbl blg toc brf haux lot lof idx 4ct 4tc idv lg xref
	gz bz2 deb tar zip
	bak orig rej
]] -- gif png jpg txt

fare_bad_names = fare_mkSet[[
	out.tex init_mzscheme
	core
	foo bar baz quux toto tata tutu titi
	FOO BAR BAZ QUUX TOTO TATA TUTU TITI
	.cvsignore
]]

function ignore_file(name)
   -- Don't ignore a few precious cached things
   -- Get it from the WWW -- if (name == "fare/texstuff/fare.eps") then return false end
   local sname = "/"..name
   local _, _, basename = string.find(sname, "/([^/]*)$")
   local _, _, extension = string.find(basename, "%.(%w+)$")
   if (fare_bad_extensions[extension]) then return true end
   if (fare_bad_names[basename]) then return true end
   if (string.find(basename, "%~$")) then return true end
   if (string.find(basename, "^%.%#")) then return true end
   if (string.find(sname, "/CVS/")) then return true end
   if (string.find(sname, "/.svn/")) then return true end
   if (string.find(sname, "/_darcs/")) then return true end
   return false
end

Finally, monotone being very young, I had to use debian unstable so that I could find recent decent versions of it compatible with the same current netsync protocol version 5 (latest being 0.22, compatibility requiring 0.20 at least) on both my i386 and arm architectures (yes, monotone runs just fine on my good old jornada 820!) There again, debian hackers on irc suggested to RTFM man apt_preferences, and so I now have the following:

# cat /etc/apt/apt.conf
APT::Default-Release "testing";
# cat /etc/apt/preferences
Package: common-lisp-controller clisp sbcl cl-* monotone user-mode-linux
Pin: release a=unstable
Pin-Priority: 1000
# apt-get update ; apt-get install monotone

Then, the CGI interface to viewing monotone is a python script that will only work with a recent mod_python in apache 2.0, so I couldn't install it on fare.tunes.org yet. But I will eventually, thanks to user-mode-linux. It is recommended that it should also use its own additional copy of the database for security concerns.

Which brings us to disk usage. monotone, lacking some kind of relevance logic to sort between facts to keep and facts that can be thrown away, practically requires a copy of all versions of all the managed software in every machine. This can be unwieldy on very large multi-gigabyte projects. Also, data seems to be compressed internally, but is then stored as text-encoded binary in the underlying database, which is a weird inefficiency the rationale of which is probably laziness in a time of cheap big disks. If you're a hacker, you may call that room for improvement.

More room for improvement, beside all that was discussed above could include a fact superseding convention in addition to the previously suggested relevance logic (to allow to "erase" sensitive data that you don't want there without having to dump, hack and rebuild the whole database, losing all previous cryptographic certificates), epistemic logic (for efficient synchronization over real SNAIL disconnected networks), a simple protocol for a centralized authority to provide human-readable release numbers (to replace the lacking $Id$ feature, incompatible with general decentralized use), etc. Come on, guys, it's a SMOP!

I hope these tips will help you install monotone and be productive, and maybe convince you to give a try to this fine piece of software (and maybe even hack it).

Update: the new and improved ssh synchronization function (above), now can handle monotone passwords stored in shell variables. Security issues may apply if a malicious user having access to the machine may read these variables by somehow inspecting the memory of your processes; but then again, if someone can do that, crypto isn't the weak link in your security, anyway.



(Post a new comment)

wow
[info]dlakelan
2005-09-29 02:00 am UTC (link)
Hi Fare,

The world is a small place. I know you through various mailing lists and web sites that we both have belonged to in the past and I'm a reader of your journal because I think you're an interesting character with good ideas but you've probably never heard of me.

In any case, thanks for this brain dump about monotone. It sounds interesting. I'd like to ask a few questions.

1) would it be reasonable to use monotone for things like image editing? I imagine that even relatively small changes in a JPEG would produce completely different files, so you might have to essentially store every copy, and it would be hopeless to merge images.

2) Can you reasonably expect to use monotone for your entire home directory, or at least for large projects not involving computer code? Things like openoffice documents, pdfs, cad files, whatever?

3) Does monotone do atomic commits of all changes or does it do file by file commits?

4) How does it handle renaming things?

I've looked at the FAQ and some articles comparing version control systems, but it was a little overwhelming, especially to try to figure out these non source code specific questions.


(Reply to this)(Thread)

Re: wow
[info]fare
2005-09-29 02:20 am UTC (link)
(1) monotone would probably be not essentially worse for image editing than any other versioning system. Unless your software allows to keep your modifications in a vector (logical) format, you'll have to deal with lots of ugly raster (bitmap) files if you do heavy editing.
(2) I'm using monotone for all the documents I'm authoring. I'm only using readable text, so it's rather lightweight, but the monotone docs speak of using it for importing a gigabyte worth of CVS data, so it must be doable. Note however that AFAICT, monotone will compute a hash of all your (concerned) files any time you commit update or diff, so it might be unwieldy to use it for too many too large blobs at once. Many small files with a lot of versions is where it shines. Many big files with few versions, probably not. If you want a version datastore for big blobs, you should be looking for a different system, sorry. I'd like to turn BKNR into such a system, someday. That said, the cut-off point depends on the speed of your CPU and hard disk, on the size of your data, and on your patience.
(3) monotone does the right thing with atomic commits, using an embedded SQL database.
(4) monotone handles renaming beautifully, and I've used that to reorganize my messy ~/etc/

(Reply to this)(Parent)(Thread)

Re: wow
[info]fare
2005-09-29 02:25 am UTC (link)
Oops. Actually, if it's clever, which I suppose it is, considering the speed of commits on my antique PDA, it will only compute checksums for touched files. So the slightly suboptimal space will be the only problem with managing blobs, not the time to compute hashes. Silly me. Long live user-settable filesystem modified time tags!

(Reply to this)(Parent)(Thread)

Re: wow
(Anonymous)
2005-09-29 03:16 am UTC (link)
Actually, monotone does hash everything be default, because this is safer than doing anything else[0], and as you've discovered, this is actually not a speed issue these days, except in extreme situations.

You can trade off a tiny bit of safety for extra speed by telling it to trust filesystem mod times:
http://www.venge.net/monotone/docs/Inodeprints.html

-- Nathaniel

[0] Assuming you believe in hashes at all, but if you don't, then you have worse problems with monotone.

(Reply to this)(Parent)

Re: wow
(Anonymous)
2005-09-29 02:33 am UTC (link)
1) Right. Although if you had a collection of images (or anything else) that you wanted to version *together*, it would be good.

2) This has been done. Of course, merging tends to only work on text.

3) Atomic. It versions file trees (which contain files), rather than individual files directly.

4) Fairly well, although it doesn't yet handle rename conflicts when merging.

Tim

(Reply to this)(Parent)

stty problem
[info]fare
2005-09-29 03:28 am UTC (link)
The display problem I was experiencing was actually the server and the client both writing to the terminal at the same time. In fact, I wrote coproc monotone ... serve ... >&0 thinking that it would redirect both stdin and stderr like >& usually does, whereas it's a different file-descriptor-redirecting syntax that only redirects stdout. I would have had to do a </dev/null >&0 2>&0. But actually, I only need a 2> /dev/null. So monotone is doing the right thing, and the bug was wholly mine. Oops.
If debugging is the process of removing bugs, then programming must be the process of putting them in. -- Dijkstra

(Reply to this)


Create an Account
Forgot your login?
Login w/ OpenID
English • Español • Deutsch • Русский…