After my pondering on preserving context when describing document formats with semantic metadata, Paul Brown pointed me to the Charteris Integration Toolkit. Their marketing literature defines the costs of data integration in 8 points, which I would simplify into two categories: point-to-point scaling problems, including n(n-1) complexity, and semantic context. The former is handily solved by a canonical data model, but they point out that many field-to-field mapping tools fall short in considering context.
Robert Worden’s answer to persisting semantic context this is the Meaning Definition Language. Unfortunately, Charteris seems to have decided that it is in their best interest to keep it wrapped up with their product, so the only reference I can find (without asking the Internet Archive) is an introductory paper. Still, MDL looks like a great start, and is more mature than my “XPath field definition” and “add metadata to XSD” approaches.
Update: Alternative navigation on the Charteris site leads to public content about MDL]9. Good stuff.