Yaml Aint Markup Language

From yaml.org: YAML(tm) (rhymes with "camel") is a straightforward machine parsable data serialization format designed for human readability and interaction with scripting languages such as Perl and Python.

YAML has a lot of the same appeal for me as Wiki does. Comparing YAML to XML (Extensible Markup Language) is roughly like comparing the Wiki Wiki Markup Language (see Text Formatting Rules) to HTML.

Contributors: Steve Howell


Simple Example:

--- language: YAML acronymn stands for: Yaml Aint Markup Language acronym doesn't stand for: Yet Another Markup Language features: - documents are very readable by humans. - interacts well with scripting languages. - uses host languages' native data structures. - has a consistent information model. - enables stream-based processing. - is expressive and extensible. - is easy to implement.

Which, in the perl implementation, turns into this (as stringified by Data::Dumper):

$VAR1 = { 'acronym doesn\'t stand for' => 'Yet Another Markup Language', 'language' => 'YAML', 'acronymn stands for' => 'Yaml Aint Markup Language', 'features' => [ 'documents are very readable by humans.', 'interacts well with scripting languages.', 'uses host languages\' native data structures.', 'has a consistent information model.', 'enables stream-based processing.', 'is expressive and extensible.', 'is easy to implement.' ] };

And in Python:

{"acronym doesn't stand for":
'Yet Another Markup Language',

'acronymn stands for':
'Yaml Aint Markup Language',

'features':
['documents are very readable by humans',

'interacts well with scripting languages', "uses host languages' native data structures", 'has a consistent information model', 'enables stream-based processing', 'is expressive and extensible', 'is easy to implement'], 'language': 'YAML'}

More Full-Featured Example:

--- invoice: 34843 date : '2001/01/23' bill-to: &id001 first name: Chris last name : Dumars address: lines: | 458 Walkman Dr. Suite #292 city : Royal Oak state : MI postal : 48046 ship-to: *id001 products: - sku : BL394D quantity : 4 description : Basketball price : 45.00 - sku : BL4438H quantity : 1 description : Super Hoop price : 2392.00 tax : 251.42 total: 4443.52 note: > Late afternoon is best. Backup contact is Nancy Billsmer @ 338-4338.

Which, in the perl implementation, turns into this (as stringified by Data::Dumper):

$VAR1 = { 'tax' => '251.42', 'invoice' => 34843, 'total' => '4443.52', 'bill-to' => { 'address' => { 'state' => 'MI', 'postal' => 48046, 'city' => 'Royal Oak', 'lines' => '458 Walkman Dr. Suite #292 ' }, 'first name' => 'Chris', 'last name' => 'Dumars' }, 'date' => '2001/01/23', 'ship-to' => $VAR1->{'bill-to'}, 'note' => 'Late afternoon is best. Backup contact is Nancy Billsmer @ 338-4338.', 'products' => [ { 'description' => 'Basketball', 'price' => '45', 'quantity' => 4, 'sku' => 'BL394D' }, { 'description' => 'Super Hoop', 'price' => '2392', 'quantity' => 1, 'sku' => 'BL4438H' } ] };

And in Python:

{'bill-to':
{'address': {'city': 'Royal Oak',

'lines':
'458 Walkman Dr.\nSuite #292\n',

'postal':
48046,

'state':
'MI'},

'family':
'Dumars',

'given':
'Chris'},

'comments':
'Late afternoon is best. Backup contact is Nancy Billsmer @ 338-4338.\n',

'date':
yaml.timestamp('2001-01-23T00:00:00.00Z'),

'invoice':
34843,

'product':
[{'description': 'Basketball',

'price':
450.0,

'quantity':
4,

'sku':
'BL394D'},

{'description':
'Super Hoop',

'price':
2392.0,

'quantity':
1,

'sku':
'BL4438H'}],

'ship-to':
{'address': {'city': 'Royal Oak',

'lines':
'458 Walkman Dr.\nSuite #292\n',

'postal':
48046,

'state':
'MI'},

'family':
'Dumars',

'given':
'Chris'},

'tax':
251.41999999999999,

'total':
4443.5200000000004}


Q: For YAML to function as an alternative to Extensible Markup Language, it needs conversions to/from Extensible Markup Language. Does that exist for Python Language? Also in an earlier IBM article it was mentioned YAML spec did not have good namespace representation (see www-106.ibm.com ). Would this and other "unfinished" area prove to be Show Stoppers for using YAML in real-life enterprises where web services are processed (supplied and consumed)?

A: YAML isn't necessarily meant as a drop-in replacement for each and every instance of XML. It's primarily meant as an alternative for those instances where Xml Sucks and readability is considered important - you can avoid all the angle brackets without Rolling Your Own Data Format. XML bindings are still in draft: yaml.org

YAML 1.1 does have shortcuts for specifying namespace-like collections of tags. Unlike XML, this is done without complicating the Info Set; it's just Syntactic Sugar.

Q: For Windows Operating Systems users, are there freeware tools to create YAML structures, and to import/export these to XML? How about tools to process YAML in manners similar to XSLT, DOM and SAX? And what additional requirements (e.g. Python xxx) do these tools have?

A: For creating YAML structures: Text Editors! Or marshaling from a YAML-enabled language. For parsing you have to rely on the capacities of your favorite language. Support is built into Ruby Language 1.8. There are also libraries (check Google Search since they're evolving) for Python Language, Php Language, Java Script, Perl Language and Java Language. In C there's libyaml, though it seems to be targeted more at Scripting Languages than Cee Language itself.


This looks like what I have thought XML may as well evolve to but how do you specify schemas?

XML can't evolve to become simpler without breaking Backward Compatibility (and YAML is considerably simpler than XML). How you specify schemas is a work in progress. Note that none of the other Alternatives To Xml have schemas yet, either.

For instance "postal" field above is it left to functions in the host language to test if only numeric characters length 5 are valid?

At the moment, yes, but that's not difficult.

Or is there some way to specify metadata in YAML itself? (It does not seem clear from yaml.org either).

from my perspective, it seems that YAML is less of a weak XML than a powerful alternative to INI and Java .properties files - that is, a human-readible and easily editable approach to serialising configuration data. Compare v. XML which is readable in name only. XML is verbose to aid in the structure of super-large files and to allow for some validation to occur within the parser. YAML isn't really interested in doing that - it's just for people who want to store settings simply and easily from within scripting languages.

Now, when you get to things like inter-language serialisation, XML-enabled databases, custom document types, RSS, etc. Yaml seems pathetically underpowered. However, when you consider the more common usage of XML, which is just providing a persistent layer for the application's "options..." menu and for some simple user-defined objects, then XML's massive complexity seems like painful overkill. YamL is for when you want your serialisation to be actually easy, instead of powerful, and you want it to be human-readable, instead of just being text-mode.

[Marcos Eliziario] Most of time, an XML file is the serialized form of some object coming from an application, and thus, it has its metadata already defined in the classes that make up these objects(at least on statically typed languages). But even when it's not the case, as it is when you want to share a piece of data accross diferent systems written in different language, It's easy to see that as soon as the needs arise, ways of describing metadata will be designed for YAML, and with time one or more standars will solidify from use. Once again, I think that YAML simpler sintax will lead also to more readable metadata definition languages.

[Another person] Martin your viewpoint appear to be supported by a Kuro Shin story at www.kuro5hin.org . Having said that, there are strong views expressed in comments responding to that post, not all for YAML over XML


Is the use of Syntactically Significant Whitespace mandatory (or highly desirable) in some YAML usage scenarios? Why should this not be generated/compressed as required using a simple utility?

Oh great, Tab Munging for data also.

Syntactically Significant Whitespace is not mandatory. YAML, at least in its current versions, is largely JSON compatible (JSON values are YAML values). This provides access to sequences between [] and maps between {} and strings inside "". However, the use of Syntactically Significant Whitespace is desirable for text editing. As far as: "Why should this not be generated/compressed as required?", it is (typically) generated/compressed as required, or at least access to that sort of feature is provided in any YAML serialization library compatible with the model described on the YAML webpage.

But the problem is that you cannot control what other people give you, but only what you give them. Thus, if a vendor gives you white-space-sensitive markup because you are a one-to-many recipient, then you risk Tab Munging anytime an editor touches the code. I would prefer built-in white-space insensitivity.

If a vendor gives you white-space-sensitive markup, you're free to convert it however you wish while editing the code. As far as 'Tab Munging' in particular goes, since you are so hung up on that: YAML simply doesn't allow tabs for indentation. Thus, there is no risk of Tab Munging.

But suppose you go in with an editor and hand edit a few items, and put in spaces inadvertently (which is easy to do). Sure, the parser may detect the problem, but not fix it.

I'm not certain how this problem you describe is specific to whitespace. I can mess up almost any file by inadvertently inserting characters, and it isn't as though spaces at the beginning of a line are invisible. The one exception is the use of empty lines. I'm not certain how often empty lines are used by others, but I do know that they require special treatment in most other data transport languages as well.

"Real" characters are usually easy to spot appearing/changing on your screen. White-space chars are not. (I suppose this is degenerating into a standard white-space fight as found in existing topics such as Syntactically Significant Whitespace Considered Harmful.)

I already anticipated and responded to that complaint... as you'd know if you bothered reading my response. White-space characters are easy to see at the beginning of any line except an empty line. If you really need them to be visible, of course, you can use a text editor that marks whitespace.

That is wrong. If a tab is displayed as 4 spaces in a given editor and you type 4 actual spaces in its place, they are indistinguishable. And if they are close, such as 3 or 5, one may not notice for an indented portion or where say parenthesis are compared to capital letters. I've been through that dance already. Sure, an editor with the right settings may make it easier to spot, but that is a work-around to what is a poor design decision (in my opinion). You can't just grab plane-jane Wordpad or the like, you have to have your Special Editor around and set properly. And if you've ever done consulting work, the client site does not always let you install your own software without special permission.

It isn't unusual that working with 'plane-jane' tools requires you take a little extra care. But unless you happen to be a moronic monkey who blindly pounds away at the space and tab keys (even knowing that YAML doesn't use tabs for indentation), I really don't see how this is a problem of sufficient significance to warrant special attention. As to whether this is a "poor design decision", you aren't examining both sides of the equation. A "real" engineer would look at the Cons AND the Pros before declaring it a "poor design decision".

When one is switching between many different tools, systems, languages, and files in the course of a high-pressure work-day; the chance of accidents is reasonably high such that I'd just rather avoid tab-reliance to begin with. Let's not risk Personal Choice Elevated To Moral Imperative here. Given a choice, I'd rather not have it available. If your preference is different, so be it. Let's not degenerate into calling each other "monkeys".

RE: "the chance of accidents is reasonably high" - Maybe so. But the idea that Syntactically Significant Whitespace related accidents is statistically greater than accidents related to the alternatives (such as forgetting to terminate a block or a quote) is unsupported. You don't have sufficient reason to believe that avoiding Syntactically Significant Whitespace will actually help you reduce accidents. And, therefore, your arguments in that vein are fallacious.

Nor is there any evidence they are problem-free. Without sufficient objective evidence either way, I will go with my own experience, and my experience is that the tab-happy approach sucks snake shit.

YAML is not "tab-happy": it doesn't use tabs for indentation. And I doubt your 'experience' on this subject is sufficient for a reasonable or fair comparison.

RE: "I'd just rather avoid tab-reliance to begin with" - YAML doesn't rely on tabs.

RE: "Let's not risk Personal Choice Elevated To Moral Imperative here. Given a choice, I'd rather not have it available" - Technically, making the alternatives to one's personal choice become 'unavailable' is a fine example of Personal Choice Elevated To Moral Imperative. Allowing both options, as is done by YAML, is the solution you should favor if your engineering aim is to avoid Personal Choice Elevated To Moral Imperative.

I don't favor or disfavor Syntactically Significant Whitespace. There are reasons to not support both (such as Language Idiom Clutter). But this idea you have that you can just 'avoid accidents' by avoiding Syntactically Significant Whitespace is utter nonsense; you'll just trade one class of accidents for another. And your "high-pressure work day" won't be any easier on these other accidents than they are on whitespace related ones.

I'd rather have WYSIWYG accidents over ghostly accidents. Anyhow, this debate is going nowhere.


And is Syck, fast YAML serialization, works equally well in Php Language as it does for the Python Language?



See original on c2.com