After writing my last post I thought I might follow up with a bit of cognitive speculation. Since the first comment was exactly about the issue I was thinking about writing on, I might as well follow up quickly.
Jeff Snell replied:
You parse semantic markup in rich text all the time. When formatting changes, you apply a reason. RFC’s don’t capitalize MUST and SHOULD because the author is thinking in upper-case versus lower-case. They’re putting a strong emphasis on those words. As a reader, you take special notice of those words being formatted that way and immediately recognize that they contain a special importance. So I think that readers do parse writing into semantic markup inside their brains.
Emphasis not added. Wait, bold isn’t emphasis, it’s strong! So sorry, STRONG not added.
I think the reasoning here is flawed, in that it supposes that reflection on how we think is an accurate way of describing how we think.
A few years ago I got interested in cognition for a while and particularly some of the new theories on consciousness. One of the parts that really stuck with me was the difference in how we think about thinking, and how thinking really works (as revealed with timing experiments). That is, our conscious thought (the thinking-about-thinking) happened after the actual thought; we make up reasons for our actions when we’re challenged, but if we aren’t challenged to explain our actions there’s no consciousness at all (of course, you can challenge yourself to explain your reasoning — but you usually won’t). And then we revise history so that our reasoning precedes our decision, but that’s not always very accurate. This gets around the infinite-loop problem, where either there’s always another level of meta-consciousness reasoning about the lower level of consciousness, or there’s a potentially infinite sequence of whys that have to be answered for every decision. And of course sometimes we really do make rational decisions and there are several levels of why answered before we commit. But this is not the most common case, and there’s always a limit to how much reflection we can do. There are always decisions made without conscious consideration — if only to free ourselves to focus on the important decisions.
And so as both a reader and a writer, I think in terms of italic and bold. As a reader and a writer there is of course translation from one form to another. There’s some idea inside of me that I want to get out in my writing, there’s some idea outside of me that I want to understand as a reader. But just because I can describe some intermediate form of semantic meaning, it doesn’t mean that that meaning is actually there. Instead I invent things like "strong" and "emphasis" when I’m asked to decide why I chose a particular text style. But the real decision is intuitive — I map directly from my ideas to words on the page, or vice versa for reading.
Obviously this is not true for all markup. But my intuition as both a reader and a writer about bold and italic is strong enough that I feel confident there’s no intermediary representation. This is not unlike the fact I don’t consider the phonetics of most words (though admittedly I did when trying to spell "phonetics"); common words are opaque tokens that I read in their entirety without consideration of their component letters. And a good reader reads text words without consideration of their vocal equivalents (though as a writer I read my own writing out loud… is that typical? I’m guessing it is). A good reader can of course vocalize if asked, but that doesn’t mean the vocalization is an accurate representation of their original reading experience.
Though it’s kind of an aside, I think the use of MUST and SHOULD in RFCs fits with this theory. By using all caps they emphasize the word over the prose, they make the reader see the words as tokens unique from "must" and "should", with special meanings that are related to but also much more strict than their usual English meaning. The caps are a way of disturbing our natural way of determining meaning because they need a more exact language.
No related posts.
Part of the problem is that descriptions like “strong” don’t really map to what we’re thinking when we write. If descriptions like “emphasize” and “shout” were used, then I think the argument would end there. No one “strongs” a word. They emphasize it. I think that “bold” is also wrong, but it’s become a synonym for something meaning “emphasize that I’m using this word in a very exact way” amongst computer-oriented people. Of course, extrapolating from “computer-oriented” to “average joe” probably fails a lot.
Also, certain conventions also come into play, such as “underline” which can be used both to emphasize as well as to follow a convention (such as when underlining the title of a publication). Should there be a separate semantic markup symbol to distinguish these two uses? Maybe, but in practice we think “underline” out of habit and conciseness. The context defines its meaning and this is commonly recognized enough to be acceptable.
The “semantic web” people seem to disregard the fact that there simply aren’t enough semantically distinguishable tokens in HTML and CSS to describe the vagaries of language (nor should there be, lest we end up with hundreds of tokens with subtle distinctions that would probably be lost or misinterpreted by 90% of readers anyway really. it. isn’t. needed.). Semantics is heavily dependent on context and interpretation. Tokens like “bold”, “italic” and “underline” describe the visual appearance of the written word and that’s really what’s needed. The visual appearance is a substitute for other cues normally gathered from a speaker’s tone, not a meaning in and of itself. Context provides the meaning. We should simply accept that the written word will never carry the subtleties of spoken word and agree that a visual description (i.e. bold, italic) is as adequate as any other subset and actually provides a much more flexible system precisely because it doesn’t try to convey meaning outside of a particular context.
Heh, and reading my own comment after markdown/CSS has been applied demonstrates how a stylesheet can subtly change the meaning of a document, something that would be more easily avoided by using purely visual descriptions (I used asterisks to denote emphasis in several places which markdown converted to strong which then got changed to italic with a slightly smallish font by the stylesheet. Subtle, but it now reads slightly differently to me than it did when I wrote it – maybe it’s adopted Ian’s tone of voice because it’s his stylesheet?). Meaning destroyed by semantics – sounds familiar.
Yeah, I think my point isn’t just that there aren’t enough semantic tokens to describe stuff, or that people wouldn’t know how or be bothered to distinguish the semantic meaning of what they are saying; I’m saying that the semantic meaning isn’t there — we mean what we write, or what we say, and in most cases the writer hasn’t even manufactured deeper meaning than that. You could challenge the writer to say more, but it’s not that they left stuff out — the extra meaning literally won’t be there until it’s asked for.
And given a finite amount of resources, I don’t know if we’ll have any more total meaning with full semantic markup, than we would have with partial semantic markup and more information (possible because of the smaller burden placed on the writer).
> Wait, bold isn’t emphasis, it’s strong!
The EM and STRONG elements are both emphasis.
EM: Indicates emphasis. STRONG: Indicates stronger emphasis.
http://www.w3.org/TR/html4/struct/text.html#h-9.2.1
I understand what you mean about inventing concepts and explanations after-the-fact (as with explaining our motivations), but I don’t think it applies here.
When I write something using italics, I intuitively mean the same thing as when I say something with emphasis. Like this. Which is spoken differently than “like this.”
And read italics mean the same thing to me.
Now when I say something with emphasis, I don’t think at all about italics or bold or emphasis … or even pitch or volume or stress. I just say it.
Because the emphasis in speech and the italics in text mean something, it’s clear they carry semantic content.
Because they mean the same thing, it’s clear to me that that meaning is not about italics.
I think emphasis is a good word for what that meaning is about.
> And so as both a reader and a writer, I think in terms of italic and bold.
It’s hard for me to believe this, as I also read and write a lot, and I’ve never thought this way at all. Not even close. When I read or write something, I don’t think about bold or italics. I think about the tone of voice and which words are stressed – as if I were speaking or listening to somebody speak.
Of course, nobody would advocate that I use <toneofvoice> and <stressed> would they? No, because my intention – even if I don’t specifically think of it that way – is to emphasise particular parts of what I am writing. I don’t think in the abstract way because when somebody communicates with me, it’s always fixed in a specific medium. And even if you do think in terms of bold and italics, the same applies to you. The bold and italics are merely a surface representation, not the underlying meaning, despite the fact that up until the past few decades of human history, the two have been inseparable.
PS: I used to participate on Usenet quite a bit, where markup wasn’t available. People used *asterisks* as a substitute; which is the precursor to Markdown. Do you think people participating in Usenet think in terms of asterisks or in terms of emphasis?
Jim:
> When I read or write something, I don’t think about bold or italics. I think about the tone of voice and which words are stressed – as if I were speaking or listening to somebody speak.
Exactly.