Tuesday, December 11, 2007

Print: The Chronicle: 6/15/2007: The New Metrics of Scholarly Authority

Print: The Chronicle: 6/15/2007: The New Metrics of Scholarly Authority 

The New Metrics of Scholarly Authority

By MICHAEL JENSEN

When the system of scholarly communications was dependent on the physical movement of information goods, we did business in an era of information scarcity. As we become dependent on the digital movement of information goods, we find ourselves entering an era of information abundance. In the process, we are witnessing a radical shift in how we establish authority, significance, and even scholarly validity. That has major implications for, in particular, the humanities and social sciences.

Scholarly authority is being influenced by many of the features that have collectively been dubbed Web 2.0 by Tim O'Reilly and others, and what I'll call Authority 2.0 in order to explore more fully the shifts that seem likely in the near future. While those trends are enabled by digital technology, I'm not concerned with technology per se — I learned years ago that technology doesn't drive change as much as our cultural response to technology does.

In Web 1.0, roughly 1992 to 2002, authoritative, quality information was still cherished: Content was king. Presumed scarce, it was intrinsically valuable. In general, the business models for online publishing were variants on the standard "print wholesaler" model, duplicating the realities of a physical world as we garbed new business and publishing models in 20th-century clothes.

Web 2.0, roughly 2002 through today, takes more for granted: It presumes the majority of users will have broadband, with unlimited, always-on access, and few barriers to participation. Indeed, it encourages participation, what O'Reilly calls "harnessing collective intelligence." Its fundamental presumption is one of endless information abundance.

That abundance changes greatly both the habits and business imperatives of the online environment. The lessonsderived from successful Web 2.0 enterprises like Google, Flickr, YouTube, etc.include a general user impatience with any impediments, a fracturing of markets into micromarkets, and many other changes in entertainment and information- and education-gathering habits across multiple demographics. Information itself is so cheap as to be free. Abundance leads to immediate context and fact checking, which changes the "authority market" substantially. The ability to participate in most online experiencesvia comments, votes, or ratingsis now presumed, and when it's not available, it's missed.

Thus we see Google leading us into microadvertising; we see the rise of volunteerism in the "information commons" providing free resources to everyone; we see Wikipedia and its brethren rise up and slap the face of Britannica. We also see increasing overlaps of information resources — machine-to-machine communications (like RSS feeds), mash-ups that intermingle information from different sites to create new meanings (like Google Maps and Craigslist apartment listings), and much more.

Web 2.0 is all about responding to abundance, which is a shift of profound significance.

Imagine you're a member of a prehistoric hunter-gatherer tribe on the Serengeti. It's a dry, flat ecosystem with small pockets of richness distributed here and there. Food is available, but it requires active pursuit — the running down of game, and long periodic hikes to where the various roots and vegetables grow. The shaman knows the medicinal plants, and where they grow. That is part of how shamanic authority is retained: specialized knowledge of available resources, and the skill to pursue those resources and use them. Hunting and gathering are expensive in terms of the energy they take, and require both skill and knowledge. The members of the tribe who are admired, and have authority, are those who are best at gathering, returning, and providing for the benefit of the tribe. That is an authority model based on scarcity.

Contrast that with the world now: For most of us, acquiring food is hardly the issue. We use food as fuel, mostly finding whatever is least objectionable to have for lunch, and coming home and making a quick dinner. Some of us take the time to creatively combine flavors, textures, and colors to make food more than just raw materials. They are the cooks, and if a cook suggests a spice to me, or a way to cook a chicken, I take his or her word as gospel. Among cooks, the best are chefs, the most admired authorities on food around. Chefs simply couldn't exist in a world of universal scarcity.

I think we're speeding — yes, speeding — toward a time when scholarship, and how we make it available, will be affected by information abundance just as powerfully as food preparation has been.

But right now we're still living with the habits of information scarcity because that's what we have had for hundreds of years. Scholarly communication before the Internet required the intermediation of publishers. The costliness of publishing became an invisible constraint that drove nearly all of our decisions. It became the scholar's job to be a selector and interpreter of difficult-to-find primary and secondary sources; it was the scholarly publisher's job to identify the best scholars with the best perspective and the best access to scarce resources.

Before risking half a year's salary on a book, the publisher would go to great lengths to validate the scholarship (with peer review), and to confirm the likely market for the publication. We evolved immensely complex, self-referential mechanisms to make the most of scarce, expensive resources. Consequently, scholarly authority was conferred upon those works that were well published by a respected publisher. It also could be inferred by a scholar's institutional affiliation (Yale or Harvard Universities vs. Acme State University). My father got his Ph.D. from Yale and had that implicit authority the rest of his professional life. Authority was also conferred by the hurdles jumped by the scholar, as seen in degrees and tenure status. And scholarly authority could accrue over time, by the number of references made to a scholar's work by other authors, thinkers, and writers — as well as by the other authors, thinkers, and writers that a scholar referenced. Fundamentally, scholarly authority was about exclusivity in a world of scarce resources.

Online scholarly publishing in Web 1.0 mimicked those fundamental conceptions. The presumption was that information scarcity still ruled. Most content was closed to nonsubscribers; exceedingly high subscription costs for specialty journals were retained; libraries continued to be the primary market; and the "authoritative" version was untouched by comments from the uninitiated. Authority was measured in the same way it was in the scarcity world of paper: by number of citations to or quotations from a book or article, the quality of journals in which an article was published, the institutional affiliation of the author, etc.

In contrast, in nonscholarly online arenas, new trends and approaches to authority have taken root in Web 2.0. One of the oldest of the authority makers in the Web 2.0 world is, of course, Google. The magic of Google's PageRank system is fairly straightforward. "In essence, Google interprets a link from Page A to Page B as a vote, by Page A, for Page B. But, Google looks at more than the sheer volume of votes, or links a page receives; for example, it also analyzes the page that casts the vote. Votes cast by pages that are themselves 'important' weigh more heavily and help to make other pages 'important,'" Google tells us.

"Of course, important pages mean nothing to you if they don't match your query. So, Google combines PageRank with sophisticated text-matching techniques to find pages that are both important and relevant to your search."

That is authority conferred mostly by applause and popularity. It has its limits, but it also both confers and confirms authority because people tend to point to authoritative sources to bolster their own work. At present, it continues to be a great way of finding "answers" — facts and specific information — from authoritative sources, but it has yet to do a very good job at providing a nuanced perspective on a source or, say, scholarly communication.

Other Web 2.0 authority models are represented in a wide variety of sites. For example, there are the group-participation news and trend-spotting sites like Slashdot ("News for nerds, stuff that matters"), which remixes, comments on, and links to other information Webwide; and digg, enabling up-or-down votes on whether a news story is worth giving value to. There is del.icio.us, a collection of favorite sites where descriptive tags denoting content raise the authority of a listed site. At Daily Kos, a semi-collective of left-leaning bloggers opining, commenting, chronicling, linking to, and responding to news, readers can recommend — vote on — not only postings, but comments as well. There are plenty of other large community sites, all finding strategies for dealing with the problem of how to prioritize postings, refresh and display them appropriately, and provide the services free — that is, without a budget for staff or much editorial intervention.

The challenge for all those sites pertains to abundance: To scale up to hundreds of thousands or millions of users, they need to devise means for harvesting collective intelligence productively. For digg, it's a simple binary yes-no vote from participants, and credit for being the first one to spot a story others find interesting; for del.icio.us, credit comes both from votes and tags indicating the degree of Web participation in an article. For Slashdot, every posting or comment has a "score" made up of bonuses and penalties associated with various user preferences and other measures of readers' engagement. Registered users — that is, members who have intentionally decided to join the community — are able to rate postings (and comments) as they read: normal, offtopic, flamebait, troll, redundant, insightful, interesting, informative, funny, overrated, underrated.

That engaged participation helps filter out crap and is a version of the "hive mind" form of conferred authority, where increasing populations simply add more richness to a resource. But raw population growth on the Web is driving those social sites toward sub-subcultures, online interest tribes around quilting, homemade wind power, tattoos, or Shakespearean sonnets. We are likely to see communities built around specific topic areas, and then they, like Usenet newsgroups in the 90s (rec.frisbee begat rec.frisbee.golf), will get so big that cross-participant ranking to filter out noncommunity users will be required for the sites to remain valuable to participants.

Flickr, YouTube, and other media-collection sites tend to use a variant of "voted on by tag," as well as using the number of viewers as a metric of interestingness and value. The more votes-by-tag a picture has, the more likely it is to be found, and to be tagged some more; the thumbnail version that gets lots of clicks to see the full version is likely to be given more accrued total attention by new viewers, and thus more attention. While such sites provide a way to sort information, they can also skip valuable pictures — or, in Google's case, documents — that weren't famous. YouTube uses a five-star rating system, as do other similar sites, which can mitigate the celebrity effect somewhat.

MySpace, Friendster, Facebook, and other social-networking sites are, so far, mostly about self-expression, and the key metrics include: How many friends do you have? Who pays attention to you? Who comments on your comments? Are you selective or not? Such systems have not been framed to confer authority, but as they devise means to deal with predators, scum, and weirdos wanting to be a "friend," they are likely to expand into "trust," or "value," or "vouching for my friend" metrics — something close to authority — in the coming years.

Wikipedia is another group-participation engine, but focused on group construction of authority and validity. Anyone can modify any article, and all changes are tracked; the rules are few — stay factual and unbiased, cite your sources — and recently some more "authoritative" editors have been given authority to override whining ax grinders. But over all, it is still an astonishing experiment in group participation. Interestingly, in Wikipedia, most users seem to believe that the more edited an entry is (that is, the more touched and changed by many different people), the more authority it has. That kind of democratization of authority is nearly unique to wikis that are group edited, since not observation, but active participation in improvement, is the authority metric.

Another venerable Web authority maker takes another tack, by relying on the judgment of a few smart editors, but getting lots of recommendations from users. In many respects Boing Boing is an old-school edited resource. It doesn't incorporate feedback or comments, but rather is a publication constructed by five editor-writers. It has become a hub of what's interesting and unusual on the Web. Get noticed by these guys, and you get a lot of traffic. They are conferrers of validity and constructors of cool for a great deal of the technophile community. As the online environment matures, most social spaces in many disciplines will have their own "boingboings."

The examples I've discussed are by no means fully representative of all authority mechanisms currently in place in online arenas — I haven't mentioned eBay's buyer-seller ratings, or Technorati's rating of the authority of blogs. But all are different models for computed analysis of user-generated authority, many of which are based on algorithmic analysis of participatory engagement. The emphasis in such models is often not on finding scarce value, but on weeding abundance.

S o what will be the next step? What is Authority 3.0, or Web 3.0? We will soon be awash in hundreds of billions of pages of content. How can we make sense of them? x Most technophile thinkers out there believe that Web 3.0 will be driven by artificial intelligences — automated computer-assisted systems that can make reasonable decisions on their own, to preselect, precluster, and prepare material based on established metrics, while also attending very closely to the user's individual actions, desires, and historic interests, and adapting to them.

The models of algorithmic filtration are just now beginning to show up. I've been involved with a few Web projects that may hint at some characteristics that are semi-Web 3.0. Two are the National Academies Press book-specific Search Builder and its Reference Finder — tools that take algorithmically extracted key phrases (pulled from a specific chapter or book) and put them to new purposes.

The Search Builder enables a researcher to select terms from a chapter and make term-pairs for sending a search request to Google, MSN, Yahoo, or back to the National Academies Press, resulting in very precise answers. That sort of reuse of language, casting language nets rather than single-term hooks, holds promise for automated intelligences. The Reference Finder is a working prototype (of a type we will no doubt see more of later) where a researcher can simply paste in the text of a rough draft, push a button, and get likely related NAP books returned, based on the algorithmically extracted and weighted key terms from the rough draft. Such tools are simultaneously about precision and "fuzzy matching."

One of the most exciting applications I've seen recently comes from Microsoft. Its Photosynth product automatically links up thousands of photographs of an area, enabling an almost three-dimensional exploratorium. While now in prototype, it shows the extent of how "connecting the dots" can become a meaning- and coherence-making mechanism on its own. Millions of correlations are intersected, knitting together images of different size, scope, and perspective. Imagine how that same model could be applied to concepts, ideas, and themes derived from the extracted language of textual material, and one begins to see how such a navigational tool could help make sense of immense content resources.

In the Web 3.0 world, we will also start seeing heavily computed reputation-and-authority metrics, based on many of the kinds of elements now used, as well as on elements that can be computed only in an information-rich, user-engaged environment. Given the inevitable advances in technology, remarkable things are likely to happen. In a world of unlimited computer processing, Authority 3.0 will probably include (the list is long, which itself is a sign of how sophisticated our new authority makers will have to be):

  • Prestige of the publisher (if any).
  • Prestige of peer prereviewers (if any).
  • Prestige of commenters and other participants.
  • Percentage of a document quoted in other documents.
  • Raw links to the document.
  • Valued links, in which the values of the linker and all his or her other links are also considered.
  • Obvious attention: discussions in blogspace, comments in posts, reclarification, and continued discussion.
  • Nature of the language in comments: positive, negative, interconnective, expanded, clarified, reinterpreted.
  • Quality of the context: What else is on the site that holds the document, and what's its authority status?
  • Percentage of phrases that are valued by a disciplinary community.
  • Quality of author's institutional affiliation(s).
  • Significance of author's other work.
  • Amount of author's participation in other valued projects, as commenter, editor, etc.
  • Reference network: the significance rating of all the texts the author has touched, viewed, read.
  • Length of time a document has existed.
  • Inclusion of a document in lists of "best of," in syllabi, indexes, and other human-selected distillations.
  • Types of tags assigned to it, the terms used, the authority of the taggers, the authority of the tagging system.

None of those measures could be computed reasonably by human beings. They differ from current models mostly by their feasible computability in a digital environment where all elements can be weighted and measured, and where digital interconnections provide computable context.

What are the implications for the future of scholarly communications and scholarly authority? First, consider the preconditions for scholarly success in Authority 3.0. They include the digital availability of a text for indexing (but not necessarily individual access — see Google for examples of journals that are indexed, but not otherwise available); the digital availability of the full text for referencing, quoting, linking, tagging; and the existence of metadata of some kind that identifies the document, categorizes it, contextualizes it, summarizes it, and perhaps provides key phrases from it, while also allowing others to enrich it with their own comments, tags, and contextualizing elements.

In the very near future, if we're talking about a universe of hundreds of billions of documents, there will routinely be thousands, if not tens of thousands, if not hundreds of thousands, of documents that are very similar to any new document published on the Web. If you are writing a scholarly article about the trope of smallpox in Shakespearean drama, how do you ensure you'll be read? By competing in computability.

Encourage your friends and colleagues to link to your online document. Encourage online back-and-forth with interested readers. Encourage free access to much or all of your scholarly work. Record and digitally archive all your scholarly activities. Recognize others' works via links, quotes, and other online tips of the hat. Take advantage of institutional repositories, as well as open-access publishers. The list could go on.

For universities, the challenge will be ensuring that scholars who are making more and more of their material available online will be fairly judged in hiring and promotion decisions. It will mean being open to the widening context in which scholarship is published, and it will mean that faculty members will have to take the time to learn about — and give credit for — the new authority metrics, instead of relying on scholarly publishers to establish the importance of material for them.

The thornier question is what Web 3.0 bodes for those scholarly publishers. It's entirely possible that, in the not-so-distant future, academic publishing as we know it will disappear. It's also possible that, to survive, publishers will discover new business models we haven't thought of yet. But it's past time that scholarly publishers started talking seriously about new models, whatever they turn out to be — instead of putting their heads in the sand and fighting copyright-infringement battles of yesteryear.

I also don't know whether many, or most, scholarly publishers will be able to adapt to the challenge. But I think that those who completely lock their material behind subscription walls risk marginalizing themselves over the long term. They simply won't be counted in the new authority measures. They need to cooperate with some of the new search sites and online repositories, share their data with outside computing systems. They need to play a role in deciding not just what material will be made available online, but also how the public will be allowed to interact with the material. That requires a whole new mind-set.

I hope it's clear that I'm not saying we're just around the corner from a revolutionary Web in which universities, scholarship, scholarly publishing, and even expertise are merely a function of swarm intelligences. That's a long way off. Many of the values of scholarship are not well served yet by the Web: contemplation, abstract synthesis, construction of argument. Traditional models of authority will probably hold sway in the scholarly arena for 10 to 15 years, while we work out the ways in which scholarly engagement and significance can be measured in new kinds of participatory spaces.

But make no mistake: The new metrics of authority will be on the rise. And 10 to 15 years isn't so very long in a scholarly career. Perhaps most important, if scholarly output is locked away behind fire walls, or on hard drives, or in print only, it risks becoming invisible to the automated Web crawlers, indexers, and authority-interpreters that are being developed. Scholarly invisibility is rarely the path to scholarly authority.

Michael Jensen is director of strategic Web communications for the National Academies. A version of this essay was originally presented as a speech given at Hong Kong University Press's 50th-anniversary celebration last fall.

http://chronicle.com
Section: The Chronicle Review
Volume 53, Issue 41, Page B6

Print: The Chronicle: 6/15/2007: The New Metrics of Scholarly Authority

No comments: