Member 1467
26 entries

Contributor to project:
Marfa, US
Immortal since Jan 12, 2008
Uplinks: 0, Generation 3

Space Canon
Claire L Evans Dot Com
Crystals, Vittles, Vitals
  • Affiliated
  •  /  
  • Invited
  •  /  
  • Descended
  • Claire L. Evans’ project
    The human species is rapidly and indisputably moving towards the technological singularity. The cadence of the flow of information and innovation in...
    Now playing SpaceCollective
    Where forward thinking terrestrials share ideas and information about the state of the species, their planet and the universe, living the lives of science fiction. Introduction
    Featuring Powers of Ten by Charles and Ray Eames, based on an idea by Kees Boeke.

    In case you didn't know, reality is science fiction.

    If you doubt me, read the news. Read, for example, this recent article in the New York Times about Carnegie Mellon's "Read the Web" program, in which a computer system called NELL (Never Ending Language Learner) is systematically reading the internet and analyzing sentences for semantic categories and facts, essentially teaching itself idiomatic English as well as educating itself in human affairs. Paging Vernor Vinge, right?

    NELL reads the Web 24 hours a day, seven days a week, learning language like a human would — cumulatively, over a long period of time. It parses text on the Internet for ontological categories, like "plants," "music" and "sports teams," then uses contextual clues to sort out what things belong in which categories, like "Nirvana is a grunge band" (see below) and "Peyton Manning plays for the Indianapolis Colts."

    In its self-taught exploration of Internet English, NELL is 87 percent correct. And the more it learns, the more accurate it will become. According to a paper called "Toward an Architecture for Never-Ending Language Learning," NELL has two tasks: to read, and to learn from that reading — to "learn to read better each day than the day before...go[ing] back to yesterday's text sources and extract[ing] more information more accurately."

    Like the premise of a dystopian sci-fi story, Read the Web is wonderful-terrifying. Wonderful, because we've designed a computer to teach itself, because it's a case study in life-long learning, and because the results will certainly be useful. Terrifying because it's difficult to look at a massive computer coming up accurate pronouncements like "bliss is an emotion" without feeling a shudder of horrible gravitas. That said, I am shuttering my fearmongering sci-fi mind and embracing NELL's mission, just one in a fascinating new field of research aimed at helping computers understand human language, using the Web as a key linguistic resource. The idea of a "Semantic Web," an Internet as comprehensible to computers as it is to humans, has been in the computer science and AI discourse for years, with good old Sir Tim Berners-Lee carrying the leading torch. In a 2001 article for Scientific American, Berners-Lee wrote that "this structure will open up the knowledge and workings of humankind to meaningful analysis by software agents, providing a new class of tools by which we can live, work and learn together."

    Upon discovering this project, I had tons of questions about NELL: could it read other languages? Who gets the data in the end? Does it have parental controls on? So I did what I always do in such cases, which is immediately write to the people in charge in the hopes of gleaning some information from them. In suit, here is a brief interview with the very gracious Professor Tom Mitchell, chair of the Machine Learning Department of the School of Computer Science at Carnegie Mellon University, and Burr Settles, a Carnegie Mellon postdoctoral fellow working on the project.


    Universe: At the moment, NELL is learning language and semantic categories in English, which would mean that its learning is limited to the output of the English-speaking world. Are there any plans to expand the program to different languages?

    Professor Tom Mitchell: Interestingly, NELL's learning methods can apply equally well to other western languages as they do to English (as long as the language uses the same character set as English). We started with English because, well, we speak English. And also because that is the most-used language on the web, and we wanted NELL to have access to lots of text.

    Burr Settles: In principle, the technology driving NELL is language-independent, so there is reason to believe that, given a corpus of Spanish or Chinese, it could learn equally as well. In fact, I suspect there are some languages it would perform even better with; for example syntax and orthography are generally more consistent in Spanish than in English, so the Spanish NELL might learn much more quickly and accurately.

    Universe: Could an advanced NELL-like computer teach itself another language?

    Burr Settles: Quite possibly. For example, imagine that NELL learns a lot about The French Revolution from English-language documents, and also knows (because we say so, or maybe because it read so!) that Wikipedia pages have corresponding translations in other languages. If NELL assumes the facts available on the English- and French-language Wikipedia pages for The French Revolution are roughly equivalent, then it could use its Knowledge to start to infer patterns, rules, word morphologies, etc. in French, and then start reading other French-language documents.

    This isn't unlike the way humans can easily pick up certain words (concrete nouns, prepositions) when traveling in foreign-language countries. I know, because I just got back from two weeks in Spain, which is why I'm absent from that fabulous New York Times photo!

    Universe: When will NELL stop running?

    Professor Tom Mitchell: We have absolutely no intention of stopping it from running. NELL stands for "Never Ending Language Learner." We mean it, though of course we need to make research progress if we want to give it the ability to continue learning in useful ways.

    Universe: Is NELL reading the web indiscriminately, or have you set it loose on particular corners of the Internet that are more conducive to language-learning (say, Wikipedia)?

    Professor Tom Mitchell: NELL primarily uses a collection of 500,000,000 web pages that represent the most broadly popular, highly referenced pages on the web. But it also uses Google's search engine to search for additional pages when it is looking for targeted information (e.g., for pages that will teach it more about sports teams). So it's not in some corner of the web, but all over it.

    Burr Settles: Currently, NELL reads indiscriminately. Of course, it tends to learn about proteins and cell lines mostly from biomedical documents, celebrities from news sites and gossip forums, and so on. In future versions of NELL, we hope it can decide its own learning agenda, e.g., "I've not read much about musical acts from the 1940s... maybe I'll focus on those kinds of documents today!" Or, alternatively, we could say we need it to focus on a particular document. Previous successes in "machine reading" research have in fact relied on a narrow scope of knowledge (e.g., only articles about sports, or terrorism, or biomedical research) in order to learn anything. The fact that NELL learns to read reasonably well across all of these domains is actually a big step forward.

    It has been interesting to hear the public's response to NELL. There are many jokes about what will happen when it comes across 4chan or LOLcats, for example. But the reality is, those texts are already available to NELL, and it is largely ignoring them because they are so ill-formed and inconsistent.

    Universe: Say NELL learns the English language well enough to be a Shakespearean scholar. What happens to the data then — do Google and Yahoo and DARPA get access to it?

    Professor Tom Mitchell: Yes, and so will everybody. Already we have put NELL's growing knowledge base up on the web. You can browse it, and also download the whole thing if you like. Furthermore, I am committed to sticking to this policy of making NELL's extracted knowledge base available for free to anybody who wants to use it for any commercial or non-commercial purpose, for the life of this research project.

    Universe: Lastly, the name NELL is a joke about the Jodie Foster movie, right?

    Professor Tom Mitchell: Well, no. I didn't really know about that movie...but I just took a look at NELL's knowledge base, and it appears to know about it. Take a look. There, the light grey items are low confidence hypotheses that NELL is considering but not yet committing to. The dark black items are higher confidence beliefs. So it is considering that NELL might be a movie, a disease, and/or a writer, but it's pretty confident that Jodie Foster starred in the movie...
      Promote (8)
      Add to favorites
    Create synapse
    Let's talk about the God Particle.

    It strikes me that people refer to the Higgs boson as the "God particle" in the same way some call the iPhone the "Jesus phone": with an almost pointed disregard for what such a prefix actually means. Considering the intensity of the culture wars, the popularity of the moniker is baffling. Is this about contextualizing the abstraction (and grandeur) of particle physics in a way "regular" people can understand? Does this represent a humanist concession to the religious? If so, can religious culture really be swayed by such a transparent ploy — y'know, it gives things mass, just like on Sundays?

    I know the use of "God particle" is largely a media problem, born of the Leon Lederman book of the same name, and that most scientists find it maddeningly overstating of the particle's qualities and importance. Lederman himself came out of a long tradition of scientists using "God" as colorful shorthand for the mysterious workings of Nature, rather than literally. Albert Einstein, who famously over-used the word, was not religious as much as a Spinozan humanist, explaining that "we followers of Spinoza see our God in the wonderful order and lawfulness of all that exists." This usage was not uncommon, but in a post-Intelligent Design scientific discourse, the habit has waned. And, while we scramble to find new, immediately relatable metaphors for "that grandiose, awe-inspiring quality of the Universe which eludes us," God does in a pinch.

    Yet punctuating the language about an elusive subatomic particle with the G-word seems like just the kind of thing that would infuriate anti-science religious nuts, or at least strike them as besides the point. I can't help but think of Yuri Gagarin, in 1961, returning from the first manned space mission and saying, "I looked and looked but I didn't see God." Did the certainly unsurprising revelation that the Creator wasn't lounging around in space like the man in the moon shatter global theology? Of course not — "I looked and didn't see God" is irrelevant if you believe (like the Catholic Church) that God exists in a realm outside of physics, or even the physical world. The discovery of the Higgs boson should reveal what the universe is physically made of, at its deepest level, but it shouldn't make a difference to those who see the making itself as an act of God. Which raises the question: do we say "God" particle because its existence would debunk religion, or because it would be an ultimate example of the manifold complexity of God's creation (ostensibly)? More importantly, of these two radically different readings, which is the most common?

    When the New York Times uses the phrase in headlines without discussion, which version of the phrase does its readership infer? It's impossible to know, and this rattles me. Language has a hypnotic, iterative power: with every use, a word becomes more engrained into its new context, increasingly impossible to view objectively. "God particle" has become a colloquialism for "Higgs boson," and it does neither physics nor the idea of God any service. Rather, it sells them both short: by implying that the questions we deal with in physics are so easily reducible, and that the Higgs might have any effect on how the religious see the world.

    "God particle" is a convenient phrase. It haphazardly gets at the importance of the whole enterprise — and it definitely grabs people's attention. Still, its meaning has become unclear, and no real information can be gleaned from it.

    At best, it hints at weightiness; at worst, it simplifies the Higgs to the point of obfuscation.
      Promote (3)
      Add to favorites (1)
    Create synapse