Member 2495
14 entries
27309 views

 RSS
Contributor to projects:
The Total Library
Polytopia
Mariana Soffer (F, 39)
Buenos Aires, AR
Immortal since Feb 16, 2010
Uplinks: 0, Generation 3

My blog
Ask me anything
FriendFeed
Twitter
I am an artificial intelligence researcher, studied in California a Master in Information Science and specialized in Genetic research there. Currently I am doing research on NLP (natural language processing), particularly in the opinion mining area. I am also interested in neuroscience, Buddhism, literature, music, anthropology among other things.
  • Affiliated
  •  /  
  • Invited
  •  /  
  • Descended
  • Mariana Soffer’s favorites
    From notthisbody
    To understand is to...
    From Xaos
    The Aesthetic Ground (part...
    From Alex Bodnar
    Language that has no words
    From Spaceweaver
    Antonio Damasio: This Time...
    From shiftctrlesc
    the original alphabet
    Recently commented on
    From Mariana Soffer
    Why do people play social...
    From Mariana Soffer
    Buddhist roots and...
    From Mariana Soffer
    Personal Information...
    From First Dark
    From Olena
    You are unoriginal, and so...
    Mariana Soffer’s projects
    Polytopia
    The human species is rapidly and indisputably moving towards the technological singularity. The cadence of the flow of information and innovation in...

    The Total Library
    Text that redefines...
    Now playing SpaceCollective
    Where forward thinking terrestrials share ideas and information about the state of the species, their planet and the universe, living the lives of science fiction. Introduction
    Featuring Powers of Ten by Charles and Ray Eames, based on an idea by Kees Boeke.
    From Mariana Soffer's personal cargo

    Introduction to Natural Language Processing
    Project: The Total Library

    Information overload
    It is great to have access to huge amounts of information, but since we are not reading faster than before, we can not take advantage of this new situation. Therefore the need of a discipline that help human beings deal with all that data is fundamental.

    Natural language processing is the process of building computational models for understanding natural language. It studies the problems of automated generation and understanding of natural human languages. NLP includes natural-language-generation systems that convert information from computer databases into normal human language and natural-language-understanding systems that convert samples of human language into more formal representations that are easier for computer programs to manipulate.

    NLP also studies the information contained in human generated texts, along with its language structure.NLP is a multidisciplinary field, which studies artificial intelligence techniques, multivariate statistics, linguistics and any other domain that can be used to process, generate or interpret language with computers.

    NLP Facts
    -The Turing test is a proposal for a test of a machine's ability to demonstrate intelligence. Described by Alan Turing in the 1950 paper "Computing Machinery and Intelligence," it proceeds as follows: a human judge engages in a natural language conversation with one human and one machine, each of which tries to appear human. All participants are placed in isolated locations. If the judge cannot reliably tell the machine from the human, the machine is said to have passed the test. In order to test the machine's intelligence rather than its ability to render words into audio, the conversation is limited to a text-only channel such as a computer keyboard and screen. This test was the first mainstream experiment related to NLP.
    -Text is the largest repository of human knowledge and is growing quickly, there are emails, news articles, web pages, chat archives, scientific articles, insurance claims, customer complaints letters, transcripts of phone calls, technical documents, government documents, patent portfolios, court decisions, contracts, and so on.

    -Nowadays we have access to huge amounts of information, much more than in the past decades, one of the problems with this is that we are not reading any faster than before, therefore we can not take full advantage of this new situation. NlP tries to optimize the human usage of information.

    -Dealing with natural language is a difficult task. We need to understanding multiple disciplines including multivariate statistics, learning algorithms, clustering, hidden Markov models and part of speech tagging. We need to have knowledge about language, grammar, ontology and folksonomy.

    -Processing of a huge amount of data in a limited amount of time is required so special algorithms are needed. We generally apply algorithms that have low computational cost or algorithms that allow reducing the amount of computational processing needed by pre-processing the data we have. To do this there are techniques for reducing the size of the text by extracting stop words, removing words that appear too often and also words that appear very few times.

    -The applications of NLP include answering queries, identifying spam, recognizing what is the main theme of a document, grouping similar texts, obtaining the main keywords of a document, detecting syntactic errors and identifying the secondary themes of a document.

    Sun, Feb 21, 2010  Permanent link

    Sent to project: The Total Library
      RSS for this post
    2 comments
      Promote (5)
      
      Add to favorites (2)
    Create synapse
     
    Comments:


    gamma     Wed, Feb 24, 2010  Permanent link
    As I was reading your post backwards, I suddenly came to the introduction...

    -Text is the largest repository of human knowledge and is growing quickly, there are emails, news articles, web pages, chat archives, scientific articles, insurance claims, customer complaints letters, transcripts of phone calls, technical documents, government documents, patent portfolios, court decisions, contracts, and so on.

    Although you post is entirely wonderful and logical, this part requires from you to be psychic. All those texts are entirely incomplete and contextual. We have to imagine what is going on in essence to explain it to anybody really, including to a computer. That is why understanding the language requires virtualization = psychic abilities.

    The second question is, what part of the natural language is natural? I had a feeling that by following the lesson from chaos theory, automatons should develop "naturally" or chaotically, but it seems that they are not very "absorbing" or adaptive.
    Mariana Soffer     Wed, Feb 24, 2010  Permanent link
    Great comment gamma, excelent thinking. Well one of the great problems we have with analyzing text meaning is that we need context, because context is everything, even dough you might have 2 very similar texts they probably do not mean the same if one was written by an indian guy in the eighteenhundreds a d the other by a teenager nowadays.
    Besides I think there is a lot of protocol and implicit rules on these texts that might say even more than what is explicitly written. They also vary a lot depending on the culture where it emerged from and the profile of the person who wrote it.
    But if you do NLP in a limited context like the current american scientific community, you can do a lot with these techniques, cause they have a very similar context.
    Regarding the question it is a very difficult think to answer, indeed you could say that human language is an artifice. Regarding the patterns I do belive that they exist everywhere like in how the different tongues merge and diverge, but they are influences also by an incredible large number of different factors such as earthquakes, economy and wars.
     
          Cancel