November 12, 2005

Sound & Vision: Site for the Blind, and Towards Computer Understanding of Language

Sometimes I'm hesitant to post things in my blog because I don't have time to do research on every idea I think up, so a bunch of them are going to be things of which other people have already thought, and posting half-complete versions of ideas that others have fleshed out is at best an exercise in arrogance and at worst harmful (in its inaccuracy) to the field I'm exploring.

However, I recently decided, "Screw that, it's my blog, and I can ramble about whatever topics I like." I'll just preface some of my ideas with a disclaimer: I'm not an expert in this field, I'm just playing with ideas in my head, and I hope people who know more will expand and expound on what I'm explaining.

--

On 'seeing' for the blind: my previous post alluded to a device in which a blind person hooked a camera up to her forehead, a bunch of piezoelectric rods to her chest (or feet or tongue, maybe), connect the two, and could "see." This has already been done, as I thought, but I have been unable to determine if my twists have been implemented (I've actually done a bit of research here, and written one of the leading scientists, who is on the road and can't answer), so I'll explain my whole idea, and place it in the public domain.

The full idea is you run an edge-detection algorithm on the image from the camera (in real-time), so that instead of trying to present the full data you only present the edges, which is what the human eye is good at detecting anyways (cf, "mach banding"). I believe that, using this system, it would be very easy to detect, say, a post two feet in front of your eyes, because if you moved your head back and forth five or six inches the edges of the post would whizz across your chest in a very noticeable fashion, and even though the chest may not be innervated enough to detect subtle difference in a static field of poking rods representing complete image data, it can certainly feel if there's something brushing across it if you use edge-detection.

Thus, the emphasis would be on detecting motion, not depth or color or hue or brightness or anything like that. Many animals have relatively poor eyesight but are extremely good at detecting motion (cf, owls), and they do OK. Hell, owls can fly.

There are a couple of cool twists you could add to this system. One would be using an infra-red camera and adding a bright infra-red LED that was offset from the camera by a couple feet. This would provide 'hatchet lighting' in low-light or high-ambient-light conditions, which would give very dramatic shadows to close objects, which would enhance the edge-detection algorithm, and make detecting motion even easier for objects within about 15 feet. (You can also imagine military uses of this system; you could attach a back-facing camera to a special forces dude and put a piezo panel on his back, and he's got eyes in the back of his head.)

The other twist would be to add real-time optical character recognition and barcode reading to the system, so that every n frames you grab the scene and see if there are any words / codes that are recognizable, and if you so read them aloud to the wearer, while simultaneously vibrating the area in which you found the word / code. In this way, once a blind person found, say, a post at a street corner, she could look up it and have the street sign read to her. Or, if she were pawing through her cabinets, she could look at boxes and cans and have the items read to her.

I know real-time barcode reading is possible (because, uh, I invented it), I suspect real-time OCR is possible as well, given that we've had basic OCR forever.

--

I've been thinking about how to work towards teaching computers to "understand" languages, with the goal of creating a simple universal translator. (Simple!)

What I'd like to see is a system where we can translate sentences from any language into a neutral format, and then translate that neutral format into words in the target language, but NOT try to put them together into grammatically correct structures. This is why I call it a simple translator.

I don't think this is a particularly hard problem, really; I think it's mainly tedious, and I've been trying to think of how to divide up the work; possibly create a wiki or some other kind of public-supported site so we can build up a universal knowledge base once and for all, much like the wikipedia.

My thinking is: we need to assign a unique code (call it a number) to every concept in the entirety of our existence. (Simple!) For example, look at the sentence, "Mary had a little lamb." "Had" as in "owned" would be, say, concept number 23. But "had" can also mean "ate," which would be concept number 2,031. "Had", as in knew carnally, would be concept number 65,312.

So, the english word "had" would have a strong association with 23, and then weaker associations with 2,031 and 65,312. We might throw out the weaker associations if they aren't re-inforced by the associations found in the rest of the piece; there appears to be no more mention of Mary eating her lamb or loving it in a way that would make John Cornyn excited, so we can probably discount those associations.

We end up with a list of concepts for each word (or possibly phrase, in the case of idioms) in the source text. We can take it further by recognizing parts of speech, and building our little concept list into a tree, sixth-grade sentence-diagram style. (Eg, prepositions and clauses start their own little tree, and adjectives hang off of the nouns they modify.)

Now, what I'm curious about is just translating these raw trees into another language and then outputting them as trees, not as sentencces. That is, I don't want to create a grammatically correct English sentence from a Japanese one; I want to see how the Japanese sentence was structured. I am sure I could figure out its original meaning, and it would also be a fascinating learning experience. Also, it would introduce fewer translation errors into the equation to simply require people to learn the structure of other languages (which is pretty trivial, really, n'est-ce pas?).

The things we could do with a system like this are amazing. Imagine playing World of Warcraft and being able to talk to people from all over the world, in their language, automatically. Imagine hooking this up to iChat so that you could iChat with anyone in real-time.

But once you had this system, we could start working on teaching the computer to actually understand our speech, in limited amounts. We could teach the computer facts in its language (that is, the numbered concept language) instead of trying to teach them in some arbitrary human language. For instance, we could teach it that Granny Smith apples are red when ripe, and it would really know this. We'd have to be smart about associating concepts with each other; eg, the concept for "red" would have a link to its class, which would be the concept for "color". But once we did this, we could ask for the "color" of any object in the fact-base, and it'd work. In any language.

This is a huge undertaking, and I know that parts of this have been done before, but never on the right scale. The idea I'm playing with is how we can use the net to have people add concepts to the initial language database (leaving aside the knowledge base, which I see as a second step, after we have concepts in), and making something useful from that before we tackle associating the concepts which each other.

Ceci n'est-ce pas un pipe!

Labels:

27 Comments:

Anonymous Anonymous said...

Yes, you could teach it that Granny Smith apples are red when ripe. But this reminds me of the Steve Martin routine about teaching your kids how to say everything wrong... (Granny Smiths are in fact green.)

November 12, 2005 4:14 PM

 
Anonymous Anonymous said...

Reminds me of Neal Stephenson's Philosophickal Language from the Baroque Cycle novels... Daniel Waterhouse would be proud of you Wil.

November 12, 2005 4:54 PM

 
Blogger Carl Johnson said...

One difficulty is that some languages have concepts that others don't. So, for example, Japanese doesn't have plurals or articles. But we don't have all their degrees of politeness. But they don't have our concept that calling people fat to their face or asking about how much they make is impolite…

Or even just words are tricky. In English, we have the concept of a door, but in Japanese, they have "doa"s and "toh"s. A "doa" is a Western style door. A "toa" is a sliding door. But it can also be a sliding window! So, if you have "Close the door" in English, it's hard to know from the grammatical structure whether to make it a "doa" or a "toh." Plus, if you have close the "toh," you have to guess if they mean sliding door or sliding window. Anyhow, while I think it would be cool, I'm guess that it's more work than people can be asked to do.

November 12, 2005 5:49 PM

 
Blogger thomas Aylott said...

I call dibs on the
…assign a unique code (call it a number) to every concept in the entirety of our existence.

I'll get right on that.

---

Vision system:
I was thinking along similar lines when I was a kid. But, like all crazy dreams back then, it never moved out of the daydreaming phase.

Now, we have the resources and wikis and whatnot to actually make this project a go in the real world.

November 12, 2005 9:14 PM

 
Anonymous Anonymous said...

Holy intro AI class first order logic batman!

November 12, 2005 11:24 PM

 
Blogger Wil Shipley said...

I took AI in college but they didn't really talk about such a concept. I guess AI has grown up since I was a kid. I still don't see any public databases with word -> concept matching.

Carl:

"doa" would have both "door" and "western" attached to it.

November 12, 2005 11:51 PM

 
Blogger Matt Schinckel said...

Reminds me of Neal Stephenson's Philosophickal Language from the Baroque Cycle novels...

I knew I heard of this concept somewhere!

November 13, 2005 1:50 AM

 
Anonymous Anonymous said...

I think it's "Ceci n'est pas une pipe", without the "-ce".

November 13, 2005 4:56 AM

 
Anonymous beercake said...

There was a concept out there once, where a group developed a "automated" translation system, by first transforming the language into an intermediate one (in fact this intermediate one was even a real language, which had a lot of common with all other languages it could translate).

But in all cases, doing automated translation will lead to the topic of AI. And looking at newer developments there, it seems Intelligence always needs some kind of embody. That's why "most" of the research now goes into the direction of "Embodied Artifical Intelligence". For short, they claim: without a body which interact with the environment, there is no intelligence. As understanding language is a very intelligent act, doing "programmatic" translation will only work for very small domains.

If you are interessted, there is a good book out there about Embodied AI:

http://www.amazon.com/exec/obidos/tg/detail/-/026266125X
The first few chapters will give a very good introduction into all the problems we face (e.g. symbol grounding, etc)

If you don't want to read it :-) there was once a online lecture on this: tokyolectures.org

November 13, 2005 5:21 AM

 
Anonymous resistor said...

There's a nice summary of the current major attempts at machine-translation here: http://www.lojban.org/files/why-lojban/mactrans.txt

While its focus is on Lojban (an artificial computer-parsable language), it has a good survey of the techniques.

Basically, the Analysis-Transform-Synthesize model currently produces the best results, but is limited in that we weill likely never totally understand the transformations between languages. Interlangua (like you describe) might be theoretically more powerful, but requires the ability for the computer to perform first-order logic, which means AI.

November 13, 2005 3:31 PM

 
Blogger andyc said...

You should check out Wordnet, which they refer to as a lexical database, but it really comes down to being a structured word data with superclass and subclass concepts (referred to as hypernyms and hyponyms). Another nice aspect of Wordnet is that it is somewhat multilingual. Despite its flaws, a pretty useful and underused resource - shouldn't spotlight realise that when I search for an "elevator" I may also want results with a "lift"? But that's a bit off topic anyway.

I've done some work with a group attempting to do English parsing, and what I do know is that building parse-trees is fairly difficult. They've taken a statistical approach to building these trees, as you have suggested. The difficulty they have is writing a grammar that is both simple and sums up the english language, and then finding training data for their system.

Keep posting your ideas! It gets the creative juices flowing in everyone.

November 13, 2005 3:53 PM

 
Blogger Abhi Beckert said...

There's a nuts ex-teacher (either math or english, not sure) who hangs around our market, the guy claims to have studied some kind of amazing math formula for translating languages...

It's a long shot, but I'll ask him for some references next time I see him. ;)

November 13, 2005 4:52 PM

 
Anonymous Anonymous said...

What a great topic!

Wil, I think the idea of a handheld camera that translates signs into speech is a wonderful idea. I'm imagining a device where the user point a small web camera like iSight at a sign or menu and the software (perhaps running on an iPod) speaks the text through a earpiece. The code for converting scanned images into text exists. The code for converting text to speech exists (native on the mac). So it seems reasonable to put this together in a neat little package. I don't think this type of device exists, and would be very useful since many signs and such are not in braille.

It would make a trip to the supermarket possible for a blind person.

November 13, 2005 10:37 PM

 
Anonymous Mark Whybird said...

Assigning a number to each concept... isn't that exactly what a theasurus attempts to do, at least to some extent? One of the public domain thesuari would at least be a good starting point, surely

November 13, 2005 11:05 PM

 
Anonymous Anonymous said...

Japanese does have the concept of plurals. The word ending "tachi" pluralizes any noun referring to people. As for articles, don't get me started.

However, while languages have different linguistic concepts, what the language describes should be common.

In English, you might say to turn off the lights, while in Japanese you say to shut the lights, which are two different things, in a literal sense (not to mention English has the definite article while Japanese has all those relational particles) but the concept of making the light go away so we return to the natural state (of most likely dark) is concept #432 in either case.

As for doa and toh, the concept of opening a portal is the same. The details could probably be described by sub-concepts, if necessary (remember, this is meant to be a rough translation). On the other hand, if we need to distinguish between a door and a window, we would fill that in from context just as in Japanese.

Admittedly, you might run into certain problems in the abstract. For example, in Japanese, "kangaeru" and "oboeru" represent different ways of using the brain in a way not represented in English. But, on the other, other hand, English can express the concept, which is the same as the difference between "looking" and "seeing" with the eyes.

So, I don't know. There's certainly something to this.

November 14, 2005 1:17 AM

 
Anonymous Uli Kusterer said...

It's unlikely nobody has thought of these ideas already. As others mentioned, your translation idea was one of the first techniques used by linguists. AFAIK it doesn't really work that easily (the number of concepts is too big, and languages tend to contain too little information to let a computer decide without an actual AI brain to correlate what is said to the world around it). But the basic principle is still being used. I guess you essentially just discovered the linguist's equivalent to the "EVA principle". It's fundamental, but it's also not at all useful by itself.

The "vision" idea sounds easy at first, but current research usually uses ultrasound, which apparently works a little better. There are actually blind people that know how to use sound ("smack" sound with their lips) to detect stairways, signposts etc. Kind of like bats.

Also, blind people often have very different needs than us seeing people. E.g. they usually don't mind that screen readers don't sound human, and set them to speak much faster than any seeing person would (thus approaching similar speeds as we would reading). Finally, I'm not sure the skin on the back or the belly is sensitive enough to work for this. Belly is more sensitive, but AFAIK the most sensitive spots are the hands and eyelids.

As to what device to use as a "display", I always thought of something like inflatable bubble-wrap. Each bubble is a pixel that is inflated, and depending on how much you inflate it, it's black, white, or some shade of grey. I'd suppose some VR researchers already tried that out to provide tactile feedback.

November 14, 2005 1:33 AM

 
Anonymous Uli Kusterer said...

A short Google later, I found an article on haptics research That shows they're actually working on this. Though it doesn't mention much what devices they use for this feedback.

November 14, 2005 1:43 AM

 
Blogger Eric said...

brilliant. technologies that are designed to assist those with disabilities are the exact technologies that manufacturers currently want to jump into. right now, i have the exclusive privilege to use video phones. this is really great tech, it's just too bad it's too expensive and closed to the rest of the world. i have no doubt that by the end of this decade, we will be seeing the most extraordinary advancements in life. not that i know something but that i understand it. ;)

good thinking man.

November 14, 2005 2:32 AM

 
Blogger Topher said...

Scientists are hard at work creating an artificial eye that jacks into the optic nerve or an alternative portion of the brain, a la Geordi LaForge's VISOR. That's where the tech is going, I'd bet.

November 14, 2005 11:55 AM

 
Anonymous kevin contzen said...

The translation idea presumes a certain amount about concepts: that they are discrete, enumerable entities. Some (Fodor) would agree with you, others wouldn't. Certainly words are discrete and enumerable, but the whole problem with translation you're trying to solve is that concepts aren't words. As I understand it, your project is to build a dictionary of concepts, and a fancy lookup table to move from language to dictionary to other language.

One problem I see: I don't see how this table of concepts is going to render a concept as anything more than a list of the various phrases used to express it in different languages. Under "036" for example, you'll find "black", "noir", "schwartz", etc. But really, what does the "036" add? I guess my question is, how is this different from taking all the french-english, german-french, etc., dictionaries and putting them all into one database? It'll certainly be tedious, but I don't know if it will advance the art or science of translation.

November 14, 2005 2:53 PM

 
Anonymous David Ayre said...

One poster already listed WordNet, and there have been many projects using this, some very related to your idea, see this bibliographic list: WordNet Bibliography.

There are quite a few projects which are trying to reproduce the WordNet ideas in other languages and cross referencing them. The OpenCyc ontology has some links to WordNet concept ids from which the more "common sense" knowledge (like apples change colors when they ripen, not sure if this is in there, but it has this type of stuff).

There are quite a few open source parsers out there too: Stanford's Parser, Dan Bikel's Multilingual Parser, and this "MontiLingua" python/java suite is pretty good lookin.

Probably one of the most difficult problems facing translation is disambiguating different senses of the words (eg. "He left the board behind." could mean board as in plank or board as in committee etc..). There has been a lot of research in this lately, competitions to evalutate research advancements (SENSEVAL), the papers are quite fascinating and inventive.. good reads for sitting on the shitter. A good overview of research (although 7 years old) in the WSD area can be found here... very interesting: Word Sense Disambiguation: The State of the Art.

blah blah blah

November 14, 2005 8:18 PM

 
Blogger David said...

wil -- you might be interested in some of the "common sense" stuff that Hugo Liu has been working on at the MIT media lab. http://web.media.mit.edu/~hugo/

anyway, it sounds like a really cool project.
-- hytmal

November 16, 2005 9:15 AM

 
Anonymous Anonymous said...

I'm not so sure this type of translation would work in practice without knowledge of context. Even people speaking the same language have misunderstandings when the context is unknown.

Thats why humans are so good at language; we remember tons of stuff and are able to link the items logically together. You translator would have to do this I would think.

Just purely translating a sentance wouldn't work IMHO. Take for example, in english someone says, "His face got red" and translate it literally, into "かれの顔は赤いをゲットした" (Kare No Kao Ha Akai Wo Getto Sita), if you translate it straight back into english it would come out as "His face got red," but the Japanese person would have a really hard time getting any sense of that!

How did his face "get" a red.. and a red what? Its a really tough problem...

In this context get means became, but then, in what situation does the translator leave "get" as get or translate it to mean "become"? In other situations get means understand like "Do you get it?" try translating that!

November 17, 2005 8:10 PM

 
Blogger Carl Johnson said...

I have a student who apparently learned that "get" means "understand" during her trip to America. Needless to say, she misused it in her essay, because she was being casual where she needed to be formal.

Also, Wil, you said, "'doa' would have both 'door' and 'western' attached to it," meaning that you're going to be compounding these things. But shouldn't they be atomic (indivisible)? Anyhow, my instinct would be to go the otherway, and say that a door is a "doa" plus a "toh." Who decides which way to slice things? Basically, anything that's one word in English is two in Japanese, and vice versa. In English, we have liberty and freedom, but the Japanese limp on by with just one word for both. We can "have a dog" and "have a nap," but the Japanese are going to have to split those ideas up. And on and on.

Also, another main problem with Japanese (which is definitely the poster child for "impossible to translate languages" but hear me out) is that word frequencies are way, way wrong. In Japanese, "different" has the conotation of "and therefore, wrong" and is so frequently used, that there's slang for it. People spontaneously shouting, "That's different!" in English would seem strange, but it's not out of place for Japanese. In winter, we might mix it up with "I'm chilly," but the Japanese prefer the straight ahead, "Coooold." The Japanese make not infrequent use of "mazui," which basically means "the opposite of delicious" but for some reason, there's no simple catch-all word for that in English, so your translation will sound odd when children keep saying "unappetizing" over and over.

With all those criticisms behind me, if you really do have a plan to tackle the problem, I admire your balls. I'm sure you know going into this, that other people had tried their hands at the problem, but figuring out why it won't work is easy. The tricky part making it work anyway.

One final bit of pedantry: かれの顔は赤いをゲットした is bad grammar, in addition to being incomprehensible. If your parser thinks that "red" is something you can get, it will use the noun form of red and say 赤をゲット instead of 赤いをゲット. On the other hand, if your parse is smart enough to realize that "red" is an adjective, it will probably also figure out that "to get" in this case means "to become."

Then of course, the real question becomes, if it all hinges on which meaning of "get" to use, why not ask the user which one they meant? For books, of course, we can't, but Wil isn't talking about translating books, is he?

November 18, 2005 9:08 AM

 
Blogger Abhi Beckert said...

I got the link:

http://dwmlc.com/

November 20, 2005 1:56 AM

 
Blogger Ben Markwardt said...

Your idea for translation is exactly what is being taught in my compiler class right now.

We're writing a simple compiler to convert a lisp derived language into forth. To do this we scan in the data, parse it into a tree and build a symbol table (dictionary) that contains both the entire lexicon and semantics of the language. Once you've got all this then you can traverse the tree however you'd like and reconstruct into a new language.

The nice thing about programming languages is that they aren't ambiguous so finding out what a symbol means is really straightforward. So real languages are a much larger challenge. Also, not all languages are as context-free as english. I think Hebrew is an example where would be really hard

February 08, 2006 9:55 AM

 
Anonymous Anonymous said...

an interesting article I found today regarding technology to restore sight

http://www.cnn.com/2006/TECH/04/24/tongue.sight.ap/index.html?section=cnn_topstories

April 24, 2006 10:34 PM

 

Post a Comment

<< Home