A Natural Language Search Engine By Powerset English Language Essay

The purpose for this hunt engine was to happen targeted replies to inquiries which are the reverse of the keyword based hunt. This enabled the user to seek the web by inquiring a inquiry, for illustration a user can seek the web like “ What is a Powerset? ” And the Powerset algorithm searches the web by understanding the inquiry and attempts to returns the web pages which answer the inquiry.

“ On the other manus keyword based hunt is non capable of understanding the context of a inquiry but alternatively it searches the web word by word, for illustration if a user types “ what is a Powerset? ” the keyword based hunt attempts to happen keywords with “ what ” , ” is ” , “ a ” and “ Powerset ” . ”[ CITATION Fen99 l 2057 ]

“ The Powerset was implemented foremost on Wikipedia, which brought new, rich semantic dimension via natural linguistic communication question processing and improved the hunt and reading experience ” ( Powerset, 1 ) .

Some of the Powerset characteristics ( which are found on Wikipedia ) include:

Facts can be expanded to expose more of the extracted verbs and their associated words and constructs.

Powerset helps to bring forth a sum-up of the cardinal facts.

Powerset besides tags things and actions found by its linguistics analysis engine on the page.

Can supply direct replies to questions from Wikipedia.

Some categories which proved that Powerset is successful

Pictures extracted from ( Powerset brings the Semantic Web to Wikipedia, 2008 )

hypertext transfer protocol: //i.i.com.com/cnwk.1d/i/bto/20080511/powerset10_540x258.jpg

The above shows how Powerset retrieves the informations in such an elegant manner and easy to understand this allows the user make a better usage of the web.

The below image shows a screen shooting that Powerset fails to make its end, it was n’t presented in a orderly mode as the image above and some irrelevant informations came up.

Pictures extracted from ( Powerset brings the Semantic Web to Wikipedia, 2008 )

hypertext transfer protocol: //gigaom.files.wordpress.com/2008/06/file-3.gif

Bing which is a Microsoft hunt engine is powered by Powerset engineering.

Powerset was utile in questioning the web which helped a batch of users find what they want on the web by as simple inquiry.

Google Language Tools

“ Google linguistic communication tools translate a subdivision of text, papers or web page into another linguistic communication without the intercession of human transcribers ” ( Inside Google Translate, 2011 ) . This service was introduced by Google in 2007 under the name Google Translate. By this linguistic communication tools Google have made the cyberspace easier to voyage for 1000000s of people worldwide.

The Google interlingual rendition works in a manner that it looks for forms in 100s of 1000000s of paperss which were inputted and translated by human transcribers, with these forms it helps you to find the best interlingual rendition for you. Google translate is capable of making intelligent conjectures by the agencies of seeking forms in big sums of text “ statistical machine interlingual rendition ”

“ Some of the restrictions which Google translate curries is that it is merely limited to a figure of paragraph ” ( Google Translate, 28 ) , although it can assist the reader to understand the general content of a foreign linguistic communication, it does non ever has the precisely same significance.

While making some trials on the Google translate I have found out that it does non ever guesses/translates the right significance for each word, but the bulk of the instances it was successful.

One of my inputs was in Maltese “ jien inhobb imur nigri ” which should hold been translated in “ I like to travel running ” but it was translated to “ I like to travel I’am running ” although the effect of the sentence is at that place, but the grammar is n’t right, which Google translates lacks a spot.

Another illustration below shows how the grammar is n’t in the right format while interpreting, I should hold been “ I went to the metropolis by walk “

A successful trial which I carried out is shown below where the grammar was done right.

The Google linguistic communication tool is really powerful which by clip can be used to interpret anything, one of the advantages that it has is that the interlingual renditions existent clip.


Wolfram|Alpha is a computational cognition engine, were the end product is generated by making calculations on its ain internal cognition base which its cognition base is updated continually in existent clip. Wolfram|Alfa differs from other hunt engine because it does non seek the web and returns any links, it aim is to convey expert-level cognition to everyone.

The range of Wolfram|Alpha is to inquire it anything of any sort of systematic factual cognition. Wolfram Alpha is built on Mathematica which is a complete functional-programming bundle which encompasses computing machine algebra, symbolic and numerical calculation, visual image and statistics capablenesss.

Some of Wolfram Alpha successful questions are:

Were the end product consequences were simple and easy to understand, but although it had some positive consequences Wolfram Alpha returned some negative consequences while doing some questions as shown in the given screen shooting below.

In the illustration below no consequences were returned but when the same question was inputted in a hunt engine satisfactory consequences were returned as shown below.

This proves that tungsten alpha has some disadvantages over the web hunt engines, and might hold some informations information which is non that much updated.

Question 2: Machine Learning

In this inquiry you are asked to research the WEKA machine larning toolkit ( a

transcript of which is included in the Cadmium that accompanies your survey usher, but which

can besides be downloaded from the Internet ) .

Download the UCI informations set from the WEKA web site, and choice three numeral

classii¬?cation jobs from the information set and three classii¬?ers from those available

in WEKA. Run each classii¬?er on each job, and obtain informations depicting the

classii¬?er ‘s public presentation on each job. You may wish to change the classii¬?er ‘s

parametric quantities to better its public presentation.

Make usage of all available resources ( including the Internet, and the WEKA certification ) as necessary to assist you understand the assorted options and parametric quantities

in WEKA. Submit a written papers depicting the information sets, naming the classi-

i¬?ers used, and depicting their public presentation on each information set.

One of the chosen datasets which I downloaded from UCI is “ breast-cancer.arff ” it consists of 10 properties and 286 cases.

The first property is age: it represents nine age groups which start from ten to 99 and have an age interval of nine old ages between one group and another, this age groups represent the figure of people which were diagnosed with chest malignant neoplastic disease within that group.

The 2nd property is menopause: it represents those people which were examined and were diagnosed with chest malignant neoplastic disease during their climacteric.

The Third property is tumor-size: this property is represented by grouping the people which were examined by age groups, there are 12 age groups and have a four twelvemonth age interval and on the other side it has the figure of people which were examined for this trial.

The 4th property is inv-nodes: this property represents the figure of the nodes that the was diagnosed during the scrutinies.

The 5th property is node-caps: this property represents the figure of the nodes that the was diagnosed during the scrutinies

The 6th property is deg-malig: this property contributes in capturing the information how powerful is the chest malignant neoplastic disease in those people which were diagnosed with this malignant neoplastic disease.

The 7th property is breast: this captures the information on which breast the malignant neoplastic disease was located, right or left chest.

The eight property is breast-quad: this captures the information on where the malignant neoplastic disease was found in what location of the chest.

The 9th property is irradiat: this captures the informations on how many people where diagnosed with chest malignant neoplastic disease against those people which were non diagnosed with malignant neoplastic disease.

The last attribute category: this captures the information if there were any reoccurrences of people being diagnose once more with chest malignant neoplastic disease, against those people which were ne’er were diagnosed once more of chest malignant neoplastic disease.

Below you find a ocular interface of Weka on the above properties:

The three classifiers used on this dataset were:

weka.classifiers.trees.J48 -C 0.25 -M 2

Where as shown below this classifier got an truth of 75.87 % this per centum is low on this type of categorization because there are 69 wrong cases which amount to 24.1 % which drops the public presentation.


This classifier is less efficient so the first one this is due to the addition in the falsely classified cases as shown below.

The public presentation of this classifier is 72.028 % and holding a sum of 206 right classified cases from 286.

weka.classifiers.functions.SMO -C 1.0 -L 0.0010 -P 1.0E-12 -N 0 -V -1 -W 1 -K “ weka.classifiers

The public presentation of this classifier is 76.2238 % which when comparison to the other two classifiers it is much better, this rise in efficiency was contributed to the fact that it had more right classified cases but the clip take to put to death the classifier was a small spot more than others.

The 2nd dataset which I downloaded was called “ hepatitis.arff ” this represent how many have died or lived when being diagnosed with hepatitis. This information set got six properties and 155 cases. The properties are:

Age: the patients are grouped by age, and the information shows how much they lived, and how much they died when they were diagnosed with hepatitis.

Sexual activity: the patients are grouped by their gender, it shows how many female and male have died when they were diagnosed with hepatitis.

Steroid: this groups the patients by steroid, it shows how many patients survived or died by taking or non taking steroids.

Fatigue: this information is grouped by Fatigue of a patient, this information shows how fatigue a patient was when taking intervention and if he survived or non.

Anorexia: this is grouped by anorexia, this shows how many patients which were diagnosed by hepatitis had suffered from anorexia as good and how of import it was for their endurance or dead.

Class: this shows all the patients which died and survived.

Below you see a visual image of all the properties which I described above:

The first classifier which was applied to this dataset was

weka.classifiers.functions.SMO -C 1.0 -L 0.0010 -P 1.0E-12 -N 0 -V -1 -W 1 -K “ weka.classifiers.functions.supportVector.PolyKernel -C 250007 -E 1.0 ”

The public presentation of the above classifier is 79.3548 % which consequences in 123 right classified cases out of 155, which means there are about 21 % of wrong classified cases, the tally clip of this classifier takes 9.33 seconds.

The 2nd classifier which I ran on this dataset was:

weka.classifiers.trees.J48 -C 0.25 -M

The undermentioned public presentation consequences were obtained:

The truth consequences were the same as the first classifier but the run clip of this classifier is 0.11 seconds which is a important betterment which proves that this classifier is much more efficient than the other.

The 3rd classifier which I ran on this dataset is:

weka.classifiers.trees.RandomForest -I 10 -K 0 -S 1

The public presentation of this classifier is the most accurate from the two above, its current truth is 94.8387 % which consists of 147 right classified cases and 8 falsely classified cases which amount to 5.1613 % . the build clip for this categorization is 0.24 seconds which makes it one of the most accurately and fast classifier.

The 3rd dataset which I used is the “ iris.arff ” which defines the features of a flower ; it consists of 5 properties and 150 cases. The properties are:

sepallength: contains the length of the flower.

Sepalwidth: contains the breadth of the flower.

Pedallength: contains the length of the flower pedal.

Pedalwidth: contains the pedal breadth of the flowers.

Class: contains the flowers which were collected to be analyzed.

Below you find the ocular graph of the above properties.

The first classifier which I ran on this information set is

weka.classifiers.trees.RandomForest -I 10 -K 0 -S 1

This classifier as shown above resulted in 100 % truth with 150 identified as right classified cases. The tally clip to was of 0.04 seconds which makes it one of the most efficient classifier identified so far.

The 2nd classifier which I ran on this dataset is:

weka.classifiers.trees.SimpleCart -S 1 -M 2.0 -N 5 -C 1.0

The classifier above ran in 0.017 seconds, which is rather good, the classifier identified 147 as right classified cases out of 150 which makes it efficiency up to 98 % and the falsely classified cases amounted to merely 2 % .

The 3rd classifier which I ran on the dataset was:


the public presentation of the above classifier is one of the worst which I identified so far, it merely identifies 50 right instances out of 150 which makes it merely 33.33 % efficient, while the wrong cases amounted to 66.67 % the build clip of this classifier was 7.22 which is a batch when compared to other classifiers.

Question 3: Doctrine of AI

In 1997, a computing machine cheat participant called Deep Blue defeated Garry Kasparov, so

universe cheat title-holder and the highest-rated participant of all time, in a six-game lucifer. During an earlier lucifer ( in 1996 ) , Kasparov claimed that Deep Blue “ understood the

game ” and that he could “ smell a new sort of intelligence across the tabular array. “ Write an essay discoursing Deep Blue ‘s public presentation in the 1996 and 1997 lucifers

against Kasparov, and whether or non Deep Blue proves the Weak and/or Strong

AI claim. Your essay should depict the algorithms and engineering used to make

Deep Blue, and turn to the issue of whether or non knowledge of these techniques

might alter your reply to the inquiry of whether it satisi¬?es the Weak and/or

Strong AI claim.

You may do usage of any and all available beginnings ( including Russell and Norvig,

your local library, and the Internet ) to i¬?nd information about Deep Blue and Kasparov, provided they are citable and decently referenced. You may non mention

Wikipedia-press releases, reviewed diary and conference articles, and newspaper and magazine articles are all i¬?ne. Marks will be awarded for decently researched and referenced work.

Your essay should non transcend 2500 words.

Question 3

Deep Blue was developed by IBM, and was specifically designed for a chess plan. The first introduction for Deep Blue against a homo was in 1996 versus Kasperov, were the two were even for four games but in the last two games Kasparov pinpointed Deep Blue failings and exploited them and finally won the two staying games, but in 1997 Deep Blue encountered Kasparov once more this clip Deep Blue won two lucifers against one with and three other lucifers being a draw.

While IBM were planing Deep Blue they were faced with some challenges: needed to turn out a solution to win the human World category title-holder under ordinance clip control and that the games had to play out no faster than three proceedingss per move, this prove to be some existent challenges for IBM.

These issues were solved in 1996 but the Deep Blue “ spreads in their appreciation of cheat cognition, which a human opposition would merely work. ”[ CITATION Fen99 l 2057 ]In fact that was one of the grounds why Deep Blue lost his first lucifer against Kasperov.

In 1997 Deep Blue hardware consisted in parallel computing machine with “ 30 IBM RS6000 processors and making alpha beta hunt, the most important portion of the Deep Blue were the 480 usage VLSI cheat processors which allowed move coevals and travel ordination for the last few degrees of the tree and besides evaluated the foliage nodes ”[ I ]. Deep Blue was capable of seeking up to 30 billion places per move and it was capable of making 14 deepnesss. It appears that one of the key successes seems to hold been capable to bring forth remarkable extensions beyond its depth bound of 40 hemorrhoids. The rating map had more than 8000 characteristics which many of them describes extremely specific forms of information. The Deep Blue besides had an unfastened book of about 4000 places and a database of 700,000 grandmaster games from which consensus recommendation could be taken or extracted, moreover the Deep Blue besides had a big end game database which consisted of solved places incorporating all places with five pieces and many with six pieces every bit good. The database had the influence in set uping the deepness hunt which allows the Deep Blue to play exactly even when it moves many times off from the checkmate.

The algorithm of the Deep Blue consisted of an Alpha Beta hunt, the alpha Beta hunt is in many ways similar to the two participant parallel of deepness hunt subdivision and edge, which is dominated by A* in the individual agent instance. This type of hunt is used in way determination and graph traverse, some of the advantages that Alpha Beta hunt have that it can extinguish subdivisions of the hunt tree as shown in the Figure 1.File: AB pruning.svg


By extinguishing certain subdivisions the algorithm will be more efficient and more dependable.

The hunt tree shown in figure 1 “ orders the move of the best value foremost, so the first move searched for a given place is besides the best move or at least a out of use move for that place ”[ CITATION Fen99 l 2057 ]. In the figure, the leftmost subdivision of the hunt tree besides represents the chief variation- ” the conjectural line where both sides play the best move. Therefore, we must take all considerations to them-no defense to a best move exists. On the other manus, all the sibling moves to a chief fluctuation move are inferior and have at least one defense we must analyze. ”[ CITATION Fen99 l 2057 ]

A defense line exhibits the perennial form of a tree degree with one defense followed by a tree degree of all responses. This is the characteristic growing form of an alpha-beta hunt

tree. The algorithm lets a chess plan hunt to approximately twice telling is close to best-first ordered.

For the plan like Deep blue that makes about 40 bilmillion hunts per move this search the “ alpha beta hunt increases its velocity by 40 million times ”[ CITATION Fen99 l 2057 ]but the velocity depends chiefly on the quality ordination.

The statement done by Kasperov is really strong statement to A.I, although it is really interesting it really hard in my sentiment to back up this statement, although the A.I is acquiring stronger, it ‘s really hard to implement common sense to computing machines, common sense allows the computing machines to understand unusual behaviour, for illustration in the Deep Blue illustration IBM demand besides to include other illustrations which are non covered in the cheat books to be able to do it full cogent evidence and have the ability to do it believe like a human being otherwise by clip worlds can happen an feat which the AI habit cater for, but one time they are inputted into the system it can became reasonably powerful. But on the other manus computing machines can be more faster and dependable than worlds because they can be after much rapidly and expeditiously.

While making the research I came across a quotation mark of Kasperov which he said

“ The machine refused to travel to a place that had decisive short term advantage – demoing a really human sense of danger ” ( Kasperov 1997 )

The portion where he says “ demoing a really human sense of danger ” its shows that the A.I is cognizant if its current environment and with the techniques used for the Deep Blue, some of the same techniques can be used for twenty-four hours to twenty-four hours jobs. Some of these jobs which A.I can undertake is like Spam combat, Logisitics planning, Robotics, Machine Translation.

A.I helps the Spam combat by larning algorithms, this type of algorithm is really utile because spammers ever change their tactic by the manner they spam people so its of import that this algorithm will be a learning algorithm.

Logisticss be aftering now a yearss is utile and powerful and that ‘s why A.I comes in topographic point, in 1991 the U.S forces deployed a Dynamic Analysis and Replanning Tool called DART. This tool allowed them to make machine-controlled logistics planning and programming for transit, with the same tactics that Deep Blue used. The US forces when they used the DART the logistics where planned in hours while with older methods normally it took them hebdomads to carry through to the same consequences.

Roboticss is besides a proof illustration of utilizing AI in a convenience mode, robotics are like vacuity cleaners for place usage etc.

At first I was skeptic how am I traveling to endorse Kosmorov claim, but when I get downing researching I noticed that Kosmorov was right and his claim is really strong. In my sentiment Kosmorov was on the brick viing with a new technological epoch which made our lifes easier and much more expeditiously.


Bing Search Engine. ( 2011, march 29 ) . Retrieved Aprikl 2011, 1, from wikipedia: hypertext transfer protocol: //en.wikipedia.org/wiki/Bing_ ( search engine )

Google Translate. ( 28, March 2011 ) . Retrieved March 2011, 29, from Wikipedia: hypertext transfer protocol: //en.wikipedia.org/wiki/Google_Language_Tools # Features_and_limitations

Hsu, F.-h. ( 1999 ) . IBM ‘S DEEP BLUE CHESS. 81.

Inside Google Translate. ( 2011 ) . Retrieved March 2011, 28, from Google Translate: hypertext transfer protocol: //translate.google.com/about/intl/en_ALL/

Powerset. ( 1, April 2011 ) . Retrieved April 2011, 1, from Wikipedia: hypertext transfer protocol: //en.wikipedia.org/wiki/Powerset_ ( company )

Powerset brings the Semantic Web to Wikipedia. ( 2008, may 11 ) . Retrieved March 28, 2011, from cnet: hypertext transfer protocol: //news.cnet.com/8301-13953_3-9938959-80.html

Powerset vs. Cognition: A Semantic Search Shoot-out. ( 7, Jun 2008 ) . Retrieved March 2011, 28, from GIGAOM: hypertext transfer protocol: //gigaom.com/2008/06/07/powerset-vs-cognition-a-semantic-search-shoot-out/

Russell, S. J. , & A ; Norvig, P. ( 2010 ) . Artificial Intelligence Third Edition International Edition. New Jersey: Pearson.

Wolfram Alpha. ( 2011, march 22 ) . Retrieved March 2011, 28, from Wikipedia: hypertext transfer protocol: //en.wikipedia.org/wiki/Wolfram_Alpha