(d) Comment on the usefulness of Nave Bayes by focusing on problems as- sociated with predicting labels without the assumption of conditional independence. You may choose to use the problem provided to you in this question to augment your statements.

icon
Related questions
Question

just need D part done

Question-7
Text Data Classification In this question, we will dry-run the Naïve Bayes' (parts
(b)-(d)) classification algorithm to predict if the sentiment of the text data provided. You
are provided the following reviews and labels:
(a)
(b)
(c)
(d)
| Sentiment | Text |
Training
Test
Positive very enjoyable
Positive really surprising
Positive really fun and enjoyable
Negative not surprising
Negative very boring and predictable
? pretty predictable and no fun
Table 1: Review data
First let us use the trained weights provided in Table 2 to predict the
sentiment of the test data. We label a positive review as '1' and negative review as
'1', take the decision boundary to be 0. You are provided a description of the features
and their respective trained weights in the table below. Use this information to predict
the sentiment of the test review. (Make appropriate assumptions about positive and
negative words)
| 04 | Feature Description
Bo Bias
0₁ Count of Positive Words
02
Count of Negative Words
03 log(word count)
|Weight Value
0.6
1.2
-3.5
0.1
Table 2: Features and Trained Weights
We now use Naïve Bayes to predict the sentiment of the review, use
'Laplace add-1 smoothing' on Table 1 to compute the likelihoods for all words in
training data. Populate the table given below:
E
Word | P(Word +) | P(Word |-)
very
enjoyable
Table 3: Likelihoods of Training Data
Using the table in part (b), predict the sentiment of the test data.
Comment on the usefulness of Nave Bayes by focusing on problems as
sociated with predicting labels without the assumption of conditional independence.
You may choose to use the problem provided to you in this question to augment your
statements.
12
Transcribed Image Text:Question-7 Text Data Classification In this question, we will dry-run the Naïve Bayes' (parts (b)-(d)) classification algorithm to predict if the sentiment of the text data provided. You are provided the following reviews and labels: (a) (b) (c) (d) | Sentiment | Text | Training Test Positive very enjoyable Positive really surprising Positive really fun and enjoyable Negative not surprising Negative very boring and predictable ? pretty predictable and no fun Table 1: Review data First let us use the trained weights provided in Table 2 to predict the sentiment of the test data. We label a positive review as '1' and negative review as '1', take the decision boundary to be 0. You are provided a description of the features and their respective trained weights in the table below. Use this information to predict the sentiment of the test review. (Make appropriate assumptions about positive and negative words) | 04 | Feature Description Bo Bias 0₁ Count of Positive Words 02 Count of Negative Words 03 log(word count) |Weight Value 0.6 1.2 -3.5 0.1 Table 2: Features and Trained Weights We now use Naïve Bayes to predict the sentiment of the review, use 'Laplace add-1 smoothing' on Table 1 to compute the likelihoods for all words in training data. Populate the table given below: E Word | P(Word +) | P(Word |-) very enjoyable Table 3: Likelihoods of Training Data Using the table in part (b), predict the sentiment of the test data. Comment on the usefulness of Nave Bayes by focusing on problems as sociated with predicting labels without the assumption of conditional independence. You may choose to use the problem provided to you in this question to augment your statements. 12
Solution:
(a) We have:
HI
[In(5)]
Our prediction is given by:
h(0, z),
Which we use to find our label and therefore sentiment using the following function:
h(0,x) > 0)
-1 h(0,x) <0
-{
Word
very
enjoyable
really
surprising
,0 -
Note that when h(0,r) lies on the decision boundary, i.e., 0, we can randomly choose any
label.
We can compute our prediction as:
h(0,z)-0-z
-0.6+1+1.2+2+(-3.5)+1+0.1.In(5)
0.6
1.2
-3.5
0.1
= -0.34 <0
Therefore, we can predict our test review as 'negative'.
(b) The populated table is given below:
| P(Word |+) | P(Word |—)
2/17
2/15
3/17
1/15
3/17
2/17
fun
2/17
and
2/17
not
1/17
boring
1/17
2/15
predictable 1/17
2/15
Table 4: Likelihoods of Training Data
We can calculate a priori probabilities as
(c) Using Naive Bayes assumption, we can define our probabilities as:
P(+|Data) = P(pretty|+)P(predictable|+)P(and|+)P(no|+)P(fun|+)P(+)
P(-|Data) = P(pretty|_)P(predictable|_)P(and|-)P(no|-)P(fun|-)P(-)
We can get rid of the new words and rewrite as follows:
P(+|Data) = P(predictable|+)P(and|+)P(fun]+)P(+)
P(-|Data) = P(predictable|_)P(and|_)P(fun|-)P(-)
P(+)- 3/5
P(-)=2/5
Finally we can calculate probabilities as:
1/15
2/15
1/15
2/15
2/15
P(+|Data) = P(predictable|+)P(and+)P(fun+)P(+)
− (1/17) + (2/17) + (2/17) (3/5)
= 0.000488
P(-|Data)=P(predictable|_)P(and|_)P(fun|-)P(-)
− (2/15) + (2/15) + (1/15) + (2/5)
-0.000474
Since P(+|Data) > P(-|Data), we predict the sentiment as 'positive”.
Transcribed Image Text:Solution: (a) We have: HI [In(5)] Our prediction is given by: h(0, z), Which we use to find our label and therefore sentiment using the following function: h(0,x) > 0) -1 h(0,x) <0 -{ Word very enjoyable really surprising ,0 - Note that when h(0,r) lies on the decision boundary, i.e., 0, we can randomly choose any label. We can compute our prediction as: h(0,z)-0-z -0.6+1+1.2+2+(-3.5)+1+0.1.In(5) 0.6 1.2 -3.5 0.1 = -0.34 <0 Therefore, we can predict our test review as 'negative'. (b) The populated table is given below: | P(Word |+) | P(Word |—) 2/17 2/15 3/17 1/15 3/17 2/17 fun 2/17 and 2/17 not 1/17 boring 1/17 2/15 predictable 1/17 2/15 Table 4: Likelihoods of Training Data We can calculate a priori probabilities as (c) Using Naive Bayes assumption, we can define our probabilities as: P(+|Data) = P(pretty|+)P(predictable|+)P(and|+)P(no|+)P(fun|+)P(+) P(-|Data) = P(pretty|_)P(predictable|_)P(and|-)P(no|-)P(fun|-)P(-) We can get rid of the new words and rewrite as follows: P(+|Data) = P(predictable|+)P(and|+)P(fun]+)P(+) P(-|Data) = P(predictable|_)P(and|_)P(fun|-)P(-) P(+)- 3/5 P(-)=2/5 Finally we can calculate probabilities as: 1/15 2/15 1/15 2/15 2/15 P(+|Data) = P(predictable|+)P(and+)P(fun+)P(+) − (1/17) + (2/17) + (2/17) (3/5) = 0.000488 P(-|Data)=P(predictable|_)P(and|_)P(fun|-)P(-) − (2/15) + (2/15) + (1/15) + (2/5) -0.000474 Since P(+|Data) > P(-|Data), we predict the sentiment as 'positive”.
Expert Solution
steps

Step by step

Solved in 3 steps

Blurred answer