I am trying to find the KNN for the dataset provided but i keep getting the error "ValueError: Input contains NaN, infinity or a value too large for dtype('float64')." and i dont know why.  Please help.

Database System Concepts
7th Edition
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Chapter1: Introduction
Section: Chapter Questions
Problem 1PE
icon
Related questions
Question

I am trying to find the KNN for the dataset provided but i keep getting the error "ValueError: Input contains NaN, infinity or a value too large for dtype('float64')." and i dont know why.  Please help.

4 Female
2 Femaie
2 Female
O Female
A
B
D
E
F
G
H
K
L
M
P
1 CustomerlD Zip Code
Spending Scc AppleUser
39
AnnualIncon Spouse
Children
Gender
Miles from W Has Winter Age
PACKAGE
NUMBER_VI: ALL-INCLUSIVE
50138
4 Male
AMale
2
1
15
1
19
39
39
39
3
50138
15
1
19
4
3
50138
15
4 Male
1
19
5
4
50138
15
4 Male
1
19
6
5.
41270
15
1
3 Male
21
81
1
6
41270
15
1
3 Male
1
21
81
1
8
41270
15
3 Male
2.
1
21
81
1
1
6
10
8
41270
15
1
3 Male
2.
21
81
9
19492
16
1
2 Female
1
20
6
1
11
10
19492
16
2 Female
1
20
6
1
12
11
19492
16
2 Female
1
20
6
1
6
77
77
77
77
13
12
19492
16
1
2 Female
20
O Female
O Female
O Female
O Female
14
13
45449
16
4
23
15
14
45449
16
4
23
16
15
45449
16
4
23
17
16
45449
16
4
23
18
17
37312
17
1
4 Female
4
31
40
1
2
19
18
37312
17
1
4 Female
4
31
40
1
2
20
19
37312
17
1
4 Female
4
31
40
1
2
21
20
37312
17
1
4 Female
4
31
40
1
2
1 Female
76
76
76
76
6
22
21
55641
17
1
2
22
23
22
55641
17
1
1 Female
2
22
24
23
55641
17
1
1 Female
2
22
25
24
55641
17
1
1 Female
2
22
26
25
81834
18
1
4 Female
1
1
35
2
27
26
81834
18
1
4 Female
1
1
35
6
2
2
28
27
81834
18
1
4 Female
1
1
35
6
2
29
28
81834
18
1
4 Female
1
1
35
6
2
2
2 Female
2 Female
2 Female
30
29
21068
18
2
1
23
94
31
94
30
21068
18
1
2
23
-
32
31
21068
18
1
2
1
23
94
2
94
3!
33
32
21068
18
2 Female
2
1
23
34
1 Male
1 Male
1 Male
33
71939
19
2
64
2
5
5
5
35
34
71939
19
2
64
3
1
2.
36
35
71939
19
1
2
64
3
1
2.
37
36
71939
19
1
1 Male
2
64
3
1
2.
5
72
72
72
72
14
38
37
12230
19
1
2 Female
4
30
1
2
1
39
38
12230
19
2 Female
4
30
1
2
1
40
39
12230
19
1
2 Female
4
30
2
41
40
12230
19
1
2 Female
4
30
1
2
1
3 Male
3 Male
3 Male
42
41
82529
19
3
4
1
67
1
1
7
14
14
14
4
14
4
99
43
42
82529
19
4
1
67
1
7
02325
44
43
82529
19
4
1
67
1
7
45
44
82529
19
3 Male
4
1
67
1
7
56844
O Female
O Female
O Female
46
45
56844
19
3
1
35
2
56844
99
99
99
15
!
47
46
56844
19
3
35
2
1
48
47
56844
56844
19
3
35
2
O Female
OPemale
2 Female
49
48
56844
19
3
1
35
1
2
1
50
51
49
49439
20
4
58
5
49439
15
!
50
20
2 Female
4
58
1
5
TS
52
5
5
15
15
77
77
77
51
49439
20
2 Female
4
58
1
53
52
49439
20
2 Female
4
58
54
53
24689
20
1
2 Female
4
24
2
55
54
24689
20
2 Female
1
24
2
56
55
24689
20
2 Female
4
1
24
2
57
56
24689
20
1
2 Female
4
1
24
77
2
58
57
30489
20
2 Male
3
37
13
1
59
58
30489
20
2 Male
3
1
37
13
1
2.
60
61
62
63
64
59
30489
20
2 Male
3
37
13
1
13
79
79
79
60
30489
20
2 Male
3
1
37
61
75251
20
2 Male
3
22
2
62
75251
20
1
2 Male
3
22
2
63
75251
20
2 Male
3
22
2
Transcribed Image Text:4 Female 2 Femaie 2 Female O Female A B D E F G H K L M P 1 CustomerlD Zip Code Spending Scc AppleUser 39 AnnualIncon Spouse Children Gender Miles from W Has Winter Age PACKAGE NUMBER_VI: ALL-INCLUSIVE 50138 4 Male AMale 2 1 15 1 19 39 39 39 3 50138 15 1 19 4 3 50138 15 4 Male 1 19 5 4 50138 15 4 Male 1 19 6 5. 41270 15 1 3 Male 21 81 1 6 41270 15 1 3 Male 1 21 81 1 8 41270 15 3 Male 2. 1 21 81 1 1 6 10 8 41270 15 1 3 Male 2. 21 81 9 19492 16 1 2 Female 1 20 6 1 11 10 19492 16 2 Female 1 20 6 1 12 11 19492 16 2 Female 1 20 6 1 6 77 77 77 77 13 12 19492 16 1 2 Female 20 O Female O Female O Female O Female 14 13 45449 16 4 23 15 14 45449 16 4 23 16 15 45449 16 4 23 17 16 45449 16 4 23 18 17 37312 17 1 4 Female 4 31 40 1 2 19 18 37312 17 1 4 Female 4 31 40 1 2 20 19 37312 17 1 4 Female 4 31 40 1 2 21 20 37312 17 1 4 Female 4 31 40 1 2 1 Female 76 76 76 76 6 22 21 55641 17 1 2 22 23 22 55641 17 1 1 Female 2 22 24 23 55641 17 1 1 Female 2 22 25 24 55641 17 1 1 Female 2 22 26 25 81834 18 1 4 Female 1 1 35 2 27 26 81834 18 1 4 Female 1 1 35 6 2 2 28 27 81834 18 1 4 Female 1 1 35 6 2 29 28 81834 18 1 4 Female 1 1 35 6 2 2 2 Female 2 Female 2 Female 30 29 21068 18 2 1 23 94 31 94 30 21068 18 1 2 23 - 32 31 21068 18 1 2 1 23 94 2 94 3! 33 32 21068 18 2 Female 2 1 23 34 1 Male 1 Male 1 Male 33 71939 19 2 64 2 5 5 5 35 34 71939 19 2 64 3 1 2. 36 35 71939 19 1 2 64 3 1 2. 37 36 71939 19 1 1 Male 2 64 3 1 2. 5 72 72 72 72 14 38 37 12230 19 1 2 Female 4 30 1 2 1 39 38 12230 19 2 Female 4 30 1 2 1 40 39 12230 19 1 2 Female 4 30 2 41 40 12230 19 1 2 Female 4 30 1 2 1 3 Male 3 Male 3 Male 42 41 82529 19 3 4 1 67 1 1 7 14 14 14 4 14 4 99 43 42 82529 19 4 1 67 1 7 02325 44 43 82529 19 4 1 67 1 7 45 44 82529 19 3 Male 4 1 67 1 7 56844 O Female O Female O Female 46 45 56844 19 3 1 35 2 56844 99 99 99 15 ! 47 46 56844 19 3 35 2 1 48 47 56844 56844 19 3 35 2 O Female OPemale 2 Female 49 48 56844 19 3 1 35 1 2 1 50 51 49 49439 20 4 58 5 49439 15 ! 50 20 2 Female 4 58 1 5 TS 52 5 5 15 15 77 77 77 51 49439 20 2 Female 4 58 1 53 52 49439 20 2 Female 4 58 54 53 24689 20 1 2 Female 4 24 2 55 54 24689 20 2 Female 1 24 2 56 55 24689 20 2 Female 4 1 24 2 57 56 24689 20 1 2 Female 4 1 24 77 2 58 57 30489 20 2 Male 3 37 13 1 59 58 30489 20 2 Male 3 1 37 13 1 2. 60 61 62 63 64 59 30489 20 2 Male 3 37 13 1 13 79 79 79 60 30489 20 2 Male 3 1 37 61 75251 20 2 Male 3 22 2 62 75251 20 1 2 Male 3 22 2 63 75251 20 2 Male 3 22 2
# -*- coding: utf-8 -*-
Created on Sat Apr 24 14:58:26 2021
@author: David
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
import pandas as pd
def plot_decision_regions (X, y, classifier, test_idx=None, resolution=0.02):
# setup marker generator and color map
markers = ('s', 'x', 'o', '^', 'v')
colors = ('gray', 'indigo', 'purple','yellow' , 'gray')
cmap = ListedColormap(colors [:len(np.unique(y))])
# plot the decision surface
x1_min, x1_max = X[:, 0].min() - .25, X[:, 0].max() + .25
x2_min, x2 max = X[:, 1].min() - .25, X[:, 1].max() + .25
xx1, xx2 = np.meshgrid(np.arange(x1_min, x1_max, resolution),
np.arange(x2_min, x2_max, resolution))
z = classifier.predict(np.array( (xx1. ravel(), xx2. ravel()1).T)
Z = Z. reshape(xx1. shape)
plt.contourf(xx1, xx2, z, alpha=0.4, cmap=cmap)
plt.xlim(xx1.min(), xx1.max())
plt.ylim(xx2.min(), xx2.max(0)
# plot all samples
X_test, y_test = X[test_idx, :), yltest_idx]
for idx, cl in enumerate(np.unique(y)):
plt.scatter(x=X[y == cl, 0], y=xly == cl, 1),
alpha-0.8, с-стар (idx),
marker-markers (idx], label=cl)
# highlight test samples
if test_idx:
X_test, y_test = X[test_idx, :), yltest_idx]
pīt.scatter(X_test[:, 0], x test[:, 1], c=",
alpha=1.0, linewidth=1, marker='o',
s=55, label='test set')
# Importing the dataset
dataset = pd. read_csv(r'/Users/jaylenmealing/Downloads/VisitJamaica_today.csv', sep="a")
X = dataset.iloc[:, [4, 121].values
y = dataset.iloc[:, 6].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0)
sc = StandardScaler()
X_train_std = sc.fit_transform(X_train)
X_test_std = sc.fit_transform(X_test)
X_combined_std = np.vstack( (X_train_std, X_test_std))
y_combined = np.hstack((y_train, y_test))
knn = KNeighborsClassifier(n_neighbors=5, p=2, metric='minkowski')
knn. fit(X_train_std, y_train)
plot_decision_regions (X_combined_std, y_combined,
classifier=knn, test_idx=range(600,725))
plt.title('K-NN (Training set)')
plt.xlabel('').
plt.ylabel('')
plt.show()
# Training the K-NN model on the Training set
from sklearn.neighbors import KNeighborsClassifier
classifier = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)
classifier.fit(X_train, y_train)
# Predicting the Test set results
y_pred = classifier.predict(X_test)
print(np.concatenate( (y_pred. reshape(len(y_pred),1), y_test.reshape(len(y_test), 1)),1))
# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy = accuracy_score(y_test, y_pred)
Transcribed Image Text:# -*- coding: utf-8 -*- Created on Sat Apr 24 14:58:26 2021 @author: David from sklearn import datasets from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.neighbors import KNeighborsClassifier import numpy as np import matplotlib.pyplot as plt from matplotlib.colors import ListedColormap import pandas as pd def plot_decision_regions (X, y, classifier, test_idx=None, resolution=0.02): # setup marker generator and color map markers = ('s', 'x', 'o', '^', 'v') colors = ('gray', 'indigo', 'purple','yellow' , 'gray') cmap = ListedColormap(colors [:len(np.unique(y))]) # plot the decision surface x1_min, x1_max = X[:, 0].min() - .25, X[:, 0].max() + .25 x2_min, x2 max = X[:, 1].min() - .25, X[:, 1].max() + .25 xx1, xx2 = np.meshgrid(np.arange(x1_min, x1_max, resolution), np.arange(x2_min, x2_max, resolution)) z = classifier.predict(np.array( (xx1. ravel(), xx2. ravel()1).T) Z = Z. reshape(xx1. shape) plt.contourf(xx1, xx2, z, alpha=0.4, cmap=cmap) plt.xlim(xx1.min(), xx1.max()) plt.ylim(xx2.min(), xx2.max(0) # plot all samples X_test, y_test = X[test_idx, :), yltest_idx] for idx, cl in enumerate(np.unique(y)): plt.scatter(x=X[y == cl, 0], y=xly == cl, 1), alpha-0.8, с-стар (idx), marker-markers (idx], label=cl) # highlight test samples if test_idx: X_test, y_test = X[test_idx, :), yltest_idx] pīt.scatter(X_test[:, 0], x test[:, 1], c=", alpha=1.0, linewidth=1, marker='o', s=55, label='test set') # Importing the dataset dataset = pd. read_csv(r'/Users/jaylenmealing/Downloads/VisitJamaica_today.csv', sep="a") X = dataset.iloc[:, [4, 121].values y = dataset.iloc[:, 6].values X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0) sc = StandardScaler() X_train_std = sc.fit_transform(X_train) X_test_std = sc.fit_transform(X_test) X_combined_std = np.vstack( (X_train_std, X_test_std)) y_combined = np.hstack((y_train, y_test)) knn = KNeighborsClassifier(n_neighbors=5, p=2, metric='minkowski') knn. fit(X_train_std, y_train) plot_decision_regions (X_combined_std, y_combined, classifier=knn, test_idx=range(600,725)) plt.title('K-NN (Training set)') plt.xlabel(''). plt.ylabel('') plt.show() # Training the K-NN model on the Training set from sklearn.neighbors import KNeighborsClassifier classifier = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2) classifier.fit(X_train, y_train) # Predicting the Test set results y_pred = classifier.predict(X_test) print(np.concatenate( (y_pred. reshape(len(y_pred),1), y_test.reshape(len(y_test), 1)),1)) # Making the Confusion Matrix from sklearn.metrics import confusion_matrix, accuracy_score cm = confusion_matrix(y_test, y_pred) print(cm) accuracy = accuracy_score(y_test, y_pred)
Expert Solution
steps

Step by step

Solved in 2 steps with 1 images

Blurred answer
Knowledge Booster
Types of trees
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.
Similar questions
  • SEE MORE QUESTIONS
Recommended textbooks for you
Database System Concepts
Database System Concepts
Computer Science
ISBN:
9780078022159
Author:
Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:
McGraw-Hill Education
Starting Out with Python (4th Edition)
Starting Out with Python (4th Edition)
Computer Science
ISBN:
9780134444321
Author:
Tony Gaddis
Publisher:
PEARSON
Digital Fundamentals (11th Edition)
Digital Fundamentals (11th Edition)
Computer Science
ISBN:
9780132737968
Author:
Thomas L. Floyd
Publisher:
PEARSON
C How to Program (8th Edition)
C How to Program (8th Edition)
Computer Science
ISBN:
9780133976892
Author:
Paul J. Deitel, Harvey Deitel
Publisher:
PEARSON
Database Systems: Design, Implementation, & Manag…
Database Systems: Design, Implementation, & Manag…
Computer Science
ISBN:
9781337627900
Author:
Carlos Coronel, Steven Morris
Publisher:
Cengage Learning
Programmable Logic Controllers
Programmable Logic Controllers
Computer Science
ISBN:
9780073373843
Author:
Frank D. Petruzella
Publisher:
McGraw-Hill Education