-
Notifications
You must be signed in to change notification settings - Fork 8
/
Copy pathData Cleaning, Exploration of Unstructured Dataset
92 lines (46 loc) · 1.17 KB
/
Data Cleaning, Exploration of Unstructured Dataset
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
# coding: utf-8
# In[1]:
# Importing Libraries
import pandas as pd
# In[2]:
# Create an object and Load the dataset
NY_FireDept_ds = pd.read_csv('C:\\Users\\pravinw\\Documents\\DATA SCIENTIST\\FDNY\\FDNY.csv')
# In[3]:
# View the content
NY_FireDept_ds.describe
# In[4]:
# View the first 5 records
NY_FireDept_ds.head()
# In[5]:
# Skip the Duplicate Header row
NY_FireDept_ds = pd.read_csv('C:\\Users\\pravinw\\Documents\\Fractal\\DATA SCIENTIST\\FDNY\\FDNY.csv', skiprows=1)
# In[6]:
# Verify the Dataset
NY_FireDept_ds.head()
# In[7]:
# Data Statistics
NY_FireDept_ds.describe()
# In[8]:
# View the attributes
NY_FireDept_ds.columns
# In[9]:
# View the index of the dataset
NY_FireDept_ds.index
# In[10]:
# Count of records of each attribute
NY_FireDept_ds.count()
# In[11]:
# view datatype of all attribute
NY_FireDept_ds.dtypes
# In[12]:
# Group the borough
groupby_borough = NY_FireDept_ds.groupby('Borough')
# In[13]:
# Individual Borough count
groupby_borough.size()
# In[14]:
# Select Fire Dept information for Manhattan
Manhattan_borough = groupby_borough.get_group('Manhattan')
# In[15]:
# View the Manhattan Fire Information
Manhattan_borough