1 00:00:00,005 --> 00:00:04,000 - [Instructor] Reading a CSV file as an array of arrays 2 00:00:04,000 --> 00:00:05,004 isn't the only option. 3 00:00:05,004 --> 00:00:08,007 You can also read the data into a dictionary structure, 4 00:00:08,007 --> 00:00:12,004 and you can do this by using the DictReader class, 5 00:00:12,004 --> 00:00:14,007 which is also in the CSV module, 6 00:00:14,007 --> 00:00:16,000 so I will encourage you 7 00:00:16,000 --> 00:00:18,007 to refer to the documentation link for that. 8 00:00:18,007 --> 00:00:23,006 So let's open up our sample file, read_csv_dict.py, 9 00:00:23,006 --> 00:00:26,009 and what we're going to do is modify our original example 10 00:00:26,009 --> 00:00:30,007 to use the dictionary reader instead of the default setting. 11 00:00:30,007 --> 00:00:34,006 So my starting point code defines an empty dictionary, 12 00:00:34,006 --> 00:00:38,007 rather than an empty array that's going to hold the data. 13 00:00:38,007 --> 00:00:42,007 So once again, we're going to open up the file in read mode. 14 00:00:42,007 --> 00:00:44,006 So this should look familiar by now, 15 00:00:44,006 --> 00:00:51,007 so I'll open up the file name in read mode as CSV file, 16 00:00:51,007 --> 00:00:52,005 but this time, 17 00:00:52,005 --> 00:00:56,000 we're going to be using the DictReader class. 18 00:00:56,000 --> 00:01:04,008 So I'll type CSV DictReader and we'll pass in the CSV file. 19 00:01:04,008 --> 00:01:08,003 Alright, and then we're going to loop over each row 20 00:01:08,003 --> 00:01:09,007 and read one at a time, 21 00:01:09,007 --> 00:01:13,002 just as we did for the array example. 22 00:01:13,002 --> 00:01:19,007 So for row in reader. 23 00:01:19,007 --> 00:01:20,006 Now in this case, 24 00:01:20,006 --> 00:01:23,001 we're going to add the row to our dictionary, 25 00:01:23,001 --> 00:01:27,005 which requires a key for the row to go into. 26 00:01:27,005 --> 00:01:31,003 So I'm going to use the first value in the row as the key, 27 00:01:31,003 --> 00:01:33,002 but how do I get that value? 28 00:01:33,002 --> 00:01:37,005 Because remember, the row at this point is not an array, 29 00:01:37,005 --> 00:01:41,003 it's a dictionary of key-value pairs. 30 00:01:41,003 --> 00:01:45,000 So the DictReader class has a property called field names, 31 00:01:45,000 --> 00:01:48,007 which it uses to map the values in each row, 32 00:01:48,007 --> 00:01:50,008 and this property is a list 33 00:01:50,008 --> 00:01:55,001 that contains the values in the CSV header row. 34 00:01:55,001 --> 00:02:02,002 So field names sub zero will be given the value item name 35 00:02:02,002 --> 00:02:03,009 if we look at the inventory data, right? 36 00:02:03,009 --> 00:02:08,002 So this is going to be 0, 1, 2, and so on, 37 00:02:08,002 --> 00:02:10,002 and we'll use that as the value, 38 00:02:10,002 --> 00:02:13,003 is the key into the row data to get the next item. 39 00:02:13,003 --> 00:02:15,003 It's a lot easier to see this in action, 40 00:02:15,003 --> 00:02:17,000 so let's just run an experiment. 41 00:02:17,000 --> 00:02:19,004 So before we do our loop, 42 00:02:19,004 --> 00:02:22,003 let's just print the first row in the data. 43 00:02:22,003 --> 00:02:26,008 So I'm going to write row equals next on the reader, 44 00:02:26,008 --> 00:02:30,007 and then print the row, 45 00:02:30,007 --> 00:02:35,005 and I'm going to also print the field names property. 46 00:02:35,005 --> 00:02:38,009 So I'll call reader.fieldnames, right? 47 00:02:38,009 --> 00:02:41,008 So before we actually do our loop, 48 00:02:41,008 --> 00:02:46,000 let's just go ahead and run what we have. 49 00:02:46,000 --> 00:02:50,002 So let's run this in the terminal. 50 00:02:50,002 --> 00:02:52,005 Right, okay, so when we run this, 51 00:02:52,005 --> 00:02:55,006 you can see that the first row is a dictionary 52 00:02:55,006 --> 00:02:58,004 consisting of key-value pairs, 53 00:02:58,004 --> 00:03:01,005 and the names property contains the values 54 00:03:01,005 --> 00:03:05,000 that are used as the keys in each row. 55 00:03:05,000 --> 00:03:07,003 So if I want to use the item name 56 00:03:07,003 --> 00:03:09,003 as the overall dictionary key, 57 00:03:09,003 --> 00:03:11,002 I just need to get the name of that key 58 00:03:11,002 --> 00:03:13,003 from the field names property. 59 00:03:13,003 --> 00:03:19,004 So now let's go back and write our loop. 60 00:03:19,004 --> 00:03:24,004 So for each row in the reader on the data, 61 00:03:24,004 --> 00:03:29,001 I'm going to get in the row, 62 00:03:29,001 --> 00:03:31,009 I'm going to get the reader field names, 63 00:03:31,009 --> 00:03:34,003 and I'm going to use the product name. 64 00:03:34,003 --> 00:03:41,009 That's going to be at index zero is equal to the row of data. 65 00:03:41,009 --> 00:03:43,007 Alright, so now this sets that row of data 66 00:03:43,007 --> 00:03:45,009 into my dictionary. 67 00:03:45,009 --> 00:03:51,001 I no longer need the print statements, 68 00:03:51,001 --> 00:03:53,001 and I'm going to leave this next in place 69 00:03:53,001 --> 00:03:55,002 'cause I want to skip over the headers. 70 00:03:55,002 --> 00:03:57,000 I don't want the headers to be part of the data. 71 00:03:57,000 --> 00:04:01,005 So I want to skip that row and just do the actual data rows. 72 00:04:01,005 --> 00:04:04,009 I've already got my return data in place, 73 00:04:04,009 --> 00:04:06,003 and as we did earlier, 74 00:04:06,003 --> 00:04:08,005 let's go ahead and print out some results. 75 00:04:08,005 --> 00:04:11,004 I'm going to use the Python Pretty Print module, 76 00:04:11,004 --> 00:04:14,003 and rather, instead of the regular print statement 77 00:04:14,003 --> 00:04:15,007 for this example, 78 00:04:15,007 --> 00:04:20,000 so I'm going to write pprint.pprint, 79 00:04:20,000 --> 00:04:24,002 and I'm going to print out the inventory data. 80 00:04:24,002 --> 00:04:26,008 And then let's also do the same thing, 81 00:04:26,008 --> 00:04:32,002 I'm going to print out the inventory data for the key of Apple, 82 00:04:32,002 --> 00:04:33,006 and then let's also print out 83 00:04:33,006 --> 00:04:39,007 the consumer price for the apple. 84 00:04:39,007 --> 00:04:41,007 Alright. 85 00:04:41,007 --> 00:04:46,005 Okay, let's save that and then let's run the code. 86 00:04:46,005 --> 00:04:49,000 I'm going to run this in the terminal, 87 00:04:49,000 --> 00:04:51,002 and let's make the terminal bigger. 88 00:04:51,002 --> 00:04:53,009 Alright, so you can see that when I run the code, 89 00:04:53,009 --> 00:04:57,003 the data is now in a dictionary, right? 90 00:04:57,003 --> 00:05:02,005 And the key for each one of the values is the item name. 91 00:05:02,005 --> 00:05:04,006 So starting all the way up here, 92 00:05:04,006 --> 00:05:08,000 we've got avocado, bagel, baguette, banana, so on. 93 00:05:08,000 --> 00:05:10,002 It looks like we're missing apple. 94 00:05:10,002 --> 00:05:11,000 You know what? 95 00:05:11,000 --> 00:05:13,005 Maybe it was a bad idea to leave that next statement in. 96 00:05:13,005 --> 00:05:16,008 Let's go ahead and just take that out, 97 00:05:16,008 --> 00:05:21,003 and let's run this again. 98 00:05:21,003 --> 00:05:25,001 Alright, let's scroll all the way up. 99 00:05:25,001 --> 00:05:25,009 Okay, there we go. 100 00:05:25,009 --> 00:05:27,007 Apple's there now, avocado, 101 00:05:27,007 --> 00:05:29,000 and you can see that each key 102 00:05:29,000 --> 00:05:32,004 has this data object associated with it, 103 00:05:32,004 --> 00:05:37,000 and let's scroll all the way back down to the bottom. 104 00:05:37,000 --> 00:05:38,006 Right, so here at the bottom 105 00:05:38,006 --> 00:05:42,003 where we accessed the specific item name 106 00:05:42,003 --> 00:05:43,004 using the Apple key, 107 00:05:43,004 --> 00:05:46,006 we can see that it has that particular row. 108 00:05:46,006 --> 00:05:48,009 So it's got the category fruits, and so on, 109 00:05:48,009 --> 00:05:51,002 and then the last line where we get the consumer price, 110 00:05:51,002 --> 00:05:53,007 you can see that that's 0.75. 111 00:05:53,007 --> 00:05:56,008 Now if this seems a bit complicated, okay, 112 00:05:56,008 --> 00:05:58,003 you don't need to do it that way, 113 00:05:58,003 --> 00:06:02,005 because remember, you can use other values for the keys. 114 00:06:02,005 --> 00:06:04,002 So let's modify the code 115 00:06:04,002 --> 00:06:08,007 to just use an integer for each key. 116 00:06:08,007 --> 00:06:12,004 Alright, so rather than using the item name, 117 00:06:12,004 --> 00:06:14,006 I'm going to make a variable called key. 118 00:06:14,006 --> 00:06:17,000 I'm going to set that to zero, alright? 119 00:06:17,000 --> 00:06:20,000 And then for each row in the reader, 120 00:06:20,000 --> 00:06:23,009 what I'm going to do is, instead of doing this, 121 00:06:23,009 --> 00:06:27,008 I'm just going to have the key, that index be the key, 122 00:06:27,008 --> 00:06:33,001 and then I'm going to say key +=1, 123 00:06:33,001 --> 00:06:35,009 and then obviously we need to change our print statements. 124 00:06:35,009 --> 00:06:39,000 So this is no longer getting the apple. 125 00:06:39,000 --> 00:06:44,006 This is getting, let use zero as the index, alright? 126 00:06:44,006 --> 00:06:50,004 So let's run again. 127 00:06:50,004 --> 00:06:53,008 Alright, and sure enough, now you can see 128 00:06:53,008 --> 00:06:56,006 that instead of the keys being the item names, 129 00:06:56,006 --> 00:06:58,002 I have these integer values, 130 00:06:58,002 --> 00:07:01,005 which are now serving as the keys, right? 131 00:07:01,005 --> 00:07:03,003 So we have the same result, 132 00:07:03,003 --> 00:07:06,006 only now the item name is not there anymore, 133 00:07:06,006 --> 00:07:08,007 but you can see that the output is essentially the same. 134 00:07:08,007 --> 00:07:10,002 So it's really up to you 135 00:07:10,002 --> 00:07:14,000 how you want to represent the keys in your data dictionary.