1 00:00:00,005 --> 00:00:01,004 - [Instructor] In this example, 2 00:00:01,004 --> 00:00:02,008 we're going to look at some different ways 3 00:00:02,008 --> 00:00:05,009 to access the content within a worksheet. 4 00:00:05,009 --> 00:00:11,006 So let's open the example file workbook_content. 5 00:00:11,006 --> 00:00:13,009 And you can see I already have some code 6 00:00:13,009 --> 00:00:16,007 that loads the FinancialSample Excel file 7 00:00:16,007 --> 00:00:18,000 that we've been using. 8 00:00:18,000 --> 00:00:21,002 And it gets the first worksheet. 9 00:00:21,002 --> 00:00:23,003 We've seen some simple examples of how 10 00:00:23,003 --> 00:00:25,007 to access worksheet content already, 11 00:00:25,007 --> 00:00:28,005 and I'll expand on those methods here. 12 00:00:28,005 --> 00:00:31,003 So first, let's take a look at how to get access 13 00:00:31,003 --> 00:00:33,006 to a range of cells. 14 00:00:33,006 --> 00:00:35,005 You can get an entire column 15 00:00:35,005 --> 00:00:39,000 or row of cells by using either a letter 16 00:00:39,000 --> 00:00:41,008 or numeric index respectively. 17 00:00:41,008 --> 00:00:44,004 So to get all the cells in column C 18 00:00:44,004 --> 00:00:46,007 or all the cells in row 10, 19 00:00:46,007 --> 00:00:50,004 I would write some code like this. 20 00:00:50,004 --> 00:00:51,008 I can col sheet 21 00:00:51,008 --> 00:00:57,001 and just pass in the letter C for the entire column, 22 00:00:57,001 --> 00:01:02,002 or I can get an entire row by using a numerical index. 23 00:01:02,002 --> 00:01:03,003 And then once we've done that, 24 00:01:03,003 --> 00:01:04,008 let's just print out some information about 25 00:01:04,008 --> 00:01:06,008 that column and row. 26 00:01:06,008 --> 00:01:13,004 So we'll print out the length of column, 27 00:01:13,004 --> 00:01:15,005 cells in the column, 28 00:01:15,005 --> 00:01:21,005 and then let's do the same thing for row, 29 00:01:21,005 --> 00:01:26,004 and that will be cells in the row. 30 00:01:26,004 --> 00:01:28,008 All right, so let's go ahead and try that out. 31 00:01:28,008 --> 00:01:34,009 Let's save this and let's run it. 32 00:01:34,009 --> 00:01:37,001 And sure enough, there are 700 cells in the column 33 00:01:37,001 --> 00:01:39,009 and 16 cells in the row. 34 00:01:39,009 --> 00:01:43,006 I can also get a range of cells using the range syntax. 35 00:01:43,006 --> 00:01:46,004 So let's take a look at that. 36 00:01:46,004 --> 00:01:50,004 So for example, I can declare a variable named range 37 00:01:50,004 --> 00:01:53,005 and I will ask the sheet to give me a range of cells. 38 00:01:53,005 --> 00:01:55,009 So I'll get the range from A2 39 00:01:55,009 --> 00:01:57,007 and I'm going to use a colon to specify 40 00:01:57,007 --> 00:02:00,008 the second part of the range, which will be B7. 41 00:02:00,008 --> 00:02:02,005 And then once again, let's go ahead 42 00:02:02,005 --> 00:02:04,004 and print out some similar information. 43 00:02:04,004 --> 00:02:11,004 So I'll print out the length of that range. 44 00:02:11,004 --> 00:02:12,007 All right, and then let's go ahead 45 00:02:12,007 --> 00:02:17,006 and print out the range itself. 46 00:02:17,006 --> 00:02:19,008 All right, so let's save that. 47 00:02:19,008 --> 00:02:21,008 Let's comment out our previous example. 48 00:02:21,008 --> 00:02:24,003 I'm going to type control slash 49 00:02:24,003 --> 00:02:27,004 to comment all those lines at once. 50 00:02:27,004 --> 00:02:31,002 And let's run this. 51 00:02:31,002 --> 00:02:33,001 All right, now you can see that when I do that, 52 00:02:33,001 --> 00:02:34,009 I get the number of cells in the range. 53 00:02:34,009 --> 00:02:36,009 So there's six cells in the range 54 00:02:36,009 --> 00:02:40,005 along with the actual range of cells themselves. 55 00:02:40,005 --> 00:02:45,002 So each of these cell objects contains the data, formatting, 56 00:02:45,002 --> 00:02:49,000 and other related information for that particular cell. 57 00:02:49,000 --> 00:02:50,009 But there are going to be times when you don't know 58 00:02:50,009 --> 00:02:52,006 how many cells or rows 59 00:02:52,006 --> 00:02:55,002 or columns you're going to be dealing with. 60 00:02:55,002 --> 00:02:58,007 And what you need is a way to iterate over those items 61 00:02:58,007 --> 00:03:00,004 that are in the worksheet. 62 00:03:00,004 --> 00:03:02,006 To do this, you can use the iter_rows 63 00:03:02,006 --> 00:03:04,007 and iter_cols functions. 64 00:03:04,007 --> 00:03:08,005 So let's go back to the code. 65 00:03:08,005 --> 00:03:10,007 All right, so let's take one example. 66 00:03:10,007 --> 00:03:13,008 I can iterate over a range of columns like this. 67 00:03:13,008 --> 00:03:16,009 I can write for column in 68 00:03:16,009 --> 00:03:19,001 and then I'm going to call on the sheet object. 69 00:03:19,001 --> 00:03:23,002 I'm going to use the iter_cols function 70 00:03:23,002 --> 00:03:25,008 and that takes some optional arguments. 71 00:03:25,008 --> 00:03:28,006 So I can specify the starting and ending row 72 00:03:28,006 --> 00:03:30,009 and column for the iteration. 73 00:03:30,009 --> 00:03:35,007 So I can say start at the min_row and we'll set that to two. 74 00:03:35,007 --> 00:03:40,000 And then I'll set max_row equal to three 75 00:03:40,000 --> 00:03:43,007 and we'll set min_col equal to two 76 00:03:43,007 --> 00:03:49,008 and max_col equal to five. 77 00:03:49,008 --> 00:03:54,006 And then for each cell that is in that column, 78 00:03:54,006 --> 00:03:59,003 I will print the cell's value. 79 00:03:59,003 --> 00:04:02,000 All right, so here in this example I'm starting at row two 80 00:04:02,000 --> 00:04:03,007 and ending at row three, 81 00:04:03,007 --> 00:04:05,002 and I'm looking at all the cells 82 00:04:05,002 --> 00:04:07,005 from column two to column five. 83 00:04:07,005 --> 00:04:10,004 So let's comment out the previous example, 84 00:04:10,004 --> 00:04:12,003 and let's save, 85 00:04:12,003 --> 00:04:16,003 and let's go ahead and run this. 86 00:04:16,003 --> 00:04:20,006 And now we can see the actual values in the output. 87 00:04:20,006 --> 00:04:23,008 Canada, Germany, product names, right, some values. 88 00:04:23,008 --> 00:04:25,008 If we look at the actual sheet, 89 00:04:25,008 --> 00:04:30,004 you'll see that that's, you have Canada, Germany, right? 90 00:04:30,004 --> 00:04:32,005 Product names, right, values and so on. 91 00:04:32,005 --> 00:04:36,006 So it seems to be working fine. Right. 92 00:04:36,006 --> 00:04:38,005 So I could do the same thing 93 00:04:38,005 --> 00:04:43,003 using the iterate rows function. 94 00:04:43,003 --> 00:04:45,009 And in this case, let's try a useful example. 95 00:04:45,009 --> 00:04:50,006 Suppose I wanted to count the number of each product type 96 00:04:50,006 --> 00:04:53,003 that appears within the product column 97 00:04:53,003 --> 00:04:55,000 of the example worksheet. 98 00:04:55,000 --> 00:04:58,006 So to do this, I'm going to use a defaultdict collection 99 00:04:58,006 --> 00:05:02,006 to keep track of each count. 100 00:05:02,006 --> 00:05:05,002 So I'll create a defaultdict 101 00:05:05,002 --> 00:05:07,009 and I'll have that be based on integers. 102 00:05:07,009 --> 00:05:11,004 And then I'm going to iterate over the entire set of rows 103 00:05:11,004 --> 00:05:13,002 for that one column. 104 00:05:13,002 --> 00:05:17,003 So in this case, I'm going to iterate for each row in 105 00:05:17,003 --> 00:05:20,006 sheet dot, and I'm going to call iter_rows this time, 106 00:05:20,006 --> 00:05:22,006 instead of iter_cols. 107 00:05:22,006 --> 00:05:28,003 I'll set the min_row equal to two to skip the header, 108 00:05:28,003 --> 00:05:32,008 and I'll set the min_col 109 00:05:32,008 --> 00:05:34,001 equal to three, 110 00:05:34,001 --> 00:05:36,005 'cause that's the product column. 111 00:05:36,005 --> 00:05:39,007 I will leave max_col also equal to three. 112 00:05:39,007 --> 00:05:42,009 So we only operate on that one column. 113 00:05:42,009 --> 00:05:50,004 And then for the cell in that row, 114 00:05:50,004 --> 00:05:55,002 my counter, indexed by the cell value, 115 00:05:55,002 --> 00:05:57,001 'cause that's going to be each product name. 116 00:05:57,001 --> 00:06:00,001 I'll just increment that count by one. 117 00:06:00,001 --> 00:06:04,004 And then when I'm done, I'll print out the counter. 118 00:06:04,004 --> 00:06:07,007 All right, so let's go ahead and comment that out. 119 00:06:07,007 --> 00:06:10,009 And if we take a quick look at the data, right? 120 00:06:10,009 --> 00:06:12,008 Column three, that's this column right here. 121 00:06:12,008 --> 00:06:15,006 So we have all these product names in here, right? 122 00:06:15,006 --> 00:06:18,008 So we're going to use each one of these words as a key 123 00:06:18,008 --> 00:06:21,000 into that default dictionary. 124 00:06:21,000 --> 00:06:22,009 And I should end up with a count 125 00:06:22,009 --> 00:06:24,009 of all the different products, right? 126 00:06:24,009 --> 00:06:31,004 So let's go ahead and save, and let's run this. 127 00:06:31,004 --> 00:06:36,007 Yep, and sure enough, we've got 93 Carrateras, 93 Montanas, 128 00:06:36,007 --> 00:06:41,005 couple of hundred Paseos, 109 Velos, and so on. 129 00:06:41,005 --> 00:06:43,008 All right, so that's pretty cool. Okay, let's keep going. 130 00:06:43,008 --> 00:06:47,004 Let's try one more example. 131 00:06:47,004 --> 00:06:51,006 It's also possible to work with other types of cell content. 132 00:06:51,006 --> 00:06:53,007 So for example, I can add a comment to a cell 133 00:06:53,007 --> 00:06:55,008 by using the comment class. 134 00:06:55,008 --> 00:06:59,000 Let's comment out this previous example, 135 00:06:59,000 --> 00:07:04,007 and I'll get a reference to the first cell in the sheet. 136 00:07:04,007 --> 00:07:09,001 And then I'll set the comment property equal to 137 00:07:09,001 --> 00:07:11,003 a new instance of a comment object, 138 00:07:11,003 --> 00:07:13,003 and that takes two strings. 139 00:07:13,003 --> 00:07:16,002 I'm going to write "This is a comment" 140 00:07:16,002 --> 00:07:18,003 and then the author is also a string. 141 00:07:18,003 --> 00:07:21,004 And I'll just put my name in there. 142 00:07:21,004 --> 00:07:23,003 And then of course I have to save the workbook 143 00:07:23,003 --> 00:07:24,009 for that comment to show up. 144 00:07:24,009 --> 00:07:27,005 So I'll call wb.save, 145 00:07:27,005 --> 00:07:28,007 and let's just give it a different name 146 00:07:28,007 --> 00:07:31,003 so we don't mess up our example file. 147 00:07:31,003 --> 00:07:35,008 So we'll save this as Content.xlsx. 148 00:07:35,008 --> 00:07:42,009 All right, let's go ahead and save, and let's run this. 149 00:07:42,009 --> 00:07:45,005 All right, sure enough, Content shows up, 150 00:07:45,005 --> 00:07:51,002 and of course we have to go and open Excel to see this 151 00:07:51,002 --> 00:07:54,003 because I don't think that extension shows this information, 152 00:07:54,003 --> 00:07:56,004 but you can see that sure enough up there in cell ones, 153 00:07:56,004 --> 00:07:57,009 that little red chip, 154 00:07:57,009 --> 00:08:00,008 when I hover over it, oh, there's my comment. 155 00:08:00,008 --> 00:08:03,000 All right, so these utility methods 156 00:08:03,000 --> 00:08:05,002 in the openpyxl library 157 00:08:05,002 --> 00:08:07,006 make it really easy to process the contents 158 00:08:07,006 --> 00:08:10,000 of workbooks and worksheets.