1 00:00:00,000 --> 00:00:05,002 (upbeat music) 2 00:00:05,002 --> 00:00:06,006 - [Instructor] All right, let's review my code 3 00:00:06,006 --> 00:00:08,000 for this challenge. 4 00:00:08,000 --> 00:00:11,000 So our challenge was to take the data in the first worksheet 5 00:00:11,000 --> 00:00:13,002 of the financial sample file 6 00:00:13,002 --> 00:00:16,006 and split it into multiple worksheets, 7 00:00:16,006 --> 00:00:19,005 and each worksheet represents the sales data 8 00:00:19,005 --> 00:00:23,007 for each of the countries in the entire dataset. 9 00:00:23,007 --> 00:00:27,003 Let's take a look at my solution. 10 00:00:27,003 --> 00:00:29,007 All right, so I chose to solve this challenge 11 00:00:29,007 --> 00:00:33,007 by creating a function named split_workbook, 12 00:00:33,007 --> 00:00:36,009 which takes the workbook to work on, 13 00:00:36,009 --> 00:00:40,009 the name of the source sheet for the data, 14 00:00:40,009 --> 00:00:44,003 and the column to split the data on. 15 00:00:44,003 --> 00:00:49,009 So my code loads the workbook, if we look down here. 16 00:00:49,009 --> 00:00:51,006 So we load the workbook 17 00:00:51,006 --> 00:00:54,004 and then we call this function 18 00:00:54,004 --> 00:00:56,008 with the name of the worksheet, right? 19 00:00:56,008 --> 00:00:59,005 Which is in this case SalesData, 20 00:00:59,005 --> 00:01:02,007 and the letter of the column to split on. 21 00:01:02,007 --> 00:01:04,000 So that way I can choose 22 00:01:04,000 --> 00:01:08,002 to group the data on a different column if I want to. 23 00:01:08,002 --> 00:01:12,002 Let's take a look at how the function works. 24 00:01:12,002 --> 00:01:15,004 So the function starts by getting the worksheet 25 00:01:15,004 --> 00:01:17,002 that contains the source data, 26 00:01:17,002 --> 00:01:20,005 which in our example is the sales data sheet. 27 00:01:20,005 --> 00:01:25,001 Then I create a new empty set called new_sheets, 28 00:01:25,001 --> 00:01:26,007 and this is going to hold the names 29 00:01:26,007 --> 00:01:29,008 of each of the new sheets that will be created. 30 00:01:29,008 --> 00:01:31,004 I also have a variable 31 00:01:31,004 --> 00:01:33,009 that holds the name of the current worksheet. 32 00:01:33,009 --> 00:01:35,007 Now, the reason I'm using a set 33 00:01:35,007 --> 00:01:39,006 is because sets can only contain unique values, 34 00:01:39,006 --> 00:01:41,002 which enables me to detect 35 00:01:41,002 --> 00:01:44,004 if I've already started processing a particular country, 36 00:01:44,004 --> 00:01:46,002 and we'll see why that's important in a moment. 37 00:01:46,002 --> 00:01:48,008 The current_worksheet variable 38 00:01:48,008 --> 00:01:53,006 keeps track of the sheet that I am currently adding data to. 39 00:01:53,006 --> 00:01:57,001 So I use a for loop 40 00:01:57,001 --> 00:02:01,003 to process all of the rows in the sales data sheet. 41 00:02:01,003 --> 00:02:05,003 And you can see that I'm using the iter_rows function, 42 00:02:05,003 --> 00:02:08,002 and we covered how to do that earlier in the chapter. 43 00:02:08,002 --> 00:02:11,004 And I'm starting iter_rows off at row 2 44 00:02:11,004 --> 00:02:15,005 in order to skip over the header values. 45 00:02:15,005 --> 00:02:18,008 Then I get the index of the column, 46 00:02:18,008 --> 00:02:22,008 and I actually am using a bit of a utility function 47 00:02:22,008 --> 00:02:27,001 that's contained in the openpyxl utilities module. 48 00:02:27,001 --> 00:02:28,003 And if you did a little bit of digging 49 00:02:28,003 --> 00:02:29,008 inside the documentation, 50 00:02:29,008 --> 00:02:31,009 because you might have run into this problem, right? 51 00:02:31,009 --> 00:02:33,006 You probably found this function. 52 00:02:33,006 --> 00:02:36,009 There are other ways, of course, to map the column letter 53 00:02:36,009 --> 00:02:39,009 or the column index to the column letter 54 00:02:39,009 --> 00:02:40,008 that we want to use, 55 00:02:40,008 --> 00:02:43,003 but it doesn't really matter how you did that. 56 00:02:43,003 --> 00:02:44,006 Okay, so I get the value 57 00:02:44,006 --> 00:02:47,003 for the column from the current row, 58 00:02:47,003 --> 00:02:48,008 which is going to be the country column 59 00:02:48,008 --> 00:02:51,003 because in this case we're doing row B. 60 00:02:51,003 --> 00:02:54,008 And then if I haven't already added the name of the country 61 00:02:54,008 --> 00:02:56,005 to the new sheet set, 62 00:02:56,005 --> 00:02:59,001 that means I have to create a worksheet 63 00:02:59,001 --> 00:03:01,001 to represent that country. 64 00:03:01,001 --> 00:03:04,003 So I add the name to the set, 65 00:03:04,003 --> 00:03:06,009 and then I make the new worksheet 66 00:03:06,009 --> 00:03:11,003 and set the value of current_worksheet to the new one. 67 00:03:11,003 --> 00:03:16,001 Otherwise, if that sheet name is already in the set, 68 00:03:16,001 --> 00:03:19,001 then that country worksheet already exists, 69 00:03:19,001 --> 00:03:20,004 I've already created it, 70 00:03:20,004 --> 00:03:23,002 I've already gone through this code once before 71 00:03:23,002 --> 00:03:27,006 and I just need to get that sheet from the workbook. 72 00:03:27,006 --> 00:03:31,003 So now that I have the proper worksheet reference, 73 00:03:31,003 --> 00:03:34,009 I can copy the row of data to the right place. 74 00:03:34,009 --> 00:03:39,003 So what I'm doing is I copy the value 75 00:03:39,003 --> 00:03:44,003 of each cell in the row into this newrow variable, 76 00:03:44,003 --> 00:03:48,000 and then I use the append function 77 00:03:48,000 --> 00:03:51,002 to append that row into the new worksheet. 78 00:03:51,002 --> 00:03:54,005 And again, we covered this earlier in the chapter. 79 00:03:54,005 --> 00:03:55,005 So this is going to happen 80 00:03:55,005 --> 00:03:58,005 for all the rows in the sales data sheet. 81 00:03:58,005 --> 00:04:00,001 So for each one of the rows, 82 00:04:00,001 --> 00:04:02,001 I'm going to look at the value, 83 00:04:02,001 --> 00:04:04,004 I'm going to see if the sheets already been created. 84 00:04:04,004 --> 00:04:07,000 If it has, then I get that worksheet reference, 85 00:04:07,000 --> 00:04:09,004 otherwise I create a new one, 86 00:04:09,004 --> 00:04:12,008 and I just keep adding or appending rows 87 00:04:12,008 --> 00:04:15,003 onto the end of each sheet. 88 00:04:15,003 --> 00:04:18,000 So when that loop completes, 89 00:04:18,000 --> 00:04:22,000 I'm going to return the new sheets set 90 00:04:22,000 --> 00:04:23,009 that has been created 91 00:04:23,009 --> 00:04:26,003 as the return value of this function, 92 00:04:26,003 --> 00:04:29,006 and we'll see why I do that in just a moment. 93 00:04:29,006 --> 00:04:33,002 All right, so let's scroll down. 94 00:04:33,002 --> 00:04:36,007 So at this point, after we called split_workbook, 95 00:04:36,007 --> 00:04:39,001 my added_sheets result 96 00:04:39,001 --> 00:04:42,000 contains the set of each of the added sheets. 97 00:04:42,000 --> 00:04:43,003 And remember, we're not done yet, 98 00:04:43,003 --> 00:04:45,007 we have to actually go and add the filters 99 00:04:45,007 --> 00:04:47,007 to each one of those sheets. 100 00:04:47,007 --> 00:04:50,003 So I loop over that set, 101 00:04:50,003 --> 00:04:52,001 right, get each of the worksheets, 102 00:04:52,001 --> 00:04:53,006 and then add the auto_filter 103 00:04:53,006 --> 00:04:57,007 and set the dimensions, the size of that entire worksheet. 104 00:04:57,007 --> 00:05:00,004 And then finally, I need to save the new workbook 105 00:05:00,004 --> 00:05:03,009 as the word new plus the original name. 106 00:05:03,009 --> 00:05:06,009 Now, because of the way I designed this function, 107 00:05:06,009 --> 00:05:09,009 right, I can split it on any of the columns. 108 00:05:09,009 --> 00:05:12,000 So I'm going to run my code again, 109 00:05:12,000 --> 00:05:14,004 but this time I'm going to change the last argument 110 00:05:14,004 --> 00:05:20,003 from column B to column A, 111 00:05:20,003 --> 00:05:24,009 because remember, whoops, there we go. 112 00:05:24,009 --> 00:05:26,004 Because remember, 113 00:05:26,004 --> 00:05:29,001 column A contains data on the market segment. 114 00:05:29,001 --> 00:05:31,004 If I click on FinancialSample, 115 00:05:31,004 --> 00:05:34,002 you can see that column A has different segments. 116 00:05:34,002 --> 00:05:35,006 We've got government, midmarket, 117 00:05:35,006 --> 00:05:38,002 right, channel partners, and so on. 118 00:05:38,002 --> 00:05:41,001 So now, instead of splitting on country names, 119 00:05:41,001 --> 00:05:43,005 I'm going to split on segment names. 120 00:05:43,005 --> 00:05:46,006 Let's save. 121 00:05:46,006 --> 00:05:50,008 I'll run this again. 122 00:05:50,008 --> 00:05:57,005 All right, and let's go back to Excel. 123 00:05:57,005 --> 00:06:00,002 And now you can see that instead of country names, 124 00:06:00,002 --> 00:06:05,009 I have data split by the different market segments. 125 00:06:05,009 --> 00:06:07,007 All right, so that's my code. 126 00:06:07,007 --> 00:06:10,000 How does your solution compare to mine?