1 00:00:00,005 --> 00:00:02,004 - [Instructor] Let's start by using OpenPyXL 2 00:00:02,004 --> 00:00:05,008 to load and explore some data in a workbook. 3 00:00:05,008 --> 00:00:08,008 I'm going to be using this sample Excel file right here 4 00:00:08,008 --> 00:00:12,002 named Financial Sample.xlsx. 5 00:00:12,002 --> 00:00:13,009 Now, if you don't have Excel, 6 00:00:13,009 --> 00:00:16,001 you can install an extension for VS Code 7 00:00:16,001 --> 00:00:19,009 called Excel Viewer, which I already have installed. 8 00:00:19,009 --> 00:00:22,003 So I'm going to go over to my extensions panel 9 00:00:22,003 --> 00:00:23,006 and scroll down, 10 00:00:23,006 --> 00:00:26,001 and you can see it's called Excel Viewer right here. 11 00:00:26,001 --> 00:00:27,003 It's by Grape City. 12 00:00:27,003 --> 00:00:28,009 You can see it has millions of downloads 13 00:00:28,009 --> 00:00:31,004 and some very high ratings on the store. 14 00:00:31,004 --> 00:00:33,000 You can find this extension in the marketplace 15 00:00:33,000 --> 00:00:34,008 by doing a search for it, 16 00:00:34,008 --> 00:00:38,007 like we did earlier for the Python extension, and it's free. 17 00:00:38,007 --> 00:00:40,002 So go ahead and install it, 18 00:00:40,002 --> 00:00:41,000 and you can see 19 00:00:41,000 --> 00:00:45,001 that when I click on an Excel file here in code, 20 00:00:45,001 --> 00:00:47,006 I actually get this nice visualization 21 00:00:47,006 --> 00:00:49,006 of the contents of the workbook 22 00:00:49,006 --> 00:00:52,006 without having to use the full Excel program. 23 00:00:52,006 --> 00:00:53,008 And like I said, it's free. 24 00:00:53,008 --> 00:00:54,008 It works really well. 25 00:00:54,008 --> 00:00:58,005 So I suggest installing it if you can't use Excel. 26 00:00:58,005 --> 00:01:00,003 Now, I do have Excel on my computer, 27 00:01:00,003 --> 00:01:02,001 so I'm going to open this file up in Excel 28 00:01:02,001 --> 00:01:05,007 and we can take a look at its structure and content. 29 00:01:05,007 --> 00:01:08,007 So you can see that this example file 30 00:01:08,007 --> 00:01:11,000 contains a sales data worksheet 31 00:01:11,000 --> 00:01:13,005 as well as an empty worksheet, 32 00:01:13,005 --> 00:01:18,000 and the sales data worksheet contains several hundred rows 33 00:01:18,000 --> 00:01:21,009 of sales information for a fictional company. 34 00:01:21,009 --> 00:01:22,007 If we scroll down, 35 00:01:22,007 --> 00:01:25,001 you can see there's quite a lot of data in this worksheet, 36 00:01:25,001 --> 00:01:27,009 about 700 rows' worth, 37 00:01:27,009 --> 00:01:32,007 and there are 16 columns that represent information 38 00:01:32,007 --> 00:01:36,003 such as market segment, country, sales amount, 39 00:01:36,003 --> 00:01:39,003 cost of goods, profit, so on. 40 00:01:39,003 --> 00:01:40,005 And like I said, 41 00:01:40,005 --> 00:01:43,005 I currently have a second blank worksheet in here 42 00:01:43,005 --> 00:01:46,009 just for example purposes that we'll see later on. 43 00:01:46,009 --> 00:01:50,003 For now, let's write some code to open this workbook 44 00:01:50,003 --> 00:01:53,000 and explore its content, right? 45 00:01:53,000 --> 00:01:56,004 So I'm going to go ahead and close this. 46 00:01:56,004 --> 00:01:58,009 Alright, so back here in code, 47 00:01:58,009 --> 00:02:00,009 let's open up our example file. 48 00:02:00,009 --> 00:02:05,000 So we're going to open up explore_workbook.py, 49 00:02:05,000 --> 00:02:08,008 and you can see that I've imported the OpenPyXL library, 50 00:02:08,008 --> 00:02:12,005 and I have a variable for the name of the file. 51 00:02:12,005 --> 00:02:13,006 To load the workbook, 52 00:02:13,006 --> 00:02:16,002 I'm going to use the load workbook function 53 00:02:16,002 --> 00:02:19,002 and that's going to return a workbook object. 54 00:02:19,002 --> 00:02:22,001 So I'll create a variable for that, 55 00:02:22,001 --> 00:02:24,003 and I'll call open OpenPyXL, 56 00:02:24,003 --> 00:02:27,000 and I'll use the load workbook function 57 00:02:27,000 --> 00:02:31,002 and I'll pass the file name. 58 00:02:31,002 --> 00:02:33,008 Alright, so that will load the workbook, 59 00:02:33,008 --> 00:02:36,006 so let's explore some of the data that's in the workbook. 60 00:02:36,006 --> 00:02:39,006 First, let's print out the number and names 61 00:02:39,006 --> 00:02:43,001 of the worksheets that the workbook contains. 62 00:02:43,001 --> 00:02:51,000 So we'll print out the number of worksheets, 63 00:02:51,000 --> 00:02:57,001 and that's going to be the length of the workbook, 64 00:02:57,001 --> 00:03:01,007 and it has a property called sheet names, 65 00:03:01,007 --> 00:03:06,009 and then let's print out each worksheet name. 66 00:03:06,009 --> 00:03:14,009 So I'll write for worksheet_name in 67 00:03:14,009 --> 00:03:20,005 and I'll use that sheet name's variable again. 68 00:03:20,005 --> 00:03:25,006 I'll assign a worksheet variable to the workbook, 69 00:03:25,006 --> 00:03:29,003 and then I can use brackets to index each worksheet name 70 00:03:29,003 --> 00:03:32,002 to get a reference to that particular worksheet. 71 00:03:32,002 --> 00:03:37,005 So I'll get the worksheet that has that name, 72 00:03:37,005 --> 00:03:41,009 and then let's print out some information about it. 73 00:03:41,009 --> 00:03:46,006 So we'll say that the worksheet is, 74 00:03:46,006 --> 00:03:50,007 and we'll print out the worksheet name. 75 00:03:50,007 --> 00:03:52,002 Alright, that's a pretty good start so far, 76 00:03:52,002 --> 00:03:57,002 so let's go ahead and save and let's run this. 77 00:03:57,002 --> 00:04:02,005 So I'll just run this in the terminal. 78 00:04:02,005 --> 00:04:04,000 Alright, so far so good. 79 00:04:04,000 --> 00:04:06,007 We can see that there are two worksheets 80 00:04:06,007 --> 00:04:09,008 and one is called SalesData and one is called Sheet2, 81 00:04:09,008 --> 00:04:12,004 so those are the correct names, so, so far so good. 82 00:04:12,004 --> 00:04:13,002 Let's keep going. 83 00:04:13,002 --> 00:04:16,001 We can get the dimensions of a worksheet 84 00:04:16,001 --> 00:04:19,006 by using the dimensions property. 85 00:04:19,006 --> 00:04:25,005 So let's create a variable named dimensions, 86 00:04:25,005 --> 00:04:29,009 and that's going to be equal to worksheet, 87 00:04:29,009 --> 00:04:33,001 and there's a dimensions property that we can examine. 88 00:04:33,001 --> 00:04:37,004 Let's go ahead and print that out. 89 00:04:37,004 --> 00:04:40,000 Let's give ourselves some spacing here, 90 00:04:40,000 --> 00:04:45,000 and I'll write dimensions, 91 00:04:45,000 --> 00:04:49,004 and that's going to be the dimensions variable, 92 00:04:49,004 --> 00:04:53,009 and we can inspect properties such as the min and max rows 93 00:04:53,009 --> 00:04:57,008 and the columns that have data in them. 94 00:04:57,008 --> 00:05:01,007 So let's print out that information as well. 95 00:05:01,007 --> 00:05:05,005 So I'll print out the minimum row 96 00:05:05,005 --> 00:05:11,008 and that's going to be the worksheet's min_row property, 97 00:05:11,008 --> 00:05:18,005 and I'll do the same thing for max_row, 98 00:05:18,005 --> 00:05:21,000 and then let's do the same thing for columns. 99 00:05:21,000 --> 00:05:23,007 So I'll make a couple of copies there, 100 00:05:23,007 --> 00:05:33,008 so this this'll be min column and max column, 101 00:05:33,008 --> 00:05:44,000 and I'll just change that to min_column and max_column. 102 00:05:44,000 --> 00:05:44,008 Alright. 103 00:05:44,008 --> 00:05:48,007 We can also get data from the worksheet, if it has any. 104 00:05:48,007 --> 00:05:52,009 So let's add some code to test if the worksheet is empty, 105 00:05:52,009 --> 00:05:55,005 and if it's not, then we'll get some data. 106 00:05:55,005 --> 00:05:58,004 So first, let's check to see 107 00:05:58,004 --> 00:06:07,002 if the worksheet's max row is equal to one, 108 00:06:07,002 --> 00:06:12,009 and the worksheet's max column is equal to one. 109 00:06:12,009 --> 00:06:15,003 That will tell us that the worksheet is empty. 110 00:06:15,003 --> 00:06:22,009 So we'll print out, "Worksheet is empty," 111 00:06:22,009 --> 00:06:25,006 if that's the case. 112 00:06:25,006 --> 00:06:30,009 Otherwise. 113 00:06:30,009 --> 00:06:34,006 Right, now, I can use either letter and number syntax 114 00:06:34,006 --> 00:06:37,000 like you normally see in Excel, 115 00:06:37,000 --> 00:06:40,004 or I can use row and column index numbers 116 00:06:40,004 --> 00:06:42,006 via the cell function. 117 00:06:42,006 --> 00:06:46,003 So let's create a cell variable, 118 00:06:46,003 --> 00:06:49,002 and I'll use the worksheet bracket notation, 119 00:06:49,002 --> 00:06:54,004 and I'll just get cell A1, and I'll print out its content. 120 00:06:54,004 --> 00:07:02,001 So the top-left cell value is, 121 00:07:02,001 --> 00:07:04,003 and I'm going to take that cell reference 122 00:07:04,003 --> 00:07:09,004 and get the value out of it. 123 00:07:09,004 --> 00:07:10,007 I can also do the same thing 124 00:07:10,007 --> 00:07:16,003 using my worksheet cell function, 125 00:07:16,003 --> 00:07:17,006 and in this case, what I'm going to do 126 00:07:17,006 --> 00:07:21,002 is get the absolute last cell in the sheet 127 00:07:21,002 --> 00:07:23,003 at the max row and the max column. 128 00:07:23,003 --> 00:07:30,008 So I'll pass in row is equal to worksheet.max_row, 129 00:07:30,008 --> 00:07:37,006 column is equal to worksheet.max_column, 130 00:07:37,006 --> 00:07:40,005 and then we'll print that. 131 00:07:40,005 --> 00:07:49,005 So the bottom-right cell value, 132 00:07:49,005 --> 00:07:55,002 and once again, we'll write out the cell value. 133 00:07:55,002 --> 00:07:58,001 Alright, so we've added a bunch of new code 134 00:07:58,001 --> 00:07:59,007 to get the dimensions. 135 00:07:59,007 --> 00:08:03,001 We're going to get the min and max rows and columns. 136 00:08:03,001 --> 00:08:06,002 We're going to check to see if the worksheet is empty. 137 00:08:06,002 --> 00:08:07,009 If not, we'll get some data out of it, 138 00:08:07,009 --> 00:08:12,000 so let's go ahead and run what we have. 139 00:08:12,000 --> 00:08:17,009 So I'll right click and choose run in the terminal, 140 00:08:17,009 --> 00:08:21,008 and let's make this a little bit larger. 141 00:08:21,008 --> 00:08:24,005 Okay, so we've got two worksheets 142 00:08:24,005 --> 00:08:25,009 and the sales data worksheet, 143 00:08:25,009 --> 00:08:28,003 we can see that the dimensions are A1 144 00:08:28,003 --> 00:08:31,000 all the way down to P701. 145 00:08:31,000 --> 00:08:35,003 So we've got 701 rows, we've got 16 columns. 146 00:08:35,003 --> 00:08:38,003 The top-left cell value is segment. 147 00:08:38,003 --> 00:08:39,001 Is that right? 148 00:08:39,001 --> 00:08:40,006 Let's take a look. 149 00:08:40,006 --> 00:08:42,006 Yep, sure enough, segment, 150 00:08:42,006 --> 00:08:47,001 and then the bottom right cell value is 2014. 151 00:08:47,001 --> 00:08:51,001 Alright, so the P row, we scroll all the way down. 152 00:08:51,001 --> 00:08:54,002 Yep, sure enough, 2014 is correct, 153 00:08:54,002 --> 00:08:58,005 and Worksheet 2, dimensions are A1 to A1, 154 00:08:58,005 --> 00:09:02,001 so we can see that that sheet indeed is empty. 155 00:09:02,001 --> 00:09:03,009 Alright, so that's a quick introduction 156 00:09:03,009 --> 00:09:09,000 to reading and exploring a workbook using OpenPyXL.