1 00:00:00,005 --> 00:00:02,005 - [Instructor] Of course, we don't want to just read data 2 00:00:02,005 --> 00:00:05,002 with pandas, we want to be able to write data as well. 3 00:00:05,002 --> 00:00:06,007 So for this example, 4 00:00:06,007 --> 00:00:12,001 we're going to open up our pandas_write.py file. 5 00:00:12,001 --> 00:00:15,004 And you can see that I've defined an array of objects 6 00:00:15,004 --> 00:00:18,009 that I would like to write out to a CSV file. 7 00:00:18,009 --> 00:00:22,004 And you can see that the data has the same format 8 00:00:22,004 --> 00:00:25,009 as our inventory.CSV sample file. 9 00:00:25,009 --> 00:00:27,003 It's just a lot smaller 10 00:00:27,003 --> 00:00:28,007 and it's in the format 11 00:00:28,007 --> 00:00:32,008 of dictionaries rather than an array of lists. 12 00:00:32,008 --> 00:00:36,008 So what I need to do first is create a data frame 13 00:00:36,008 --> 00:00:38,005 to represent the data. 14 00:00:38,005 --> 00:00:42,000 So to do that I'm going to create a data frame variable, 15 00:00:42,000 --> 00:00:44,007 and on pandas, I'm simply going to use the data frame 16 00:00:44,007 --> 00:00:50,002 constructor and pass in my array of objects. 17 00:00:50,002 --> 00:00:54,003 Then I need to save the data as CSV format, 18 00:00:54,003 --> 00:00:56,009 and I can do that by using the appropriately named 19 00:00:56,009 --> 00:01:00,001 to CSV file. 20 00:01:00,001 --> 00:01:06,006 So I'll call DF and I'll call to CSV 21 00:01:06,006 --> 00:01:12,009 and I'm going to name it output.CSV. 22 00:01:12,009 --> 00:01:18,003 All right, so let's go ahead and run that. 23 00:01:18,003 --> 00:01:22,004 And when I do, you can see that output.CSV gets created 24 00:01:22,004 --> 00:01:25,001 and we can go ahead and look at that in our text editor 25 00:01:25,001 --> 00:01:28,008 and you can see that pandas added index values 26 00:01:28,008 --> 00:01:33,009 to the beginning of each row except for the header row. 27 00:01:33,009 --> 00:01:36,002 Now in some cases that might be what you want, 28 00:01:36,002 --> 00:01:39,003 but if you don't want the indexes to be added for you, 29 00:01:39,003 --> 00:01:42,005 then you can specify that when you call the function. 30 00:01:42,005 --> 00:01:46,004 So let's go ahead and update our code 31 00:01:46,004 --> 00:01:50,007 and I'm going to specify index equals false. 32 00:01:50,007 --> 00:01:55,005 All right, so let's save, let's run that again. 33 00:01:55,005 --> 00:01:58,000 All right, and then let's click on our file. 34 00:01:58,000 --> 00:02:02,009 And now you can see that the indexes are no longer present. 35 00:02:02,009 --> 00:02:04,007 You've probably realized by now 36 00:02:04,007 --> 00:02:08,001 that we can convert a CSV file directly to an Excel file 37 00:02:08,001 --> 00:02:09,006 with just two lines of code. 38 00:02:09,006 --> 00:02:14,007 So let's go ahead and try that. 39 00:02:14,007 --> 00:02:17,008 So in this case I'm going to write DF equals 40 00:02:17,008 --> 00:02:26,001 and I'm going to call read CSV on inventory.CSV 41 00:02:26,001 --> 00:02:28,003 and then I'm going to call DF. 42 00:02:28,003 --> 00:02:30,006 And in this case I'm going to call the corresponding 43 00:02:30,006 --> 00:02:34,003 to Excel function. 44 00:02:34,003 --> 00:02:39,002 I'm going to name it inventory.XLSX. 45 00:02:39,002 --> 00:02:44,007 I'm going to specify the sheet name to be inventory. 46 00:02:44,007 --> 00:02:46,003 And once again, I'm going to specify 47 00:02:46,003 --> 00:02:50,005 that I don't want the index values. 48 00:02:50,005 --> 00:02:55,003 All right, and let's comment out the previous example. 49 00:02:55,003 --> 00:03:00,005 Let's save and let's run this. 50 00:03:00,005 --> 00:03:03,007 And sure enough we can see that inventory.XLSX 51 00:03:03,007 --> 00:03:05,008 has been created. 52 00:03:05,008 --> 00:03:10,009 Let's go ahead and look at that in Excel 53 00:03:10,009 --> 00:03:13,007 and sure enough, yep, data looks all right to me. 54 00:03:13,007 --> 00:03:18,003 There's 50 rows, there's the headers, everything looks good. 55 00:03:18,003 --> 00:03:20,009 It's important to note that the right functions will 56 00:03:20,009 --> 00:03:25,000 overwrite any existing file that already has that name. 57 00:03:25,000 --> 00:03:29,001 So if you want to add new data to an existing Excel file, 58 00:03:29,001 --> 00:03:32,001 you need to do it by using the Excel writer class 59 00:03:32,001 --> 00:03:34,000 that Pandas provides. 60 00:03:34,000 --> 00:03:41,006 So let's go back to our code 61 00:03:41,006 --> 00:03:44,005 and let's add a new worksheet to the one 62 00:03:44,005 --> 00:03:46,002 that we just created. 63 00:03:46,002 --> 00:03:49,002 And this worksheet will contain the same data, 64 00:03:49,002 --> 00:03:51,007 but with the indexes included. 65 00:03:51,007 --> 00:03:56,007 So first we have to open the file in append mode. 66 00:03:56,007 --> 00:04:03,002 So I'm going to write with pd.ExcelWriter, 67 00:04:03,002 --> 00:04:10,002 and I'm going to open up the inventory.XLSX file 68 00:04:10,002 --> 00:04:13,001 and we have to open it in append mode. 69 00:04:13,001 --> 00:04:16,000 So mode is going to be A, all right, 70 00:04:16,000 --> 00:04:18,002 and I need to do one other thing. 71 00:04:18,002 --> 00:04:22,005 I need to specify that the underlying engine to use is going 72 00:04:22,005 --> 00:04:24,006 to be OpenPyXL. 73 00:04:24,006 --> 00:04:26,002 So pandas is going to detect 74 00:04:26,002 --> 00:04:28,002 which libraries you have installed 75 00:04:28,002 --> 00:04:30,003 for working with Excel files. 76 00:04:30,003 --> 00:04:33,007 And it's not going to choose OpenPyXL by default, 77 00:04:33,007 --> 00:04:37,001 but other libraries like XLSX Writer for example, 78 00:04:37,001 --> 00:04:40,008 don't support appending data to existing workbooks. 79 00:04:40,008 --> 00:04:43,008 So we need to explicitly tell pandas 80 00:04:43,008 --> 00:04:46,003 to use the OpenPyXL library. 81 00:04:46,003 --> 00:04:50,002 So I'm going to do that by specifying engine equals, 82 00:04:50,002 --> 00:04:55,001 I'm going to write OpenPyXL. 83 00:04:55,001 --> 00:04:58,002 All right, so now I've got my completed statement here 84 00:04:58,002 --> 00:05:05,003 and I'm going to open that as the XLW variable. 85 00:05:05,003 --> 00:05:08,005 And now I just need to write the new data. 86 00:05:08,005 --> 00:05:15,002 So the data frame, I'm going to call to Excel 87 00:05:15,002 --> 00:05:17,006 and I'm going to pass in my Excel writer object 88 00:05:17,006 --> 00:05:21,008 that I just created right here. 89 00:05:21,008 --> 00:05:24,004 And I'm going to specify the sheet name 90 00:05:24,004 --> 00:05:31,001 is going to be inventory indexes, 91 00:05:31,001 --> 00:05:35,001 and I want the index label 92 00:05:35,001 --> 00:05:40,002 to be index. 93 00:05:40,002 --> 00:05:44,007 All right, so let's comment out the previous example 94 00:05:44,007 --> 00:05:46,008 because we've already created the file, 95 00:05:46,008 --> 00:05:47,009 so we don't need to do that again. 96 00:05:47,009 --> 00:05:50,006 So you can see that the file exists right here. 97 00:05:50,006 --> 00:05:53,009 All right, and oh, actually I need to 98 00:05:53,009 --> 00:05:57,006 have the data frame though. 99 00:05:57,006 --> 00:06:03,009 All right, so let's go ahead and run this one more time. 100 00:06:03,009 --> 00:06:05,004 All right, there we go. 101 00:06:05,004 --> 00:06:07,007 Okay, it looks like it's been updated. 102 00:06:07,007 --> 00:06:12,006 So let's open up the result in Excel. 103 00:06:12,006 --> 00:06:15,003 And sure enough, now you can see we have two sheets. 104 00:06:15,003 --> 00:06:18,007 We have the original sheet, which is untouched, 105 00:06:18,007 --> 00:06:21,008 and now we have the new inventory indexes sheet. 106 00:06:21,008 --> 00:06:23,004 And you can see that in this case, 107 00:06:23,004 --> 00:06:28,000 pandas has added the numeric indexes for me. 108 00:06:28,000 --> 00:06:31,001 So using pandas, it's very easy to both read 109 00:06:31,001 --> 00:06:36,000 and write data in multiple formats like CSV and Excel.