Welcome to our lab here where we will be introducing how to work with time series data within pandas. Some of the learning outcomes will be to understand time series applications with NumPy and in pandas. We'll discuss how to summarize a DataFrame using our DatetimeIndex, which will be very powerful when working with time series data. Then we'll close out by generating some simple time series plots using both pandas and Matplotlib. Now, we have a lot of information here, much of which you should already know. What I do want to highlight is that the dataset that we're going to be working with will be this superstore sales data, which includes four years of daily sales broken down by customer and category. Then some of the data types that we'll be working with specific to time series will be this datetime64 object, which is just your NumPy date time format, where each value is a timestamp. Then the timedelta 64 is just going to be the interval format or the amount of time between different dates or between different timestamps. Then for pandas, we should be familiar with each of these data types. What will come into play is this index. We're going to be working specifically with the DatetimeIndex, and with that, leveraging timedeltas and periodIndices and multiIndices, which you can click on each of the links if you want further explanation beyond what we'll actually go through in this course. But again, we're going to see how the DatetimeIndex will be very powerful and being able to actually execute different time series models or generate the dataset that will feed cleanly into the time series models that we'll use later on. We're going to import here all of our necessary libraries, which we'll discuss as we actually use them throughout, most of which you should be familiar with. We do have some time specific ones. We have the datetime, that timedelta, and the relativedelta. Again, we'll see that once we work through our actual notebook. Now, first things first, we're going to actually read our actual dataset, so we call Read Excel and pull in our dataset and we have the different columns here. We see what our dataset looks like, and we see there's a bunch of columns. Just to make it easier to manipulate in regards to time series, we're going to simplify this dataset. What we're going to do is we're going to just start the order date so that we have a date, the category, and the actual amount of sales. We're going to group by order date and category. Then our outcome variable is just going to be this sales data. We can call groupby on our group variables, and then what do we want to group? We want to group the sales data and their outcome variable, and we want to take the sum. We call reset_index here, just so if we look at the base dot head. Well, let's go down to sales and we'll get back. Now there when we look at the base dot head. It has its own index, 0, 1, 2, 3, 4. If we didn't do that, the order date and category would have been the indices. Just so you know, with groupby, you could have also done as index equals false, and remove this last part and gotten the same results. Then we see that our columns are going to be the order date, the category, in the sales. Again, our index is just these values, zero through 2,863. Now, we want to look at the different data types in our DataFrame. We see that our order date, as we'd hope would be, is going to be a datetime64 type object. That's going to be the values in that series, will all be datetime objects. The category is going to be a string, which is just an object, and then the sales will be float64. That's going to just be the revenue. Then just to make very clear how pandas DataFrames are working here, we have the actual class of that series if we call base x, and that's going to be different than the data type of that series. Each one of these individual columns are series, and each one of these series has a certain data type within that series. That's either datetime object or float64. Now, working with NumPy arrays, just to start off. We can see how it's not necessary to always work with pandas and perhaps we can work with NumPy arrays, but we're just going to go over this briefly and then switch back to working with Pandas throughout. We're going to convert each one of these different series, each one of these different columns into a NumPy array by calling np dot array and then calling just that column. Then we can look at the different data types, and now rather than working with a series as we had here, we're working with a NumPy array. Then we still have each of the different data types that are going to be the same throughout the NumPy array. If for some reason you were fed a NumPy array, you can always convert that into a DataFrame by just passing in this dictionary with the column names, as well as the actual arrays, as long as those arrays are the same length. We can see again, those data types stayed the same. Now, the NumPy data array is going to be a datetime64 objects with nanosecond units. Just to see what that looks like now that we're working with the NumPy array, we see that it gets very specific in the time period. We can do this in pandas as well, but it automatically converted it to this very specific timestamp within the NumPy array. If we just wanted daily data because none of this other information adds any extra information to what we are looking for, we can change the data type to datetime64, and instead of the ns that we have here or that we have here, we can just say D, capital D, and that'll change it to day, and when we look at it daily, we see now we only have the daily data. Now, when we converted to daily data, we didn't lose any information there. However, if we were to aggregate to monthly, so change the data type to month. Our data cannot be as specific as it was before, and we see rather than setting 01-04 and 01-05 as we had up here, we just have 2011-01 and then 2011-01 again. If we look at the number of unique values or we look at the unique values that are there, and then we look at the number of unique values you see we only have 48 unique values, compared to what we had originally. If we copy this and look just at the order date, we had 1,238 unique values. So you want to be careful if you're going to aggregate in that sense. Now, I'm going to stop the video here, and in the next video we'll start to dive into working with pandas and working with a pandas DatetimeIndex. All right, I'll see you there.