What is the process for arranging data into a meaningful order to make it easier to understand analyze and visualize 1 point reframing sorting filtering prioritizing?

More on sorting and filtering

What is the process for arranging data into a meaningful order to make it easier to understand analyze and visualize 1 point reframing sorting filtering prioritizing?

This is the fifth course in the Google Data Analytics Certificate. These courses will equip you with the skills needed to apply to introductory-level data analyst jobs. In this course, you’ll explore the “analyze” phase of the data analysis process. You’ll take what you’ve learned to this point and apply it to your analysis to make sense of the data you’ve collected. You’ll learn how to organize and format your data using spreadsheets and SQL to help you look at and think about your data in different ways. You’ll also find out how to perform complex calculations on your data to complete business objectives. You’ll learn how to use formulas, functions, and SQL queries as you conduct your analysis. Current Google data analysts will continue to instruct and provide you with hands-on ways to accomplish common data analyst tasks with the best tools and resources. Learners who complete this certificate program will be equipped to apply for introductory-level jobs as data analysts. No previous experience is necessary. By the end of this course, you will: - Learn how to organize data for analysis. - Discover the processes for formatting and adjusting data. - Gain an understanding of how to aggregate data in spreadsheets and by using SQL. - Use formulas and functions in spreadsheets for data calculations. - Learn how to complete calculations using SQL queries.

View Syllabus

Spreadsheet, Data Analysis, SQL, Data Calculations, Data Aggregation

From the lesson

Organizing data to begin analysis

Organizing data makes the data easier to use in your analysis. In this part of the course, you’ll learn the importance of organizing your data through sorting and filtering. You’ll explore these processes in both spreadsheets and SQL as you continue to prepare your data for analysis.

Hey, great to see you again. Earlier we talked about why you should organize your data, no matter what part of the lifecycle it's in. Just like any collection, it's easier to manage and care for a group of things when there's structure around them. Now we should keep in mind that organization isn't just about making things look orderly. It's also about making it easier to search and locate the data you need in a quick and easy way. As a data analyst, you'll find yourself rearranging and sifting through databases pretty often. Two of the most common ways of doing this are with sorting and filtering. We've briefly discussed sorting and filtering before, and it's important you know exactly what each one does. Sorting is when you arrange data into a meaningful order to make it easier to understand, analyze, and visualize. Sorting ranks your data based on a specific metric that you can choose. You can sort data in spreadsheets and databases that use SQL. We'll get to all the cool functions you can use in both a little later on. A common way to sort items when you're shopping on a website is from lowest to highest price, but you can also sort by alphabetical order, like books in a library. Or you can sort from newest to oldest, like the order of text messages in a phone. Or nearest to furthest away, like when you're searching for restaurants online. Another way to organize information is with a filter. Filtering is showing only the data that meets a specific criteria while hiding the rest. Typically you can use filters when you want to narrow down the amount of data you want to sift through. Say you're searching for green sneakers online. To save time, you filter for green shoes only. Using a filter slims down larger data sets to smaller subsets that are relevant to what you need. Sorting and filtering are two actions you probably perform a lot online. Whether you're sorting movie showtimes from earliest to latest, or filtering your search results to just images, you're probably already familiar with how helpful they can be for making sense of data. Now let's take that knowledge and apply it. When it comes to sifting through large, disorganized piles of data, filters are your friend. You might remember from a previous video that you can use filters and spreadsheet programs, like Excel and Sheets, to only display data from rows that match the range or condition you've set. You can also filter data in SQL using the WHERE clause. The WHERE clause works similarly to filtering in a spreadsheet because it returns rows based on a condition you name. Let's learn how you can use a WHERE clause in a database. We'll use BigQuery to access the database and run our query. If you're joining us, open up your tool of choice for using SQL and reference the earlier resource on how to access the dataset. Otherwise, watch as the WHERE clause does its thing. Here's the database. You might recognize it from past videos. Basically, it's a long list of movies. Each row includes an entry for the columns named Movie_Title, Release_Date, Genre, Director, Cast_Members, Budget, and Total_Revenue. It also includes a link to the film's Wikipedia page. If you scroll down the list, the list goes on for a long time. Of course, we won't need to go through everything to find the data we want. That's the beauty of a filter! In this case, we'll use the WHERE clause to filter the database and narrow down the list to movies in the comedy genre. To start, we'll use the SELECT command followed by an asterisk. In SQL, an asterisk selects all of the data. On a new line, we'll type FROM and the name of the database: movie_data.movies. To filter the movies by comedy, we're going to type WHERE, then list the condition, which is Genre. Genre is a column in the dataset, and we only want to select rows where the cell in the Genre column exactly matches "Comedy." Next we'll type the equals sign and write the specific genre we're filtering for, which is comedy. Since the data in the Genre column is a string format, we have to use single or double quotations when writing it. And keep in mind that capitalization matters here, so we have to make sure that the letter casing matches the column name exactly. And now we can click Run to check out the results. What we're left with is a shorter list of comedy movies. Pretty cool, right? Here's something else you should know. You can apply multiple filters to a database. You can even sort and filter data at the same time for even more precise results. As a data analyst, knowing how to sort and filter data will make you a superstar. That's all for now. Coming up, we'll get down to the nitty-gritty of sorting functions in spreadsheets. See you there!

Data sorting is any process that involves arranging the data into some meaningful order to make it easier to understand, analyze or visualize. When working with research data, sorting is a common method used for visualizing data in a form that makes it easier to comprehend the story the data is telling.  Sorting can be done with raw data (across all records) or at an aggregated level (in a table, chart, or some other aggregated or summarized output).

Data is typically sorted based on actual values, counts or percentages, in either ascending or descending order, but can also be sorted based on the variable value labels. Value labels are metadata found in some programs which allow the researcher to store labels for each value option of a categorical question. Most software applications also allow sorting by multiple variables. This type of sorting will be executed in a predetermined variable priority, for example, a data set containing region and country fields can first be sorted by region as the primary sort and then by country. The county sort will be applied within each sorted region.

Quickly sort your data

A Simple Example of Data Sorting

To illustrate a basic sorting operation, consider the table below which has two columns, Country and Population. The Country column is a text field (or label), whereas the Population column contains numeric data. The table on the left shows the original data which is not sorted in any particular order.  The table on the right has been sorted by Population in descending order. In other words, the country with the highest population is sorted to the first row, followed by the country with the second-highest population, and so forth.

What is the process for arranging data into a meaningful order to make it easier to understand analyze and visualize 1 point reframing sorting filtering prioritizing?

This allows the reader to easily understand the order of the countries, without needing to compare all of the numbers in the table.

Quickly sort your data

Standard Applications for Data Sorting

There are a handful of standard sorting applications when working with any kind of data. One such application is data cleaning which is the process of sorting data to look for abnormalities in a data pattern. For example, monthly sales data can be sorted by month to look for variances in sales volume.

Another common use of sorting is for ranking or prioritizing records. In this situation, data is sorted by some rank, calculated score or other prioritizing value (for example, highest volume accounts or heavy usage customers).

Properly sorting visualizations (tables, charts, etc.) is also extremely important to allow for proper data interpretation. For example, in market research, it is common to sort the results of a single response question by column percentage, i.e. most answered to least answered in descending order as illustrated in the following brand preference question.

What is the process for arranging data into a meaningful order to make it easier to understand analyze and visualize 1 point reframing sorting filtering prioritizing?

However, it wouldn’t make much sense to sort scale questions in the same manner. In these cases, it is better to sort based on the question scale as this makes the data interpretation task much easier.

What is the process for arranging data into a meaningful order to make it easier to understand analyze and visualize 1 point reframing sorting filtering prioritizing?

Incorrect sorting can often lead to misinterpretation. It is advisable to always ensure the most logical sorts are applied to all visualizations.

Quickly sort your data

Technical Issues with Data Sorting

Whilst applying sorting functions is a simple concept to grasp, there are a few technical issues to be aware of. One such issue is the arbitrary sorting of non-unique data. As an example, suppose again that you have a data set with region and country fields and multiple records per region. If a sort by region is applied, what would be the default secondary sort? In other words, how will the data within each region be sorted?

This depends on the application. Excel, for example, will retain the original sort as the default sort order after the primary sort is executed. SQL databases do not have a default sort order. Rather this is dependent on other factors such as the database management system (dbms) being used, indexes and other factors. Other applications may apply additional default sorting based on the order of the columns.

Another potential issue is sorting numeric data when stored in a text field. In this case numbers will be sorted in alphanumeric order rather than numeric. For example, consider the following set of numeric values: (12, 4, 1, 31,18, 101). When sorted numerically, they would be returned in numerically sorted ascending order: (1, 4, 12, 18, 31, 101). However, if these values are stored in a text field and sorted in ascending order, the following sort would be returned: (1, 101, 12, 18, 31, 4).  This is also a problem when storing date values in text fields.

Quickly sort your data

Data Sorting Software

Most analysis and statistical software packages provide a wide range of sorting functions at virtually every phase of data processing.

 Application  Available Sorting Methods
 Q  Apply custom sorting to table outputs, raw data or by using QScript to automate   sorting functions.
 R  Apply sorting functions to various objects with different data structures (vectors, data   frames, matrices, etc.)
 Displayr  Sort table outputs and apply custom sorting to R functions
 SPSS  Sort table outputs or use syntax to apply sorting to objects
 SQL  Utilizes the ORDER BY clause to sort a recordset when executing SQL statements