Data Analysis

Working with data is “the sexiest job of the 21st century.”

It was the Harvard Business Review that made this claim back in 2012. Now, nearly a decade later, those with their fingers on the pulse of the industry continue to list the role of data analyst as one of the fastest growing professions in the world.

Take one look at the technological disruption shaping modern society and it’s easy to see why data analysts are (and will continue to be) in such high demand. The Internet of Things brings more and more devices online each and every day, and advances in quantum computing show improvements in data storage and processing by orders of magnitude. As a result, demand is skyrocketing for analysts who can make sense of this data and offer actionable insights to organizations around the globe.

In this guide, you’ll dive deeper into the analytics process to see what tools, methods, and techniques you’ll need to succeed on the job.

What is data analysis?

Data analysis is the process of extracting meaningful insights from data. The data may be raw and unstructured (like the text of restaurant reviews on Yelp), requiring significant cleanup and processing before analysis; or, the data may be structured and tidy (like a spreadsheet of monthly loan disbursements), ready to be analyzed.

The goal of data analysis is to discover patterns in the dataset that can explain a historical trend, predict a future outcome, test a hypothesis, or suggest an optimal course of action. Often, these insights will be presented to key stakeholders in the form of infographics, interactive dashboards, presentations, or other visual formats.

Why learn data analysis skills?

Purdue University estimates that by the year 2022, more bytes of data will be produced in a year than the total number of grains of sand that cover all the beaches in the entire world – a staggeringly immeasurable number. Industries around the globe are in dire need of skilled analysts to collect, dissect, and interpret all of this data:

Healthcare

These days, more people than ever are keeping a closer eye on their health. Many patients are turning to mobile apps and wearable devices to monitor their heart rate, sleep patterns, blood pressure, and other medical conditions. What’s more, medical providers maintain extensive electronic health records, live surgical feeds, and other forms of digital patient data. Data analysts in this sector have a wide range of opportunities available to them, from building dashboards to monitor single patient health, to predicting the spread of a virus outbreak.

Finance

Data analysts in the financial sector will often work with investment portfolios to predict when to buy, sell, or trade a particular stock or digital currency. They may also work closely with financial institutions to detect fraud, crack down on scammers, and streamline the loan approval process for bank customers.

Connected devices

Also known as the Internet of Things, the ecosystem of “smart” devices continues to grow as more and more are brought online each day. Data analysts help to improve the predictive capabilities of smart phones, smart cars, smart homes, smart watches, voice assistants, and many other consumer electronic devices.

Business intelligence

The role of the data analyst in business intelligence is twofold. They assist businesses with making marketing decisions by offering insights into a specific target market, predicting customer churn, and reporting on the performance of marketing campaigns. They also illuminate areas of opportunity for business expansion and predict bottlenecks and obstacles to help businesses better navigate choppy industry waters.

Commerce

Whether in store or online, data analysts help streamline the shopping process for retail customers. From predicting which products a customer is likely to purchase, to determining which target segment an advertisement is best suited for, the data analyst ensures that the customer is presented with just the right product at just the right time.

This short survey doesn’t even begin to cover all of the different industries in which the demand for data analysts continues to grow. The opportunities for skilled professionals in industries from education to agriculture and beyond are limitless.

What’s the difference between data analysis and data science?

The jury is still out on the topic of data analysis vs. data science, and the job duties you’ll be tasked with in either role will be highly dependent on the organization you’re working for. Companies frequently use both terms interchangeably to describe the data analysis process. However, there are a few generally accepted differences between the two, which can help you solidify your choice on which career path to follow.

Difference in focus

One of the major differences between data analysis and data science is the general area of focus. Data analysts are often focused on gleaning actionable insights from existing datasets. In the process, they produce answers to targeted questions that help move an organization in a certain direction.

On the other hand, data scientists are more likely to be focused on generating hypotheses and creating new and better algorithms to predict future trends. Data scientists ask questions that may not have concrete answers, and spend a considerable amount of time in experimentation and exploration.

Difference in scope

The second major difference between a data analyst and a data scientist is the scope of the work involved. Projects in either field can be performed by one person or a whole team of people. However, in general data analytics projects are more likely to be doable with one analyst, whereas data science endeavors often require a team of scientists working together.

For example, consider an agriculture company that’s wanting to create an inspection interface for their produce. The interface should automatically detect and remove any defective products based on visual indicators of disease or decay.

This project would likely require a team of data scientists to collect the data, build the inspection algorithm, train and test a machine learning model, and optimize its performance. Once the inspection interface is in place and working automatically, a single data analyst might be responsible for creating monthly reports on which farms sent the most decayed produce, or how much revenue was lost to diseased goods.

Do take note that this is a contrived example. There are data science projects that are doable with a team of one, and many companies have several analysts working together on a single project. Still, data science projects often result in the invention of new algorithms, advancements in artificial intelligence, and other endeavors of massive scope that would benefit from a team of scientists working together.

Data analytics projects, on the other hand, are often within the smaller scope of answering specific business questions, suggesting the ideal next course of action, and making connections between existing datasets – tasks that are often well within the capabilities of a single professional analyst.

Difference in tools

The final difference between data analysis and data science considers the tools that one is likely to use in each field. Because of the wide range of advanced technical and mathematical knowledge required, data scientists are going to be using tools that boast a steeper learning curve than what a data analyst would be likely to use.

The myriad of business intelligence tools available allow the data analyst to work with little to no programming knowledge. However, a data scientist would be unlikely to contribute anything new to their field without a solid grasp of Python, R, or MATLAB. Furthermore, data scientists need a strong foundation in advanced mathematics – calculus, linear algebra, and statistics, to name a few – in order to develop new machine learning algorithms and improve on existing ones.

While a data analyst would also do well to understand statistical concepts to better inform their analyses, it’s not really necessary to be able to write out a proof of the K-means clustering algorithm. Simply understanding how the algorithm works from a bird’s eye view should be enough for any data analyst to draw conclusions from a given dataset.

Remember, these are just a few of the generally accepted differences between data analysis and data science. Still, there are many organizations that don’t consider these to be different fields. As a data analyst, you may find yourself diving into the realm of data science every now and then: trying out new machine learning algorithms, building prediction engines, and exploring potential questions that businesses might want to consider in the future, in addition to the traditional responsibilities of creating reports, coding dashboards, and designing visualizations.

The most important thing is that you thoroughly read each job description you apply for and ask relevant questions to be sure that the role meets your expectations.

What are the different data analysis methods and techniques?

Now that you understand what data analysis is, how would you go about completing an end-to-end analytics project? What are the different methods and techniques you might employ? In this next section, you’ll dive deeper into the data analytics process to see how it works step by step.

Data Analysis Methods

Before you embark on any data analytics project, it’s crucial for you to have a clear understanding of what kind of analysis you’ll be performing. The techniques you’ll use to complete your analysis are often framed within a specific methodology. Here, you’ll take a look at four of these data analysis methods, each of which is oriented around a particular focal question.

What happened?

The most basic data analysis method is called descriptive analytics, and it aims to describe the patterns and trends found in existing datasets in order to answer the question, “What happened?”

For example, a marketing firm may hire a data analyst to build monthly reporting dashboards for various social media platforms. The dashboard would summarize each platform’s performance for the previous month, reporting on audience engagement, click-through rates, visitor demographics, and other key metrics.

Descriptive analytics forms the basis of all other analytics methods, as knowing what happened is a crucial first step to understanding why a certain pattern has occurred, and what to do about it in the future.

Why did it happen?

Data analysts add value to businesses by identifying not only what happened, but also why a certain pattern or trend occurred. Knowing what happened is a great way to monitor activity and document processes, but knowing why things happen allows a business to pivot strategy and take the appropriate next steps in response.

Diagnostic analytics is a problem-solving method that takes the results from descriptive analytics and tries to answer the question, “Why did this happen?”

A data analyst may use diagnostic analysis to discover the root cause behind a spike in rental prices within a certain zip code. By considering multiple data sources and searching for correlations between variables, the analyst might discover that houses with the largest increase in rent are the closest to a newly opened private school with highly sought-after faculty. This information could help a property management company to ensure that their listed rental prices are in line with market demand.

What could happen?

While descriptive and diagnostic analysis are oriented towards the past, predictive analytics is all about the future. This data analysis method asks the question, “What might happen next?”

An investment firm would likely use predictive analytics to determine how much a client could expect to profit from any given portfolio. A data analyst working in this setting will consider historical stock market data, information about a company, market sentiment, and other similar data points to build a model that will predict the return on any given investment.

Here is where the line between data analytics and data science begins to blur. A data analyst using predictive analytics may very well find themselves training machine learning models, engineering new features, and evaluating performance on unseen datasets. These are all activities that a data scientist would perform as well, though the data analyst will still have a focus on trying to find actionable answers to questions that are limited in scope.

What should be done?

Nowadays, an even more advanced method of data analysis is emerging out of the field of predictive analytics. This method aims to take the question of future events one step further, asking what should be done in response to potential future events.

Prescriptive analytics aims to both ask and answer the question, “What should we do about what could happen next?” Instead of simply predicting the future, this data analysis method also attempts to offer data-driven decisions to accompany those predictions. A data analyst using this method would not only provide insight into what events might transpire, but they would also give actionable next steps based on the likelihood of different possible outcomes.

One area in which the use of prescriptive analytics shines is healthcare. A data analyst working for a hospital may use prescriptive analytics to determine what steps hospital staff should take to reduce the spread of a viral outbreak. After considering historic patient data, the analyst is able to identify factors that contribute to viral spread and suggest several courses of action to mitigate against it.

Data Analysis Techniques

Descriptive, diagnostic, predictive, and prescriptive analytics are methods that assist the analyst in framing their core question and identifying the desired outcome of their analysis. The process of reaching that desired outcome takes the analyst through several different stages, each with its own techniques.

Ask ten data analysts how many steps there are in the analytics process, and you’ll likely get ten different answers. Businesses and analytics professionals break the stages down differently depending on their desired use cases. Some breakdowns are more granular, while others lump all the steps together into one iterative process. For the purposes of this article, you’ll consider five general stages that a data analyst will go through to complete a project, as well as the various techniques you might employ within each stage.

Information Gathering

The first stage of the data analytics process is often the most crucial one. Known as the information gathering stage, this is where you’ll define project goals, select the questions that need to be answered, locate the needed data sources, and decide what the final deliverable should be. Here is also where you choose which analytics method you’ll be working with (descriptive, diagnostic, predictive, or prescriptive).

The techniques you’ll use to complete this stage are ones you may have used before when fleshing out an idea for a project: for instance, mindmapping, storyboarding, or freewriting. For larger analytics projects where you’re working on a team, you may bounce ideas off of colleagues or role play to determine what sort of outcome you’re hoping for. Business intelligence analysts may perform a SWOT analysis or create a flowchart, and data scientists might define a hypothesis to test.

Regardless of the techniques used in the stage, the end result should be a problem statement with a clear question that can be answered with the chosen analytics method.

Data Collection

Once you’ve defined your core analytics question and located the necessary data sources, you’ll need to pull that data into one central location for easy access during analysis. The techniques used during the data collection stage depend on whether or not the desired dataset already exists. If so, then the data analyst may write a script to scrape data from a web page or connect to a database and ingest data directly through an API.

If the dataset does not yet exist, then the data will need to be generated first. The analyst may work alone or in a team to create and administer surveys, questionnaires, interviews, experiments, and other data collection activities. The gathered data will need to be stored in a structured format in order to facilitate analysis. Common data storage formats include databases, spreadsheets, and delimited text files (like CSV files).

A data analyst may spend considerable time creating a data model, which is a form of documentation that expounds on the structure of a dataset. A data model explains the various features and attributes that are associated with each record. Data modeling is an extremely useful technique that simplifies the process of interpreting results. Having a clear view of the structure of your data will speed up the time it takes to find and explain correlations between variables.

Lastly, it would behoove any well-trained data analyst to be skilled in various database management techniques. These include writing SQL queries, working with data lakes, interacting with cloud storage software, and more.

Without data, the analyst has nothing to analyze and can provide no insights to their organization. Gathering the right data and storing it in an easy-to-use format is paramount to the success of the rest of the analytics process.

Data Wrangling

Not all data is good data. In the real world, data is often messy and unfit for immediate analysis. Ask almost any data analyst and they’ll tell you that the majority of their time is spent sanitizing user input, filling in missing data, removing irrelevant records, and engineering new features.

These techniques are all part of a process known as data wrangling. Also called data munging (or simply data cleaning), this step of the analytics process takes an existing dataset and prepares it for analysis. Some of the techniques an analyst will perform during this step are as follows:

  • removing duplicate entries
  • merging and splitting tables
  • deleting unnecessary columns from a spreadsheet
  • imputing missing values
  • fixing spelling and grammar errors
  • standardizing headers
  • checking for correlated variables
  • converting between data types
  • transforming features and creating new ones
  • turning categorical variables into numeric ones (also known as one-hot encoding)

Not all of these techniques will be performed during every analysis, and there are many more to choose from that aren’t listed here. The purpose of this step is not to tick the boxes on every item in this list, but to ensure that your dataset is as useful as possible when it comes to answering the questions that were defined in the information gathering stage.

Analysis

Only after you’ve defined the analytics goal and collected and prepared the necessary data do you finally perform what’s known as data analysis. Using a combination of statistical, programmatic, and even qualitative techniques, the analyst searches for patterns and trends within and across datasets, distilling the results down into measurable and actionable insights.

There are numerous techniques that an analyst may use during this stage. Regression analysis is one of the most common analytical techniques, whose aim is to discover a correlation between two or more variables such that a change in one can predict the outcome of another. An oft-used example is predicting the price of a newly constructed home based on the sale price of houses with similar features in the same area.

Classification and cluster analysis are analytical techniques that aim to assign data points to different groups based on similarity, or to reveal clusters of homogeneity within a dataset. Time series analysis looks at cycles and shifts in seasonal patterns over time, in order to both understand historical trends as well as predict future ones. Sentiment analysis takes textual data and gleans from it the emotional context, with which an analyst can determine how a particular person feels about any given topic.

Machine learning takes the analytics process a step further by introducing aspects of automation, algorithmic design, model selection, and optimization. A data analyst using machine learning techniques is walking the fine line between their role and that of a data scientist. At this point, the data analyst may need to know how to write code and to use software libraries that will speed up the implementation of their machine learning model.

Like with data wrangling, the full scope of what’s out there for the analyst to use in this stage of the process is much too grand to cover here. Not only is there a myriad of choices available to you now, but data scientists are hard at work every single day developing new techniques for data analysts to employ in their work. It is up to the analyst to be prudent in keeping up to date with advancements in analytical techniques, as well as ensuring they maintain a strong foundation in the statistical fundamentals.

Data Visualization

You’ve defined your core question, gathered the necessary data, cleaned your dataset, and performed the desired analysis. Now, it’s time to share your results!

While an analyst may summarize their findings in an oral or written report, key stakeholders are likely to prefer results in visual format. This is especially true for the numbers-heavy output generated by statistical analysis. Data visualization takes the insights gleaned from your analysis and condenses them down into easy-to-understand, visually-appealing graphics that can be consumed in a fraction of the time. Common choices include charts and graphs like the following:

Like many other parts of the analytics process, data visualization techniques are innumerable, and the possibilities for how to best present your work to the world are endless.

Data analysts may construct these visualizations using software packages that allow them to drag and drop visual elements onto a canvas for final rendering. They may also use programming languages to code visualizations from the ground up. Multiple visualizations can be combined into one interface in the form of an infographic, an interactive dashboard, an Excel worksheet, a slide show presentation, and so on.

In the end, the goal is to present the outcome of your analysis in such a way that key stakeholders can easily understand the results by simply looking at the visual output. Adding elements of interactivity can enhance the experience and allow the viewer to explore the data and come to the same conclusions on their own.

What are popular data analysis tools?

In the previous section, you covered the different stages of the analytics process as well as the various techniques an analyst might use to complete each stage. Now, you’ll take a quick look at some of the major tools a data analyst would use to carry out their analysis. As always, there are many more tools at your disposal than are listed here. This quick survey is meant to give you a lay of the land. Don’t hesitate to dive deeper on your own and discover what tools await you!

Python

Hands-down, Python is the most popular tool out there when it comes to data analytics. Any internet search for “how to become a data analyst” will bring up dozens of results telling you to learn Python. This general purpose, high-level programming language is beginner friendly, meaning you’ll be writing code that runs in no time.

Programming languages allow the programmer to create almost anything they can think of. As a result, it can be quite intimidating to know where to start. Thankfully, most programming languages have an ecosystem of tools that you can use to jumpstart a specific kind of project. For an analytics project in Python, you’ll want to jump into what’s known as the Python Scientific Stack. This is a well-known suite of software tools written in Python that will get you up and running with analyzing data as quickly as possible: Pandas and NumPy for wrangling data, Statsmodels for statistical analysis, Scikit-learn and SciPy for machine learning, and Matplotlib, Plotly, Dash, Seaborn, Bokeh, and a host of other libraries for data visualization.

Python also boasts the Jupyter Notebook, a tool that allows you to conduct analysis in real time in an interactive notebook, as well as Anaconda, an environment management tool that helps to prepare your local machine for data analysis.

A company that uses Python for data analysis might have you build an automated monthly report for their X (Twitter) account. You’d write a Python script to interact with the Twitter API and pull in data on followers and their engagement, demographics, and sentiment. Using the Pandas library, you’d consolidate all of that data into a dataframe and perform various cleaning transformations to prepare it for analysis. Matplotlib offers a host of visualization capabilities with which you’d build line charts, pie charts, word clouds, and more. Finally, you’d export your visualizations and write a script to send the report out automatically each month.

Because Python is a programming language, the data analyst is limited only by their imagination as to how to go about the analytics process, making it a popular tool of choice for many businesses around the globe.

SQL

Structured Query Language is used for managing databases and for storing, retrieving, removing, and updating the data within them. SQL databases are made up of various tables, each of which can store millions upon millions of data points.

A SQL table is kind of like a spreadsheet: it has rows and columns and appends data points one by one. Businesses use SQL databases to store each of these tables where they can interact with each other and any of the other analytical tools the business employs.

For instance, an e-commerce store might have a database that stores all information related to their business. An employee table will store records about who works there, and a customer table will store information about the people who buy the company’s products. The customer table will be directly connected to an orders table, which lists the various orders each customer has made.

This small e-commerce company only has one database with tree tables, and only two of their tables are connected (the customers and the orders). However, database systems can scale up to contain as many rows and tables as the company needs. As a result, SQL database management systems are a popular tool of choice for data analysts at enterprise organizations.

Microsoft Excel

It may seem old school, but Microsoft Excel is still a popular tool of choice for many businesses. It’s biggest selling point is that a data analyst using Excel does not need to know how to code to perform their analysis, and key stakeholders are likely also able to interpret the results found in the spreadsheet fairly easily.

What’s more, Microsoft has specifically introduced new features to Excel that enhance the software’s data analytics capabilities. Power Query allows an analyst to clean, transform, and preprocess a dataset right in Excel itself. No need to learn how to code in Python or to manually slog through all the rows. Simply define a query and set it to run as many times as you need.

Power Query allows you to quickly and efficiently clean up millions of rows without ever leaving your spreadsheet. Once your dataset is cleaned up, you can then move on to using Power Pivot to analyze your dataset. Power Pivot is yet another Microsoft tool that enhances the data analysis capacity of Excel. Use it to perform calculations, create pivot tables, and establish relationships within your dataset.

A data analyst who’s well-versed in Microsoft Excel, Power Query and Power Pivot will be able to generate insightful analytics results as quickly as an analyst using their favorite programming language. Don’t underestimate the power of Excel; if programming isn’t your thing, then it’s a viable tool for data analysis.

Power BI

Microsoft offers another analytics tool, one tailored specifically to the business intelligence space. Power BI is a platform that allows analysts to create and share interactive, web-based visualizations. You can build complex dashboards without any programming experience, and these can be consumed on any type of device – from mobile phones to tablets to desktops and beyond.

There are several reasons why companies like to use Power BI for data analysis, though one of the biggest is how Power BI handles data. In contrast to a software application like Excel, Power BI has no limitation on the number of rows a dataset can have. In other words, Power BI offers functionality for working with much larger datasets out of the box. It also integrates nicely with Microsoft’s Azure Machine Learning platform.

If your goal is to work for an enterprise corporation or other large-scale organization, with terabytes of data to process and the need for automated machine learning models, then Power BI is definitely one piece of software you’ll want to have in your toolkit.

Tableau

For those interested in the visualization aspect of the analytics process, Tableau is a great place to start. This data visualization platform allows any analytics professional to hit the ground running with generating charts, graphs, dashboards, and other visual reports. Like Power BI, Tableau boasts a drag-and-drop interactive interface for users with few to no programming skills. The visualizations you create on Tableau can be saved and easily shared online.

Companies both small and large use Tableau to visually represent the results of data analysis. Since it requires no advanced programming skills to get started, it’s easy for the data analyst to produce results quickly and share them with key stakeholders. While it can take years to become proficient in Python, a tool like Tableau can be learned in just a few weeks if not a few days. As such, it remains in the top choices for analytics professionals.

What is career growth like in data analytics?

In their Future of Jobs Report 2020, the World Economic Forum ranks “data analyst and scientist” as one of the top emerging professions between now and 2025 (second only to AI and machine learning specialists). According to O*NET, career outlook for data analysts is bright, and Indeed reports in 2021 that established professionals can expect an average pay of over $70,000 annually.

The expected income for a data analyst differs across countries and between organizations, and it also differs according to job title. For example, “business intelligence analyst” boasts an average salary in the United States of over $94,000 in 2021, whereas the title “analyst” (as opposed to “data analyst”) is slightly lower, at $69,000. Moreover, the job title “data scientist” commands the highest average salary of all, at over $120,000.

A typical career move for the data analyst is to learn a programming language (often Python, sometimes R), level up their skills in artificial intelligence and machine learning, and make the jump to data scientist, machine learning engineer, or other programming-heavy role. Such moves often require the acquisition of advanced degrees; some data science positions will only accept interns who are currently pursuing a Ph.D., for instance.

Other career moves emphasize a specialization into one aspect of the analytics process. For example, an analyst who particularly enjoys the data collection stage of the analytics process may make a career move to becoming a data engineer, in which they’d be responsible for the production and maintenance of ETL pipelines and database systems.

Regardless of the job title used to refer to them, the opportunities for data analysts to continue to grow their careers is expected to increase significantly in the coming years. The World Economic Forum reports that the number one skill that organizations are searching for is analytical thinking and innovation. 95% of the companies they surveyed said that they’re working to increase the rate of adoption of big data analytics within their organizations.

Interestingly enough, the key job tasks that an analyst will be expected to perform is set to shift in the coming years. As information gathering and data processing tasks are automated and offloaded onto machines, data analysts will be left to focus on those tasks that still require a touch of human intuition and ingenuity. The human share of reasoning, decision-making, communicating and interacting is expected to increase 64% by 2024. In other words, the data analyst will likely focus less on data collection and cleaning, and may even turn over key analysis capabilities to machines as well. However, it will still be up to the data analyst to glean those actionable insights and present them in a manner that’s as clear and concise as possible.

Learn Data Analytics

Data analytics is a broad field, with an endless number of applications in nearly every industry imaginable. If you’re interested in starting off down the data analytics career path, it can be a harrowing exercise simply choosing where to begin. If you’re already working in data analytics, then keeping up to date and growing can be equally as daunting.

A good data analytics online course will walk you through the analytics process step by step to give you a solid foundation in the basics. From there, you’ll want to immerse yourself in the major tools you’ll need to perform an analysis. If programming is your thing, then get started with Python for data analysis. If you prefer graphical user interfaces, then learn to use Tableau or Power BI. And make sure you can write and understand basic SQL queries, as working with SQL will become a key part of storing and retrieving the data you’ll use for analysis.

After you’ve covered the basics, it’s time to move on to end-to-end projects. The best way to start is with something you love. Analytics rewards domain expertise, so choose a topic you know well and start asking questions that require data-driven answers. Search for the right dataset (or create one yourself), dive deep and look for meaningful insights, visualize the results and share them with the world.

As you work through the analytics process in your own projects, you’ll be well on your way to developing the skills you’ll need to succeed as an analytics professional.

Ready to learn more about Data Analysis? Browse our online courses today.

View Data Analysis courses