An important part of data management and analytics is making sure your codes run efficiently for chosen data structures. Different functions run better with different data structures, and it's important to apply algorithms that will work the fastest on your data structure.
Jay Wengrow covers these basics in his new book, A Common-Sense Guide to Data Structures and Algorithms, published by The Pragmatic Bookshelf.
What are data structures?
Data structures are a method of collecting, storing and organizing data so that it can be used later. There are several types of data structures that determine how strings of information can be accessed and processed. Understanding the manner in which different data structures process data is key to writing efficient algorithms.
Knowing data structures
An important part of understanding data structures is simply knowing some of the most common data structures.
"It's impossible to know all of them … but be aware of the most common ones, and how to analyze and use a data structure in the context of what you're trying to do," Wengrow said.
Being familiar with data structures can also help you when choosing algorithms. Knowing you have a structure that will run your code efficiently is the first step, and then you choose an algorithm in a similar way.
"It's basically about choosing the right combination of data structure and algorithm to try to get your code to be fast or take up less memory," Wengrow said.
But he said beginners shouldn't be too focused on choosing the exactly right combination.
Types of data structures
Here are some common data structures:
- An array is one of the most basic structures and contains data points as a list in a linear sequence.
- A stack is similar to an array; however, data can only be inserted, deleted or read from the end of the stack.
- A queue is similar to a stack, but it goes by a first-in-first-out process instead of the last-in-first-out setup in a stack.
- A graph concentrates on relationships and how data points are connected to each other.
- A tree is a non-linear structure, starting with a root and branching off into parent and children elements in a hierarchical way.
Speed vs. memory
One of the keys to understanding data structures is knowing how to pair code with data structure in the most efficient way. This generally means looking at how fast your code runs -- meaning how many steps it takes to complete a function.
"There's two primary factors that you generally want to look at: How fast will my code run if I use this data structure? And you may also be interested in finding out: How much memory will my program consume if I use this data structure?" Wengrow said.
According to Wengrow, those can sometimes include a trade-off. You may face a choice where your most efficient code will require a lot of memory, or you may be in a place where you need more memory for the program and must run a less efficient code for your data structure.
"If you have a program on a website that's servicing so many different users at the same time, if your code is slow … if every time a user clicks on a button it takes the server a full second to process that request, but you have 100 million people using your site at the same time, the net code may be too slow," Wengrow said.
The trade-off of memory or speed is becoming less of an issue over time as memory continues to decrease in price.
"In all this context, speed is generally the more important factor, and memory is so cheap," Wengrow said.
Big O Notation
Another part of data structure and algorithm basics is working with Big O Notation, a formal way to express the efficiency of an algorithm with a data structure. Big O Notation focuses on answering the question "if there are N data elements, how many steps will the algorithm take?" Wengrow said in A Common-Sense Guide to Data Structures and Algorithms.
"Big O Notation is derived from mathematical concepts," he explained. "Often people are intimidated by that."
People unfamiliar with the notation style can be intimidated by the mathematical influence. But once you get down to addressing that question, it becomes easier to understand.
"It's really just a commonsense thing," Wengrow said.
Beginners may not deal with Big O Notation too much when they start, but it is a good thing to learn because it gives a common language about how fast algorithms will work.
Take a look
Wengrow's book was written with beginners in mind. It is a great start to understanding data structures and the most common types you'll interact with.
"I know a couple professors are now using it as a textbook for intro to data structures courses in college," he said.
In addition to college students, it's also good for people who are self-taught, going through bootcamps or anyone who needs a refresher. It helps to have a basic understanding of coding first, but it is -- as the title says -- a commonsense introduction.
Click here to check out the first chapter, which introduces data structures and how to test how efficient your code is on different structures.