This is the first post of series I am planning to write, which is called “All you need to know about …”. Main idea is to present key ideas of (mostly) technical topics to broader audience. I will try to capture not only the technical details, but also the ideas or intentions behind the scene.
About this article:
Targeted audience:
- People who does not know much about programming, but they want to start with it.
- Newbie programmers who can write some code, but the process behind is kind of a mystery for them.
Key topics:
- How is your program interacting with the computer and with you.
- How is your program composed and how it runs.
- What is managed and not-managed code.
- Where to start.
What is programming?
In general, it is a process of creating instructions for computer to do the stuff you want it to do … Sounds simple, right? It actually is not so hard as it is being shown in all those “hacker” movies and series in TV. But you must know some basics to do the things right. Let’s start with the #1 basic, that (according to my opinion) all programmers shall know …
What is in (and out of) your computer?
In this part, we are going to check on things like memory, CPU, GPU, peripheries … do not worry. It is quite simple in the end.
First thing worth mentioning is the memory — place for your data and programs. That data can live in 2 spaces. First place is your hard drive, which serves as long-term storage. All your programs and data can live their life in peace in this place till they are needed to be used. This is the time for second space. The RAM (Random Access Memory). RAM is volatile space of your computer (after unplugging the power, the data will be deleted), where your programs and data are loaded to do their jobs. RAM is much faster in terms of accessing programs and data in comparison with hard drive, but it is also much smaller than your hard drive, so you want to load only the bare minimum stuff in there. Big advantage is that your original program (if not somehow forced) stays the same and if something breaks during program runtime, restarting your program or PC should solve most of the problems. Of course, if the program runtime depends on some data stored on hard drive, and you break the data, then you might have problem. So … this is the top secret behind the most famous quote “Have you tried turning it off and on again?”. Another advantage is, that if you have some user-permission based system, where your data and programs live, then it is hard to modify the original programs, so it is very safe to run them.
So … we have our data and program available, but what runs the scene. Let’s ignore all the routes to transfer the data (they are called buses BTW) and talk about CPU (Central Processing Unit) — The brain of the computer. It is responsible for executing all the instructions you gave it. It says what to move from one place to another, what and how to compute. It has basic set of instructions and has space for the arguments (inputs) of the instructions. Basic example: You want to add 2 numbers … the instruction (operation) will be add and the numbers will be the arguments. But it can also orchestrate moving the data from hard drive to RAM or getting the pixels of your image to your screen.
CPU is extremely fast, but can work only with limited amount of data. Which can be uncomfortable bottleneck for your applications. So … how to process large amount of data? GPU (Graphics Processing Unit) is the answer. It is able to process large amount of data at same time. Then you might ask … why aren’t we using the GPUs instead of CPUs. I like the car analogy a lot (I am not original author of this idea, but it is the best one I have ever heard before) … CPU is like super-sport car, that can get one or two people to some point extremely fast, but when you want to move your flat, it would be better to get some truck, which you will load for some time and then move all the stuff together, which will be much faster in the end, rather than running 20 times back and forth with super-fast car.
And this is the basic idea of stuff, that are in your computer. Outside, you have all the peripheries, that you can control by moving some data to them. Those data can be encapsulated in some messaging systems etc. But that is not so important for now. Just imagine it as place, where you move some data and the periphery will take care of processing it. For example you can send the one frame of the video to space for your monitor and the monitor will take care of displaying it to you.
I have been talking about the data all the time … but what is the data and how is your program represented in the computer?
Data and program representations
You surely have heard about the 1s and 0s in your computer. But how do the 1 or 0 look for real? The main idea of those numbers is easy representation of the voltage states in your computer and its transformation to human-like data. It is much easier to separate voltage (0V vs 5V) with some threshold(s). For example: let’s say all voltage values under 1.5V are 0 and all values over 3.5V are 1. Everything between is an error. (again, this is just an example, there are different representations used. for example TTL)
OK … so we have 1s and 0s. But how do we get our numbers and letters. Simply … lets take multiple (predefined count — usually 1, 8, 16, 32 or 64) of 1s and 0s in the row and have “tables” of what they mean. For example letter “a” in ASCII format is 0110 0001
which is equivalent of number 97 in binary numeric format. For logical representation of states, we usually use true (1) and false (0) values. In case of numbers, we are usually considering integer numbers (-1, 1000, 333) or floating numbers (0.5, 6.66) as different representations.
We have our data and hardware (HW — the touchable things in (and outside) your computer), so how to tell our computer, what it should do … Yes! Now it is the time, when we are going to talk about the programming for real!
So, what is the programming for real?
We have already talked about the CPU, that has some basic set of instruction it can do. Those instructions are called using Machine Code. Very simple code, that can do those basic operations. There were ancient times, when programmers were programming in such simple codes, but it was extremely slow to write anything in it, so the programmers were forced to write programs that can create machine code from more human-friendly code. This is the point where we are starting to talk about modern programming languages, such as C, C++, etc.
As I have already written, these languages can be written in human-readable format. Lets see following code example, which will print numbers 0 to 9 and print quote for each number divisible by 5 (0, 5).
for (size_t increment = 0; increment < 10; ++increment) {
cout << increment << endl;
if (increment % 5 == 0) {
cout << "This number is divisible by 5" << endl;
}
}// ---------------------------------------------------------
// Notes - in case you are interested in this C++ code:
// * This // symbol is called comment and is removed during
// code pre-processing.
// * size_t says how do we interpret the 0 a 1 as number
// (size_t is used for large integers)
// * for (some variable with initial value; till some
// condition evaluates; do some operation with variable
// - for example add one each loop)
// * cout << stands in C++ for printing something in console
// * % is module (what is the rest after division)
// * == is equals
But of course, the computer cannot take this code directly, so it must be converted (in case you find this paragraph too complicated, do not panic, it is not so important at the beginning and it is related mostly to C++ code). First this code (source code) is preprocessed. Preprocessing is process of converting the code to format, that can be composed by the compiler. This means, that all the comments, white-space and new-line characters are removed and files are merged together to create whole program (still in text format). This file is then taken by compiler and assembled into assembly format, which can be assembled into object code (can even be machine code) which is combined (linked) with code from standard library (part of the operating system) to create executable file. Yes! We are there. Executable is the final program you want to run. (.exe on Windows, .a or without extension on Unix-based systems).
This seems quite complicated, right? But in real life, you usually care only about which files to take as an input and how to connect them to create executable.
This pipeline is typical for C++ language, which is typical compiled language. Language, that is pre-compiled produce machine code and then run directly within hardware. There are also interpreted languages, such as Python, that are running as scripts line-by-line and can use some kind of intermediate representation to interact with the hardware. Both of those approaches have its advantages and disadvantages.
For example the C++ code is usually specifically compiled for each operating system, so program compiled for Windows won’t run on Linux and vice versa. On the other hand, the program will need minimal overhead on its run, so it is much faster.
On the other hand, the Python program uses pre-compilation into bytecode, which is then run by virtual machine (very compact operating system interacting with hardware within your standard operating system), so it is not dependent on your operating system, but is a bit slower.
There is also one big difference between C++ and Python (or C# or Java). That is managed memory. In C or C++, you are responsible for memory allocation (space in RAM, that your program will use) at your own risk. This often led to errors called “memory leaks”, that usually means, that user allocated some space and forget to delete it once they stopped using it. This often lead to adding up of currently allocated memory till it won’t fit the RAM and everything collapses, or other programs won’t be able to allocate more space. Managed languages such as Python, C#, have thing called garbage collector, which helps programmers to manage the allocated space. This is very good for programmers, when they do not have to care about details in implementation, but it can also be tricky in case you need to be very efficient in terms of memory.
Now, you know about programs generally, but what are they made of?
What are the programs made of?
In general … variables, functions, conditions, loops and (in some languages) the objects.
Variables are representing the data (usually in RAM). It can be numbers, texts, objects, tables, etc. etc. They usually have some name and value. For example:
carsCount = 5;
Functions are representing some logical piece of code. They might have inputs or outputs. For example:
carsCount = countCarsOfBrand(allCars, brand);
Conditions can help you with switching in the program. Example:
if (carCount >= 10) {
sellOldestCar();
}
else {
checkForNewCars();
}
Loops are designed for repeating. Example:
for (auto& line : lines) {
cout << line << endl;
}// For each line print it to console / shell
Objects are usually representing something. Example:
class Car {
private:
int year;
string brand;
string model;
string owner;
...public:
...
...
...
string composeOutputLine() {
return "Record: " + brand + "-" + model + to_string(year);
};};
// This is class car with its properties and method
// composeOutputLine, that creates text string with
// standard car info. - Note: method is function,
// which is part of the class.
All those are your basic buildings stones of your programs… you can put them together to crate more complex functions, objects etc. that will be larger building blocks for you in more complex programs…
As you have some basic knowledge about the programming now, you might want to start on your own. So … where to start?
Which language to start with?
First thing first … It depends on what you want to do. All good programmers (at least those I was picking in my team) know syntax of at least several languages in the end, because each of them is good for something. In case you are looking for extra-fast general programs or larger systems, I would go for C++. Web applications, learn JavaScript. Do you want to work with databases, Java or C# might be your friends. Game development? C++, C# will be your friends. Are you interested in research? Try Python. Matlab can be your friend if you are working at University, but Python offers you nearly the same these days. If you do not know, then do not worry … I would recommend you the Python.
I am recommending Python to everyone, who wants to start for several reasons (some of them might not be clear to beginners now, but if you start with the language, you will learn about them in one or two days):
- It has managed memory: You do not need to be worried that your app will overgrow your RAM (well, if you do evil things, it might, but in most cases it will not)
- It is very nice to read
- It will teach you, how to format your code, because it is dependent on indentation
- There are sooooooooooo many open-source packages available using pip package manager — so you can easily start with creating simple web, game or do machine-learning stuff with minimum effort
`pip install some-cool-package-doing-all-the-work-for-me` - It is “Object Oriented Language” (the quotes are there, because it is OOP only in hackish way) -> It is easy to do user idea abstractions. For example you can have object Book, that has all cool properties (size, weight, name, author, …) or methods (getPage(page_number)).
- It is easy to install on every system … (the best integration I have seen so far is on Fedora Linux, but it is not so hard to run it on Windows too)
- The community around python is extremely huge, so you will easily google your way through beginnings.
- There are so many projects to start with available online. For example — directly on official web page.
- You can use Python in CodingGame to code your AI bots. This is BTW the best way how to learn coding, if you like logic games and want to learn programming.
- There are several virtualization systems around python (virtualenv, venv, conda) that can allow you to not break your python environment (all the installed packages etc.).
Good additional languages to Python are C or C++, because it is easy to bind them together. Or you can even use Cython extension to write C code in pythonic way. In case you learn Python and C++, then it would not be hard for you to go for C#, then to Java, etc. etc. etc. Big advantage of C code is, that you can easily wrap it to higher languages such as C#, Java, etc. so you can write whole program in C or C++ and then just provide your customer or colleague with some simple code for their language, that works over your C or C++ code. Also, when you learn the old-school C or C++ versions, you will have very good knowledge of the hardware stuff behind, because as I said: The most magical (and terrible for beginners) thing about C or C++ is, that if something does not work, it is surely your fault.
As example, see following code example comparing Python and C++:
// C++ - You need to write a lot of the code on your own
vector<string> split(const string& str, const string& delimiter)
{
vector<string> substrings;
size_t prev = 0, pos = 0;
do
{
pos = str.find(delimiter, prev);
if (pos == string::npos) {
pos = str.length();
}
string substr = str.substr(prev, pos-prev);
if (!substr.empty()) {
substrings.push_back(substr);
}
prev = pos + delimiter.length();
}
while (pos < str.length() && prev < str.length());
return substrings;
}
for (auto& line in lines) {
auto substrings = split(line, ",");
cout << substrings[0] << endl;
}#######################################################
# Pythonfor line in lines:
print(line.split(",")[0])
OK … now you know, with what languages you can possibly start, but where I can write my code? Well … you can use notepad or vim or any other text editor, but the best way is to use some IDE.
What is IDE?
IDE (Integrated Development Environment) is usually some text editor with whole bunch of stuff around it. It will format your code, so it is easier for you to read. It will show you the potential errors in your code. It can help you “debug” your code (run your code and see what it does for real).
Some of them are language specific, some of them have extremely large functionality for corporations (such as diagrams export etc.) and some of them are lightweight or easy to customize.
In case of pure Python, you might want to check on Czech IDE PyCharm. It is probably very good for beginners, because it allows you to do a lot of stuff with clicking and not writing your custom configs. On the other hand it has a lot of performance (well … in comparison with other IDEs) issues, and in case you want to code in other languages than Python or on remote servers, it is not what you will end up with in the end. But why not …
I personally prefer Visual Studio Code (do not get confused with Visual Studio, that is paid and extremely extensive IDE), which is open-source (free) based IDE with extreme large set of plugins for variety of languages. It supports remote development (you can write your code on some cloud machine, but won’t recognize you are not on your own device) in very clever way. You can develop in Docker containers on one click (do not worry, I will make All you need to know about … Docker soon). You can also pre-set a lot of configuration text files there, so you have your favorite settings available on one click (but yeah, it will take you some time to figure out, how to write them).
So … now you know, that there is something called IDE you can use, to make your programming work easier.
What you can look forward to in next articles?
I hope you liked this article and I would like to give you something to look forward to in next weeks …
- How to version (not only backup, but version for real) your code (git).
- Docker — working and developing in containers (isolated environments)
- What are Neural Networks