Tuesday, 30 April 2013

Write your first MapReduce program in 20 minutes


The slow revolution
Some revolutions are marked by a single, spectacular event: the storming of the Bastille during the French Revolution, or the destruction of the towers of the World Trade Center on September 11, 2001, which so changed the US’s relationship with the rest of the world. But often the most important revolutions aren’t announced with the blare of trumpets. They occur softly, too slow for a single news cycle, but fast enough that if you aren’t alert, the revolution is over before you’re aware it’s happening.

Such a revolution is happening right now in computing. Microprocessor clock speeds have stagnated since about 2000. Major chipmakers such as Intel and AMD continue to wring out improvements in speed by improving on-chip caching and other clever techniques, but they are gradually hitting the point of diminishing returns. Instead, as transistors continue to shrink in size, the chipmakers are packing multiple processing units onto a single chip. Most computers shipped today use multi-core microprocessors, i.e., chips with 2 (or 4, or 8, or more) separate processing units on the main microprocessor.

The result is a revolution in software development. We’re gradually moving from the old world in which multiple processor computing was a special case, used only for boutique applications, to a world in which it is widespread. As this movement happens, software development, so long tailored to single-processor models, is seeing a major shift in some its basic paradigms, to make the use of multiple processors natural and simple for programmers.

This movement to multiple processors began decades ago. Projects such as the Connection Machine demonstrated the potential of massively parallel computing in the 1980s. In the 1990s, scientists became large-scale users of parallel computing, using parallel computing to simulate things like nuclear explosions and the dynamics of the Universe. Those scientific applications were a bit like the early scientific computing of the late 1940s and 1950s: specialized, boutique applications, built with heroic effort using relatively primitive tools. As computing with multiple processors becomes widespread, though, we’re seeing a flowering of general-purpose software development tools tailored to multiple processor environments.

One of the organizations driving this shift is Google. Google is one of the largest users of multiple processor computing in the world, with its entire computing cluster containing hundreds of thousands of commodity machines, located in data centers around the world, and linked using commodity networking components. This approach to multiple processor computing is known as distributed computing; the characteristic feature of distributed computing is that the processors in the cluster don’t necessarily share any memory or disk space, and so information sharing must be mediated by the relatively slow network.

In this post, I’ll describe a framework for distributed computing called MapReduce. MapReduce was introduced in a paper written in 2004 by Jeffrey Dean and Sanjay Ghemawat from Google. What’s beautiful about MapReduce is that it makes parallelization almost entirely invisible to the programmer who is using MapReduce to develop applications. If, for example, you allocate a large number of machines in the cluster to a given MapReduce job, the job runs in a highly parallelized way. If, on the other hand, you allocate only a small number of machines, it will run in a much more serial way, and it’s even possible to run jobs on just a single machine.

What exactly is MapReduce? From the programmer’s point of view, it’s just a library that’s imported at the start of your program, like any other library. It provides a single library call that you can make, passing in a description of some input data and two ordinary serial functions (the “mapper” and “reducer”) that you, the programmer, specify in your favorite programming language. MapReduce then takes over, ensuring that the input data is distributed through the cluster, and computing those two functions across the entire cluster of machines, in a way we’ll make precise shortly. All the details – parallelization, distribution of data, tolerance of machine failures – are hidden away from the programmer, inside the library.

What we’re going to do in this post is learn how to use the MapReduce library. To do this, we don’t need a big sophisticated version of the MapReduce library. Instead, we can get away with a toy implementation (just a few lines of Python!) that runs on a single machine. By using this single-machine toy library we can learn how to develop for MapReduce. The programs we develop will run essentially unchanged when, in later posts, we improve the MapReduce library so that it can run on a cluster of machines.

Our first MapReduce program
Okay, so how do we use MapReduce? I’ll describe it with a simple example, which is a program to count the number of occurrences of different words in a set of files. The example is simple, but you’ll find it rather strange if you’re not already familiar with MapReduce: the program we’ll describe is certainly not the way most programmers would solve the word-counting problem! What it is, however, is an excellent illustration of the basic ideas of MapReduce. Furthermore, what we’ll eventually see is that by using this approach we can easily scale our wordcount program up to run on millions or even billions of documents, spread out over a large cluster of computers, and that’s not something a conventional approach could do easily.

The input to a MapReduce job is just a set of (input_key,input_value) pairs, which we’ll implement as a Python dictionary. In the wordcount example, the input keys will be the filenames of the files we’re interested in counting words in, and the corresponding input values will be the contents of those files:

filenames = ["text\\a.txt","text\\b.txt","text\\c.txt"]
i = {}
for filename in filenames:
  f = open(filename)
  i[filename] = f.read()
  f.close()
After this code is run the Python dictionary i will contain the input to our MapReduce job, namely, i has three keys containing the filenames, and three corresponding values containing the contents of those files. Note that I’ve used Windows’ filenaming conventions above; if you’re running a Mac or Linux you may need to tinker with the filenames. Also, to run the code you will of course need text files text\1.txt, text\2.txt, text\3.txt. You can create some simple examples texts by cutting and pasting from the following:

text\a.txt:

The quick brown fox jumped over the lazy grey dogs.

text\b.txt:

That's one small step for a man, one giant leap for mankind.

text\c.txt:

Mary had a little lamb,
Its fleece was white as snow;
And everywhere that Mary went,
The lamb was sure to go.
The MapReduce job will process this input dictionary in two phases: the map phase, which produces output which (after a little intermediate processing) is then processed by the reduce phase. In the map phase what happens is that for each (input_key,input_value) pair in the input dictionary i, a function mapper(input_key,input_value) is computed, whose output is a list of intermediate keys and values. This function mapper is supplied by the programmer – we’ll show how it works for wordcount below. The output of the map phase is just the list formed by concatenating the list of intermediate keys and values for all of the different input keys and values.

I said above that the function mapper is supplied by the programmer. In the wordcount example, what mapper does is takes the input key and input value – a filename, and a string containing the contents of the file – and then moves through the words in the file. For each word it encounters, it returns the intermediate key and value (word,1), indicating that it found one occurrence of word. So, for example, for the input key text\a.txt, a call to mapper("text\\a.txt",i["text\\a.txt"]) returns:

[('the', 1), ('quick', 1), ('brown', 1), ('fox', 1), ('jumped', 1),
 ('over', 1), ('the', 1), ('lazy', 1), ('grey', 1), ('dogs', 1)]
Notice that everything has been lowercased, so we don’t count words with different cases as distinct. Furthermore, the same key gets repeated multiple times, because words like the appear more than once in the text. This, incidentally, is the reason we use a Python list for the output, and not a Python dictionary, for in a dictionary the same key can only be used once.

Here’s the Python code for the mapper function, together with a helper function used to remove punctuation:

def mapper(input_key,input_value):
  return [(word,1) for word in
          remove_punctuation(input_value.lower()).split()]

def remove_punctuation(s):
  return s.translate(string.maketrans("",""), string.punctuation)
mapper works by lowercasing the input file, removing the punctuation, splitting the resulting string around whitespace, and finally emitting the pair (word,1) for each resulting word. Note, incidentally, that I’m ignoring apostrophes, to keep the code simple, but you can easily extend the code to deal with apostrophes and other special cases.

With this specification of mapper, the output of the map phase for wordcount is simply the result of combining the lists for mapper("text\\a.txt"), mapper("text\\b.txt"), and mapper("text\\c.txt"):

[('the', 1), ('quick', 1), ('brown', 1), ('fox', 1),
 ('jumped', 1), ('over', 1), ('the', 1), ('lazy', 1), ('grey', 1),
 ('dogs', 1), ('mary', 1), ('had', 1), ('a', 1), ('little', 1),
 ('lamb', 1), ('its', 1), ('fleece', 1), ('was', 1), ('white', 1),
 ('as', 1), ('snow', 1), ('and', 1), ('everywhere', 1),
 ('that', 1), ('mary', 1), ('went', 1), ('the', 1), ('lamb', 1),
 ('was', 1), ('sure', 1), ('to', 1), ('go', 1), ('thats', 1),
 ('one', 1), ('small', 1), ('step', 1), ('for', 1), ('a', 1),
 ('man', 1), ('one', 1), ('giant', 1), ('leap', 1), ('for', 1),
 ('mankind', 1)]
The map phase of MapReduce is logically trivial, but when the input dictionary has, say 10 billion keys, and those keys point to files held on thousands of different machines, implementing the map phase is actually quite non-trivial. What the MapReduce library handles is details like knowing which files are stored on what machines, making sure that machine failures don’t affect the computation, making efficient use of the network, and storing the output in a useable form. We won’t worry about these issues for now, but we will come back to them in future posts.

What the MapReduce library now does in preparation for the reduce phase is to group together all the intermediate values which have the same key. In our example the result of doing this is the following intermediate dictionary:

{'and': [1], 'fox': [1], 'over': [1], 'one': [1, 1], 'as': [1],
 'go': [1], 'its': [1], 'lamb': [1, 1], 'giant': [1],
 'for': [1, 1], 'jumped': [1], 'had': [1], 'snow': [1],
 'to': [1], 'leap': [1], 'white': [1], 'was': [1, 1],
 'mary': [1, 1], 'brown': [1], 'lazy': [1], 'sure': [1],
 'that': [1], 'little': [1], 'small': [1], 'step': [1],
 'everywhere': [1], 'mankind': [1], 'went': [1], 'man': [1],
 'a': [1, 1], 'fleece': [1], 'grey': [1], 'dogs': [1],
 'quick': [1], 'the': [1, 1, 1], 'thats': [1]}
We see, for example, that the word ‘and’, which appears only once in the three files, has as its associated value a list containing just a single 1, [1]. By contrast, the word ‘one’, which appears twice, has [1,1] as its value.

The reduce phase now commences. A programmer-defined function reducer(intermediate_key,intermediate_value_list) is applied to each entry in the intermediate dictionary. For wordcount, reducer simply sums up the list of intermediate values, and return both the intermediate_key and the sum as the output. This is done by the following code:

def reducer(intermediate_key,intermediate_value_list):
  return (intermediate_key,sum(intermediate_value_list))
The output from the reduce phase, and from the total MapReduce computation, is thus:

[('and', 1), ('fox', 1), ('over', 1), ('one', 2), ('as', 1),
 ('go', 1), ('its', 1), ('lamb', 2), ('giant', 1), ('for', 2),
 ('jumped', 1), ('had', 1), ('snow', 1), ('to', 1), ('leap', 1),
 ('white', 1), ('was', 2), ('mary', 2), ('brown', 1),
 ('lazy', 1), ('sure', 1), ('that', 1), ('little', 1),
 ('small', 1), ('step', 1), ('everywhere', 1), ('mankind', 1),
 ('went', 1), ('man', 1), ('a', 2), ('fleece', 1), ('grey', 1),
 ('dogs', 1), ('quick', 1), ('the', 3), ('thats', 1)]
You can easily check that this is just a list of the words in the three files we started with, and the associated wordcounts, as desired.

We’ve looked at code defining the input dictionary i, the mapper and reducer functions. Collecting it all up, and adding a call to the MapReduce library, here’s the complete wordcount.py program:

#word_count.py

import string
import map_reduce

def mapper(input_key,input_value):
  return [(word,1) for word in
          remove_punctuation(input_value.lower()).split()]

def remove_punctuation(s):
  return s.translate(string.maketrans("",""), string.punctuation)

def reducer(intermediate_key,intermediate_value_list):
  return (intermediate_key,sum(intermediate_value_list))

filenames = ["text\\a.txt","text\\b.txt","text\\c.txt"]
i = {}
for filename in filenames:
  f = open(filename)
  i[filename] = f.read()
  f.close()

print map_reduce.map_reduce(i,mapper,reducer)
The map_reduce module imported by this program implements MapReduce in pretty much the simplest possible way, using some useful functions from the itertools library:

# map_reduce.py
"'Defines a single function, map_reduce, which takes an input
dictionary i and applies the user-defined function mapper to each
(input_key,input_value) pair, producing a list of intermediate
keys and intermediate values.  Repeated intermediate keys then
have their values grouped into a list, and the user-defined
function reducer is applied to the intermediate key and list of
intermediate values.  The results are returned as a list."'

import itertools

def map_reduce(i,mapper,reducer):
  intermediate = []
  for (key,value) in i.items():
    intermediate.extend(mapper(key,value))
  groups = {}
  for key, group in itertools.groupby(sorted(intermediate),
                                      lambda x: x[0]):
    groups[key] = list([y for x, y in group])
  return [reducer(intermediate_key,groups[intermediate_key])
          for intermediate_key in groups]
(Credit to a nice blog post from Dave Spencer for the use of itertools.groupby to simplify the reduce phase.)

Obviously, on a single machine an implementation of the MapReduce library is pretty trivial! In later posts we’ll extend this library so that it can distribute the execution of the mapper and reducer functions across multiple machines on a network. The payoff is that with enough improvement to the library we can with essentially no change use our wordcount.py program to count the words not just in 3 files, but rather the words in billions of files, spread over thousands of computers in a cluster. What the MapReduce library does, then, is provide an approach to developing in a distributed environment where many simple tasks (like wordcount) remain simple for the programmer. Important (but boring) tasks like parallelization, getting the right data into the right places, dealing with the failure of computers and networking components, and even coping with racks of computers being taken offline for maintenance, are all taken care of under the hood of the library.

In the posts that follow, we’re thus going to do two things. First, we’re going to learn how to develop MapReduce applications. That means taking familiar tasks – things like computing PageRank – and figuring out how they can be done within the MapReduce framework. We’ll do that in the next post in this series. In later posts, we’ll also take a look at Hadoop, an open source platform that can be used to develop MapReduce applications.

Second, we’ll go under the hood of MapReduce, and look at how it works. We’ll scale up our toy implementation so that it can be used over small clusters of computers. This is not only fun in its own right, it will also make us better MapReduce programmers, in the same way as understanding the innards of an operating system (for example) can make you a better application programmer.

To finish off this post, though, we’ll do just two things. First, we’ll sum up what MapReduce does, stripping out the wordcount-specific material. It’s not any kind of a formal specification, just a brief informal summary, together with a few remarks. We’ll refine this summary a little in some future posts, but this is the basic MapReduce model. Second, we’ll give a overview of how MapReduce takes advantage of a distributed environment to parallelize jobs.

MapReduce in general
Summing up our earlier description of MapReduce, and with the details about wordcount removed, the input to a MapReduce job is a set of (input_key,input_value) pairs. Each pair is used as input to a function mapper(input_key,input_value) which produces as output a list of intermediate keys and intermediate values:

[(intermediate_key,intermediate_value),
 (intermediate_key',intermediate_value'),
 ...]
The output from all the different input pairs is then sorted, so that values associated with the same intermediate_key are grouped together in a list of intermediate values. The reducer(intermediate_key,intermediate_value_list) function is then applied to each intermediate key and list of intermediate values, to produce the output from the MapReduce job.

A natural question is whether the order of values in intermediate_value_list matters. I must admit I’m not sure of the answer to this question – if it’s discussed in the original paper, then I missed it. In most of the examples I’m familiar with, the order doesn’t matter, because the reducer works by applying a commutative, associate operation across all intermediate values in the list. As we’ll see in a minute, because the mapper computations are potentially done in parallel, on machines which may be of varying speed, it’d be hard to guarantee the ordering, and this suggests that the ordering doesn’t matter. It’d be nice to know for sure – if anyone reading this does know the answer, I’d appreciate hearing it, and will update the post!

One of the most striking things about MapReduce is how restrictive it is. A priori it’s by no means clear that MapReduce should be all that useful in practical applications. It turns out, though, that many interesting computations can be expressed either directly in MapReduce, or as a sequence of a few MapReduce computations. We’ve seen wordcount implemented in this post, and we’ll see how to compute PageRank using MapReduce in the next post. Many other important computations can also be implemented using MapReduce, including doing things like finding shortest paths in a graph, grepping a large document collection, or many data mining algorithms. For many such problems, though, the standard approach doesn’t obviously translate into MapReduce. Instead, you need to think through the problem again from scratch, and find a way of doing it using MapReduce.

Exercises
How would you implement grep in MapReduce?
Problems
Take a well-known algorithms book, e.g., Cormen-Leiserson-Rivest-Stein, or a list of well-known algorithms. Which algorithms lend themselves to being expressed in MapReduce? Which do not? This isn’t so much a problem as it is a suggestion for a research program.
The early years of serial computing saw the introduction of powerful general-purpose ideas like those that went into the Lisp and Smalltalk programming languages. Arguably, those ideas are still the most powerful in today’s modern programming languages. MapReduce isn’t a programming language as we conventionally think of them, of course, but it’s similar in that it introduces a way of thinking about and approaching programming problems. I don’t think MapReduce has quite the same power as the ideas that went into Lisp and Smalltalk; it’s more like the Fortran of distributed computing. Can we find similarly powerful ideas to Lisp or Smalltalk in distributed computing? What’s the hundred-year framework going to look like for distributed computing?
How MapReduce takes advantage of the distributed setting
You can probably already see how MapReduce takes advantage of a large cluster of computers, but let’s spell out some of the details. There are two key points. First, the mapper functions can be run in parallel, on different processors, because they don’t share any data. Provided the right data is in the local memory of the right processor – a task MapReduce manages for you – the different computations can be done in parallel. The more machines are in the cluster, the more mapper computations can be simultaneously executed. Second, the reducer functions can also be run in parallel, for the same reason, and, again, the more machines are available, the more computations can be done in parallel.

The difficulty in this story arises in the grouping step that takes place between the map phase and the reduce phase. For the reducer functions to work in parallel, we need to ensure that all the intermediate values corresponding to the same key get sent to the same machine. Obviously, this requires communcation between machines, over the relatively slow network connections.

This looks tough to arrange, and you might think (as I did at first) that a lot of communication would be required to get the right data to the right machine. Fortunately, a simple and beautiful idea is used to make sure that the right data ends up in the right location, without there being too much communication overhead.

Imagine you’ve got 1000 machines that you’re going to use to run reducers on. As the mappers compute the intermediate keys and value lists, they compute hash(intermediate_key) mod 1000 for some hash function. This number is used to identify the machine in the cluster that the corresponding reducer will be run on, and the resulting intermediate key and value list is then sent to that machine. Because every machine running mappers uses the same hash function, this ensures that value lists corresponding to the same intermediate key all end up at the same machine. Furthermore, by using a hash we ensure that the intermediate keys end up pretty evenly spread over machines in the cluster. Now, I should warn you that in practice this hashing method isn’t literally quite what’s done (we’ll describe that in later lectures), but that’s the main idea.

Needless to say, there’s a lot more to the story of how MapReduce works in practice, especially the way it handles data distribution and fault-tolerance. In future posts we’ll explore many more of the details. Nonetheless, hopefully this post gives you a basic understanding of how MapReduce works. In the next post we’ll apply MapReduce to the computation of PageRank.

Sunday, 28 April 2013

Roles and Responsbilities of a Business Analyst

A Business Analyst is responsible for identifying the business needs of their clients and stakeholders (the directors, vendors, employees, and customers) to help determine solutions to business problems. They typically have a high degree of industry experience and are a key facilitator within an organization, acting as a bridge between the client, stakeholders and the solution team.The Business Analysts study the overall business and information needs of an organization in order to develop appropriate solution strategies. As the key liaison between business and information technology departments, the business analyst is responsible for gathering and documenting business requirements and translating them into functional system design specifications that can be successfully executed by IT development teams.The Business Analyst discloses, analyzes, validates and documents business, organizational and operational requirements. Solutions are not predetermined by the Business Analyst, but are driven solely by the requirements of the business. Solutions often include a systems development component, but may also consist of process improvement or organizational change. The Business Analyst can have a significant impact on development costs and help the business minimize project delays.Responsibilities: 
• Define and document business and user requirements after thorough fact-finding with customers and analysis of business workflows and system requirements.• Produce, explain and clarify specifications for customers and project team members.• Work closely with developers in translating business requirements into high level design specifications.• Provide realistic estimates and updates of the work necessary to complete all relevant deliverables, and complete all tasks within targeted timeframes.• Review and provide feedback on test cases and documentation.• Provide process improvement suggestions.

What Qualities a Business Analyst should possess.


A good Business Analyst is creative, a people person. Someone wanting a more hands on approach to business and problem solving. The good Business Analyst will look for opportunities to grow and learn. He or she will listen attentively to what others are saying. The good Business Analyst is like a walking encyclopedia about the company he or she works within. They will know people from every department.


What makes a good business analyst is the ability to listen to what is being said and hear what is not. The good business analyst can read into the meaning of stakeholders words. He or she can understand the needs being expressed when the stakeholders do not always know what they are. The good Business Analyst will be able to determine if the requests from stakeholders or end users are viable. In some cases they are not and it is up to the business analyst to inform what can be done versus what is wanted.

A good Business Analyst is a visionary, a creative thinker, and innovative. He or she is fun to work with and carries a positive attitude.

To summarize, a good Business Analyst must possess following qualities.

1) Quality of a Business Analyst may include some 
degree in technology.

2) Another quality a Business Analyst should have is the ability to be
comfortable in the board room as well as in front of the drawing board.

3) Another best attribute for the Business Analyst is being 
able to supply options. He or she will know what is available and from whom.

4) Being 
open minded is a good quality for the Business Analyst.

5) A Business Analyst should be 
able to analyze the attributes of another individual. He or she can show that person where their expertise can help a project.

6) A Business Analyst should possess a quality that allows him to 
look into the future to see where business and technology are going.

There are times when services from outside sources may be utilized by the business. The business analyst is trained to understand the importance or lack of need for these sources. He or she can determine the most cost effective way to use the sources. The business analyst may find directing the designated tasks to in-house departments more beneficial to the company. This is part of researching the project proposal. The Business Analyst is to determine the most cost effective way to reach the goal and still succeed with a bottom line net profit. Learn more about the roles and responsibilities of a Business Analyst.

Knowledge, Skills and Abilities of the Business Analyst
 

• 
Business knowledge: The Business Analyst should have some background knowledge of the subject to make the requirements gathering efficient, although it is not always a must and depends highly on the complexity of the project.

• 
IT knowledge: The Business Analyst should understand what the company information systems can and cannot do. A skilled business analyst does not need to have a deep technical knowledge but should have some general knowledge of network, operating systems, hardware capabilities, database concepts, and the System Development Life Cycle and project methodology.

• 
Interpersonal and communication skills (both written and verbal): The Business Analyst should be a great communicator and diligent team member. Because she or he has to liaise with various business units to gather requirements and resolve different business issues.

• 
Data collecting skills: The Business Analyst should know what data do the company currently have and need to be carried over into the new systems or analysis around what can be achieved with a new system by projecting previous figures of a successful project on the business.

• 
Analytical and problem solving skills: The Business Analyst needs to have the ability to assemble, analyze and evaluate data and to be able to make appropriate and well-reasoned recommendations and decisions to support the Business stakeholders and the Project team. The Business Analyst should also be able to analyse the feasibility of requirements in terms of efforts, inputs, time, and costs. Identify and resolve issues

• 
Ability to understand and document business processes: The Business Analyst should be able to recognize, analyze and map processes, model and improve business process and anticipate future state.

To summarize, qualities and skills of a Business Analyst are:

A good Business Analyst will be one of the best assets a company or organization can invest in. Finding a Business Analyst with these qualities is like finding a pot of gold at the end of a rainbow.

Saturday, 27 April 2013

10 things to do after installing Linux Mint 13

Linux Mint 13 just got released and I have been using it (MATE edition) on my laptop for nearly a week and from my experience, this is a very nice and stable release. However, depending on personal need and preference, users still need to tweak and customize a few things to make it more usable. Here are 10 things that I did after installing Linux Mint 13 - MATE edition.

Update System

As always on a freshly installed system, the first thing you need to do is to update the packages. To do so, you can click on the shield icon on the notification area, enter the password and the update manager will appear. Click on the Install Updates button to do the system updating. It will take a while depending on your network speed.

10 things to do after installing Linux Mint 13

You can also open the terminal and run the following command for system updating: 
 sudo apt-get update && sudo apt-get upgrade  

Install proprietary driver for your graphic card

If your computer has a dedicated graphic card, you should consider installing the proprietary driver for it. When the open source driver does get the job done, it still cannot hold a candle to the proprietary one. To check for available proprietary drivers, open the Mint menu, go to control center then Additional Drivers then select the driver to install. If you are using an ATI card, I recommend you to pick the second driver in the additional drivers window since the first one had caused my system to crash.

 things to do after installing Linux Mint 13

After installing the driver, you need to reboot the system for the new driver to take effect.

Change the search engines of Firefox


In Linux Mint 13, Firefox uses two search engines, Yahoo search if you use the search box and DuckDuckGo if you search in the address box. I always prefer Google over these search engines so one of the first things I did after installing Linux Mint 13 was to change the Firefox search engine back to Google search.

To change the search engine in the search box of Firefox to Google, you just need to open Firefox then go to the following address:
 http://www.linuxmint.com/searchengine_google.php  

After that, click on the drop down list in the search box and select "Add Google"

change search engine firefox Linux mint

To change the search engine of the address bar from DuckDuckGo to Google, you open Firefox , type about:config on the address bar. You will see a warning, just click on the "I'll be careful" button to go to the config page. In this config page, scroll down until you find the line that begins with "keyword.URL":

change firefox Linux mint search engine

Right click on this line and select the Modify option. The box to edit the search engine will appear, in this box, you just need to replace the DuckDuckGo url with Google:
 http://www.google.com/search?q=  

change Linux mint firefox search engine

Click ok and everything is done.  Now both the address bar and the search box use Google as the default search engine.

Get more themes and icons

Linux Mint 13 comes with very few themes and icon sets so I think everybody would love to get more themes and icon to decorate the desktop. The best place to look for nice themes and icons in my opinion is Gnome-look. Just go there and search for the themes and icons you like. Also remember to read the installation instruction of each theme and icon set.

Here is my Linux Mint 13 desktop with Delorean Noir theme and ubo icons

beautiful Linux Mint desktop


Get a conky setup

Conky is an amazing linux application. A neat conky will make your desktop look amazing. I may be accused of being flamboyant but a conky setup was the first thing I got for my Linux Mint 13 desktop.

Take a look at the awesome Reloj Conky config:



If you are interested in conky like me, you can check my articles about beautiful conky configs.

Customize the Mint Menu

You can use the Mint Menu to search for and launch applications or access quickly to preferred folders. To customize the Mint Menu, right click on it and choose the Preferences option and the Menu preferences window will open. To add shortcut to your favorite folders, just go to the Places tab and click on the New button to add new folders to the Mint Menu:

customize linux mint menu

To customize the Favorites list on the Mint Menu, you can right click on each application to remove or add it into the Favorite list:

how to customize linux mint menu

You may find that after you remove some applications out of and add new ones into the Favorite List, the icons arent arranged into the order you want and you cannot drag these icons around on the menu. To change the order of the icons on the Favorites list, you need to edit the file~/.linuxmint/mintMenu/applications.lst. Here is how this file looks to get the Mint Menu in the screenshot below. ( Note: after you edit the applications.list file, you need to right click on the Mint Menu and select Reload plugins to restart the Mint Menu)
 location:/usr/share/applications/firefox.desktop  
 location:/usr/share/applications/thunderbird.desktop  
 location:/usr/share/applications/pidgin.desktop  
 location:/usr/share/applications/xchat.desktop  
 separator  
 location:/usr/share/applications/mate-appearance-properties.desktop  
 location:/usr/share/applications/libreoffice-writer.desktop  
 location:/usr/share/applications/mate-terminal.desktop  
 location:/usr/share/applications/pluma.desktop  
 separator  
 location:/usr/share/applications/ccsm.desktop  
 location:/usr/share/applications/gimp.desktop  
 separator  
 location:/usr/share/applications/banshee.desktop  
 location:/usr/share/applications/vlc.desktop  

how to customize mint menu

Set auto mount for all the partitions

Besides the root partition, there are several other partitions in the hard drive of my laptop. These partitions are where I keep my important stuffs. And I hate it when I tried to access a file from an application, I realized that I forgot to mount these partitions. That's why I always prefer auto mounting all the partitions in my Linux box.

There are actually many ways to auto mount partitions in Linux, one of them is to use a tool called  "pysdm". First, you need to install it:
 sudo apt-get install pysdm  

Next, open the terminal and run pysdm as root:
 sudo pysdm  

The configuration window of pysdm will appear, on which you just need to select the partition to configure:


After that, click on the Assistant button on the right and you will have a menu to customize the behavior of each partition. The option to auto mount partition at booting is the second one.


Just do that to all the partitions you want to mount at booting up and next time, you dont need to manually mount these partition anymore.

Hide the partition icons on the desktop

By default, the desktop of Linux Mint 13 always displays the icons of mounted partitions and USB drives and I think that makes the desktop ugly. I always want my desktop to be nice and clean. To hide these icons, hit Alt+F2 and type mateconf-editor then hit enter:


The MATE configuration window will appear. On which, you go to apps > caja > desktop then uncheck the volumes_visible option like the image below:


And you wont see these partition and usb icons on your desktop anymore.

Install Compiz Config Settings Manager

Compiz is an awesome tool, you can do a lot of cool things with it so I recommend everybody to install Compiz. You can tweak a lot of things with CCSM and of course, it is always fun to show the cubic desktops to your friends :D

To install Compiz, you can search for it in the Software Manager or run the following command:
 sudo apt-get install compizconfig-settings-manager  

Install Wine

I myself at times need to run some Windows applications so Wine is always a must-have tool for me. To install Wine, you can search for it in the Software Manager or use this command:
 sudo apt-get install wine  

-------------------------------------------------------------------------------------------

These are 10 things I did after installing Linux Mint 13 - MATE edition. If you are using Linux Mint 13 too and have any other ideas, please tell me by giving a comment.

Changing Folder Icons Mint Linux


With the Mate Desktop customizing is a great way to quickly find a special folder or file. You may have noticed that the folder icons are predetermined by the theme you set. So if you wish to change a folder icon to another so that it will stand out from the system-wide icons. There are tones of sites to get icons and you can use them to customize your folders in your user area and find your files a little bit faster.

Changing Folder Icons


1. Open the File Browser

Go to Menu > System Tools > File Browser, right click a folder icon, select Properties.

2. Navigate to the folder

Find a folder or make one for testing.

3. Right click the folder icon.

Left click Properties to open the settings window.

4. Left click the icon image.

Left click on the folder image at the top to choose a new icon.