Archives for category: Network Theory

You may be under the impression that when you search for something on Google the results you see are the same as anyone else that performs that search. This isn’t the case, and hasn’t been for a long time.

In 2009 Google went full steam ahead with personalized search. The idea was to look through your internet history, your Gmail and all the rest of your Google products and look for signals that would enable Google to tailor a search results to exactly what you are looking for.

As well as looking through your history, Google has always wanted to look at your social network to make your search results more relevant. The only problem with that is it doesn’t own any social network data – a social network like Facebook is a ‘walled garden’ that Google can only peek in from the outside.

The arrival of Google+ allows Google free-rein over your social data and will herald the age of a new buzzword – social search. Social search is the process whereby your social network (or social graph) affects the results of a Google search. By looking at the content that has been created or shared by people in my social graph, the results I get from a Google search will be more personalized than ever before.

I’ve already seen this in action. After searching Google for ‘SOPA’ (the Stop Online Piracy Act) I found myself reading from a website that I had never heard of. I traced how I ended up on this particular page and it turns out that someone I have in my Google Circle network was a writer for this website and had +1’ed the article.

This is great, right? Google search results will become more relevant, based upon people like me and less likely to be manipulated by dirty SEO tactics. Some people have even gone so far as to call this a ‘Socratic Revolution’ – suggesting that the era of personalized search is akin to the philosopher Socrates placing man at the center of the intellectual universe.

There is, however, a dark side to personalized search that has been recognized in a book called  ‘The Filter Bubble’ by Eli Pariser. The problem, he argues, is that this personalized ecosystem of knowledge acts as a mirror that reinforces what we believe without allowing the possibility of our views being challenged. Each new layer of personalization strengthens the walls of our own bubble – satisfying us with the information we want to see instead of offering new ideas. Or as he puts it, we are being given ‘too much candy, and not enough carrots.’

Whilst the Filter Bubble emphasizes our uniqueness, it acts as a centrifugal force – it pulls us apart from one another. With enough personalization the front page of Google News will be different for everyone, removing the kind of shared experience we used to have with a newspaper. Also, the Filter Bubble is invisible – we don’t know the maths behind how these algorithms define us. And with the increasing omnipotence of Google – it is difficult to not be a part of it.

So the arrival of Google+ social search marks a new era of ‘invisible autopropaganda’ that will continue ‘indoctrinating us with our own ideas’. What it will also mark is the start of a new form of marketing and campaigning – especially in the run-up to the 2012 US election. If I tap ‘Healthcare’ into Google I will be presented with the healthcare articles that my network has shared. Both the Democrats and the Republicans will have to fight to ensure that they have the right people inside the voters Google Circles.

Whilst we may still be at the dawn of social search – the correct techniques in this area could eventually make or break a campaign. Could 2012 be the year that Obama leverages Google+ to win the election?


What have the Pyramids of Egypt, the space race and Wikipedia all have in common? Apart from being great achievements for humanity they were all accomplished through small contributions by a massive amount of people. But what else can be achieved through massive-scale collaboration?

Luis von Ahn has already begun to answer this question. He is the guy responsible for re-CAPTCHA, the service that stops spam on webforms by forcing the user to enter a distorted sequence of characters. Re-CAPTCHA is different to the original CAPTCHA because it presents two words instead of random characters.

Whilst you have probably filled in a Re-CAPTCHA, what you might not have known is that by filling in the two words, you are helping to digitize the worlds collection of print books – something that computers struggle to do automatically.

How? The trick is that one word in the re-CAPTCHA is the security word that the computer knows, and the other is a digital image from a print book. You get the security word right, you are probably going to get the other word right as well. And if ten other people say that the word is what you have said it is – then we have one accurately digitized word.

Because re-CAPTCHA is so popular on websites, it is managing to digitize 100 million words a day – the equivalant of 2.5 million books a year.

Fascinating stuff! But what is more exciting is his next project – translating the web. Computer translation is not going to be perfect for at least another 10 years, and hiring professionals to translate the the non-Spanish Wikipedia pages into Spanish (only 20% of the English Wikipedia is in Spanish) would apparently cost $50 million – and that’s at almost slave labour wages.

Luis von Ahn has tackled this by connecting this problem with the 1.2 billion people that are learning another language. His website Duolingo offers people the opportunity to learn a language for free (language lessons are notoriously expensive) in exchange for their time in translating the web.

So, if ten people learning a language all translate a sentence the same way then it is going to be correct. And the people taking part are learning by doing! Although it is still only in testing phases – it is apparently a powerful language teacher and a really accurate way to translate content. If the site gets a million active users – it will be able to translate Wikipedia into Spanish in 80 hours.

To hear more check out the TED talk

In my last post I described some of the main metrics used in social network analysis graphs. In this post I am going to look at some of the important considerations regarding the look and design of a network diagram.

A social network can be very vast, and a network diagram can quickly become very cluttered and unreadable. Netviz Nirvana has been developed to combat this. It is a set of principles that can guide you in your graphing projects. Your diagram should come as close as possible to matching these requirements:

  • Node Visibility – Each node should stand apart and clear from all others – no node should occlude another node.
  • Edge Visibility – You should be able to count the amount of edges coming off from every node
  • Edge Crossing – The less crossings – the better. The more often an edge crosses over another, the more visually complex the image becomes, and the harder it is to follow paths.
  • Edge Tunnels – These are when a node lies on an edge that is not its own. The problem could lie with either the position of the node or the position of the edge.
  • Text Readibility – All text should be clear enough for a reader to read.
  • Text Distinction – All text should be appropriately truncated (use a key if necessary).
  • Clusters and outliers should be clearly visible and distinct.

These are all good points to keep in mind when producing a graph. I would add that you should be careful that all colours used are distinctive from each other, and that they shouldn’t clash (you want your diagram to look good don’t you?)

For a video that goes a bit further into Netviz Nirvana – click here

The last few years have seen the adoption of social networking increase rapidly. From Facebook to Twitter,  LinkedIn to Flickr – there is a social network for just about anything.

As the revolution of social networking continues unabated, there comes a growing need to explore patterns within the networks – a process called social network analysis (SNA)

Previously, the world of social network analysis could only be accessed with a bit of computing knowledge. However, an open source programme called Nodexl has changed that by bringing some of the important metrics used to understand a network, and the ability to create impressive network graphs, into Excel.

Nodexl makes understanding a social network graph easy for anyone who can navigate around a spreadsheet. Excel is often where the world of computer programmers and the rest of us can meet up and speak the same language. Nodexl also makes it easy to import data from existing social networks such as Twitter, Flickr and Youtube

The people that can begin to make use of network graphs range from marketers to activists – and I imagine they are now a staple of any well equipped social media political campaign. Using a social network graph you can (among other things):

  • Spot the trusted influencers in a network
  • Find the important people that act as bridges between groups
  • Uncover isolated people and groups
  • Find the people who seem good at connecting a group
  • Plot who is at the centre and who is at the periphery of a network
  • Work out the where the weakest points of a network are
  • Assess who is best placed to replace a network admin
There are two basic components of a social graph:
  • Node: In a social network a node will usually represent a single person – but it can also represent an event, hashtag etc
  • Edge: A connection/interaction between two nodes – such as a friendship in Facebook, a follow on Twitter or an attendance at an event or Twitter Hashtag.

One major question that a social network analysis asks is how connected nodes (or people) are. But what determines how connected any person is? What metrics can be used to work it out how influential or powerful any individual player is?

These are some of the major metrics used in Nodexl – and they offer a good way to start thinking about your own networks:

  • Centrality – A key term which refers to how ‘in the middle’ a node is in a network.
  • Degree centrality – a count of the number of nodes a node is connected to. This could be the number of people that follow you on Twitter, or the amount of people that viewed a YouTube video. It is important to remember that a high degree score isn’t necessarily the most important factor in measuring a nodes importance.
  • In Degree and Out Degree – A connection between two nodes can be undirected (we are mutual friends on Facebook) or directed (you follow someone on Twitter that doesn’t follow you back). The In-Degree refers to the number of inbound connections, and Out-Degree refers to the number of outbound connections.
  • Geodesic distances – A geodesic distance is the shortest possible distance between two nodes  (popularly known as the degree of separation). In social network analysis, a nodes shortest and longest geodesic distance is recorded (the longest possible distance between a node and another is sometimes refered to as its eccentricity and can be used to work out the diameter of a network). An average geodesic distance of an entire network is worked out to assess how close community members are to each other.
  • Closeness centrality – This metric determines how well connected a node is in the overall network. It takes into account a nodes geodesic distance from all other nodes. Using this metric you can find people that don’t have strong connections.
  • Betweenness centrality – A score of how often a node is on the shortest path between two other nodes. This can be thought of as a bridge score – how important a node is at bridging other connections. People with a high betweenness centrality are often known as key players. A node could only have a degree centrality of 2, but if those two connections bridge to large unconnected groups, then that node will have a high betweenness centrality.
  • Eigenvector centrality – This looks at how well connected the people you are connected to are. It scores how much of a network a node can reach in comparison to the same amount of effort enacted by every other node in the network.

I am going to be exploring social network analysis over the next few weeks and blogging what I find here – if you want to follow along make sure you follow me on twitter or subscribe for updates.

An understanding of SEO (Search Engine Optimisation) is important to anyone that creates content for the web. SEO is the process of making a website show up as high as possible in search engine results, which increases the chances of it being visited by the searcher.

The world of SEO is enormous, and there are many different methods available. Getting the right combination of SEO factors, or signals, is the key to SEO success. The exact recipe for how a company like Google performs search is a company secret, but is said to involve 10,000 different ranking signals!

So, whilst you could never hope to learn them all, here is an overview of some of the main methods that you can understand and start considering for your site.

Inbound Links

The number of inbound links (or backlinks) to your website is a key method that search engines use to assess the authority of your website. It is a major feature of Google’s PageRank search algorithm as well as other major search engines like Technorati.

When a website links to yours it acts as a ‘vote’ of authority and improves the SEO of your site. The more inbound links you have to your website, the more authority your site is given.

Some inbound links are worth more than others – so a link from from the Guardian or the BBC is worth much more than a link from a small WordPress blog.

Also, if the site that links to yours has similar content then the link is considered more relevant than from a site that features irrelevant content.

A further consideration is the anchor text of the link. Anchor text refers to the words that are visible on a page and constitute the length of the link. If the inbound link includes lots of relevant anchor text then it improves the SEO rating.

Blog Comments

When you leave a comment on a news site or blog, you are normally given the opportunity to include a link to your site. Whilst this link is largely ignored by the major search engines, it can still draw traffic to a website if the comment is interesting enough to warrant a readers further interest.


In order to show up in search engines, you need to know what words and phrases people are searching for. If you are running a beauty website, do people use the word ‘face cosmetics’ or ‘make-up’ more often? Ensuring that you have a high quantity of these highly searched for terms is key to driving traffic to your site.

Keyword Density is the percentage of your page that is made up of that keyword. So, if you have 100 words and 10 of them are the keyword then you have a 10% keyword density for that keyword. Whilst it may be tempting to continually repeat the key words throughout the site, it is important to strike a balance between repetition and good sounding writing.

Also – too high a keyword density will make Google think you are spam. A good rule of thumb is 3-6% keyword density. If you are struggling to find places to include the keywords, use them in place of pronouns.

Where these words are placed is also important. The main areas to place keywords are:

  • Domain Name – Google ranks pages with a keyword in the URL highly. Ensure that any blog posts have the dominant keywords included in the URL.
  • Headers – Don’t go for puns or clever titles when naming a page, but make them as explanatory and keyword filled as possible. There are several different types of headers and sub headers, and a search engine will look at all of them for clues to the sites content.
  • File names – If you are uploading media, make sure you name the file type an appropriate keyword.
  • Meta description – This bit appears underneath your site in search results, so make sure it includes the keywords. If you are blogging using WordPress, you can add a plugin that lets you write a meta description for each post.
Researching is an important part of finding the right keywords and here is a pretty comprehensive list of tools to use.


Having good quality content is the most important element of SEO.  Search engines have access to the amount of time users spend on your page and know if people are clicking links on your page or bouncing straight away again.

You need to ensure that you are offering something that other people are not. Content should be fresh, topical and relevant – this is the kind of content that is enjoyed and shared by people.

Keeping content fresh is also important for another reason. If a particular search term becomes unusually popular for a short amount of time, Google will work out why – and if you have content associated with the reason for this spike you will improve your SEO. This is called Query Deserved Freshness – read more here.

To find out more about SEO – check out the authoritative Search Engine Land

The EOL is aiming to be a single online resource cataloguing all life on this planet. Collaborating globally with many other collections, the site is working to provide a webpage for every single one of the 1.9 million species on the planet.

Each page will contain photo’s, sound-clips, videos, maps and articles written by experts and verified by the scientific community. There is also prominent ‘threat status’ section, letting the viewer know how endangered the species is.

On every page there is a dedicated community page which links to all discussions that relate to that particular species. Anyone can sign up and begin a discussion and all content is licensed under creative commons. It currently has 48,000 members who have already contributed towards the 634,000 images on the website.

The ultimate goal is to ‘make high-quality, well-organized information available on an unprecedented level.’

Nation of Neighbours provides a simple set of tools that enables citizens of a community to communicate with one another. It is a development of the existing Neighbourhood Watch scheme and allows citizens to share information on local crime, report suspicious activity and voice concerns about the community.

Any registered member can submit a report about their local area. A point based ranking system determines whether a member can file a report straight onto the website, or if their report has to go into a queue. Items in the queue are moderated by active members who have accumulated enough ‘stars’.

Any person can register their local community (US only at present). Members can receive alerts whenever there is a new report published that matches their alert criteria via email, text or RSS.  There is the option to publish local news and events, share photos and discuss community issues.

The hope of the project is that it will increase social participation and strengthen the sense of neighbourhood whilst helping local authorities keep in contact with the community and reduce crime. Plans for the future include an API that will enable Nation of Neighbours to be incorporated into existing community websites.