These are ‘bots’ that search engines use to scan through the world wide web looking for new websites and pages to add to their database. For spiders to do this successfully it’s really important the websites are easily crawlable. Sites with broken links, bad URL structures etc can cause spiders to miss out and your pages might not end up on the search engines.
What the spiders will also do is check the contents of the page to see if it’s suitable to appear on the search engine and also what kind of keywords it will show for.
To use a search engine, you enter a keyword into the search bar and it will provide you with a list of results. How a search engine determines what it shows you are a complicated process. The first process is the when it thinks about relevance. The search engines have to crawl your website. This is the search engines analysing your page to see if it’s suitable to show.
For example, say you have a piece of content on your site that is talking about a certain jacket. If that is made clear throughout the page then the search engine can crawl the page, see what the page is about and decide firstly if it’s going to index the page and secondly, how high on the search results it will appear for the keyword ‘jackets’ and any other keywords related.
Rules & Guidelines
Every search engine out there has certain guidelines that they need every website to follow. As you know there are obviously certain things out there on the internet that are illegal etc and should not be made easily accessible by a search engine. There are also people who use certain spamming techniques to try and trick the search engine into providing people with there website above others when it isn’t really justified. It’s really worthwhile reading into the guidelines of any search engines that you want to rank a website on so you get a better understanding of what they are looking for from you.
How do the search engines work? So, much-asked question by everyone. No one knows exactly how the algorithm works or anything like that, but you see people googling how do the search engines work? People see using the three-step process. There are lots of different search engines out there, Google being the main one.
Basically if you look at this, it will tell you the crawl and index and the algorithm are how it all works. What I want to do is talk about a few of those things and how things work. Obviously, your website has to be built properly, first and foremost. You have to have a website that Google can crawl, index, and the bot can go in and out and check out what you’re doing and crawl and index your pages. So you’ve got bots or Google bot, if you like, if you’re asking the question of how Google works. Different search engines, and even search engine optimization tools such as SEMrush or Ahrefs. They all have bots which crawl websites and obviously check out the pages, crawl all the content, energies and everything else that’s on there.
That is essentially what Google do, the bot goes about looking at websites. The first part is web crawling. So that’s one thing, so you hear people mentioning that the bot, spider or whatever, crawler, it’s the same thing. What we have is a tool called Google search console. It’s wise to install Google search console first and foremost purely … well, not purely because it does a lot of different things. But you know, if you want to see how Google’s crawling your website, what it’s indexing, if there are any problems, then the search console can give you all that information.
So as I said, Google you want to crawl, and then what it should do is index the pages within your website. Now, crawling is one thing. Allowing Google to crawl your website is one thing. Getting the context indexed is another. Quite a common problem is people who copy and paste content. Now, if you copy and paste content from another website, or a supplier’s list or whatever it may be, and put it on your own website, the chances of that page being indexed are slim depending how much of it you copied.
So that’s where you can go into your search console and you can see how many pages are indexed. You can also see how many pages are excluded. You know, and the certain things within a website that you would want to exclude from Google’s index such as search photos and various other certain things. So for eCommerce websites you want to make sure that those pages are not indexed or products that’ll get maybe the description, you know, varying sizes of products.
So for example, for this black shoes, you may want the black shoes page to be crawled and indexed, but you know, you might … if the content’s gonna be the same on various other aspects of that shoe, other variants of the products, then you’re gonna get penalized for duplicate content so you might not want all the kind of product variations indexed. So if it’s black shoes and they come in a size six, seven, eight, nine, ten, then you might not necessarily want six, seven, eight, nine, ten indexed. You know, having one of those pages indexed is more than enough. Then obviously the customer can select the size they want. But in terms of Google you would maybe want to filter out some of those product variations.
But as I said, the search console will allow you to see what’s indexed, see if there’s any problems on there. But in terms of duplicate content, there’s a tool out there called Copyscape.com and you can easily run your website through here by sticking in your domain name. What would happen is state your domain name, and what Copyscape will do is show you other pages that have similar content on the website. So you can see here, someone’s clearly taking content from Amazon’s website for this 100 [Days of Food 00:04:42].com. That’s what not to do if you want to rank well.
So the [inaudible 00:04:49] Google is clever. It can obviously filter out duplicate content and it basically throws it in the bin if you want to use layman’s terms. So that is how search engines work. They crawl, then index, then obviously that algorithm and whatever else they’re looking for will be taken into play. And if you, first of all, get crawled and indexed, you’ve got a chance of ranking. Whether that’s on page one or page ten, you’ve got a chance of ranking well. But obviously, they are gonna take other things into consideration. It is very unlikely that you’ll rank well on content alone unless it’s a very, very niche market you’re working in or some non-competitive local area.
So that’s how search engines basically work in layman’s terms. Obviously the mechanics and everything else that goes in behind Google servers, you know, it’s a lot more complex. If you want to understand how all that works I’m sure there are core, seasoned people you can talk to such as Don Anderson and various other tech SEO’s who are completely obsessive about how these search engines work and do a lot of research on all that kind of stuff [inaudible 00:05:59] trying to get to the bottom of it. So they’re super smart people and these guys are [inaudible 00:06:08] technical stuff. But that is how the search engines work in layman’s terms.
So the crawling, the indexing, and then getting your website ranked is the key part. But Google search console is something you do definitely want to install from the get go. You can have an overview of your website, performance. It does give you a whole heap of other stuff there as well, such as if you’ve got manual action taken against you. Mobile usability, if you’ve got Site maps, and various other things as well. So you can see my site map here, when Google last read it, all that kind of stuff.
There’s a whole heap of other stuff in here that you can have a look at, and what should allow you to get much more performance out of your website. And obviously it will flag up certain errors and stuff as well, so you always want to make sure your website’s error-free, so make sure that you do have a look at that, and as I say, that is pretty much how the search engines work in layman’s terms.