Search engines
Since the Web has been widely adopted, search engines have existed to help users navigate the endless amount of information that is contained online. Today we look at the basics of search engines.
Robot powered
- Search engines use "crawlers," "spiders" or "robots" -- software that records source code for webpages found by navigating links.
- The spiders' findings are compiled into a master database, called an "index."
- The results are then evaluated by search engine software, which determines the relevancy and placement.
- The results are what appear in online searches.
Evaluation by algorithm
- An algorithm is a formula for solving a problem.
- To evaluate billions of webpages an algorithm is created to maintain of set of rules for judgment.
- These operate in unique ways but usually read title, headers, body text, links, meta elements.
- Location sometimes matters: the closer to the top of the page content, the more likely a search engine will consider content relevant.
PageRank
- To determine importance of site, Larry Page and Sergey Brin's algorithm measured number of links to a site and the importance of those link.
- Existing model: scientific citations. The more references to an article, the more important -- the same logic could be used to evaluate site importance.
- PageRank, Google's algorithm, is tightly guarded and frequently updated, but relies on basic criteria.
PageRank criteria
- First, it looks at how many other sites link to the page (and what terms are used).
- Second, it looks at the linking site/page and determine how many sites/pages link to it.
- The linking sites are measured in terms of importance and are given more weight.
- Thus, a link from heavily trafficked and linked-to site will have more importance than a link from an unknown, rarely linked-to website -- this will reflect on its PageRank.
Google's dominance
- Due to PageRank and their advances in Web advertising, Google dominates the search market and has for a decade.
- Like any massive organization, problems exist at Google, including claims of violating antitrust protection, similar to Microsoft during the 1990s.
- Two issues regarding search results are relevant today: trademark lawsuits and click fraud.
Trademark lawsuits
- Traditionally, when one introduces a trademark, the individual controls how it is used and may sue those who use it without permission.
- In the search industry, ownership of terms is more flexible.
- American Blinds and Wallpaper Factory -- do they own "american blinds" (settled)
- Earlier, Playboy v. Netscape (settled) -- "playboy" and "playmate"
- "Cruise ships" = Oceana environmental advertisement (2004)
Google bombing
& spamdexing
- Google bombing and spamdexing use search criteria to ensure a concept or term is highly ranked on Google.
- Google's algorithm considers linked text phrases when ranking search results.
- If enough pages link one term to one page, that page will appear higher in the rankings than others.
- When Google's index is manipulated by unscrupulous SEOs, it is referred to as Google bombing or spamdexing.
- Both concepts use "link farms" -- communities of individuals linking pages to each other, using certain terms across countless pages.
- Examples: Election 2004
- "Complete failure"
: Top result was whitehouse.gov
- "Waffles": To John Kerry's site
- "Santorum": A conservative politician (currently running for the GOP nomination).
- Other Google bombs have generated controversy.
Click fraud
- The practice of generating advertisement clickthroughs on Google's ads that appear in site content (known as AdSense).
- Because the site owner benefits from clickfraud it is possible to create sites that contain only AdSense content and repeatedly clicking on the links.
- Sometimes accomplished with either low-wage workers or more commonly with a robot that simply scans the sites mechanically clicking on the link.
SEO strategies
- Choose the <title></title> text carefully -- this is given disproportionate weight since it is used in Google's search results.
- Keywords that reflect your site's content should be in other prominent places -- URL, headers, meta tags.
- Link between pages on keywords within your site; ask others to do same on other sites.
- Use ALT attributes on images for descriptions.
- Link to your content on social media platforms.