Search engine have become more advanced with the methods that they use to index webpages by over the years, becoming more sophisticated, more trusted, and more technical. The first generation of search engine crawled the Internet indexing words on each page; this could easily be hijacked by filling pages with the same words thus achieving a high ranking for when people searched for that term. This was a technique used by porn sites two-drive traffic to them.
Google, Alta Vista and Yahoo started innovating in the field; building their indexes not based on word counts on pages but instead using such metrics as links to the web page, and how trusted the source of the link could be. This lead to more reliable results, taking into account aspects of reliability and trust from content sources on the Internet. However when indexing content there is limited understanding by the system about what that content is about, not being able to determined if it is a person, an event, or a location which is being discussed in the content being indexed.
These questions are trying to gain semantic information from the web pages, allowing computers to greater determine the meaning of the content. This builds on the definition of Web 3.0
Web 3.0, a phrase coined by John Markoff of the New York Times in 2006, refers to a supposed third generation of Internet-based services that collectively comprise what might be called ‘the intelligent Web’ — such as those using semantic web, microformats, natural language search, data-mining, machine learning, recommendation agents, and artificial intelligence technologies — which emphasize machine-facilitated understanding of information in order to provide a more productive and intuitive user experience.
With semantic web deign defined by W3C as
“The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries.” and Tim Berners-Lee described it as ”If HTML and the Web made all the online documents look like one huge book, RDF, schema, and inference languages will make all the data in the world look like one huge database”
There is a movement to create ontologies for the Internet; this is being achieved by the creating of OWL (Web Ontology Language). These languages would be ways of representing knowledge and data within specific domains. The languages are based on standardized RDF/XML serialization and by formal semantics, which has all been enforced by the W3C. This way of reining information and knowledge has been embraced by the medical industry
SPARQL(SPARQL Protocol and RDF Query Language) is a query language for the semantic web. It is a RDF query language for large databases that can retrieve and manipulate data stored in RDF frameworks. It is one of the main languages that is used to query ontologies, and has been implemented in many languages, along with tools allowing it to be translator into other query languages such as SQL and xQuery.
Several large institutions within the UK have implemented endpoints in their systems that allow for querying of data with SPARQL. These include the UK government at http://data.gov.uk/sparql and the University of Southampton http://sparql.data.southampton.ac.uk/.
As said above it is about referencing knowledge and information between objects and domains. This is where RDF comes in as it was developed by academic for artificial intelligence, where everything has to be cross-referenced to gain meaning. However as it was made by academics it is harder to understand than XML or JSON as it was not designed for readability. An example of how this form of system can be used is the “I want to sell / I want to buy” example. If a person is looking for a phone they would use a application which would search though the internet like a crawlers does for a search engine looking for RDF files which contain the information about people wanting to sell a phone. This would be done in real time. However say you want to sell a phone you would again use an application, which would create the RDF files, however this would also have to include information such as pointers to the information about the manufacture of the phone, potential dealers of the phone, components within the phone, all tying it into the sematic web. There are several issues and questions including: how do we find the files, how do we search, what if people call things by different name, this is where the Ontologies come in, this allows for look up for relevant information about a topic, allowing the application to identify similar or relevant knowledge on the Internet.
I personally fell that this implementation of a true semantic Internet is a long time off. Yes there will be stores that store information in RDF format, and yes there will be end points for specialist SPARQL dataset such as governments and open data. However I person feel that the current form of the Internet is too well established to move on quickly. If we do move to the semantic Internet we are going to need a number of large players within the Internet to start adopting and developing for it, companies like Google, Microsoft, Facebook, and potential trust companies like VeriSign.
I do feel there will be a push to start using the technologies for use in social networks as data protection and privacy become an issue, as technologies mentioned in the semantic Internet would allow for people to potentially store there own data, but still allow it to be used and discovered, turning currently ‘big data’ into self managed big data overcoming common privacy issues which are currently seen with big data.
If it is implemented end users may not realise as it would more likely be implemented at the data layer, potentialy allowing a few more features to the end user, but mainly offering benefits to data manipulation, search and storage. This will most likely mean that if and when it gains ground Web 2.o will still exists. Web 2.0 will become the easly accessible form of development for newbiew on the Internet, where has Web 3.0 will be used for the big business system which can invest in the technology which will ultimately redefine the meaning of Web 3.o in a post hoc definition. Thus currently the moment of Web 3.0 is firmly bound in specialised implementations until the technology is accessible enough for mass deployment.