ABSTRACT:
Organizations across the world are facing a new challenge; to efficiently manage the growing volume of static and/or dynamic information accumulated within the enterprise. If managed successfully, large volumes of data need not be a burden. Indeed, significant benefits can be drawn from a new generation of content management solution that leverages semantics (the study of meaning) to improve the way information is used.
By using such technology, organizations working with large volumes of data will soon realize dramatic cost reductions, revenue improvements and opportunities for gaining competitive advantage. In this regard contends that Semantic Content Management offers enormous business benefits, both in terms of cost reductions and increased revenue/competitive advantage.
In Part 1 the author first defines the meaning of ‘content’ and ‘management’. He then considers the business-value of content and the necessity to add meaning to content objects. Technological aspects of semantic content management are then reviewed together with the advantages associated with the ‘next generation’ solution. Next the value of having open standards is discussed, followed by an overview of business benefits generated by semantic content management. Finally the use of semantics to increase business value is explored.
Introduction
With an explosion in the amount of information made available to us as individuals, our world is often characterized by increasing complexity. Most of the time this wealth of information is considered key to the welfare of both individuals and enterprises. However, handling massive information streams is not a trivial task; on the contrary, it requires a sophisticated IT environment that employs the correct tools and well-chosen standards to offer the freedom and ability to face the content management challenges of tomorrow. This is further magnified by the fact that content is delivered in many forms and via many channels; including print, Internet, intranet, extranet, email, SMS, WAP, 3G, PDA’s and Digital TV to name but some.
There are already several well-reported examples of businesses successfully leveraging their information resources. In February 2000, CIO Magazine reported that Pfizer Inc., by reworking existing information relating to pharmaceutical development, was able to reduce the time to market for new drugs and generate additional revenues of $142 million over a four-year period. There are also an increasing number of companies making money directly from the sale of their content. These include the Wall Street Journal’s WSJ.com, which has generated over $40 million, and the Finnish financial news portal Kauppalehti Online, which generates almost two thirds of its revenue from the sale of information-based content
In the gold rush to service the needs of organizations, software vendors have begun offering different solutions, based on different standards. This has led to further confusion as to the exact definition of content management:
· What do we mean by content? Content can be as diverse as film, audio, SMS, email and news streams
· What is ‘management’? Is it storage, editing, web site structure and/or workflow authorization procedures? Or is it something else entirely?
Given the challenges described above, it is clear that organizations able to successfully manage content possess a valuable asset.
The Value of Content From A Business Perspective
Naturally, efficient content management is of particular importance to industries where information is sold as a commodity (media, telco, web etc.). In such industries the following factors must be considered:
· Many sources of content in many formats
· Many delivery channels
· A need to avoid information bottlenecks
· Easy content reuse (e.g. in new products or services)
· Volume and speed, often in real time
Financial companies, and large to midsize industrial enterprises (such as pharmaceuticals), face very similar problems, even when they do not directly sell information as a commodity. Efficient, highly targeted communication is essential for successful Customer Relationship Management and Supply Chain Management systems.
Customer service organizations also face complex content delivery scenarios, as do companies selling primarily to ‘invisible’ customers; for example real estate agents or supermarkets. Finally, the public sector and some service companies depend on good content management to be effective in their business.
In any of these environments there are three common stages to content management:
1. Content sourcing: Includes tasks such as authoring, collecting and editing the information from external and internal sources and systems. Of extreme importance here is the issue of classification. Information objects not accurately described are difficult to manage and publish. In other words, the less effort spent during content input, the higher the overall costs – or the lower the success of the following two stages.
2. Administration: Where possible, this function should be automated. Without automation, bottlenecks may occur, bringing the entire cycle to a halt. This function is not trivial; the sheer volume of information passing through a content management system is simply too much to be dealt with by humans in a reasonable timeframe. Other issues of importance are ease of assigning users’ information needs and access rights.
3. Publication: Static publication of content takes many forms: Web pages on inter-/intra-/extranets, print, TV etc. Although effective, this form of publication is not targeted at the customer or even a customer type. Dynamic publication may use web technologies or new ‘push-media’ such as email, SMS, WAP and 3G. Effective reuse of (reformatted versions of) content is essential, but is highly dependent on the accuracy of the object’s classification.
One issue has an overriding influence on the cost/benefit of a content management system; the accurate classification of content and its ease of reuse. Any means of streamlining and improving this central process will have a knock-on effect on the subsequent two stages; delivering reduced labor costs, and improved content reuse across distribution channels (‘Create once, publish anywhere’).
Surprisingly, only a small percentage of a product’s functionality is ever used. Many content management products originate from a web site administration or a document management background. Such products offer only basic keyword searches (like traditional Internet search engines). Although many sophisticated techniques are used, such as word frequency counts and lexical analysis, this method is not sophisticated enough to go beyond basic information searches.
Content objects by themselves are just that,
objects. To make an object more useful one needs to introduce meaning.
A normal web page or document-based search engine
typically returns far too many results to be of use; often a large percentage
of results are not relevant to the original request. Users must be able to
interrogate content based on its exact meaning. Take, for example, a
basic search for Woody Allen. The results do not tell us whether they relate to
Woody Allen as a director, actor or both. Such methods are one-dimensional. The
software is simply searching against a list of keywords, unknown and unrelated
to one another.
It is this inability to exactly exclude or limit
searches that is lacking from today’s web and content management solutions.
Only by introducing meaning can content be optimized. This can be achieved through semantics, a highly complex discipline defined by Webster’s Encyclopedic Unabridged Dictionary of the English Language (1994 edition) as, ‘the study of meaning’.
Although semantics is a term derived from the world of linguistics it can be compared to the IT term ‘metadata’, a word to signify data that describes data. Metadata can, for example, be used to describe the meaning of content stored within a data warehouse or an online catalog. Such technology can, therefore, be used as the mechanism for building large, structured descriptions of content within enterprise systems.
As the volume of information continues to explode, the need for efficient and targeted automation of data capture / delivery is becoming more and more apparent. Without such systems in place, users will ultimately spend more time searching for information than actually consuming it. There is only one way of enabling computers to deal with information in a meaningful way and that is to describe it in a precise, machine-readable format. This can be achieved using metadata.
Using metadata also elegantly solves traditional problems of scalability. In many instances a user only needs to know that a piece of information exists. In such cases it is not necessary to consume bandwidth by sending the user the entire content object. Instead, the associated piece of metadata would be sufficient.
As we have learnt, semantic content
management is about managing content objects based on their properties. The
objects can be of any type and the meaning of their properties can be recorded
within metadata descriptions. These metadata descriptions are like library
index cards meant for machine readers, not human readers. The metadata
expresses the semantics according to the business environment. It could be customer
codes for tenders, personal identities for digital images and artist names for
audio files.
The technologies for managing this are based on XML and RDF:
· XML (Extensible Mark-up Language): An open market, non-proprietary standard for defining, validating and storing structured data objects by expressing these objects as tagged text. XML is a subset of an earlier mark-up language, SGML
· RDF (Resource Description Framework: A declarative language that provides a standard way of using XML to represent metadata in the form of statements about properties and relationships of items. Such items, known as resources, can be almost any type of object. On top of this you find RDF Schemas, which describe metadata vocabulary sets. A schema defines the meaning, characteristics and relationships of a set of properties, and this may include constraints on potential values and the inheritance of properties from other schemas. Within a schema, the meanings of terms are spelled out in detail, enabling independent communities to share vocabularies
Only systems, which, by design, are implemented to support those standards are capable of taking full advantage of semantics. In a semantic content management system the input is described by metadata whilst Active Query Agents continuously traverse the semantic network and try to match the information needs of users / customers with the information patterns available in the semantic content management system. Information is pushed to users / customers based on their profiles and on their preferred delivery devices (Web, email, mobile handsets etc.).
There are several major advantages to this next generation of content management solution:
· Automation: Only metadata enables computers to deal with information in a meaningful manner, actively enabling a new, higher level of automation.
· Flexible Reuse: Enabling easy development of new services. Content can be easily reused in many different publishing contexts and using new media as it becomes available.
· Quality: Because results from semantic queries only return meaningful results (no ‘spamming’ or ‘information overload’). By its nature, a collection of structured metadata only gets better as it grows – this, in contrast to the confusion created by a large collection of objects having little, or no, meaningful semantic structure (such as the Web today).
· Ease of use/ implementation: Complex queries are much easier to express, leading to easier development of more advanced applications.
· Interoperability: Content can be exchanged between different parties because its meaning is expressed using technology based on open standards. Peer-to-Peer content networks can be established.
· Location and Storage Independence: Since the meaning of the content object is described using metadata there is no need to move content objects into specialized storage with specialized search facilities – this has a dramatic effect on the cost structure of content providing systems
· Format / Type Independent: For the same reason as above, there is no limit to the kind of content that can be managed – any content object can be described by metadata, including non-textual objects such as pictures.
· Best of Breed Approach: A semantic content management solution may eventually be composed of products from several software vendors, as long as the products are using the same, open standards. The days of proprietary architectures and vendor lock-in are numbered.
· Scalability: Since all the system needs to manage the content is the metadata, and since metadata is compact, a semantic content management system can handle very large numbers of content objects without scalability problems.
· Open Standards: Basing strategic solutions on open standards is the best investment protection possible. At the same time, open standards make it a lot easier to acquire qualified staff, additional software tools, training classes etc. These all result in significant cost reductions.
The Value Of Open Standards
“The great fortunes of the information age lie in the hands of companies that have successfully established proprietary architectures that are used by a large installed base of locked-in customers. And many of the biggest headaches of the information age are visited upon companies that are locked into information systems that are inferior, orphaned, or monopolistically supplied” (Shapiro & Varian, 1999).
Allowing information to flow freely and interact both on the input and output side is critical to the success of any environment that hosts a content management system. In order to establish such a highly efficient, communicative content management system we must look to the use of open standards.
On its grandest scale, the use of open standards is key to the success of the World Wide Web. An organization called The World Wide Web Consortium (W3C) is dedicated to promoting the evolution and interoperability of the Web by developing common protocols such that the next generation of web technologies are capable of communicating in a meaningful way. This initiative is called ‘The Semantic Web’ (Scientific American, 2001). W3C is a non-profit consortium founded in 1994 by Tim Berners-Lee, and its current host institutions are Massachusetts Institute of Technology, Institute National de Recherche en Informatique et Automatique, and Keio University of Japan)
The two main standards to consider within the semantic web are XML and RDF, both of which were described previously:
· XML (Extensible Markup Language): A W3C standard for syntax, which is very useful for assigning metadata descriptions to objects
· RDF (Resource Description Framework): A W3C standard for defining metadata and its structure. Where XML is the ‘syntax’, RDF is the ‘grammar’
XML is in widespread use today. RDF is rapidly gaining ground, and is utilized in many different contexts and products. In the publishing industry for instance two industry-standard sets of metadata (Dublin Core and PRISM) are defined and described in RDF. Other examples include: RSS (Rich Site Summary) – a scheme for Web site categorization and CC/PP – a forthcoming standard that, amongst other uses, will describe the capabilities of next generation telephone devices.
RDF is a solid foundation for adding metadata to content objects in the
authoring phase. For example, Internet browsers Mozilla and Netscape 6 use RDF.
Another very good example is Adobe Acrobat 5, which is using RDF as its
metadata language. This is part of Adobe’s XMP (Extensible Metadata Platform)
framework, in-turn the backbone of Adobe’s approach to network publishing. In
Adobe’s own words: “XMP provides Adobe applications and partners with a common
metadata framework that standardizes the creation, processing and interchange
of document metadata across publishing workflows. XMP will be incorporated into
all Adobe products eventually”.
An Overview of the Business Benefits
Some benefits arise from cost reduction, others from increased revenue. Significant competitive advantage may also be gained. A summary of possible business benefits is shown in the table below:
|
Reduced costs |
Increased revenue |
Competitive advantage |
Automation |
Less labor intensive |
Can handle more users / customers |
Faster to the market |
Flexible reuse |
Little or no manual intervention |
New products and services as the market evolves |
Deliver new products and services sooner |
Quality |
Less time spent on administration; increased reliability |
Satisfied users / customers |
Better prices and reputation |
Ease of use and implementation |
Short implementation time and low costs |
Deliver quality products and services with good prices |
React quicker to market changes |
Interoperability |
Little or no manual intervention in incoming or outgoing content streams |
Content syndication and repackaging of external sources for more comprehensive solutions |
A lot easier to interoperate with partners’ and customers’ systems |
Location and Storage independence |
Reduce machine, software and network bandwidth costs |
Lower the bar for providing new solutions that are closer to real needs |
React quickly to changes without having to wait for database conversions. |
Format and type independence |
Reduce implementation costs |
Take advantage of new formats and specific customer needs |
Can support the formats that customers need, quickly |
Open Standards |
Investment protection, easier to get specialist staff, software tools and training for standards-based products |
Flexibility in choice of client interfaces, support any customer using standard products |
Beat competitors, who are locked to proprietary architectures |
Best of breed approach |
Reduce software costs |
Tailor your solution to market needs |
Quickly integrate new infrastructure products |
Scalability |
Reduce IT investment |
Grow with the business |
Achieve large market shares, quickly |
Using Semantics To Increase Business
Value
The author believes that semantic content management is particularly well suited in environments with one or more of the following characteristics:
· Business value of content is high
· A need for complex searches
· Rich content objects with many properties
· Dynamic content
· A need for real-time publication
· A need for multi-channel delivery (and/or capture)
· Many content provider sources
· High volume
· External content feed and/or delivery
Maybe the most important aspect from a business perspective is the ability
to ‘create once, publish anywhere’. For example: A provider of financial
information can deliver chargeable
subscriber services allowing professional clients to receive real-time
information that they can further distribute to several Intranet and Internet
sites, as well as to clients’ email and mobile devices. The same content could
be delivered directly to end-users, and also indirectly in partnership with
large companies, for use on an intranet. Delivery of such content can:
·
Reduce production costs
·
Reduce deployment time of new service(s)
·
Increase loyalty of end users
·
Increase usage of a cellular network = increased revenue
·
Offer high quality services, which attract new professional customers
Semantic content management offers enormous business benefits, both in terms of cost reductions and increased revenue/competitive advantage.
Only by acting accordingly and taking an interest in semantic technology will organizations be in a position to reap the financial rewards and benefit, indirectly, from reduced labor costs, reduced IT investments, shorter time to market for new products/services, increased quality of products and services and competitive pricing.
References
Scientific American, 2001, http://www.scientificamerican.com/2001/0501issue/0501berners-lee.html.)
Shapiro, C. and Varian, H.R., Information Rules, Harvard Business School Press, Cambridge Mass., 1999
Michaël Auffret is employed at Profium (http://www.profium.com/) and can be reached at michael.auffret@profium.com