CanCore: Metadata for Learning Object Repositories

Norm Friesen, University of Alberta (Canada)

Abstract

"Metadata", "repositories," and "learning objects" are terms used to describe a new vision for the widespread sharing and reuse of digital educational resources for classroom and distance education settings. The purpose of this paper is to provide an overview of this vision, focusing specifically on the central role of metadata in facilitating the discovery, reuse and management of these learning objects in public education contexts. This overview will begin by introducing the concepts of "learning objects", and the "metadata" that can be used to describe them. The paper will then focus on the Canadian Core Protocol (CanCore) as a ready-made metadata solution with the potential of providing single-click access to distributed educational resources. Finally, it will describe how the Canadian Core can achieve this type of accessibility in the context of a variety of repository architectures and distribution models.

Keywords

Metadata, Learning Object, Repository, Educational Object, Web Gateway, CanCore, IMS

CanCore: Metadata for Learning Object Repositories

Introduction

The difficulties involved in the effective discovery, evaluation, use and reuse of Web resources in teaching and learning now form a familiar litany. Issues like non-searchable multimedia materials, content and quality controls, pedagogical effectiveness, and resource reusability have plagued the development and implementation of learning materials on the Web from its inception. The Canadian Core Learning Resource Metadata Protocol (CanCore), together with special repository systems and more formalized understandings of learning resources or "objects," represents a solution to these problems. This solution involves the reconceptualization of digital learning resources in general as smaller, modular learning objects, systematically described through standardized, searchable metadata, and collected and made seamlessly accessible to users through a system of linked repositories. This paper will provide an overview of this vision, focusing specifically on the central role of metadata and the utility of the CanCore Protocol.

What are Learning Objects?

As the combination of two important terms, the phrase "learning object" designates something that is at once an informational or interactive object and that also has an evident educational application. Derived from the world of object-oriented programming, the term "object" connotes a resource that is modular, reusable, and capable of being integrated with other objects. Although the literature differs on its exact understandings of the modularity and "combinability" (or interoperability) of these objects, it reflects a general agreement on the importance and rationale associated with their reusability: Roschelle and Kaput (1996), for example, argue that conventional "stand-alone applications are incompatible with typical production, distribution, and usage patterns for educational software...." As a result, educators and developers, as a Cisco Systems White paper argues, "need to move from large, inflexible 'courses' to reusable, granular objects that can be accessed dynamically through a database" --theoretically to be shared and reused by any number of users (Cisco, 1999) [HREF 3].

The word "learning" appearing in the term "learning object" implies that such an object cannot simply be a free-floating, decontextualized element of information or interactivity. Instead, to be properly educational, a learning object must have at least one specifiable educational purpose or context. This context can take the form of a lesson plan, a quantifiable outcome, or even a very specific conceptual and technical framework that places the object in a specified sequence or algorithmic "syntax." The more specific conceptual or technical frameworks used to define an object's pedagogical context seem to be of greatest interest to those involved in the design of commercial and military training programs. This approach is well represented in the works of David Merrill [HREF 4], David Wiley [HREF 5], and in systems and specifications being developed by industry players like Cisco Systems, the IMS Global Learning Consortium [HREF 6] and the SCORM initiative [HREF 7]. However, in the public education sector --the area on which this paper will focus-- understandings of the educational purpose and context associated with any learning object tend to be more general and flexible. Besides taking the form of lesson plans or suggestions for educational use of the object, these more general educational "contextualizations" can also be represented by student assignments or questions, or by end-user or peer reviews of the object. In these circumstances, further contextualization is often provided by specifying the learner level, and the curricular or disciplinary attributes that can be associated with the object. As a result, a particular object could conceivably have a number of such "contextualizations", lesson plans or reviews associated with it --each serving the purposes of a different grade level, curriculum, or instructional design.

When understood in these general and flexible terms, a learning object could conceivably take the form of any one of a number of digital media. These media can include Java applets, Flash animations, and audio and video clips; but they can also take the form of more exclusively "informational" materials like Web pages, Web sites, PDF documents, or PowerPoint presentations. Any of these resources could conceivably be used by teachers to augment classroom or online lessons, by students for remedial or independent study, by instructors or designers to construct online courses, or by administrators for purposes of curriculum coordination.

Finally, in discussing learning objects, it is important to mention the question of "granularity". This term refers to the size or the number of sub-components that combine to make a given learning object, and to the relative size of the object itself. For an object to have a specifiable educational purpose, its granularity must not be so "fine" that its educational purpose is unclear or endlessly variable. (For example, a photograph of historical significance could be used in learning or teaching the history of photography, the history of any type of subject matter depicted, or for the aesthetic achievement that photograph itself might represent. But unless it is specified as being appropriate for at least one of these purposes, it would likely remain unassociated, un-classified and thus unused in the case of all of them.) At the same time, optimally the size of the object should not be so large (e.g taking the form of an entire course or program) that it has little potential to be adapted for different instructional contexts and purposes.

What is Metadata?

For learning objects to be effectively shared and reused, a user with a specific need --in a particular subject area, with a given learning level and learning style-- must be matched up with the learning object that would most effectively and efficently meet that need. In addition, this object would in all likelihood have to be identified from a large and varied collection of related resources distributed across a variety of locations.

Metadata promises to accomplish this task by providing a controlled and systematic way of describing each object. In this sense, a collection of metadata elements or a record describes and links to a learning object in much the same way as a library catalogue card categorizes and indicates the location of a book in a library. And because learning objects are digital --i.e. composed of digital data rather than print-- this descriptive data is specifically known as "meta"-data. Like a library card, a metadata record covers a number of aspects of the resource that it is describing. In the case of both types of records, these aspects or parameters can include the resource's content, creators, and format, as well as its location, access rights and any administrative information that might be associated with it (see Figure 1, below).

Figure 1: A library card as an example of metadata.

It is also important to understand learning object metadata in the context of more recent developments connected with the World Wide Web and with information sharing standards in general. The Internet and the World Wide Web are exemplary instances of what information standards, when universally and consistently adopted, can accomplish. Anyone who remembers the days of exchanging (often mutually incompatible) files on floppy disks can attest to this. But even though the Internet has provided us with standards such as HTTP, FTP and HTML that greatly facilitate data sharing, it provides no reliable way to search this shared data --especially according to aspects that would be of direct relevance to educators (such as age level or media type). Anyone who has recently searched the Web using a general purpose search engine, and sifted through the thousands of results of questionable value that one typically retrieves, can attest to this fact. This problem is further exacerbated by the expanding use of unannotated, bandwidth-intensive Web-based multimedia. Seen in this context, metadata is an attempt to build on existing Web standards in order to enhance the type of data sharing that these standards already so effectively facilitate. For it is only though structured, consistent and systematically descriptive metadata --and the powerful searching on multiple informational elements that it makes possible-- that the efficient discovery, sharing and reuse of high-quality multimedia learning resources can become a reality.

There are two different metadata solutions or standards that take different approaches to the definition and structure of learning resource description. These are the Dublin Core [HREF 8], and the IMS Learning Resource Meta-data Information Model (or simply, "IMS") [HREF 9]. The Dublin Core takes a "minimalist" approach to metadata definition, and identifies 15 core attributes or "elements" for the description of learning objects and digital resources in general. These include aspects common to a wide variety of digital resources, such as title, creator, subject, description, publisher, contributor, date, and language. However, this "core" does not include any elements identified specifically for the description of pedagogical aspects of learning objects. (Dublin Core has meanwhile provisionally provided six educational "extension" elements. However, these are only at the proposal stage, and a number of them are simply endorsements for the use of elements already identified in the IMS standard.)

The IMS "Learning Resource Meta-data Information Model," on the other hand, takes a "structuralist" approach to Metadata, and provides elements describing an exhaustive set of characteristics that a digital learning resource might manifest. The model consists of about 80 elements, structured hierarchically in four levels to form nine main groups, and 16 sub-groups. Due in large part to the number of major industry players supporting it, the IMS standard has been widely recognized as the leading metadata solution for describing learning objects, and is being used in international repository efforts like MERLOT [HREF 10] and ARIADNE [HREF 11], as well as in the U.S. Department of Defense SCORM initiative. However, it is also generally recognized that the IMS specification has created difficulties for implementers. And this is generally attributed precisely to its descriptive specificity and sophistication. As the IMS itself admits,

Many vendors [have] expressed little or no interest in developing products that [are] required to support a set of meta-data with over 80 elements. …Most have existing products that they hope could support a minimum baseline of elements that the learning resource community would agree to be essential. They also want to be able to make marketing statements such as "IEEE/IMS meta-data conforming document." (IMS, 2000) [HREF 12].

Unfortunately, as of its latest (1.2) version of the metadata information model, IMS has not satisfactorily addressed these concerns (IMS, 2001a) [HREF 13]. Conformance (beyond the level of mere formatting validation) remains something that is still not clearly defined. Exacerbating the difficulties associated with the implementation of this specification is the fact that IMS provides only very brief and sometimes confusing descriptions of the purpose and character its numerous metadata elements. For example, the element labeled "1.3 Catalog Entry" is described in IMS documentation only as the "designation given to the resource", and "5.4 Semantic Density" is characterized confusingly as a "subjective measure of the learning object's usefulness as compared to its size or duration" (IMS 2001b) [HREF 14]. The matter of deciding whether to use such elements and deciphering what their intended purpose might be is no small task. As a result, the actual implementation of the IMS metadata element set is necessarily a complex, resource-intensive undertaking, requiring elements to be chosen, interpreted, used, and then possibly reinterpreted by each group or individual collecting or developing resources. Varying implementations of this element set, moreover, threaten to create problems for the effective searching and exchange of metadata records between projects and jurisdictions.

Consequently, the IMS metadata specification --despite its dominance-- does not represent a "ready-made" metadata solution. Vendors, repositories, and developers cannot yet claim that their objects or collections are "in conformance with IEEE/IMS meta-data" with a clear sense of what that claim might actually mean.

CanCore Metadata

The Canadian Core Learning Resource Metadata Protocol (CanCore) [HREF 15] represents a streamlined and thoroughly explicated version of the IMS element set --a metadata specification, in other words, that is ready for implementation and developer conformance. CanCore has established a core of 36 IMS elements, in effect presenting a third way between the extremes of minimalist and structuralist approaches to metadata represented by Dublin Core and IMS. The CanCore element set is explicitly based on the elements and the hierarchical structure of the IMS specification, but it greatly reduces its complexity and ambiguity. CanCore consists of 8 main categories, 15 "placeholder" elements that designate sub-categories, and 36 "active" elements for which data can be actively supplied in the process of creating a metadata record.

The CanCore protocol includes eight of the nine main categories in the IMS standard: General, Lifecycle, Metametadata, Technical, Educational, Rights, Relation and Classification. Each category and the elements contained in each can be briefly described as follows: The first category, General, describes "context-independent features of the learning object", and in CanCore this category includes seven active elements including title, language, coverage, and an element for full-text description of the resource's content. The second category, Lifecycle, uses four active elements to describe the circumstances of the object's development, including its developers' (and other contributors') names, the date of its creation, as well as publication and version information. Elements in the Metametadata category describe the metadata record itself, identifying those who developed or validated the record, the natural language of the record, and the date it was created or validated. The Technical and Educational categories use 5 elements each to designate (among other things) the object's technical format, size, location and requirements, as well as its educational type, context, and age range. (CanCore also provides a simplified vocabulary for the educational context suitable for an object.) The Rights and the Relations categories employ three active elements each to describe terms and conditions for the use of the learning object, and its relation to other resources. Classification, the last category, consists of four active elements which can be adapted to the use of almost any classification purpose or vocabulary, regardless of the type or the aspect of the object that vocabulary might describe. As one suggested application of this "catch-all" category, CanCore provides a classification and vocabulary for granularity (or pedagogic type) to designate the object as a "program", "course", "unit", "lesson" or "component".  (See [HREF 16] for more information about the metadata elements included in the CanCore Protocol.)

The simplifications and interpretations provided in CanCore already save users the task of selecting and coordinating the use of metadata elements to achieve a basic level of interoperability. In this way, CanCore has already realized considerable economies of scale for its users. It has already worked to prevent redundant or inconsistent implementation efforts, and to ensure that educational metadata and resources can be shared easily among its users and with IMS implementations internationally. As of the writing of this paper, the CanCore Protocol is serving these ends in the context of a number of national projects that are currently developing learning object repository services.  These include the CANARIE-supported "Portal for Online Objects for Learning" (POOL) [HREF 17] and "Broadband Enabled Lifelong Learning Environment" (BELLE) [HREF 18] projects, as well as the Alberta Learning Portal. Funding and support for the development of the CanCore Protocol has been provided through these projects, as well as by the Netera Alliance, TeleCampus.edu, the CAREO Project, and the Electronic Text Centre at the University of New Brunswick.

To ensure that further coordination and economies of scale can be realized, CanCore is developing a comprehensive guidelines document. This document is based on two existing implementation documents or application profiles (see: [HREF 19]), and will provide interpretations for the precise meaning and use of each element included in CanCore. In addition, CanCore is planning to provide training and further vocabulary recommendations, as well as other support and coordination services. Together, these products and services promise to realize further economies of scale, and to make possible an even higher level of interoperability between vendors, developers and repository efforts. In this way, CanCore has added and will add considerable value to the IMS standard, allowing developers and vendors to make clear and confident marketing claims about conformance to CanCore metadata specifications.

Repository Architecture & Metadata

To understand the workings of the seamless, single-click access that is promised by CanCore, it is important to envision how CanCore-compliant records will be integrated into a number of repository architectures and distribution models. These architectures and models fall into two paired classes: A centralized architecture and a commercial or for-profit distribution model on the one hand, and a distributed architecture, and a public or open distribution, on the other.

Typically, a centralized repository architecture combines both learning resources and the metadata describing them in the same location ń-often on the same server, but at least subject to the same central administration. This centralization provides the potential for exercising significant control over the distribution and availability of the learning objects housed in such a repository. As a result, a centralized architecture would likely be most suitable for vendors of commercial content, such as textbook publishers who wish to control access to their educational resources or assets. It is important to note, however, that this control and management would take place in a layer that is separate from the metadata. Such control and management "layers" or systems are available from third party providers. However, the CanCore Protocol --like the IMS specification-- provides data elements that indicate the presence and nature of such rights controls, and that describe commercial status and even the price of the object under consideration.

For CanCore to make seamless search and retrieval a reality in the context of this centralized architecture, it is essential that the metadata records describing the educational assets are themselves not subject to any access or rights restrictions. These metadata records --unlike the assets themselves-- must be made openly available in an interchangeable format to other repositories or collections of similar metadata records. Such a format for interchange is provided by XML (eXtensible Markup Language). CanCore will be providing IMS-compliant specifications for creating and verifying such interchangeable XML records. These interchangeable descriptions can then be seen to serve as digital "calling cards" for the commercial assets they describe, exposing the products to wider markets and the users of any number of learning object repositories.

A distributed architecture provides an alternate method of organizing learning objects that seems especially well-suited to the interests of public sector repository projects, and to larger collaborative efforts. Just as commercial content providers have been deploying distribution control mechanisms for their products, public sector educators --taking their cue from the academic research and open source communities-- have been implementing practices for openly sharing and developing high-quality educational resources. For example, the MERLOT project has adapted the practices of peer review of faculty research as the basis for encouraging the development, evaluation and reuse of learning objects for postsecondary teaching and learning. At the same time, in its highly publicized OpenCourseWare initiative, the Massachusetts Institute of Technology has taken the open source software development model as the basis for making many of its own course materials freely available on the Web for public use and collaborative development --all with the ultimate expectation that the initiative "will raise the tide of educational innovation within MIT and elsewhere" (MIT, 2001) [HREF 20].

Germaine to these types of open, collaborative approaches would be a repository architecture that is itself distributed and cross-institutional. The efforts invested in the development, implementation and review of the objects would likely be scattered across many institutions.  In the same way, the objects themselves would be distributed across the Web, in locations provided most likely by their developers or supporters, or their sponsoring institutions. The metadata describing these resources, however, would remain centralized for fast and effective searching, sharing and control. This would allow the resources to be updated and otherwise maintained by their owners while allowing the metadata describing them to be shared and searched. This type of mechanism --perhaps more accurately described as a Web gateway than repository (see the DESIRE Information Gateways Handbook; [HREF 21])-- is being developed by the BELLE and POOL projects (using the CanCore scheme itself), and has been implemented in MERLOT (using a CanCore-friendly IMS interpretation).

As in the case of a centralized architecture, in a distributed repository model, metadata records describing the resources are in every instance freely available for interchange. Whereas the records created in a centralized, commercial model would typically be generated by the agency owning the assets, the records in an "open source" distributed repository would likely be created by those individuals who develop or contribute objects. Because these individuals would not likely be trained in classification or indexing, support documents and quality control procedures would have to be provided for the development of this metadata. Both the processes of quality control and the creation of records by non-specialists would be facilitated by the relatively small number and simplicity of elements available in CanCore, and by the explication and interpretation CanCore provides. Again, these metadata records would serve as a type of "calling card" for the resource, providing the original developer with favourable exposure to a community of peers, and allowing any resource to be used, reused and potentially improved.

The free interchange of these records among commercial and public repositories opens the way for a third type of repository approach --one that is derivative of the distributed model outlined above. This type of repository can be characterized broadly as "a repository of repositories". With metadata records both freely available and exchangeable, it is possible to envision a collection of metadata that would actually be independent of any collection of assets or resources. In this case, these records would refer to resources available in any number of other centralized or distributed collections, and would provide the user with a single place for searching and accessing these resources, regardless of their point of origin. When a user searches such a central metadata store, she would be able to get single-click access to resources that are otherwise scattered across the Web, in proprietary and public databases that may have a variety of access protocols and paths. When combined with customizable news, community-building features and other services, the aggregation of material presented in such a context would truly earn it the title of "portal": a starting point for educational users of the Web.

Figure 2: Learning Object Portal Architecture

A diagram (Figure 2) schematizing these repository types and their possible interrelation is provided above. An effective way of understanding the operation and efficacy of these types of repositories is, again, provided by practices that are commonplace in the library world --namely, in the form of a "union" catalogue, and of similar information resource sharing networks. A union or consortium library catalogue represents an aggregation of interchangeable records describing resources that are available across an entire library system or consortium. Such a catalogue would be the functional equivalent of a "portal" or "repository of repositories" described above. Moreover, because the creation of these standard and interchangeable records in the library world is recognized to be a very labor-intensive process, libraries rarely create their own cataloguing records from scratch; instead, they have developed a network where these records are shared. Whenever a library receives a resource for which there is no record, the library will create a new one and add it to the shared record pool, making it available to others only for the price of participating in the network. Similarly, the effective sharing of metadata records among repositories could realize considerable efficiencies, save effort, ensure accuracy, and greatly increase access for users and exposure for vendors or developers.

Conclusion

Interoperable metadata, consistently and systematically implemented, is one of the lynchpins in achieving the vision of easy access to shared and reusable learning objects. The CanCore Protocol promises to provide this crucial functional element for those using it in their implementations. As has hopefully become apparent in this paper, the interest of CanCore is not to compete with or supplant other specification efforts. Instead, its goal is to add value strategically to the widely accepted but difficult metadata model put forward by the IMS Consortium. CanCore adds value to this model by simplifying and refining it, by developing vocabularies suitable to a number of educational sectors, and also by providing interpretation and support in the form of guidelines and other services. In this way, CanCore is attempting to position itself not as an "authoritative" metadata solution, but rather as the solution that is the easiest to implement, and potentially the most strategically attractive to developers, administrators and educators alike.

References

IMS Global Learning Consortium. (2000) IMS Learning Resource Meta-data Best Practices and Implementation Guide. [HREF 23].

IMS Global Learning Consortium. (2001a) IMS Learning Resource Meta-data Best Practice and Implementation Guide. [HREF 22].

IMS Global Learning Consortium. (2001b) IMS Learning Resource Meta-data Information Model. [HREF 24].

MIT News Office. (2001) MIT to make nearly all course materials available free on the World Wide Web. [HREF 25].

Roschelle, J. K. J. (1996). Educational Software Architecture and Systemic Impact: the Promise of Component Software, Journal of Educational Computing Research, v.14 n. 3, p217-228.

Wieseler, W, J. Katzman, J. Larsen, & J. Caton. (1999) RIO: A Standards-based Approach for Reusable Information Objects [HREF 26].

Hypertext References

HREF 1 http://www.ualberta.ca/~nfriesen/norm

HREF 2 http://www.careo.org/

HREF 3 http://www.cisco.com/warp/public/10/wwtraining/elearning/learn/whitepaper_docs/rlo_strategy_v3-1.pdf

HREF 4 http://www.id2.usu.edu/Papers/Contents.html

HREF 5 http://www.reusability.org/

HREF 6 http://www.imsproject.org/

HREF 7 http://www.adlnet.org/Scorm/scorm_index.cfm

HREF 8 http://dublincore.org/

HREF 9 http://www.imsproject.org/metadata/

HREF 10 http://www.merlot.org/

HREF 11 http://ariadne.unil.ch/

HREF 12 http://www.imsproject.com/metadata/mdbestv1p1.html

HREF 13 http://www.imsproject.org/metadata/ims_md_bestv1p2.html

HREF 14 http://www.imsproject.org/metadata/ims_md_infov1p2.html

HREF 15 http://www.cancore.org/

HREF 16 http://www.cancore.org/schema.html

HREF 17 http://www.newmic.com/pool/

HREF 18 http://www.netera.ca/belle/

HREF 19 http://www.cancore.ca/documents

HREF 20 http://web.mit.edu/newsoffice/nr/2001/ocw.html

HREF 21 http://www.desire.org/handbook/

HREF 22 http://www.imsproject.org/metadata/ims_md_bestv1p2.html

HREF 23 http://www.imsproject.com/metadata/mdbestv1p1.html

HREF 24 http://www.imsproject.org/metadata/ims_md_infov1p2.html

HREF 25 http://web.mit.edu/newsoffice/nr/2001/ocw.html

HREF 26 http://www.cisco.com/warp/public/10/wwtraining/elearning/learn/whitepaper_docs/rlo_strategy_v3-1.pdf

© Copyright 2001. The author, Norm Friesen, assigns to the University of New Brunswick and other educational and non-profit institutions a non-exclusive license to use this document for personal use and in courses of instruction provided that the article is used in full and this copyright statement is reproduced. The author also grants a non-exclusive license to the University of New Brunswick to publish this document in full on the World Wide Web and on CD-ROM and in printed form with the conference papers, and for the document to be published on mirrors on the World Wide Web. Any other usage is prohibited without the express permission of the author.