Wednesday 15 May 2013

CS9223 ADVANCED SYSTEM SOFTWARE NOTES



ADVANCED SYSTEM SOFTWARE
CS9223 

1.      What is meant by disabling interrupt mechanism?

"Disabling interrupts, another mechanism that achieves mutual exclusion, is a mechanism where a process disables interrupts before entering the critical section and enables the interrupt immediately after exiting the critical section. Mutual exclusions is achieved because a process is not interrupted during the execution of its critical section and thus excludes all other processes from entering their critical section. The problems with this method are that it is applicable to only uniprocessor systems and important input-output events may be mishandled."

2.      What are the differences between livelock and deadlock?
A livelock is similar to a deadlock, except that the states of the processes involved in the livelock constantly change with regard to one another, none progressing. Livelock is a special case of resource starvation; the general definition only states that a specific process is not progressing.
A real-world example of livelock occurs when two people meet in a narrow corridor, and each tries to be polite by moving aside to let the other pass, but they end up swaying from side to side without making any progress because they both repeatedly move the same way at the same time.
Livelock is a risk with some algorithms that detect and recover from deadlock. If more than one process takes action, the deadlock detection algorithm can be repeatedly triggered. This can be avoided by ensuring that only one process (chosen randomly or by priority) takes action.

Deadlock is a situation when two processes, each having a lock on one piece of data, attempt to acquire a lock on the other's piece. Each process  would wait indefinitely for the other to release the lock, unless one of the user processes is terminated. SQL Server detects deadlocks and terminates one user's process.

A livelock is one, where a  request for an exclusive lock is repeatedly denied because a series of overlapping shared locks keeps interfering. SQL Server detects the situation after four denials and refuses further shared locks. A livelock also occurs when read transactions monopolize a table or page, forcing a write transaction to wait
indefinitely

3.      What are the two major differences between deadlock and starvation?

Deadlock:
Two processes are said to be in deadlock situation if process A holding onto resources required for process B and where as B holding onto the resources required for process A.

Starvation:

This is mostly happens in time sharing systems in which the process which requires less time slot is waiting for the large process to finish and to release the resources, but the large process holding the resources for long time
(almost for forever) and the process that requires small time slot goes on waiting. Such situation is starvation for small process


The previous four chapters of this book gave a broad overview of Java's architecture. They showed how the Java virtual machine fits into the overall architecture relative to other components such as the language and API. The remainder of this book will focus more narrowly on the Java virtual machine. This chapter gives an overview of the Java virtual machine's internal architecture.
The Java virtual machine is called "virtual" because it is an abstract computer defined by a specification. To run a Java program, you need a concrete implementation of the abstract specification. This chapter describes primarily the abstract specification of the Java virtual machine. To illustrate the abstract definition of certain features, however, this chapter also discusses various ways in which those features could be implemented.

What is a Java Virtual Machine?

To understand the Java virtual machine you must first be aware that you may be talking about any of three different things when you say "Java virtual machine." You may be speaking of:
  • the abstract specification,
  • a concrete implementation, or
  • a runtime instance.
The abstract specification is a concept, described in detail in the book: The Java Virtual Machine Specification, by Tim Lindholm and Frank Yellin. Concrete implementations, which exist on many platforms and come from many vendors, are either all software or a combination of hardware and software. A runtime instance hosts a single running Java application.
Each Java application runs inside a runtime instance of some concrete implementation of the abstract specification of the Java virtual machine. In this book, the term "Java virtual machine" is used in all three of these senses. Where the intended sense is not clear from the context, one of the terms "specification," "implementation," or "instance" is added to the term "Java virtual machine".

The Lifetime of a Java Virtual Machine

A runtime instance of the Java virtual machine has a clear mission in life: to run one Java application. When a Java application starts, a runtime instance is born. When the application completes, the instance dies. If you start three Java applications at the same time, on the same computer, using the same concrete implementation, you'll get three Java virtual machine instances. Each Java application runs inside its own Java virtual machine.
A Java virtual machine instance starts running its solitary application by invoking the main() method of some initial class. The main() method must be public, static, return void, and accept one parameter: a String array. Any class with such a main() method can be used as the starting point for a Java application.
For example, consider an application that prints out its command line arguments:
// On CD-ROM in file jvm/ex1/Echo.java
class Echo {
 
    public static void main(String[] args) {
        int len = args.length;
        for (int i = 0; i < len; ++i) {
            System.out.print(args[i] + " ");
        }
        System.out.println();
    }
}
You must in some implementation-dependent way give a Java virtual machine the name of the initial class that has the main() method that will start the entire application. One real world example of a Java virtual machine implementation is the java program from Sun's Java 2 SDK. If you wanted to run the Echo application using Sun's java on Window98, for example, you would type in a command such as:
java Echo Greetings, Planet.
The first word in the command, "java," indicates that the Java virtual machine from Sun's Java 2 SDK should be run by the operating system. The second word, "Echo," is the name of the initial class. Echo must have a public static method named main() that returns void and takes a String array as its only parameter. The subsequent words, "Greetings, Planet.," are the command line arguments for the application. These are passed to the main() method in the String array in the order in which they appear on the command line. So, for the previous example, the contents of the String array passed to main in Echo are: arg[0] is "Greetings," arg[1] is "Planet."
The main() method of an application's initial class serves as the starting point for that application's initial thread. The initial thread can in turn fire off other threads.
Inside the Java virtual machine, threads come in two flavors: daemon and non- daemon. A daemon thread is ordinarily a thread used by the virtual machine itself, such as a thread that performs garbage collection. The application, however, can mark any threads it creates as daemon threads. The initial thread of an application--the one that begins at main()--is a non- daemon thread.
A Java application continues to execute (the virtual machine instance continues to live) as long as any non-daemon threads are still running. When all non-daemon threads of a Java application terminate, the virtual machine instance will exit. If permitted by the security manager, the application can also cause its own demise by invoking the exit() method of class Runtime or System.
In the Echo application previous, the main() method doesn't invoke any other threads. After it prints out the command line arguments, main() returns. This terminates the application's only non-daemon thread, which causes the virtual machine instance to exit.

Java Virtual Machine (JVM) architecture
If you start learning Java, you would want to learn full details of how JVM really functioning. In this post, I'm going to explain JVM (Java Virtual Machine) architecture.

JVM has various sub components internally. You can see all of them from the above diagram.
1. Class loader sub system: JVM's class loader sub system performs 3 tasks
      a. It loads .class file into memory.
      b. It verifies byte code instructions.
      c. It allots memory required for the program.

2. Run time data area: This is the memory resource used by JVM and it is divided into 5 parts
      a. Method area: Method area stores class code and Method code.
      b. Heap: Objects are created on heap.
      c. Java stacks: Java stacks are the places where the Java methods are executed. A Java stack contains frames. On each frame, a separate method is executed.
      d. Program counter registers: The program counter registers store memory address of the instruction to be executed by the micro processor.
      e. Native method stacks: The native method stacks are places where native methods (for example, C language programs) are executed. Native method is a function, which is written in another language other than Java.

3. Native method interface: Native method interface is a program that connects native methods libraries (C header files) with JVM for executing native methods.
4. Native method library: holds the native libraries information.
5. Execution engine: Execution engine contains interpreter and JIT compiler, which covert byte code into machine code. JVM uses optimization technique to decide which part to be interpreted and which part to be used with JIT compiler. The HotSpot represent the block of code executed by JIT compiler.
The above information is just very basic guide of what JVM consists of. Please refer "Inside JVM" book in order to learn more about JVM in detail.
 


Memory Management
• Program data is stored in memory.
– Memory is a finite resource: programs may need to reuse some of it.
• Most programming languages provide two means of structuring data stored in memory:
Stack: memory space (stack frames) for storing data local to a function body.
– The programming language provides facilities for automatically managing stack-allocated data. (i.e. compiler emits code for allocating/freeing stack frames)
– (Aside: Unsafe languages like C/C++ don’t enforce the stack invariant, which leads to bugs that can be exploited for code injection a^acks…)
Heap: memory space for storing data that is created by a function but needed in a caller. (Its lifetime is unknown at compile time.)
– Freeing/reusing this memory can be up to the programmer (C/C++)
– (Aside: Freeing memory twice or never freeing it also leads to many bugs in C/C++ programs…)
Garbage collection automates memory management for Java/ML/C#/etc.

UNIX MEMORY LAYOUT
EXPLICIT MEMORY MANAGEMENT
• On unix, libc provides a library that allows programmers to manage the heap:
void * malloc(size_t n)
– Allocates n bytes of storage on the heap and returns its address.
void free(void *addr)
– Releases the memory previously allocated by malloc address addr.
• These are user-level library functions. Internally, malloc uses brk (or sbrk) system calls to have the kernel allocate space to the process.

Simple Implementation : Free Lists
• Arrange the blocks of unused memory in a free list.
– Each block has a pointer to the next free block.
– Each block keeps track of its size. (Stored before & aeer data parts.)
– Each block has a status flag = allocated or unallocated (Kept as a bit in the first size (assuming size is a multiple of 2 so the last bit is unused)


• Malloc: walk down free list, find a block big enough
– First fit? Best fit?
• Free: insert the freed block into the free list.
– Perhaps keep list sorted so that adjacent blocks can be merged.
• Problems:
– Fragmentation ruins the heap
– Malloc can be slow

Exponential Scaling / Buddy System
• Keep an array of freelists: FreeList[i]
– FreeList[i] points to a list of blocks of size 2i
• Malloc: round requested size up to nearest power of 2
– When FreeList[i] is empty, divide a block from FreeList[i+1] into two halves, put both chunks into FreeList[i]
– Alternatively, merge together two adjacent nodes from FreeList[i-1]
• Free: puts freed block back into appropriate free list
• Malloc & free take O(1) time
• This approach trades external fragmentation (within the heap as a whole) for internal fragmentation (within each block).
– Wasted space: ~30%

Why Garbage Collection?
• Manual memory management is cumbersome & error prone:
– Freeing the same pointer twice is ill defined (seg fault or other bugs)
– Calling free on some pointer not created by malloc (e.g. to an element of an array) is also ill defined
malloc and free aren’t modular: To properly free all allocated memory, the programmer has to know what code owns” each object. Owner code must ensure free is called just once.
– Not calling free leads to space leaks: memory never reclaimed
• Many examples of space leaks in long-running programs
• Garbage collection:
– Have the language runtime system determine when an allocated chunk of memory will no longer be used and free it automatically.
– But… garbage collector is usually the most complex part of a language’s runtime system.
– Garbage collection does impose costs (performance, predictability)

Memory Use & Reachability
• When is a chunk of memory no longer needed?
– In general, this problem is undecidable.
• We can approximate this information by freeing memory that can’t be reached from any root references.
– A root pointer is one that might be accessible directly from the program (i.e. they’re not in the heap).
– Root pointers include pointer values stored in registers, in global variables, or on the stack.
• If a memory cell is part of a record (or other data structure) that can be reached by traversing pointers from the root, it is
live.
• It is safe to reclaim all memory cells not reachable from a root (such cells are garbage).

Reachability & Pointers
• Starting from stack, registers, & globals (roots), determine which objects in the heap are reachable following pointers.
• Reclaim any object that isn't reachable.
• Requires being able to distinguish pointer values from other values (e.g., ints).
• Type safe languages:
– OCaml, SML/NJ use the low bit:
1 it's a scalar, 0 it's a pointer. (Hence 31-bit ints in OCaml)
– Java puts the tag bits in the object meta-data (uses more space).
– Type safety implies that casts can’t introduce new pointers
– Also, pointers are abstract (references), so objects can be moved without changing the meaning of the program
• Unsafe languages:
– Pointers aren’t abstract, they can’t be moved.
– Boehm-Demers-Weiser conservative collector for C use heuristics: (e.g.,the value doesn't point into an allocated object, pointers are multiples of 4, etc.)
– May not find as much garbage due to conservativity.

Example Object Graph
• Pointers in the stack, registers, and globals are roots

Mark and Sweep Garbage Collection
• Classic algorithm with two phases:
• Phase 1: Mark
– Start from the roots
– Do depth-first traversal, marking every object reached.
• Phase 2: Sweep
– Walk over all allocated objects and check for marks.
– Unmarked objects are reclaimed.
– Marked objects have their marks cleared.
– Optional: compact all live objects in heap by moving them adjacent to one another. (needs extra work & indirection to “patch up” pointers)

Results of Marking Graph

Implementing the Mark Phase
• Depth-first search has a natural recursive algorithm.
• Question: what happens when traversing a long linked list?
• Where do we store the information needed to perform the traversal?
– (In general, garbage collectors are tricky to implement because if they allocate memory who manages that?!)

Deutsch-Schorr-Waite (DSW) Algorithm
• No need for a stack, it is possible to use the graph being traversed itself to store the data necessary…
• Idea: during depth-first-search, each pointer is followed only once. The algorithm can reverse the pointers on the way
down and restore them on the way back up.
– Mark a bit on each object traversed on the way down.
• Two pointers:
– curr: points to the current node
– prev points to the previous node
• On the way down, flip pointers as you traverse them:
– tmp := curr
curr := curr.next
tmp.next := prev
prev := curr

Example of DSW (traversing down)

Costs & Implications
• Need to generalize to account for objects that have multiple outgoing pointers.
• Depth-first traversal terminates when there are no children pointers or all children are already marked.
– Accounts for cycles in the object graph.
• The Deutsch-Schorr-Waite algorithm breaks objects during the traversal.
– All computation must be halted during the mark phase. (Bad for concurrent programs!)
• Mark & Sweep algorithm reads all memory in use by the program (even if it’s garbage!)
– Running time is proportional to the total amount of allocated memory (both live and garbage).
– Can pause the programs for long times during garbage collection.


Here's a look at the five top virtual server security concerns of the moment.
1. Managing oversight and responsibility
The overarching issue with virtual servers is responsibility, MacDonald says. Unlike physical servers, which are the direct responsibility of the data-center or IT managers in whose physical domain they sit, responsibility for virtual servers is often left up in the air. Should the business-unit that requested it be able to configure and secure it? Should it be the IT manager closest to the physical host? A centralized master sysadmin tasked with management and security for all the virtualized assets in an enterprise?
"People don't appreciate that when you add virtual servers there's another layer there of technology in addition to the application and the operating system and the hardware, and you have to secure it, MacDonald says.
2. Patching and maintenance
The most tangible risk that can come out of a lack of responsibility is the failure to keep up with the constant, labor-intensive process of patching, maintaining and securing each virtual server in a company. Unlike the physical servers on which they sit, which are launched and configured by hands-on IT managers who also install the latest patches, virtual machines tend to be launched from server images that may have been created, configured and patched weeks or months before.
Most companies maintain a small number of general-purpose "golden" images from which to launch or relaunch new VMs for many purposes, but also keep dozens or hundreds of server images stored on DVD or disk after being laboriously configured to support specific applications or business requirements, MacDonald says.
"You can take a snapshot of a virtual machine and write it off to disk so you don't have to recreate it the next time, or for disaster recovery. Just fire off one of these virtual machines sitting in offline libraries. But for the most part they're not being kept up to date with A/V signatures and patches, " MacDonald says. "Someone should check when they do launch one, but often they don't, and there isn't usually a way to check."
Both Microsoft and VMware supply patch-management schedules with their base infrastructure products. Both require disk images stored in libraries to be launched periodically so they can be patched.
That's a tedious process for companies with libraries of hundreds of VM images, however, and does nothing to address the patch status of VMs that are running but might not have been patched or had new antivirus signatures installed for weeks or months. Of course, VMware, HP, and many startup companies are trying to help IT automate much of this work right now with management products.
3. Visibility and compliance
Virtual servers are designed to be, if not invisible, then at least very low profile, at least within the data center. All the storage or bandwidth or floor space or electricity they need comes from the physical server on which they sit. To data-center managers not specifically tasked with monitoring all the minute interactions of the VMs inside each host, a set of virtual servers becomes an invisible network within which there are few controls.
"Virtual switch implementations let the VMs talk to each other, and across the network," MacDonald says. "But unless you put virtualized security controls—virtual sniffers, virtual firewalls, all the same controls you'd use on a physical server, inside that network, you don't see what's going on."
"There are a lot of compliance and use issues," McDonald says."Just because you don't have a sniffer to see those packets moving between the virtual servers doesn't mean they're not there," MacDonald says. "You could have a HIPPA-controlled workload talking to a non-HIPPA workload, or PCI and non-PCI workloads talking to each other. That puts you in a bad position. You would know if you looked at the packets on that network, but those packets are not coming out of the box for you to look at, so unless you take extra steps, you wouldn't know."
Microsoft, VMware and Citrix are all building some level of visibility and control over those interactions into their base products, but the level of function is nowhere near the point that customers will be secure, MacDonald says.
Silicon Valley startup Altor is finding some fans for its virtual firewalls, as is Reflex Systems, which migrated from physical to virtual firewalls to keep up with growth in that market, MacDonald says.
"Cisco's not there yet, Juniper's not there; we haven't reached the tipping point where the traditional networking vendors feel they have to be able to reach into virtual machines," MacDonald says.
In many cases, customers either don't know or don't care about certain risks. A poll of 109 attendees at the RSA Conference 2009 in Las Vegas last month, conducted and published by virtual-security software provider Secure Passage, indicated that 72 percent of respondents have not deployed virtual firewalls of any kind. The most frequent reasons cited: the limited visibility respondents had into virtual networks, the difficulty of managing virtual security and lack of understanding regarding what constitutes a virtual firewall.
VMSafe, the APIs that VMware built into the VSphere version of its virtual infrastructure product, makes it possible for third-party security vendors to apply their applications to VMware VMs. The company also announced at the RSA conference that it had built RSA's data loss prevention software into vSphere to enhance its security.
"They're making progress," MacDonald says of VMware and Microsoft. "They're not where we need them to be yet."
Simon Crosby, chief technology officer of Citrix Systems, said during a security debate at the RSA conference that security should be built into the applications, not the hypervisor or virtual-infrastructure management products.
He said paying attention to the security configuration guidelines that Citrix and other hypervisor vendors publish can fix most of the security issues and that industry groups such as the Cloud Security Alliance can extend that guidance to include process-management and policy issues.
4. VM sprawl
Another consequence of the lack of oversight of virtual machines is sprawl—the uncontrolled proliferation of virtual machines launched, and often forgotten, by IT managers, developers or business-unit managers who want extra servers for some specific purpose, and lose track of them later.
VM sprawl wastes resources, creates unmonitored servers that could have access to sensitive data, and sets the company as a whole and IT in particular up for a painful cleanup when a problem crops up later, Steffen says.
"We try to treat the VMs in exactly the same way we do physical machines—with system scans, antivirus, and everything else. That includes going through a procurement process for VMs just as if they were physical machines," Steffen says.
Forcing business unit managers to fill out requisitions and explain why they want an additional VM, for what, and for how long slows the process down, which could be considered inefficient, but also gives everyone involved time to think about how necessary each new VM is.
"We don't do that if they need to replace a server they're already running," Steffen says. "But with VMs you have the potential for VMs to get completely out of hand and have so many out there you can't do anything about how secure they are."
The Secure Passage poll of RSA attendees showed 42 percent were concerned about sprawl, specifically the lack of controls available to keep business unit managers from spawning off new servers at will, rather than coordinating with IT to make sure they are managed and secure.
5. Managing Virtual Appliances
One of the very best things about virtual infrastructures is the ability to buy or test a product from a third-party vendor and have it up and running in minutes, rather than having to clear space on a test server, install the software, get it to talk to the operating system and the network and then, hours later, see whether it does what it's supposed to, MacDonald says.
Unfortunately, virtual appliances are also virtual pigs in a poke. "There's an operating system and application in every package, every one with its own configuration and patch status and you have no idea what's in there or who's going to maintain it or what the long-term risk is going to be," MacDonald says. "It has a full application and OS all configured and ready to run. In five minutes you can try out that new anti-spam server. But what OS is in the package and is it patched, and if not, who is going to give you the patch? "


COMPILER STRUCTURE
Garbage collection (computer science) 1
Garbage collection (computer science)
In computer science, garbage collection (GC) is a form of automatic memory management. The garbage collector,
or just collector, attempts to reclaim garbage, or memory occupied by objects that are no longer in use by the
program. Garbage collection was invented by John McCarthy around 1959 to solve problems in Lisp.[1][2]
Garbage collection is often portrayed as the opposite of manual memory management, which requires the
programmer to specify which objects to deallocate and return to the memory system. However, many systems use a
combination of approaches, including other techniques such as stack allocation and region inference.
Garbage collection does not traditionally manage limited resources other than memory that typical programs use,
such as network sockets, database handles, user interaction windows, and file and device descriptors. Methods used
to manage such resources, particularly destructors, may suffice as well to manage memory, leaving no need for GC.
Some GC systems allow such other resources to be associated with a region of memory that, when collected, causes
the other resource to be reclaimed; this is called finalization. Finalization may introduce complications limiting its
usability, such as intolerable latency between disuse and reclaim of especially limited resources, or a lack of control
over which thread performs the work of reclaiming.
Principles
The basic principles of garbage collection are:
1. Find data objects in a program that cannot be accessed in the future
2. Reclaim the resources used by those objects
Many computer languages require garbage collection, either as part of the language specification (e.g., Java, C#, and
most scripting languages) or effectively for practical implementation (e.g., formal languages like lambda calculus);
these are said to be garbage collected languages. Other languages were designed for use with manual memory
management, but have garbage collected implementations available (e.g., C, C++). Some languages, like Ada,
Modula-3, and C++/CLI allow both garbage collection and manual memory management to co-exist in the same
application by using separate heaps for collected and manually managed objects; others, like D, are garbage
collected but allow the user to manually delete objects and also entirely disable garbage collection when speed is
required. While integrating garbage collection into the language's compiler and runtime system enables a much
wider choice of methods, post hoc GC systems exist, including some that do not require recompilation. (Post-hoc GC
is sometimes distinguished as litter collection.) The garbage collector will almost always be closely integrated with
the memory allocator.
Benefits
Garbage collection frees the programmer from manually dealing with memory deallocation. As a result, certain
categories of bugs are eliminated or substantially reduced:
• Dangling pointer bugs, which occur when a piece of memory is freed while there are still pointers to it, and one
of those pointers is dereferenced. By then the memory may have been re-assigned to another use, with
unpredictable results.
• Double free bugs, which occur when the program tries to free a region of memory that has already been freed,
and perhaps already been allocated again.
• Certain kinds of memory leaks, in which a program fails to free memory occupied by objects that have actually
become unreachable, which leads to memory exhaustion if such a behavior is repeated indefinitely. (Garbage
collection typically does not deal with the unbounded accumulation of data which is reachable, but which will
actually not be used by the program.)
Some of the bugs addressed by garbage collection can have security implications.
Garbage collection (computer science) 2
Disadvantages
Typically, garbage collection has certain disadvantages:
• Garbage collection consumes computing resources in deciding which memory to free, reconstructing facts that
may have been known to the programmer. The penalty for the convenience of not annotating object lifetime
manually in the source code is overhead, often leading to decreased or uneven performance. Interaction with
memory hierarchy effects can make this overhead intolerable in circumstances that are hard to predict or to detect
in routine testing.
• The moment when the garbage is actually collected can be unpredictable, resulting in stalls scattered throughout a
session. Unpredictable stalls can be unacceptable in real-time environments, in transaction processing, or in
interactive programs. Incremental, concurrent, and real-time garbage collectors address these problems, with
varying trade-offs.
• Non-deterministic GC is incompatible with RAII based management of non GCed resources. As a result, the need
for explicit manual resource management (release/close) for non-GCed resources becomes transitive to
composition. That is: in a non-deterministic GC system, if a resource or a resource like object requires manual
resource management (release/close), and this object is used as 'part of' another object, then the composed object
will also become a resource like object that itself requires manual resource management (release/close).
Tracing garbage collectors
Tracing garbage collectors are the most common type of garbage collector. They first determine which objects are
reachable (or potentially reachable), and then discard all remaining objects.
Reachability of an object
Informally, an object is reachable if it is referenced by at least one variable in the program, either directly or through
references from other reachable objects. More precisely, objects can be reachable in only two ways:
1. A distinguished set of objects are assumed to be reachable: these are known as the roots. Typically, these include
all the objects referenced from anywhere in the call stack (that is, all local variables and parameters in the
functions currently being invoked), and any global variables.
2. Anything referenced from a reachable object is itself reachable; more formally, reachability is a transitive closure.
The reachability definition of "garbage" is not optimal, insofar as the last time a program uses an object could be
long before that object falls out of the environment scope. A distinction is sometimes drawn between syntactic
garbage, those objects the program cannot possibly reach, and semantic garbage, those objects the program will in
fact never again use. For example:
Object x = new Foo();
Object y = new Bar();
x = new Quux();
/* at this point, we know that the Foo object
* originally assigned to x will never be
* accessed: it is syntactic garbage
*/
if(x.check_something()) {
x.do_something(y);
}
System.exit(0);
/* in the above block, y *could* be semantic garbage,
Garbage collection (computer science) 3
* but we won't know until x.check_something() returns
* some value: if it returns at all
*/
The problem of precisely identifying semantic garbage can easily be shown to be partially decidable: a program that
allocates an object X, runs an arbitrary input program P, and uses X if and only if P finishes would require a semantic
garbage collector to solve the halting problem. Although conservative heuristic methods for semantic garbage
detection remain an active research area, essentially all practical garbage collectors focus on syntactic garbage.
Another complication with this approach is that, in languages with both reference types and unboxed value types, the
garbage collector needs to somehow be able to distinguish which variables on the stack or fields in an object are
regular values and which are references: in memory, an integer and a reference might look alike. The garbage
collector then needs to know whether to treat the element as a reference and follow it, or whether it is a primitive
value. One common solution is the use of tagged pointers.
Strong and Weak references
The garbage collector can reclaim only objects that have no references. However, there can exist additional
references that, in a sense, do not matter, which are called weak references. In discussions about weak references,
ordinary references are sometimes called strong references. An object is eligible for garbage collection if there are no
strong (i.e. ordinary) references to it, even though there still might be some weak references to it.
A weak reference is not merely just any pointer to the object that a garbage collector does not care about. The term is
usually reserved for a properly managed category of special reference objects which are safe to use even when the
object disappears because they lapse to a safe value. An unsafe reference that is not known to the garbage collector
will simply remain dangling by continuing to refer to the address where the object previously resided. This is not a
weak reference.
In some implementations, notably in Microsoft.NET,[3] the weak references are divided into two further
subcategories: long weak references (tracks resurrection) and short weak references.
Weak Collections
Objects which maintain collections of other objects can also be devised which have weak tracking features. For
instance, weak hash tables are useful. Like a regular hash table, a weak hash table maintains an association between
pairs of objects, where each pair is understood to be a key and value. However, the hash table does not actually
maintain a strong reference on these objects. A special behavior takes place when either the key or value or both
become garbage: the hash table entry is spontaneously deleted. There exist further refinements such as hash tables
which have only weak keys (value references are ordinary, strong references) or only weak values (key references
are strong).
Weak hash tables are important for maintaining associations between objects, such that the objects engaged in the
association can still become garbage if nothing in the program refers to them any longer (other than the associating
hash table).
The use of a regular hash table for such a purpose could lead to a "logical memory leak": the accumulation of
reachable data which the program does not need and will not use.
Garbage collection (computer science) 4
Basic algorithm
Tracing collectors are so called because they trace through the working set of memory. These garbage collectors
perform collection in cycles. A cycle is started when the collector decides (or is notified) that it needs to reclaim
memory, which happens most often when the system is low on memory. The original method involves a naïve
mark-and-sweep in which the entire memory set is touched several times.
Naïve mark-and-sweep
In the naive mark-and-sweep method, each object in memory has a flag (typically a single bit) reserved for garbage
collection use only. This flag is always cleared, except during the collection cycle. The first stage of collection does
a tree traversal of the entire 'root set', marking each object that is pointed to as being 'in-use'. All objects that those
objects point to, and so on, are marked as well, so that every object that is ultimately pointed to from the root set is
marked. Finally, all memory is scanned from start to finish, examining all free or used blocks; those with the in-use
flag still cleared are not reachable by any program or data, and their memory is freed. (For objects which are marked
in-use, the in-use flag is cleared again, preparing for the next cycle.)
This method has several disadvantages, the most notable being that the entire system must be suspended during
collection; no mutation of the working set can be allowed. This will cause programs to 'freeze' periodically (and
generally unpredictably), making real-time and time-critical applications impossible. In addition, the entire working
memory must be examined, much of it twice, potentially causing problems in paged memory systems.
Tri-color marking
Because of these pitfalls, most modern tracing garbage collectors implement some variant of the tri-colour marking
abstraction, but simple collectors (such as the mark-and-sweep collector) often do not make this abstraction explicit.
Tri-colour marking works as follows:
1. Create initial white, grey, and black sets; these sets will be used to maintain progress during the cycle.
• Initially the white set or condemned set is the set of objects that are candidates for having their memory
recycled.
• The black set is the set of objects that cheaply can be proven to have no references to objects in the white set;
in many implementations the black set starts off empty.
• The grey set is all the objects that are reachable from root references but the objects referenced by grey objects
haven't been scanned yet. Grey objects are known to be reachable from the root, so cannot be garbage
collected: grey objects will eventually end up in the black set. The grey state means we still need to check any
objects that the object references.
• The grey set is initialised to objects which are referenced directly at root level; typically all other objects are
initially placed in the white set.
• Objects can move from white to grey to black, never in the other direction.
2. Pick an object from the grey set. Blacken this object (move it to the black set), by greying all the white objects it
references directly. This confirms that this object cannot be garbage collected, and also that any objects it
references cannot be garbage collected.
3. Repeat the previous step until the grey set is empty.
4. When there are no more objects in the grey set, then all the objects remaining in the white set have been
demonstrated not to be reachable, and the storage occupied by them can be reclaimed.
The 3 sets partition memory; every object in the system, including the root set, is in precisely one set.
The tri-colour marking algorithm preserves an important invariant:
No black object points directly to a white object.
This ensures that the white objects can be safely destroyed once the grey set is empty. (Some variations on the
algorithm do not preserve the tricolour invariant but they use a modified form for which all the important properties
Garbage collection (computer science) 5
hold.)
The tri-colour method has an important advantage: it can be performed 'on-the-fly', without halting the system for
significant time periods. This is accomplished by marking objects as they are allocated and during mutation,
maintaining the various sets. By monitoring the size of the sets, the system can perform garbage collection
periodically, rather than as-needed. Also, the need to touch the entire working set each cycle is avoided.
Implementation strategies
In order to implement the basic tri-colour algorithm, several important design decisions must be made, which can
significantly affect the performance characteristics of the garbage collector.
Moving vs. non-moving
Once the unreachable set has been determined, the garbage collector may simply release the unreachable objects and
leave everything else as it is, or it may copy some or all of the reachable objects into a new area of memory, updating
all references to those objects as needed. These are called "non-moving" and "moving" (or, alternatively,
"non-compacting" and "compacting") garbage collectors, respectively.
At first, a moving GC strategy may seem inefficient and costly compared to the non-moving approach, since much
more work would appear to be required on each cycle. In fact, however, the moving GC strategy leads to several
performance advantages, both during the garbage collection cycle itself and during actual program execution:
• No additional work is required to reclaim the space freed by dead objects; the entire region of memory from
which reachable objects were moved can be considered free space. In contrast, a non-moving GC must visit each
unreachable object and somehow record that the memory it alone occupied is available.
• Similarly, new objects can be allocated very quickly. Since large contiguous regions of memory are usually made
available by the moving GC strategy, new objects can be allocated by simply incrementing a 'free memory'
pointer. A non-moving strategy may, after some time, lead to a heavily fragmented heap, requiring expensive
consultation of "free lists" of small available blocks of memory in order to allocate new objects.
• If an appropriate traversal order is used (such as cdr-first for list conses), objects that refer to each other
frequently can be moved very close to each other in memory, increasing the likelihood that they will be located in
the same cache line or virtual memory page. This can significantly speed up access to these objects through these
references.
One disadvantage of a moving garbage collector is that it only allows access through references that are managed by
the garbage collected environment, and does not allow pointer arithmetic. This is because any native pointers to
objects will be invalidated when the garbage collector moves the object (they become dangling pointers). For
interoperability with native code, the garbage collector must copy the object contents to a location outside of the
garbage collected region of memory. An alternative approach is to pin the object in memory, preventing the garbage
collector from moving it and allowing the memory to be directly shared with native pointers (and possibly allowing
pointer arithmetic).[4]
Copying vs. mark-and-sweep vs. mark-and-don't-sweep
To further refine the distinction, tracing collectors can also be divided by considering how the three sets of objects
(white, grey, and black) are maintained during a collection cycle.
The most straightforward approach is the semi-space collector, which dates to 1969. In this moving GC scheme,
memory is partitioned into a "from space" and "to space". Initially, objects are allocated into "to space" until they
become full and a collection is triggered. At the start of a collection, the "to space" becomes the "from space", and
vice versa. The objects reachable from the root set are copied from the "from space" to the "to space". These objects
are scanned in turn, and all objects that they point to are copied into "to space", until all reachable objects have been
copied into "to space". Once the program continues execution, new objects are once again allocated in the "to space"
Garbage collection (computer science) 6
until it is once again full and the process is repeated. This approach has the advantage of conceptual simplicity (the
three object color sets are implicitly constructed during the copying process), but the disadvantage that a (possibly)
very large contiguous region of free memory is necessarily required on every collection cycle. This technique is also
known as stop-and-copy. Cheney's algorithm is an improvement on the semi-space collector.
A mark and sweep garbage collector maintains a bit (or two) with each object to record whether it is white or black;
the grey set is either maintained as a separate list (such as the process stack) or using another bit. As the reference
tree is traversed during a collection cycle (the "mark" phase), these bits are manipulated by the collector to reflect the
current state. A final "sweep" of the memory areas then frees white objects. The mark and sweep strategy has the
advantage that, once the unreachable set is determined, either a moving or non-moving collection strategy can be
pursued; this choice of strategy can even be made at runtime, as available memory permits. It has the disadvantage of
"bloating" objects by a small amount.
A mark and don't sweep garbage collector, like the mark-and-sweep, maintains a bit with each object to record
whether it is white or black; the gray set is either maintained as a separate list (such as the process stack) or using
another bit. There are two key differences here. First, black and white mean different things than they do in the mark
and sweep collector. In a "mark and don't sweep" system, all reachable objects are always black. An object is marked
black at the time it is allocated, and it will stay black even if it becomes unreachable. A white object is unused
memory and may be allocated. Second, the interpretation of the black/white bit can change. Initially, the black/white
bit may have the sense of (0=white, 1=black). If an allocation operation ever fails to find any available (white)
memory, that means all objects are marked used (black). The sense of the black/white bit is then inverted (for
example, 0=black, 1=white). Everything becomes white. This momentarily breaks the invariant that reachable
objects are black, but a full marking phase follows immediately, to mark them black again. Once this is done, all
unreachable memory is white. No "sweep" phase is necessary.
Generational GC (ephemeral GC)
It has been empirically observed that in many programs, the most recently created objects are also those most likely
to become unreachable quickly (known as infant mortality or the generational hypothesis). A generational GC (also
known as ephemeral GC) divides objects into generations and, on most cycles, will place only the objects of a subset
of generations into the initial white (condemned) set. Furthermore, the runtime system maintains knowledge of when
references cross generations by observing the creation and overwriting of references. When the garbage collector
runs, it may be able to use this knowledge to prove that some objects in the initial white set are unreachable without
having to traverse the entire reference tree. If the generational hypothesis holds, this results in much faster collection
cycles while still reclaiming most unreachable objects.
In order to implement this concept, many generational garbage collectors use separate memory regions for different
ages of objects. When a region becomes full, those few objects that are referenced from older memory regions are
promoted (copied) up to the next highest region, and the entire region can then be overwritten with fresh objects.
This technique permits very fast incremental garbage collection, since the garbage collection of only one region at a
time is all that is typically required.
Generational garbage collection is a heuristic approach, and some unreachable objects may not be reclaimed on each
cycle. It may therefore occasionally be necessary to perform a full mark and sweep or copying garbage collection to
reclaim all available space. In fact, runtime systems for modern programming languages (such as Java and the .NET
Framework) usually use some hybrid of the various strategies that have been described thus far; for example, most
collection cycles might look only at a few generations, while occasionally a mark-and-sweep is performed, and even
more rarely a full copying is performed to combat fragmentation. The terms "minor cycle" and "major cycle" are
sometimes used to describe these different levels of collector aggression.
Garbage collection (computer science) 7
Stop-the-world vs. incremental vs. concurrent
Simple stop-the-world garbage collectors completely halt execution of the program to run a collection cycle, thus
guaranteeing that new objects are not allocated and objects do not suddenly become unreachable while the collector
is running.
This has the obvious disadvantage that the program can perform no useful work while a collection cycle is running
(sometimes called the "embarrassing pause"). Stop-the-world garbage collection is therefore mainly suitable for
non-interactive programs. Its advantage is that it is both simpler to implement and faster than incremental garbage
collection.
Incremental and concurrent garbage collectors are designed to reduce this disruption by interleaving their work with
activity from the main program. Incremental garbage collectors perform the garbage collection cycle in discrete
phases, with program execution permitted between each phase (and sometimes during some phases). Concurrent
garbage collectors do not stop program execution at all, except perhaps briefly when the program's execution stack is
scanned. However, the sum of the incremental phases takes longer to complete than one batch garbage collection
pass, so these garbage collectors may yield lower total throughput.
Careful design is necessary with these techniques to ensure that the main program does not interfere with the garbage
collector and vice versa; for example, when the program needs to allocate a new object, the runtime system may
either need to suspend it until the collection cycle is complete, or somehow notify the garbage collector that there
exists a new, reachable object.
Precise vs. conservative and internal pointers
Some collectors can correctly identify all pointers (references) in an object; these are called precise (also exact or
accurate) collectors, the opposite being a conservative or partly conservative collector. Conservative collectors
assume that any bit pattern in memory could be a pointer if, interpreted as a pointer, it would point into an allocated
object. Conservative collectors may produce false positives, where unused memory is not released because of
improper pointer identification. This is not always a problem in practice unless the program handles a lot of data that
could easily be misidentified as a pointer. False positives are generally less problematic on 64-bit systems than on
32-bit systems because the range of valid memory addresses tends to be a tiny fraction of the range of 64-bit values.
Thus, an arbitrary 64-bit pattern is unlikely to mimic a valid pointer. Whether a precise collector is practical usually
depends on the type safety properties of the programming language in question. An example for which a
conservative garbage collector would be needed is the C language, which allows typed (non-void) pointers to be type
cast into untyped (void) pointers, and vice versa.
A related issue concerns internal pointers, or pointers to fields within an object. If the semantics of a language allow
internal pointers, then there may be many different addresses that can refer to parts of the same object, which
complicates determining whether an object is garbage or not. An example for this is the C++ language, in which
multiple inheritance can cause pointers to base objects to have different addresses. Even in languages like Java,
internal pointers can exist during the computation of, say, an array element address. In a tightly optimized program,
the corresponding pointer to the object itself may have been overwritten in its register, so such internal pointers need
to be scanned.
Garbage collection (computer science) 8
Performance implications
Tracing garbage collectors require some implicit runtime overhead that may be beyond the control of the
programmer, and can sometimes lead to performance problems. For example, commonly used stop-the-world
garbage collectors, which pause program execution at arbitrary times, may make garbage collection inappropriate for
some embedded systems, high-performance server software, and applications with real-time needs.
Manual heap allocation
• search for best/first-fit block of sufficient size
• free list maintenance
Garbage collection
• locate reachable objects
• copy reachable objects for moving collectors
• read/write barriers for incremental collectors
• search for best/first-fit block and free list maintenance for non-moving collectors
It is difficult to compare the two cases directly, as their behavior depends on the situation. For example, in the best
case for a garbage collecting system, allocation just increments a pointer, but in the best case for manual heap
allocation, the allocator maintains freelists of specific sizes and allocation only requires following a pointer.
However, this size segregation usually cause a large degree of external fragmentation, which can have an adverse
impact on cache behaviour. Memory allocation in a garbage collected language may be implemented using heap
allocation behind the scenes (rather than simply incrementing a pointer), so the performance advantages listed above
don't necessarily apply in this case. In some situations, most notably embedded systems, it is possible to avoid both
garbage collection and heap management overhead by preallocating pools of memory and using a custom,
lightweight scheme for allocation/deallocation.[5]
The overhead of write barriers is more likely to be noticeable in an imperative-style program which frequently writes
pointers into existing data structures than in a functional-style program which constructs data only once and never
changes them.
Some advances in garbage collection can be understood as reactions to performance issues. Early collectors were
stop-the-world collectors, but the performance of this approach was distracting in interactive applications.
Incremental collection avoided this disruption, but at the cost of decreased efficiency due to the need for barriers.
Generational collection techniques are used with both stop-the-world and incremental collectors to increase
performance; the trade-off is that some garbage is not detected as such for longer than normal.
Determinism
• Tracing garbage collection is not deterministic in the timing of object finalization. An object which becomes
eligible for garbage collection will usually be cleaned up eventually, but there is no guarantee when (or even if)
that will happen. This is an issue for program correctness when objects are tied to non-memory resources, whose
release is an externally visible program behavior, such as closing a network connection, releasing a device or
closing a file. One garbage collection technique which provides determinism in this regard is reference counting.
• Garbage collection can have a nondeterministic impact on execution time, by potentially introducing pauses into
the execution of a program which are not correlated with the algorithm being processed. Under tracing garbage
collection, the request to allocate a new object can sometimes return quickly and at other times trigger a lengthy
garbage collection cycle. Under reference counting, whereas allocation of objects is usually fast, decrementing a
reference is nondeterministic, since a reference may reach zero, triggering recursion to decrement the reference
counts of other objects which that object holds.
Garbage collection (computer science) 9
Real-time garbage collection
While garbage collection is generally nondeterministic, it is possible to use it in hard real-time systems. A real-time
garbage collector should guarantee that even in the worst case it will dedicate a certain number of computational
resources to mutator threads. Constraints imposed on a real-time garbage collector are usually either work based or
time based. A time based constraint would look like: within each time window of duration T, mutator threads should
be allowed to run at least for Tm time. For work based analysis, MMU (minimal mutator utilization)[6] is usually
used as a real time constraint for the garbage collection algorithm.
One of the first implementations of real-time garbage collection for the JVM was work on the Metronome
algorithm.[7] There are other commercial implementations.[8]
Reference counting
Reference counting is a form of garbage collection whereby each object has a count of the number of references to it.
Garbage is identified by having a reference count of zero. An object's reference count is incremented when a
reference to it is created, and decremented when a reference is destroyed. The object's memory is reclaimed when the
count reaches zero.
Compared to tracing garbage collection, reference counting guarantees that objects are destroyed as soon as they
become unreachable (assuming that there are no reference cycles), and usually only accesses memory which is either
in CPU caches, in objects to be freed, or directly pointed by those, and thus tends to not have significant negative
side effects on CPU cache and virtual memory operation.
There are some disadvantages to reference counting:
• If two or more objects refer to each other, they can create a cycle whereby neither will be collected as their mutual
references never let their reference counts become zero. Some garbage collection systems using reference
counting (like the one in CPython) use specific cycle-detecting algorithms to deal with this issue.[9] Another
strategy is to use weak references for the "backpointers" which create cycles. Under reference counting, a weak
reference is similar to a weak reference under a tracing garbage collector. It is a special reference object whose
existence does not increment the reference count of the referent object. Furthermore, a weak reference is safe in
that when the referent object becomes garbage, any weak reference to it lapses, rather than being permitted to
remain dangling, meaning that it turns into a predictable value, such as a null reference.
• In naive implementations, each assignment of a reference and each reference falling out of scope often require
modifications of one or more reference counters. However, in the common case, when a reference is copied from
an outer scope variable into an inner scope variable, such that the lifetime of the inner variable is bounded by the
lifetime of the outer one, the reference incrementing can be eliminated. The outer variable "owns" the reference.
In the programming language C++, this technique is readily implemented and demonstrated with the use of
const references. Reference counting in C++ is usually implemented using "smart pointers" whose constructors,
destructors and assignment operators manage the references. A smart pointer can be passed by reference to a
function, which avoids the need to copy-construct a new reference (which would increase the reference count on
entry into the function and decrease it on exit). Instead the function receives a reference to the smart pointer
which is produced inexpensively.
• When used in a multithreaded environment, these modifications (increment and decrement) may need to be
atomic operations such as compare-and-swap, at least for any objects which are shared, or potentially shared
among multiple threads. Atomic operations are expensive on a multiprocessor, and even more expensive if they
have to be emulated with software algorithms. It is possible to avoid this issue by adding per-thread or per-CPU
reference counts and only accessing the global reference count when the local reference counts become or are no
longer zero (or, alternatively, using a binary tree of reference counts, or even giving up deterministic destruction
in exchange for not having a global reference count at all), but this adds significant memory overhead and thus
Garbage collection (computer science) 10
tends to be only useful in special cases (it's used, for example, in the reference counting of Linux kernel modules).
• Naive implementations of reference counting do not in general provide real-time behavior, because any pointer
assignment can potentially cause a number of objects bounded only by total allocated memory size to be
recursively freed while the thread is unable to perform other work. It is possible to avoid this issue by delegating
the freeing of objects whose reference count dropped to zero to other threads, at the cost of extra overhead.
Escape analysis
Escape analysis can be used to convert heap allocations to stack allocations, thus reducing the amount of work
needed to be done by the garbage collector.
Compile-time
Compile-time garbage collection is a form of static analysis allowing memory to be reused and reclaimed based on
invariants known during compilation. This form of garbage collection has been studied in the Mercury programming
language[10]
Availability
Generally speaking, higher-level programming languages are more likely to have garbage collection as a standard
feature. In languages that do not have built in garbage collection, it can often be added through a library, as with the
Boehm garbage collector for C and C++. This approach is not without drawbacks, such as changing object creation
and destruction mechanisms.
Most functional programming languages, such as ML, Haskell, and APL, have garbage collection built in. Lisp,
which introduced functional programming, is especially notable for introducing this mechanism.
Other dynamic languages, such as Ruby (but not Perl 5, or PHP, which use reference counting), also tend to use GC.
Object-oriented programming languages such as Smalltalk, Java and ECMAScript usually provide integrated
garbage collection. Notable exceptions are C++ and Delphi which have destructors. Objective-C has not traditionally
had it, but ObjC 2.0 as implemented by Apple for Mac OS X uses a runtime collector developed in-house, while the
GNUstep project uses a Boehm collector.
Historically, languages intended for beginners, such as BASIC and Logo, have often used garbage collection for
heap-allocated variable-length data types, such as strings and lists, so as not to burden programmers with manual
memory management. On early microcomputers, with their limited memory and slow processors, BASIC garbage
collection could often cause apparently random, inexplicable pauses in the midst of program operation. Some BASIC
interpreters such as Applesoft BASIC on the Apple II family, had terribly inefficient garbage collectors for strings
which repeatedly scanned the string descriptors for the string having the highest address in order to compact it
toward high memory. This one-string-at-a-time processing loop resulted in O(N*N) time performance in the number
of strings, which would introduce a pause more than one minute long into the execution of string-intensive programs.
A replacement garbage collector for Applesoft BASIC published in Call-A.P.P.L.E. (January 1981, pages 40–45,
Randy Wigginton) identified a group of strings in every pass over the heap, reducing a pause of two minutes into less
than a second depending on the size of the group. Other approaches were published, but none ever made it into a new
revision of the BASIC interpreter.
Garbage collection (computer science) 11
Limited environments
Garbage collection is rarely used on embedded or real-time systems because of the perceived need for very tight
control over the use of limited resources. However, garbage collectors compatible with such limited environments
have been developed.[11] The Microsoft .NET Micro Framework and Java Platform, Micro Edition are embedded
software platforms that, like their larger cousins, include garbage collection.
References
[1] "Recursive functions of symbolic expressions and their computation by machine" (http:/ / portal. acm. org/ citation. cfm?id=367177. 367199).
Portal.acm.org. . Retrieved 29 March 2009.
[2] "Recursive functions of symbolic expressions and their computation by machine, Part I" (http:/ / www-formal. stanford. edu/ jmc/ recursive.
html). . Retrieved 29 May 2009.
[3] http:/ / msdn. microsoft. com/ en-us/ library/ ms404247. aspx
[4] "Copying and Pinning" (http:/ / msdn2. microsoft. com/ en-us/ library/ 23acw07k. aspx). Msdn2.microsoft.com. . Retrieved 9 July 2010.
[5] "Memory allocation in embedded systems" (http:/ / www. eros-os. org/ pipermail/ cap-talk/ 2007-January/ 006795. html). Eros-os.org. .
Retrieved 29 March 2009.
[6] Perry Cheng; Guy E. Blelloch (22 June 2001). "A parallel, real-time garbage collector". ACM SIGPLAN Notices 36 (5): 125–136.
doi:10.1145/381694.378823.
[7] "The Metronome: A Simpler Approach to Garbage Collection in Real-Time Systems" (http:/ / www. research. ibm. com/ people/ d/ dfb/
papers/ Bacon03Metronome. pdf). .
[8] "Real-time Java, Part 4: Real-time garbage collection" (http:/ / www. ibm. com/ developerworks/ java/ library/ j-rtj4/ index. html). .
[9] "Reference Counts" (http:/ / www. python. org/ doc/ 2. 5. 2/ ext/ refcounts. html). Extending and Embedding the Python Interpreter. 21
February 2008. . Retrieved 13 November 2008. "While Python uses the traditional reference counting implementation, it also offers a cycle
detector that works to detect reference cycles."
[10] "Compile-time garbage collection for the declarative language Mercury" (http:/ / www. mercury. csse. unimelb. edu. au/ information/
papers. html#mazur-thesis). .
[11] "Wei Fu and Carl Hauser, "A Real-Time Garbage Collection Framework for Embedded Systems". ACM SCOPES '05, 2005" (http:/ / portal.
acm. org/ ft_gateway. cfm?id=1140392& type=pdf& coll=GUIDE& dl=GUIDE& CFID=15151515& CFTOKEN=6184618). Portal.acm.org. .
Retrieved 9 July 2010.
Further reading
• Jones, Richard; Hosking, Antony; Moss, Eliot (19 August 2011). The Garbage Collection Handbook: The Art of
Automatic Memory Management. CRC Applied Algorithms and Data Structures Series. Chapman and Hall/CRC.
ISBN 1-4200-8279-5.
• Jones, Richard; Lins, Rafael D. (1996). Garbage Collection: Algorithms for Automatic Dynamic Memory
Management. Wiley. ISBN 0-471-94148-4.
• Wilson, Paul R. (1992). "Uniprocessor Garbage Collection Techniques" (http:/ / citeseerx. ist. psu. edu/ viewdoc/
summary?doi=10. 1. 1. 47. 2438). IWMM '92 Proceedings of the International Workshop on Memory
Management (Springer-Verlag).
External links
• The Garbage Collection Handbook: The Art of Automatic Memory Management (http:/ / gchandbook. org)
• Richard Jones' Garbage Collection Page (http:/ / www. cs. kent. ac. uk/ people/ staff/ rej/ gc. html) at the
University of Kent
• The Memory Management Reference (http:/ / www. memorymanagement. org/ )
• The Very Basics of Garbage Collection (http:/ / basen. oru. se/ kurser/ koi/ 2008-2009-p1/ texter/ gc/ index. html),
by Thomas Padron-McCarthy
• Publications by the OOPS group (http:/ / www. cs. utexas. edu/ users/ oops/ papers. html) at the University of
Texas at Austin
• A garbage collector for C and C++ (http:/ / www. hpl. hp. com/ personal/ Hans_Boehm/ gc/ ) by Hans Boehm
Garbage collection (computer science) 12
• Expensive Explicit Deallocation: An Example (http:/ / www. hpl. hp. com/ personal/ Hans_Boehm/ gc/ example.
html) by Hans Boehm
• On-the-fly garbage collection: an exercise in cooperation (http:/ / www. cs. utexas. edu/ users/ EWD/ ewd05xx/
EWD595. PDF) by Edsger W. Dijkstra and Leslie Lamport and A.J.Martin and C.S.Scholten and E.F.M.Steffens
• A Garbage Collection Course Curriculum at MSDN Academic Alliance (http:/ / www. academicresourcecenter.
net/ curriculum/ pfv. aspx?ID=6182)
• A Glance at Garbage Collection in Object-Oriented Languages (http:/ / www. osnews. com/ story.
php?news_id=6864)
• Notes on the CLR Garbage Collector (http:/ / www. vineetgupta. com/ 2007/ 01/
notes-on-the-clr-garbage-collector)
• Understanding GC pauses in HotSpot JVM (http:/ / aragozin. blogspot. com/ 2011/ 06/
understanding-gc-pauses-in-jvm-hotspots. html)
• Java SE 6 HotSpot™ Virtual Machine Garbage Collection Tuning (http:/ / www. oracle. com/ technetwork/ java/
javase/ gc-tuning-6-140523. html)
Implementations
• A Real-Time Garbage Collector Based on the Lifetimes of Objects (http:/ / web. media. mit. edu/ ~lieber/
Lieberary/ GC/ Realtime/ Realtime. html) by H. Lieberman and C. Hewitt, MIT Artificial Intelligence Laboratory
• TinyGC - an independent implementation of the BoehmGC API (http:/ / tinygc. sourceforge. net/ )
• OpenRTL - Contains a mark and sweep collector for a standard dynamic allocation heap, with precise scanning
support. (http:/ / code. google. com/ p/ openrtl)
• Conservative Garbage Collection Implementation for C Language (http:/ / www. codeproject. com/ KB/ cpp/
conservative_gc. aspx) by Yasin Hınıslıoğlu
• MeixnerGC - an incremental mark and sweep garbage collector for C++ using smart pointers (http:/ / sourceforge.
net/ projects/ meixnergc/ )
Article Sources and Contributors 13
Article Sources and Contributors
Garbage collection (computer science) Source: http://en.wikipedia.org/w/index.php?oldid=496091366 Contributors: .alyn.post., 213.253.39.xxx, Abhinabab, Abigor, Alexey.ragozin, Alksub,
Almkglor, Alvin-cs, Ansible, Arjayay, Asqueella, Athox, Austin512, Awaterl, Axlrosen, Being blunt, Beland, Ben-Zin, Bentogoa, Bevo, BigFatBuddha, Bluezy, Bob Armstrong, Bovlb, Cadr,
CanisRufus, Cbraga, Cengelhard, Centrx, Chimpex, Chip Zero, Conversion script, Curly Turkey, Curps, Cybercobra, D. F. Schmidt, Damian Yerrick, Danhicks, Darguz Parsilvan, Darius Bacon,
DavidCary, Dcoetzee, Derek Ross, Dfeuer, Dgpop, DocWatson42, Dodek, Doug Bell, Dreadstar, Drundia, Dusclaux, Dysprosia, Eaglizard, Ebraminio, Ed g2s, Ederiel, Eldray, Enauspeaker,
EngineerScotty, Evoisard, Ewlyahoocom, Falcon Kirtaran, Fangfufu, FatalError, Firsfron, Fragglet, Fubar Obfusco, Furrykef, GSlicer, Gadfium, Gaius Cornelius, Gayasri, Georgevulov,
Gerbrant, Gf uip, Gjking, Gortsack, Gpvos, Grayscale, GregorB, Guy Harris, Gwern, Haham hanuka, Haikupoet, Hairy Dude, Hannes Hirzel, HarryHenryGebel, Hashar, Headbomb, Heirpixel,
Himanshubansal05, Hirzel, Hvammen, Iain.mcclatchie, Iggymwangi, Isilanes, IvanLanin, Ivmai77, Jerome pic, Jerryobject, Jerub, Jj137, Jkl, Johnleemk, JonHarder, JoseSilvaSilva, Kesla, Kieff,
Kindall, Kinema, Kurykh, KuwarOnline, Kwi, Landon Judkins, LaruaWA11, Law, Ldo, Lee Carre, Lfwlfw, Lilwik, Loffe, Lonesomefighter, Luna Santin, MFNickster, Maghnus, MarkSweep,
Markpapadakis, Marokwitz, Marudubshinki, Materialscientist, MaxDZ8, Mayur, MetalGearLiquid, Michal Jurosz, MightyOcelot, Mike Lin, Minghong, Motherofinvention, Mozjag, Mrwojo,
Mshonle, Murray Langton, Music Sorter, Nandhp, Neilc, Nikitadanilov, Nixeagle, NoisyJinx, Nothings, Nurg, Nwwaew, OlEnglish, Oleg Alexandrov, On5deu, Orderud, Ortolan88, Orz,
OsamaBinLogin, Palmcluster, Panzi, Paul Stansifer, Perfecto, Personman, Pibara, Pinar, Pmokeefe, Pne, PowerPaul86, Pretzelpaws, Qef, Qwertyus, R. S. Shaw, RHaworth, Radagast83, Raise
exception, Rcbarnes, Rjwilmsi, Roberta F., RoySmith, Ruakh, Rursus, Ruud Koot, Rwendland, Sam Hocevar, Scm83x, Sct72, Shadowjams, Shenme, Simetrical, Siroxo, Skaller, Slaniel, Sleske,
Stupidsing, T Long, TakuyaMurata, Tanner Swett, Teapeat, Tempshill, The Anome, Thumperward, Tobias Hoevekamp, Tony Sidaway, Torc2, Uzwal47, Vadmium, Van der Hoorn, Varmaa,
VictorAnyakin, Virtlink, VladV, Waggers, Wavelength, Wellithy, West.andrew.g, Wlievens, Wolfkeeper, Woohookitty, Ww, XDanielx, Xodarap00, Yahoolian, Yoric, ZX81, Zaak, Zebsy,
Zorak, 342 anonymous edits
License
Creative Commons Attribution-Share Alike 3.0 Unported
//creativecommons.org/licenses/by-sa/3.0/

VM SECURITY ISSUES

Here's a look at the five top virtual server security concerns of the moment.
1. Managing oversight and responsibility
The overarching issue with virtual servers is responsibility, MacDonald says. Unlike physical servers, which are the direct responsibility of the data-center or IT managers in whose physical domain they sit, responsibility for virtual servers is often left up in the air. Should the business-unit that requested it be able to configure and secure it? Should it be the IT manager closest to the physical host? A centralized master sysadmin tasked with management and security for all the virtualized assets in an enterprise?
"People don't appreciate that when you add virtual servers there's another layer there of technology in addition to the application and the operating system and the hardware, and you have to secure it, MacDonald says.
2. Patching and maintenance
The most tangible risk that can come out of a lack of responsibility is the failure to keep up with the constant, labor-intensive process of patching, maintaining and securing each virtual server in a company. Unlike the physical servers on which they sit, which are launched and configured by hands-on IT managers who also install the latest patches, virtual machines tend to be launched from server images that may have been created, configured and patched weeks or months before.
Most companies maintain a small number of general-purpose "golden" images from which to launch or relaunch new VMs for many purposes, but also keep dozens or hundreds of server images stored on DVD or disk after being laboriously configured to support specific applications or business requirements, MacDonald says.
"You can take a snapshot of a virtual machine and write it off to disk so you don't have to recreate it the next time, or for disaster recovery. Just fire off one of these virtual machines sitting in offline libraries. But for the most part they're not being kept up to date with A/V signatures and patches, " MacDonald says. "Someone should check when they do launch one, but often they don't, and there isn't usually a way to check."
Both Microsoft and VMware supply patch-management schedules with their base infrastructure products. Both require disk images stored in libraries to be launched periodically so they can be patched.
That's a tedious process for companies with libraries of hundreds of VM images, however, and does nothing to address the patch status of VMs that are running but might not have been patched or had new antivirus signatures installed for weeks or months. Of course, VMware, HP, and many startup companies are trying to help IT automate much of this work right now with management products.
3. Visibility and compliance
Virtual servers are designed to be, if not invisible, then at least very low profile, at least within the data center. All the storage or bandwidth or floor space or electricity they need comes from the physical server on which they sit. To data-center managers not specifically tasked with monitoring all the minute interactions of the VMs inside each host, a set of virtual servers becomes an invisible network within which there are few controls.
"Virtual switch implementations let the VMs talk to each other, and across the network," MacDonald says. "But unless you put virtualized security controls—virtual sniffers, virtual firewalls, all the same controls you'd use on a physical server, inside that network, you don't see what's going on."
"There are a lot of compliance and use issues," McDonald says."Just because you don't have a sniffer to see those packets moving between the virtual servers doesn't mean they're not there," MacDonald says. "You could have a HIPPA-controlled workload talking to a non-HIPPA workload, or PCI and non-PCI workloads talking to each other. That puts you in a bad position. You would know if you looked at the packets on that network, but those packets are not coming out of the box for you to look at, so unless you take extra steps, you wouldn't know."
Microsoft, VMware and Citrix are all building some level of visibility and control over those interactions into their base products, but the level of function is nowhere near the point that customers will be secure, MacDonald says.
Silicon Valley startup Altor is finding some fans for its virtual firewalls, as is Reflex Systems, which migrated from physical to virtual firewalls to keep up with growth in that market, MacDonald says.
"Cisco's not there yet, Juniper's not there; we haven't reached the tipping point where the traditional networking vendors feel they have to be able to reach into virtual machines," MacDonald says.
In many cases, customers either don't know or don't care about certain risks. A poll of 109 attendees at the RSA Conference 2009 in Las Vegas last month, conducted and published by virtual-security software provider Secure Passage, indicated that 72 percent of respondents have not deployed virtual firewalls of any kind. The most frequent reasons cited: the limited visibility respondents had into virtual networks, the difficulty of managing virtual security and lack of understanding regarding what constitutes a virtual firewall.
VMSafe, the APIs that VMware built into the VSphere version of its virtual infrastructure product, makes it possible for third-party security vendors to apply their applications to VMware VMs. The company also announced at the RSA conference that it had built RSA's data loss prevention software into vSphere to enhance its security.
"They're making progress," MacDonald says of VMware and Microsoft. "They're not where we need them to be yet."
Simon Crosby, chief technology officer of Citrix Systems, said during a security debate at the RSA conference that security should be built into the applications, not the hypervisor or virtual-infrastructure management products.
He said paying attention to the security configuration guidelines that Citrix and other hypervisor vendors publish can fix most of the security issues and that industry groups such as the Cloud Security Alliance can extend that guidance to include process-management and policy issues.
4. VM sprawl
Another consequence of the lack of oversight of virtual machines is sprawl—the uncontrolled proliferation of virtual machines launched, and often forgotten, by IT managers, developers or business-unit managers who want extra servers for some specific purpose, and lose track of them later.
VM sprawl wastes resources, creates unmonitored servers that could have access to sensitive data, and sets the company as a whole and IT in particular up for a painful cleanup when a problem crops up later, Steffen says.
"We try to treat the VMs in exactly the same way we do physical machines—with system scans, antivirus, and everything else. That includes going through a procurement process for VMs just as if they were physical machines," Steffen says.
Forcing business unit managers to fill out requisitions and explain why they want an additional VM, for what, and for how long slows the process down, which could be considered inefficient, but also gives everyone involved time to think about how necessary each new VM is.
"We don't do that if they need to replace a server they're already running," Steffen says. "But with VMs you have the potential for VMs to get completely out of hand and have so many out there you can't do anything about how secure they are."
The Secure Passage poll of RSA attendees showed 42 percent were concerned about sprawl, specifically the lack of controls available to keep business unit managers from spawning off new servers at will, rather than coordinating with IT to make sure they are managed and secure.
5. Managing Virtual Appliances
One of the very best things about virtual infrastructures is the ability to buy or test a product from a third-party vendor and have it up and running in minutes, rather than having to clear space on a test server, install the software, get it to talk to the operating system and the network and then, hours later, see whether it does what it's supposed to, MacDonald says.
Unfortunately, virtual appliances are also virtual pigs in a poke. "There's an operating system and application in every package, every one with its own configuration and patch status and you have no idea what's in there or who's going to maintain it or what the long-term risk is going to be," MacDonald says. "It has a full application and OS all configured and ready to run. In five minutes you can try out that new anti-spam server. But what OS is in the package and is it patched, and if not, who is going to give you the patch? "


 

No comments:

Post a Comment