Thursday, December 13, 2007

Memory Management in Linux

Linux is a Unix-like computer operating system. Linux is one of the most prominent examples of free software and open source development; typically all underlying source code can be freely modified, used, and redistributed by anyone.

The Linux kernel was first released to the public on 17 September 1991, for the Intel x86 PC architecture. The kernel was augmented with system utilities and libraries from the GNU project to create a usable operating system, which led to an alternative term, GNU/Linux. Linux is packaged for different uses in Linux distributions, which contain the sometimes modified kernel along with a variety of other software packages tailored to different requirements.

Predominantly known for its use in servers, Linux is supported by corporations such as Dell, Hewlett-Packard, IBM, Novell, Oracle Corporation, Red Hat, and Sun Microsystems. It is used as an operating system for a wide variety of computer hardware, including desktop computers, supercomputers, game systems, such as PlayStation 2, 3, several arcade games, and embedded devices, such as mobile phones and routers.

Linux Memory Management

The Linux memory manager implements demand paging with a copy-on-write strategy relying on the 386's paging support. A process acquires its page tables from its parent (during a fork()) with the entries marked as read-only or swapped. Then, if the process tries to write to that memory space, and the page is a copy-on-write page, it is copied, and the page is marked read-write. An exec() results in the reading in of a page or so from the executable. The process then faults in any other pages it needs.

Each process has a page directory which means it can access 1 KB of page tables pointing to 1 MB of 4 KB pages which is 4 GB of memory. A process' page directory is initialized during a fork by copy_page_tables(). The idle process has its page directory initialized during the initialization sequence.

Each user process has a local descriptor table that contains a code segment and data-stack segment. These user segments extend from 0 to 3 GB (0xc0000000). In user space, linear addresses and logical addresses are identical.

On the 80386, linear address run from 0GB to 4GB. A linear address points to a particular memory location within this space. A linear address is not a physical address--it is a virtual address. A logical address consists of a selector and an offset. The selector points to a segment and the offset tells how far into that segment the address is located)

The kernel code and data segments are privileged segments defined in the global descriptor table and extend from 3 GB to 4 GB. The swapper page directory (swapper_page_dir is set up so that logical addresses and physical addresses are identical in kernel space.

The space above 3 GB appears in a process' page directory as pointers to kernel page tables. This space is invisible to the process in user mode but the mapping becomes relevant when privileged mode is entered, for example, to handle a system call. Supervisor mode is entered within the context of the current process so address translation occurs with respect to the process' page directory but using kernel segments. This is identically the mapping produced by using the swapper_pg_dir and kernel segments as both page directories use the same page tables in this space. Only task[0] (the idle task, sometimes called the swapper task for historical reasons, even though it has nothing to do with swapping in the Linux implementation) uses the swapper_pg_dir directly.

  • The user process' segment_base = 0x00, page_dir private to the process.
  • user process makes a system call: segment_base=0xc0000000 page_dir = same user page_dir.
  • swapper_pg_dir contains a mapping for all physical pages from 0xc0000000 to 0xc0000000 + end_mem, so the first 768 entries in swapper_pg_dir are 0's, and then there are 4 or more that point to kernel page tables.
  • The user page directories have the same entries as swapper_pg_dir above 768. The first 768 entries map the user space.
The upshot is that whenever the linear address is above 0xc0000000 everything uses the same kernel page tables.

The user stack sits at the top of the user data segment and grows down. The kernel stack is not a pretty data structure or segment that I can point to with a ``yon lies the kernel stack.'' A kernel_stack_frame (a page) is associated with each newly created process and is used whenever the kernel operates within the context of that process. Bad things would happen if the kernel stack were to grow below its current stack frame.

User pages can be stolen or swapped. A user page is one that is mapped below 3 GB in a user page table. This region does not contain page directories or page tables. Only dirty pages are swapped.

Minor alterations are needed in some places (tests for process memory limits comes to mind) to provide support for programmer defined segments.

Linux Virtual memory

Introduction

The Linux philosophy regarding memory usage is that “unused memory is wasted memory”. So what does that mean when you look at the free list when using the top utility or vmstat? It means that top is showing you wasted memory, or rather, memory that is not currently needed. Looking at the free column alone to determine current memory usage is misleading, in that it gives you an incomplete view of the whole memory picture.

In this paper I will try to provide a more complete view of Linux memory management and highlight a few tools that will help reach this goal. I will also outline a method for quickly determining a Linux system's memory use, which should prove handy when you need to eliminate possible contributors to bad performance or chase down memory related errors. The tools and methods I will use in the examples have been chosen due to their availability and ease of use, and can be applied to any Linux system if you want to re-create the scenarios on your own equipment.

First off, we should get some terms defined that appear throughout this document. Particular focus will be paid to the Page Cache throughout this paper. The reason for this, is that the page cache is where it seems all our memory winds up.

Definitions

Page - a discrete unit of memory that is manipulated by the Linux kernel. In systems that utilize Intel 32 bit processors, a page is 4096 bytes, or 4 kilobytes (KB).

Anonymous memory - when a process requests memory from the kernel via the malloc() system call, the process is assigned memory that has no file backing on disk. This is why it is called "anonymous". When this memory is allocated, a reservation is taken against physical swap space on disk. This way, when the kernel needs to free up memory (due to pressure from processes that need more memory or when new processes start), this area will be used to write out the changed pages. The kernel will then add these reclaimed pages to the free list. When a process tries to access pages that have since been paged to the swap area, those pages need to be read back from disk and written into memory.

Buffer cache - The buffer cache is the area of memory set aside to buffer blocks read from or written to disk. This disk activity is known as disk I/O, or disk input and output. Buffer cache also contains filesystem metadata, such as directory structure data and filesystem journaling information.

Page cache - The area of memory set aside for filesystem and process pages that have been read in from disk, or pages that have no file backing. If the kernel needs to allocate memory to a process, and it finds the pages here, there will be no disk I/O operation. The page cache contains anonymous memory pages, processes' executable pages and pages of regular files open for reading and writing. The Linux kernel tries to keep this as large as possible to maintain fast file operations.

Paging - The act of moving pages of memory in to and out from disk. Paging in refers to loading a process's executable image and associated data into memory at startup. It also refers to loading pages into memory that were previously written to swap. Paging out occurs when pages are written to disk in order to free memory. This paging out can be either to its file backing on the filesystem or to disk based swap.

Free list - The pool from which memory allocations are satisfied. The Linux kernel tries to keep the free list at a certain size so that allocations need not always be satisfied from cache. The kernel uses a method of aging where the least recently used pages eventually filter down through different states and are then candidates for being placed here.

Cache hit rate - the rate at which a system can find a page in cache. A miss indicates a read from disk for the requested page, which is slow and needs to be avoided.



Wednesday, November 28, 2007

Research Topics 2




Virtual memory in UNIX

Virtual memory is an internal “trick” that relies on the fact that not every executing task is always referencing it’s RAM memory region. Since all RAM regions are not constantly in-use, UNIX has developed a paging algorithm that move RAM memory pages to the swap disk when it appears that they will not be needed in the immediate future.

RAM demand paging in UNIX;

As memory regions are created, UNIX will not refuse a new task whose RAM requests exceeds the amount of RAM. Rather, UNIX will page out the least recently referenced RAM memory page to the swap disk to make room for the incoming request. When the physical limit of the RAM is exceeded UNIX can wipe-out RAM regions because they have already been written to the swap disk. When the RAM region is been removed to swap, any subsequent references by the originating program require UNIX copy page in the RAM region to make the memory accessible. UNIX page in operations involve disk I/O and are a source of slow performance. Hence, avoiding UNIX page in operations is an important concern for the Oracle DBA.

Memory management

Memory management is the act of managing computer memory. In its simpler forms, this involves providing ways to allocate portions of memory to programs at their request, and freeing it for reuse when no longer needed.

Virtual memory systems separate the memory addresses used by a process from actual physical addresses, allowing separation of processes and increasing the effectively available amount of RAM using disk swapping. The quality of the virtual memory manager can have a big impact on overall system performance.

Garbage collection is the automated allocation, and deallocation of computer memory resources for a program. This is generally implemented at the programming language level and is in opposition to manual memory management, the explicit allocation and deallocation of computer memory resources.

Relocation

In systems with virtual memory, programs in memory must be able to reside in different parts of the memory at different times. This is because when the program is swapped back into memory after being swapped out for a while it can not always be placed in the same location. Memory management in the operating system should therefore be able to relocate programs in memory and handle memory references in the code of the program so that they always point to the right location in memory.

Protection

Processes should not be able to reference the memory for another process without permission. This is called memory protection, and prevents malicious or malfunctioning code in one program from interfering with the operation of other running programs.

Sharing

Even though the memory for different processes is protected from each other different processes should be able to share information and therefore access the same part of memory.

Logical organization

Programs are often organized in modules. Some of these modules could be shared between different programs, some are read only and some contain data that can be modified. The memory management is responsible for handling this logical organization that is different from the physical linear address space. One way to arrange this organization is segmentation.

Physical organization

Memory is usually divided into fast primary storage and slow secondary storage. Memory management in the operating system handles moving information between these two levels of memory.

DOS memory managers

In addition to standard memory management, the 640 KB barrier of MS-DOS and compatible systems led to the development of programs known as memory managers when PC main memories started to be routinely larger than 640 KB in the late 1980s (see conventional memory). These move portions of the operating system outside their normal locations in order to increase the amount of conventional or quasi-conventional memory available to other applications. Examples are EMM386, which was part of the standard installation in DOS's later versions, and QEMM. These allowed use of memory above the 640 KB barrier, where memory was normally reserved for RAMs, and high and upper memory.


Thursday, November 22, 2007

Research Topics

1. Review article about Operating System


An operating system (OS) is the software that manages the sharing of the resources of a computer and provides programmers with an interface used to access those resources. An operating system processes system data and user input, and responds by allocating and managing tasks and internal system resources as a service to users and programs of the system. At the foundation of all system software, an operating system performs basic tasks such as controlling and allocating memory, prioritizing system requests, controlling input and output devices, facilitating networking and managing file systems. Most operating systems come with an application that provides a user interface for managing the operating system, such as a command line interpreter or graphical user interface. The operating system forms a platform for other system software and for application software.
The most important program that runs on a computer. Every general-purpose computer must have an operating system to run other programs. Operating systems perform basic tasks, such as recognizing input from the keyboard, sending output to the display screen, keeping track of files and directories on the disk, and controlling peripheral devices such as disk drives and printers.
For large systems, the operating system has even greater responsibilities and powers. It is like a traffic cop -- it makes sure that different programs and
users running at the same time do not interfere with each other. The operating system is also responsible for security, ensuring that unauthorized users do not access the system.


Operating systems can be classified as follows:

> multi-user : Allows two or more users to run programs at the same time. Some operating systems permit hundreds or even thousands of concurrent users.

> multiprocessing : Supports running a program on more than one CPU.

> multitasking : Allows more than one program to run concurrently.
multithreading : Allows different parts of a single program to run concurrently.

> real time: Responds to input instantly. General-purpose operating systems, such as DOS, UNIX, are not real-time.

Operating systems provide a software platform on top of which other programs, called application programs, can run. The application programs must be written to run on top of a particular operating system. Your choice of operating system, therefore, determines to a great extent the applications you can run. For PCs, the most popular operating systems are DOS, OS/2, and Windows, but others are available, such as Linux.
As a user, you normally interact with the operating system through a set of
commands. For example, the DOS operating system contains commands such as COPY and RENAME for copying files and changing the names of files, respectively. The commands are accepted and executed by a part of the operating system called the command processor or command line interpreter. Graphical user interfaces allow you to enter commands by pointing and clicking at objects that appear on the screen.

2. 2 reasons why a regional bank might decide to buy 6 server computers instead of one supercomputer:

a. for their files to have a backup in case of errors or troubles that their computer may adapt or acquire.

b. for them to continue their transactions when other computer units are having their maintenance and repairs.