Large size of the database is often hundreds of GB, and sometimes even use the TB calculation. The single-table data often reach millions of records, and records the number will grow over time. This not only affects the efficiency of the database, database maintenance also increases the difficulty. In addition to the amount of data table, the different access patterns on the table may also affect the performance and availability. These problems can be reasonable on a large table area are greatly improved. When the tables and indexes become very large, partition the data can be divided into smaller, more manageable parts to improve the efficiency of the system. If there are multiple CPU or multiple disk subsystems, you can get better performance in parallel operations. So, to partition the large table to deal with huge amount of data a very efficient way. In this paper, a concrete example of how to create and modify the partition table, and how to view the partition table.
1, SQL Server 2005
SQL Server 2005 is Microsoft's launch of SQL Server 2000 after an interval of five years after the launch of a database platform, its relational database engine for data and structured data to provide a more secure and reliable storage capabilities, enables users to build and manage for business, high availability and high performance data applications. In addition, SQL Server 2005 combines the analysis, reporting, integration, and notifications. This allows companies to build and deploy cost-effective BI solutions to help the team through the scorecard, Dashboard, Web Services and mobile data applications to all areas of business. Both developers, database administrators, information workers, or policy makers, SQL Server 2005 can provide the innovative solutions and to gain more from the benefits of the data.
It brings new features, such as T-SQL enhancements, data partition, service agents and with. Net Framework's integration in the manageability, availability, scalability, and security enhancements, and other aspects .
2, table partition realization method
Table partition divided into horizontal partition and vertical partition. Horizontal partition table into multiple tables. Each table contains the same number of columns, but fewer rows. For example, a multi-billion-row table containing the level of partition into 12 tables, each small table, said one month's data within a given year. Any particular month of data need only reference to the corresponding month of the query table. The vertical partition sucked more than the original table into the table only contains fewer columns. Zoning district level is the most common way to present this to the level of district specific realization.
Horizontal partition commonly used method is based on the use of the data period and the level of partition. For example, this example sends a message log contains the most recent year of data, but only periodically visit this quarter data. In this case, consider the data into four regions, each contains only one quarter of data.
2.1, create a file group
The establishment of the partition table first create a file group, and create multiple file groups is mainly to get good I / O balance. Under normal circumstances, the file number of the best group of the same number with the area, and these groups are usually located in a different file on the disk. Each file group can form by one or more files, each partition must be mapped to a file group. A file group can be used by multiple partitions. In order to better land management data (for example, the backup in order to obtain more precise control), the partition table should be Sheji, so Zhiyou related data or logical grouping of data is the same file group. Use ALTER DATABASE, add the logical file group name:
ALTER DATABASE [DeanDB] ADD FILEGROUP [FG1]
DeanDB for the database name, FG1 file group name. Create a file group, then use ALTER DATABASE to add files to the file group:
ALTER DATABASE [DeanDB] ADD FILE (NAME = N'FG1 ', FILENAME = N'C: DeanDataFG1.ndf', SIZE = 3072KB, FILEGROWTH = 1024KB) TO FILEGROUP [FG1]
The establishment of a similar group of four papers and documents and to store data for each file on different disk drive.
2.2, create a partition function
Create partition table partition function must be determined mechanism, the standard partition table is determined by partition function. Create a data partition function has RANGE "LEFT | / RIGHT" two options. On behalf of each of which side of the boundary value in the local. For example there are four partitions, then the definition of the three border point value, and specify each value is the first partition on the border (LEFT) or a second partition of the lower boundary (RIGHT). Code:
CREATE PARTITION FUNCTION [SendSMSPF] (datetime) AS RANGE RIGHT FOR VALUES ('20070401 ', '20070701', '20071001 ')
2.3, create a partition scheme
Create partition function, it must be associated with the partitioning scheme to partition a document pointing to a specific group. Is to define the actual storage of data media and the correspondence between data blocks. Multiple data table can share the same data partition function, generally do not share the same data partitioning scheme. By different partitioning scheme, use the same partition function, the different data tables have the same geographical conditions, but stored in different media. Create a partition program code is as follows:
CREATE PARTITION SCHEME [SendSMSPS] AS PARTITION [SendSMSPF] TO ([FG1], [FG2], [FG3], [FG4])
2.4, create a partition table
The establishment of a good partition function and partition scheme, you can create a partition table. Partition table partition keys and partition by defining the associated program. Insert the record, SQL SERVER key based on the different partition, partition function is defined by the data into the appropriate partition. Thus the partition function, partition scheme and partition table by combining the three. The code to create the partition table is as follows:
CREATE TABLE SendSMSLog ([ID] [int] IDENTITY (1,1) NOT NULL, [IDNum] [nvarchar] (50) NULL, [SendContent] [text] NULL [SendDate] [datetime] NOT NULL,) ON SendSMSPS ( SendDate)
2.5, view the partition table information
System running for some time or the previous data into the partition table, we need to see data specific storage conditions, that is, the number of records for each district to access those records access in that area and so on. We can $ partition.SendSMSPF to see, the code is as follows:
SELECT $ partition.SendSMSPF (o.SendDate) AS [Partition Number], min (o.SendDate) AS [Min SendDate], max (o.SendDate) AS [Max SendDate], count (*) AS [Rows In Partition] FROM dbo.SendSMSLog AS oGROUP BY $ partition.SendSMSPF (o.SendDate) ORDER BY [Partition Number]
In the implementation of the above script in Query Analyzer, the results shown in Figure 1:
Figure 1: The partition table information
2.6 Maintenance Division
Partition add partition maintenance of the main design, reduce, merge and convert partitions. By ALTER PARTITION FUNCTION option SPLIT, MERGE, and ALTER TABLE SWITCH option to achieve. SPLIT will increase more than one zone, but MEGRE will combine or reduce the partition, SWITCH is the conversion between the logical partitions in the group.
3, performance comparison
Our data on 26.5 million, about 4G memory space occupied by comparing the performance of single-table test environment IBM365, CPU Xeon 2.7G * 2, memory, 16G, HDD 136G * 2, the system platform for the Windows 2003 SP1 + SQL Server 2005 SP1 . The results in Table 1:
Table 1: Performance Comparison of partition and not partition table (unit: ms)
Test item partition is not partition
11654661466
21333
32014061546
41714061000
Description:
1, according to the time one day record the time spent retrieving
2, a single record into the time spent
3, according to the time one day record the time taken to delete
4, the number of statistical time monthly record
As can be seen from Table 1, to operate on the partition table partition table than not faster, because the operation of the partition table using CPU and I / O, parallel operation, the amount of data to retrieve data are also smaller, location shorter time-consuming data.
4 Conclusion
Of the massive data processing has been a headache. Fen Li's technology is the first consideration of all designers, whether it is a process for separating function is to separate the data Fang Wen, if to be a reasonable plan, can Shifenyouxiao of Jiejue Da data table of the operating efficiency of the problem of low and high maintenance costs. SQL Server 2005 features the new partition table, you can partition the data reasonable, when the user access to some data, SQL Server engine can optimize the physical storage of the data to find the best implementation of the program, which is not needle in a haystack .