San headquarters installation and user guide


















If you are replicating data, you configure SAN HeadQuarters to connect to the source site and the target site for most meaningful data. The Single Sign-On feature enables you to store a group administration account and password and when running SAN HeadQuarters from the same domain user account on the same computer, start the Group Manager GUI as a standalone application without entering login credentials.

SAN HeadQuarters helps you effectively manage your storage by providing email notification of performance alerts so you can quickly respond to the problem before it becomes catastrophic. SAN HeadQuarters uses data archived from early time intervals and compares to present data to determine trends.

SAN HeadQuarters 95th percentile feature helps you analyze data better by factoring out high percentage anomalies.

This way, the data you see reflects normal day-by-day activities. The Live View feature polls data as quick as 3-second intervals, presenting a temporary view of your environment.

Unlike the Group Manager Performance Monitor where data is available only in the current session, Live View data is stored in separate logs that you can access to compare different time intervals. Similarly, you can select information for all pools in the group or restrict data to an individual pool. Syslog Event Logging When an event occurs in a PS Series group for example, you create a volume or remove a power supply , the group generates an event message.

Event messages help you monitor normal operations and also identify problems before they disrupt service. You can change the default event log size only when you first add the group to the monitoring list. Click Group Configuration then click Notifications. The Group Notification window appears. In the Event Logs panel, click Send events to syslog servers. Click Add. Select the priority of the events you want to log to the syslog server.

See Syslog Events on page You can change the syslog server configuration and enable only specific network interfaces for use as listening UDP sockets. The following requirements apply: You must change the syslog server configuration using a domain user account on the computer that is running the SAN HeadQuarters Server.

The domain user account must have write privileges to the log file directory. Before changing the syslog configuration, backup or make a copy of the SyslogConfig. To change the syslog server configuration and enable only specific network interfaces for use as listening UDP sockets: 1. Edit the SyslogConfig. For example, the default SyslogConfig. Empty stringDisables the use of all network interfaces for a specific network protocol. If there is a syntax error in the SyslogConfig.

If needed, you can disable the syslog server. The user account must have write privileges to the log file directory.

Note: If there is a syntax error in the SyslogConfig. If the group network address changes, SAN HeadQuarters can no longer monitor the group through the original address. If a group network address changes and you want to monitor the group under the new address: 1.

Create an archive of the latest group data. Do one of the following: Stop monitoring the group but retain the group log files. Note that the data associated with the obsolete network address will still appear in the GUI under the group name, however the monitoring status will be disconnected. If you try to resume monitoring the group using the obsolete network address, the operation will fail. Remove the group from the monitor list. The log files will be deleted and the GUI will no longer show group data.

See Removing a Group from the Monitor List on page Add the group to the list of monitored groups, specifying the new network address. See Adding a Group to the Monitoring List on page In addition, for each monitored group, the computer running the SAN HeadQuarters Server must have network access to all the configured network interfaces on all the group members, the group IP address, and the management address if applicable. Use the ping command to determine if you can access all IP addresses.

You specify the log file size when configuring a group for monitoring. To be able to store a year's worth of data, each log file in a set contains data from polling periods of increasing lengths. For example, one log file will contain data that the SAN HeadQuarters Server polled every two minutes, another log file will contain data that was polled every four minutes, and so on.

By compressing data over time, the SAN HeadQuarters Server can efficiently store a large amount of data and also reduce data volatility by moderating extreme values. Note: Using a log file size that is larger than the default size of 5 MB enables you to store more.

If you use a log file size that is smaller than the default size, data will be less precise but response time might improve. Table 4 shows how compression over time affects different types of performance data. For example, some data will be understated because intermittent idle time in typical workloads will decrease averages.

See Performance and Capacity Terms on page 67 for a description of the terms used in Table 4. For related information about increasing log file size, see Increasing the Log File Size on page By default, the polling period is two minutes. Unsuccessful SNMP polls can result from network problems or the group workload. If the Monitor Service determines that a group is not responding to SNMP requests, it will double the time between consecutive polls.

For example, if the poll period is 2 minutes, it will increase the interval to 4 minutes, then 8 minutes, and so on. Once the Monitor Service obtains a response from the group, it will decrease the poll period by half at each interval for example, 8 minutes to 4 minutes to 2 minutes. See Polling Status on page You can determine the polling period used to obtain a data point by placing the cursor over a point in time on a SAN HeadQuarters graph.

SAN HeadQuarters might temporarily lose connectivity with a group and not obtain data. When connectivity to the group is restored and polling succeeds, the average value will still be calculated from the last two successful polling operations.

If you install a newer version of SAN HeadQuarters after running an older version using the same log file directory , no information will appear for the new type of data prior to the time of the update. The type of performance data shown for specific groups also depends on the version of the PS firmware that is installed on the group. Information Provided by the GUI Before you can analyze performance data, you should understand the type of data that the GUI displays: Performance and capacity dataTo analyze the group data in the GUI, you must understand the terminology.

See Performance and Capacity Terms on page If polls are not successful, you should investigate the cause. AlertsAlerts notify you when hardware or performance issues occur in a group, so you can take action to prevent problems. See Reported Alerts on page Events and Audit LogsIf you configure a group to use the syslog server that is part of the SAN HeadQuarters Server, you are notified when group events occur, so you can take action to prevent problems.

See Syslog Events on page 79 and Audit Messages on page GUI displays data from the most recent eight-hour time period. Use the Zoom links above the timeline to quickly set the value of the time range selector and also to control the range of dates seen in the timeline. For example, click Show Latest to show data up to the most recent time.

See Navigating the GUI on page Performance and Capacity Terms The data collected by SAN HeadQuarters can help you obtain a better understanding of your storage environment, prevent future problems, and identify and diagnose existing problems before they affect operations.

Capacity and Replication Terms The SAN HeadQuarters GUI provides the following capacity and replication terms, and space statistics: Delegated spaceSpace on the secondary group reserved for storing all replica sets from the primary group.

If free delegated space is low, you might want to increase it to ensure space is available for additional replicas. Space delegated to a primary group must be increased by the secondary group administrator. When viewed from the primary group, delegated space on the secondary group is called replica space. Note: Failback replica set space is tracked as delegated space.

Local replication reserveSpace on the primary group reserved for storing volume changes during replication and the failback snapshot. Overall group capacityAvailable space. Capacity depends on a number of variables. For example, group capacity depends on the number of members, the number and size of the disks installed in the members, and each member's RAID policy. Replication partnerA list of all groups currently configured as an outbound replication partner.

Replica reservePortion of delegated space reserved for storing the replica set for a volume. Once replica reserve has been consumed, the oldest replica are deleted to free space for new replicas. To retain more replicas, increase the replica reserve percentage. Snapshot reserveSpace reserved for storing snapshots.

Once snapshot reserve has been consumed, the oldest snapshots are deleted to free space for new snapshots. To retain more snapshots, increase the snapshot reserve percentage. Thin provisioning statisticsNumber of volumes that are thin provisioned, the amount of unreserved unallocated space for thin provisioned volumes, and the percentage of group space required to fulfill the maximum in-use space requirements for thin provisioned volumes.

Volume reserveSpace allocated to a volume. For thin-provisioned volumes, the volume reserve is based on usage patterns. As more data is written to the volume, more space is allocated to the volume, up to the user-defined limit. Volume typeType of volume: template, thin-provisioned, thin clone, or standard a fully provisioned volume. Space utilization terms are described as follows: In useSpace that is currently storing data. FreeSpace that us not storing data or reserved for any purpose.

ReservedSpace that is reserved for some purpose may include reserved space that is storing data and space that is not storing data. Unused SpaceSum of free space and space that is reserved but not in use. The data represents the average for the polling period. This is the average amount of data that is transferred each second. All storage systems have a maximum throughput capacity. If the threshold is reached, it indicates a sequential workload.

Latency also called delay is the best gauge for measuring the storage load and is the principal method for determining if a group has reached its full capabilities.

While some volatility is lost, idle time does not affect the average latency. Therefore, older latency data is still a good indicator of performance. See Identifying Performance Problems on page 1 for information about interpreting latency values. The read and write percentages are not an indicator of performance. However, this information is important when sizing and configuring groups for specific workloads. The load value is an estimate. Use it only as a general indicator.

The SSD Space, displayed only on groups with at least one member using tiered storage, indicates the amount of disk space available from solid state drives. This information can help you understand the relationship between IOPS and latency.

For example, a high number of IOPS usually means a longer latency time. Ethernet port last modified dateDate when the network interface configuration was changed or the member restarted. Link speedNegotiated link speed for all the active network interfaces. Link speed is reported at the half-duplex data transmission rate. Double the rate to obtain the full-duplex rate. Management networkWhether a dedicated management network is enabled in the group.

The theoretical maximum bandwidth is based on the negotiated link speed for all active network interfaces on the group members. The network is rarely a bottleneck in a SAN.

Sent and Received trafficAverage per-second rate of network traffic sent and received. TCP retransmit rates are tracked on each member, but not on each network interface. By default, the polling periodthe time between consecutive polling operationsis two minutes. You can capture data in shorter polling intervals, as brief as three seconds, by using the Live View feature. Table 5 describes the polling status.

Table 5: Polling Status. SNMP poll was successful. Group performance or a network problem prevented the group from responding to SNMP requests in a timely manner. SAN HeadQuarters cannot contact the group. In this case, an alert Failed describing the problem will be generated. A member rebooted. Reported Alerts Alerts enable you to be quickly informed of problems so you can diagnose and correct them. SAN HeadQuarters displays two types of alerts: Performance-related alerts detected by SAN HeadQuarters for example, low free pool space or high latency Some alerts have an increasing priority, as the condition increases in severity.

Hardware alarms detected by the group for example, high temperature or a failed control module Hardware alarms depend on the PS Series firmware version and also the member hardware. See the Group Administration Guide for a list of hardware alarms. In some cases, a SAN HeadQuarters alert and a hardware alarm may be generated for the same group event.

Optionally, you can configure e-mail notification for alerts. Alert Priorities Alerts reported by SAN HeadQuarters have the following priorities: InformationalNormal, operational events in the group that do not require any administrator action.

CautionLow-level conditions that, if not addressed, might lead to performance issues or undesired results. Warning alerts correspond to Warning events and Warning alarms in the group. CriticalSerious problem that is currently affecting group operation. Critical alerts correspond to Error events and Critical alarms in the group.

Displaying Alerts Alerts appear in the bottom panel of the All Groups window only active alerts or in the Alerts panel at the bottom of the GUI windows. To display Alerts: 1. Open the Alerts panel and click the Alerts tab Figure For each alert, the panel shows: Alert priority.

See Alert Priorities on page Date and time of the SNMP poll that detected the alert. How long the alert has been active. Statue of the alert whether the alert is Active or Cleared. Alert description. The Alerts panel displays in the lower pane of the window.

Figure Alerts Panel. Exporting Alerts You can export alerts to an. To export alerts: 1. Click the Export Alerts icon in the Alerts panel. Enter a file name. Click Save. Copying Alerts to the Clipboard You can copy all the alerts for a group or selected alerts to the clipboard.

A member's active control module failed, resulting in failover to the secondary control module. SNMP requests for control module information timed out, due to the group workload. Therefore, the number of control modules reported might not be accurate. A member control module has failed or is not installed. Some SNMP requests for member disk drive information timed out, due to the group workload. The number of reported drives might not be accurate.

Due to detected drive problems, a member is copying data to a spare. A member is missing a disk drive, based on the model number, the number of disk slots, and the standard configurations 14, 16, 24, or 48 disk drives. A disk has a status other than online or spare and requires administrator attention. A disk drive failed A group member is running a supported firmware version that is not recommended in this case.

A group's firmware is imcompatible with the disk's firmware. SNMP request was not complete, making the poll unusable. A pool's free space is less than the recommended value.

Member added Member controller reboot Member disk added Member disk protocol mismatch Member disk removed Member disk firmware out of date. A member has been added to the group. A members control module rebooted. A disk has been added to a member. Caution A disk has been removed from a member. A disk has been detected with out of date firmCritical ware.

A member has been upgraded with new Member firmware upgrade Caution firmware. Member firmware upgrade The firmware on a control module has been Informational reboot pending upgraded.

A health condition exists, likely related to a Member health status Warning, critical hardware failure. A member has different PS Series firmware Member mixed firmware Warning versions running on its control modules.

Member network port failWarning A network port on a member failed. Member offline Critical A member is offline. A member's status changed, such as from Member status Caution online to offline. Critical of date Contact your Dell EqualLogic support provider. A network interfaceother than one dedicated Port at reduced speed Caution to a management networkis connected to a network device with a speed of less than 1 GB.

Replica reserve resize failA replication operation failed because the volWarning ure ume's replica reserve cannot increase. The replication failed because the mutual Replication authentication Warning authentication passwords on the group do not failure match the passwords on a partner.

A replication operation has failed. The replication failed because the secondary Warning group does not have downgrades disallowed. The replication failed because the partner Warning could not be reached. The in-use snapshot reserve exceeds the warnWarning ing level set in the Group Manager.

A member's ratio of TCP retransmits to sent Caution, warning, packets is too high, indicating a network probcritical lem. A thin provisioned volume's in-use space Warning, critical exceeds the warning limit set in the Group Manager.

There is a problem with allocating volume space according to the desired RAID preference. The volume's preferred RAID level is over subscribed. Replication failed because the partner is not Volume replication partner Warning running the correct firmware and must be needs upgrade upgraded. Volume replication partner Replication of all volumes to a partner was Caution paused paused.

Replication of a volume was paused from the Volume replication paused Caution primary group. Volume replication remote Replication of a volume was paused from the Caution paused secondary group.

A volume's replication reserve space is insufficient: When a volume is actively borrowing free space for replication operations.

If the remote replication reserve space, as detected by the remote site, is invalid or low. Event Priorities Table 7 lists event priorities in order of lowest least severe to highest most severe priority.

Table 7: Event Priorities Priority Description. Informational messageIndicates an operational or transitional event that requires no action. Potential problemCan become an event with Error priority if administrator intervention does not occur.

Serious failureIdentify and correct the problem as soon as possible. Catastrophic failureIdentify and correct the problem immediately. By default, the Show All button is selected. To show only audit logs, select the Show Event Logs only button. If there are no events to display, instructions are provided to verify that the group is properly configured to send events and audit logs to the syslog server on the SAN HeadQuarters server.

Each event message includes the following information: Event priority see Event Priorities on page Date and time that the syslog server received the event from the group. Member on which the event occurred. Description of the event. Click a column heading to sort according to the column data.

SAN HeadQuarters displays events that occurred within the selected time period. To display the latest events, select Show latest in the GUI window. Figure Events Panel. Searching Events You can search the event log for events that include a specific word, words, or text string.

You can also use the Filter Editor for advanced search capabilities. To display events that include a specific word, words, or text string, in the Events panel Figure 15 or the Events window Figure 16 : 1.

Enter the text in the search field and click Search. Click Clear to return the original event display. For advanced search capabilities, click Filter Editor. The Filter Editor dialog box appears. The Filter Editor enables you to set up a complex search algorithm: Click the first field defaults to Message to select what you want to search message text, priority, member, or time detected.

Click the second field defaults to Begins with to select the search parameters. For example, you can specify that you want to match text or exclude text.

You can select text in the Message column and copy it to the search field. Click And to add additional search criteria. Exporting Events You can export the event log to an. Copying Events to the Clipboard You can copy the event log for one group or all groups to the clipboard. Audit Messages Audit messages are syslog events about administrator actions. They provide a historical reference to actions such as logging in, logging out, creating a volume, setting up replication, and so on.

To show only audit logs, select the Show Audit Logs only button. If there are no audit logs to display, instructions are provided to verify that the group is properly configured to send events and audit logs to the syslog server on the SAN HeadQuarters server. Each audit message includes the following information: Account to which the audit message pertains.

Date and time that the syslog server received the audit message from the group. Click the column heading arrow to sort ascending or descending by date. A description of the event that occurred at the time the audit message was received. Click the small icon in the upper right corner of the Message column header to view the message details.

Figure Audit Log Panel. Searching Audits You can search the audit log for audit messages containing a specific word, words, or text string. To display audits that include a specific word, words, or text string, in the Audit log panel Figure 17 or the Audit window Figure 18 :.

Click Clear to return the original audit display. You can select text in the Message column and copy it to the search field, if desired. Exporting Audit Logs You can export audit logs to an. To export audit logs: 1. Copying Audit Logs to the Clipboard You can copy all the audit logs for a group or selected alerts to the clipboard.

By analyzing the data collected by SAN HeadQuarters, you can quickly detect hardware failures, evaluate group performance, and identify areas of concern. You can also determine if the group can handle an increase in workload. The areas in your environment that can be sources of performance problems and which areas SAN HeadQuarters monitors.

See Potential Sources of Performance Problems on page How your applications utilize group storage resources. See Understanding Application Storage Utilization on page GUI graphs display data from the most recent eight-hour time period and GUI tables display data from the most recent poll. See Displaying Data from Different Times on page Potential Sources of Performance Problems Identifying the source of a performance problem in your environment can be difficult.

For example, if response time is too long, the problem might be caused by a hardware failure, insufficient server resources, or an improperly configured application.

Performance problems can result from: HardwarePoor performance can be the result of a hardware failure in the group for example, a disk failure , the network, or the server. Network configurationAlthough network bandwidth is rarely fully utilized, the network can be a source of performance problems.

For example, some parts of the network might not be Gigabit Ethernet, switches might not be properly configured, or interswitch links might not have sufficient bandwidth. Servers and applicationsServers that do not have sufficient resources CPU, memory, bus can experience performance problems. Also, applications might not be properly configured. SAN HeadQuarters is a good tool for determining if a performance problem is the result of a hardware failure in a group.

SAN HeadQuarters also provides information that can indicate a performance problem in the storage environment for example, if the workload exceeds the capability of the group. To fully diagnose non-group problems, you must use additional tools. Different applications and workloads result in different performance profiles. Statistics that might indicate a performance problem in one environment might indicate an efficient use of storage resources in another.

To characterize how your applications utilize storage resources, you should understand: Application capacity requirements. Do the applications require very low latency? Is the workload consistent or does it vary over time? Contact your PS Series support provider or your application support provider for more information about characterizing your application storage utilization.

Make sure the environment is under a normal workload. Some performance issues are temporary and result from an unusual increase in workload. Monitor the GUI for hardware problemsFailed hardware is a common source of performance problems.

See Identifying Hardware Problems on page After you fix a hardware problem, allow time for SAN HeadQuarters to collect new data before analyzing the data. Performance data collected while a hardware failure exists can be regarded as abnormal. Monitor the GUI for common indicators of performance problemsIf you are sure there are no hardware failures, check for statistics that might indicate a performance problem.

See Identifying Performance Problems on page 1. Be aware that the performance data is subjective and depends on the performance characteristics of your applications. Continue to monitor the group regularlyIf you have configured e-mail notification, the computer running the SAN HeadQuarters Server will generate a message when an alert related to a hardware failure or a performance problem occurs.

Identifying Hardware Problems Hardware failures are a common cause of performance problems and must be corrected immediately. Always check for the following hardware-related issues in the SAN HeadQuarters GUI: Hardware alertsCheck the Alerts panel for hardware problems that might affect performance, such as a failed disk or a network connection that is not Gigabit Ethernet.

In some cases, performance might return to normal once the operation completes. Low free space will also negatively affect the performance of thin provisioned volumes. Correct hardware problems immediately.

This information is only an estimate. However, it can give help you more fully understand group performance and also help you plan for storage expansion. Note: Because the estimated information in the Experimental Analysis window is based on a.

The Experimental Analysis window also provides run-time group performance data, so you can compare the estimates to actual data. Always consider latency when examining estimated performance data. Displaying the Experimental Analysis Window. Select a group in the Servers and Groups tree in the left panel.

The Experimental Analysis for Group window displays, as in Figure Figure Experimental Analysis Data for a Group. So you can compare the estimated data with run-time data, the graph also shows the actual number of IOPS reads and writes performed by the group. Estimated max IOPS cannot be calculated.

SAN HeadQuarters calculates the estimated maximum group IOPS when there are no disk drive failures in the group orange line in the graph and also when at least one RAID set is in a degraded state brown line in the graph. This information is useful for understanding the performance impact of a disk failure. The degraded estimate is based on a drive failure in a RAID set that would result in the greatest performance impact.

The degraded estimate does not include the performance impact that might occur during RAID reconstruction for example, when the array is reconstructing data from parity on a spare drive. If the run-time group data does not indicate a performance problem that is, latencies are low, applications complete on time, and user response time is adequate , then you can assume that the group can handle an increase in workload without a performance degradation.

You may want to consider decreasing the load on the group or adding additional hardware or arrays. If the run-time group data indicates a performance problem for example, high latencies or high queue depth, applications do not complete on time, or response time is slow , you probably have reached the limit of the group. You should immediately consider decreasing the load on the group or adding additional hardware or arrays.

The data might indicate one of the following: You reached the limit of the groupYou should immediately consider decreasing the load on the group or adding additional hardware or arrays. Network problems exist. Correct the network problems immediately. Member hardware problems existReplace any failed hardware and ensure that you configure all the network interfaces on all group members. As these examples show, estimated data must be used in conjunction with run-time group data to obtain an accurate and comprehensive understanding of group performance.

For run-time group data examples, see Examples of Interpreting Performance Data on page Examples of Interpreting Performance Data No single piece of performance data reported by the SAN HeadQuarters can provide a complete characterization of group performance. You must consider a broad range of performance data, in addition to environmental data, user and application response times, and the group workload.

The examples in the following sections might help you better understand the relationship between different types of performance data. Table 8 describes the relevant data.

Typical range. Example 1 shows a group that is performing well and is within its capabilities. The latencies are all below 20 ms, which is desirable. The workload appears to be reasonably static and thus predictable. The average queue depth should occasionally be shown as 0 or 1. Table 9 describes the relevant data.

Example 2 shows a group that is mainly idle. The very low latency and low IOPS values indicate that this group can handle a larger workload. However, since the current group workload is so low, it is difficult to determine how large a workload increase the group can handle. Increase the workload gradually and evaluate the group performance after each increase.

Table 10 describes the relevant data. Example 3 shows some contradictory information. The latencies are low less than 20 ms. Alternately, the group might be benefiting from a high level of control module cache hits.

Because the latencies are low, the group currently appears to be performing well. However, an increase in the workload may result in performance degradation. Table 11 describes the relevant data. This group has a heavy load, consisting of highly random, small reads and writes. Yet, only a fraction of the network is being utilized. Click Yes to remove the server from monitoring. The video will first describe the new and exi The manual discusses conceptual background, installation requirements, and management information in these main topics:.

Figure 3 shows the Servers and Groups window with two May 5, Find out how we leverage our unparalleled residential dealer network and world-class commercial support services to lead the industry.



0コメント

  • 1000 / 1000