1.1alpha9: - finish hard-linked templates - escalations and repeated notifications - add encryption between server and agents - monitoring of log files - more supported parameters under FreeBSD, HP-UX and Solaris (cpu times, network stats) - disk stats under Linux 2.6.x ZABBIX 1.1: - add HTML block to the screen cell (to see external data, images, etc) - execution of commands on monitored hosts in case of pre-defined events - audit log - anvanced SNMP trapper - handle unavailability of MySQL gracefully - check SUSE 9.2 startup scripts. Parameter -u is misplaced. - apply patches for DNS checks TKOM: - alarms forwarding to external systems (using SNMP traps, for example) - groups of graphs - auto-discovery script - alarm/trigger sounds - saved custom-time period in graphs and screens - SNMP trap processing - SNMP trap sending - wider service testing possibilities TELECENTRS: - hard-linked templates (graphs for new hosts to be added automatically) - clickable map in map configuration - rus/lat characters for graphs
END OF TELECENTRS 4. SNMP traps. Make possible action definition for the SNMP trap-related events. 7. snmp traps kak? dalee pozhelanija: 9. u hosta "5 User defined fields (text or OID)". eto dlja togo chtob tehnicheskuju informaciju o hoste mozhno bilo derzhat' tam zhe. eto serial number/mesto polozhenija/etc. 10. threads. OTHER: - check bug with trapper receiving value with ":" - configuration management. Add a page with hardware details for each host. - multi-institution support - add acknowledgements - improve removal of item related data from history '-' doesn't work in trigger expressions (a-b=0) When editing trigger actions in subject it says something for instance "low disk space on {HOSTNAME}'s /usr" when submitting form, it dumps mysql error, due to non- escaped single quote. I believe all user input should be escaped before submitting to mysql. ESCALATION: Escalation (urovenj escaljacii, pravila izmenenija urovnja): * Nabor pravil izmenenija prioriteta triggera/... s techeniem vremeni. * Nabor pravil ustanavlivaetsja na sled.urovnjah: sistema, host, user, trigger. * Prioritetnostj nabrov pravil (naprimer)v porjadke ubivanija: user, trigger, host, sistema. * Nabor pravil ustanavlivaetsja dlja kazhdogo 'severity' otdeljno, ** prichem trigger sleduet svoemu naboru v zavisimosti ot ego originaljnogo 'severity'. ! * Ponjatie: urovenj eskalacii - skoljkim urovnjam admin.ierarhii proishodit posil alerta - vishe originaljnogo. Variant pravil: * povtor alerta - zaderzhka vo vremeni: PA - proishodit povtor posilki alerta ispoljzuja tekuschee znachenie 'severity' i urovnja ; * izmenenie severity - zadaem kolichestvo PA, cherez kotoroe Severity++; ** samo soboi vishe Disaster ne stanet :) * izmenenie urovnja eskalacii: zadaem kolichestvo PA, cherez kotoroe urovenj eskalacii++; > Atradu liekas, ka vienu bug: tad, ja updeito Host statusu zem Host, tad trigeris par "Host unreachable" pazuud, jo ir statusaa "unknown", bet pie alarms neparaadas >, ka tas buutu mainijis savu statusu. - negative values in graphs - check all screens if they are user friendly and present information in the best possible way - possibility (a button) to recheck unsupported parameters - add advanced task scheduler/executor to ZABBIX - agent incorrectly returns amount of free/available memory on HP-UX (must be 24GB) - get rid of net[listen_*] in templates - use assert() for critical functions - check if simple checks (FTP, SMTP!) works correctly when host name (not IP) is used I found that if I use the fully qualified name of my host, all simple checks are working, but they failed if I simply use the hostname. - add icons to a map by clicking mouse - optionally support map generation in JPEG. PNG images are huge! - make w32 agent not to write warnings in case of timeout (HB) - security-related issues There are still quite a few strcpy calls made against non-const user, database, or remote supplied data, in both the src/* and include/* directories. printf calls w/o static format specifiers... in src and include: grep -R printf * | grep -v "\%\|Makefile\|Binary" extensive use of atoi rather than something w/ bounds checking like strtol. Most of the signal handlers all need to delay or block signals to avoid signal race conditions, as they call non-reentrant functions. - in user administration, add many resources from a list (not by one ID) - availability report for day/week/month/etc - check if items delay works - apply Igor patches for chart.php (calculation of MIN, MAX) - review PHP code for simplifying (urls, iif() ) - fix screenedit.php permissions Kak naschet dopolnenij k time & date: * tipa month, day, year, hour, minute, ... * osobenno weekday polezno - dabi ne podnimatj triggeri po vihodnim :) Nu i k 'str' dodlstj csjakie vsjakosti tipa * substr, length, poisk pozicii simvola/podstroki (FIND/SEARCH v Excel), podscet kolichestva simvola (tipa skoljko tam zapjatih)... - add requirement for fping to the manual - separate login.php and index.php - remove icmpping and others from the QUEUE - summarising report about triggers/actions/items. Show items without triggers, trigger wihout actions, etc etc 5) When adding trigger, next page is without triggers of this host. Is it possible after adding triger to go back to triggers of this host page ? - zabbix thinks that host is unrechable if it receives string for numeric item - think about implementation of sinchronised items like cpu[usr], cpu[sys], cpu[idle] - check setproctitle() Lukas sent some time ago for Linux - implement complex SLA calculation (donwtimes, working/non-working hours), etc. See forum Open Discussion for more info. - add selection of MIN/MAX/AVG for graphs - rewrite validation and evaluation of expressions (i.e. make >-1 work) - support for 'Clone' (graphs and screens) - add support of interface statistics under Solaris and FreeBSD - add screens and graphs to host templates - calculate MIN value for graphs (do not use '0') - infrastructure for reports - stacked graphs - disk size is not correct after evaluating macros in the message - fix evaluation of complex trigger expressions (diskpercentage) - check if alert and service alert is added after host become reachable again - navigation bar for all graphs and screens LATER: - Oracle support - zabbix to write log to the item (for example: log when item becomes unsupported) - support for UDP-based checks (DNS, etc) - preserve selected host group between different screens (trigger maintenance, host maintenance) - add zabbix[unixtime] and lastupdate() and tick() to get rid of nodata() - check return code for alert scripts - zabbix_agentd to support processor load, swap[*] and memory[*] onder AIX - make possible passing of parameters to user script for UserParameters like param[*] - make session expiration time configurable in config.php - mess with user groups when running under root - support for PREFIX/zabbix/etc (configuration files) - trapper item. Source->Alarms do not grow when last status is unknown for latest trigger (item type = TRAPPER) - SNMP traps. Condition to switch expression off. New function which would return status of the trigger could help. - when zabbix cannot evaluate expression it does not change status to UNKNOWN but adds ALARM (UNKNOWN) - preset time for graphs - finish src/zabbix_snmptrapper - links to Zabbix frontend in alert emails - strings items for graphs - ability to add a library of userwritten c-functions with dlopen/dlsym to the agent with a predefined API that can return values to suckerd (for both suckerd and agentd) - zabbix_trapperd does not start if PostgreSQL and DBConnectOnEach=1 - monitoring of log files - [10] Refresh stop ability. At least for graphs and triggers being selected. - [10] IT Services. Show downtimes for weeks. - [10] IT Services. User view, show algorithm. - [10] BUG. When adding new service, service name is wrong. - [10] IT Services. SLA for a period of time only (09:00-18:00) - monitoring of Windows event log LS: - update trigger value immediately after add or update - graphs. Dobavljatj item so smescheniem vo vremeni - check permissions for zabbix_agentd when started under root - add item to group (not to all) - update host from template change(N) - raznica mezhdu poslednim i na N vremeni ranee poluchennim znacheniem; abschange(N) = abs(change(N)); trend(N) - linejnaja aproksimacija sled.znachenija po interval vremeni N; trend(N,M) - tozhe, no ne sled.znachenie, a vpered na interval vremeni M; !!!!! kstati bilo-bi neploho pozvolitj poljzovatsja ne toljko intervalom vremeni kak argumentom funkcij, no i ispoljzovatj kolichestvo otscetov, tipa srednee za poslednie 5 scitivanij !!!!! * k primeru argument 5 oznachhaet pjatj sekund, a #5 - pjatj izmerenii !!! nu i massu matematicheskih mozhno - dlja super-advanced chuvakov: * SQRT, LOG, LOG10, trigonometricheskie, ... - if 'Update', then default action (button 'Enter') must be Update - do not check SNMP port existance if Item is not SNMP - add item to host group - add descriptions to items - link host (group) to template graphs: - min value (0 or automatic) - customisation of font size - link items to show them in one graph (network in/out) - snmp oid symbolic representation - periodically check unsupported items (NxInterval). Can be configurable. - Latest values->Host->[Show not monitored] HP: - use profiles instead of passing parameters using POST - add threshold,min,max line for user-defined graphs - Vel viens wishlists ir pie "IT Services", pielikt pie /~zabbix/report3.php?serviceid=1&year=2003 Ka ir reziims, ka paraada visus downtime attieciigajaa meenesii (vai gadaa/nedeelaa). (Tipa : time - subservice/trigers - priority - ...) - [3] Windows event log checking. - [4] Latest values. Link to page which shows list of all triggers-related to the item. - [4] users.php. Link to page which shows all actions defined for this user. - [5] Configuration of items/triggers. Add Select. - [7] SLA algorithm (A - 0%, B - 100% -> availability of AB = 50%) - [7] IT Service % must start at 01.01.20xx, not first day of the week - [7] ALARMS. Ability to select events by specifying date from/to. - [9] History of who and when changed trigger comment. As starting point: who and when did latest changes. - [9] ... the history to be used to see who did changes and who wrote specific parts of comments to know who is in charge for the instructions - [9] ... find outdated instructions to initiate its renewal (review) LS: - daily weekly monthly graphs - detailed description for items TOP PRIORITY: - fix update of 'status'. When added it never gets updated, if server is already unreachable. - pay attention to Solaris agent - add more checks for forms (check all possible wrong values) - support of fetch_html[*] - restructure sources - support for automake - zabbix_agentd to support processor load, swap[*] and memory[*] onder AIX BUGS: - add protection from IT Service looping DIFFERENT TASKS: - SNMP trapping - distributed monitoring - backup/restore scripts for server. Backup all (DB, configs, etc). - personalisation (refresh rate, graph size, default graph period, etc) - SNMP-walk in WEB interface - add UserParameter without restart of an agent TODO: - Agent 1. An API so that I can build and active agent into the centralized monitoring point of my application. 2. max/min/average values during the poll pause along with last value to catch CPU spikes otherwise invisible (have had these problems with BMC Patrol). 3. Ability to add a library of userwritten c-functions with dlopen/dlsym to the agent with a predefined API that can return values to suckerd. Not ready yet: 1. Monitoring of Windows (2000) services via SNMP This is an extension I wrote based on your SNMP checker that scans through the Windows SNMP mib to check if a specified +process is running. I thought it may be able to be used in a similar situation to the new SIMPLE check. 2. Basic schedule to monitor certain items during specified period. I made changes to the PHP frontend and zabbix_sucker.c to enable the user to specify a time period (like between 9am-5pm) when +the item should be checked. (From: "Dave McCrudden" ) - BUG. zabbix_sucker should accept parameter regardless of "\n" at the end of line - do not send notifications at certain periods of time. Do not monitor services at certain periods of time. - decrease number of TCP connections between server and agent - add triggers for all hosts at once - check new Mariusz patches - add support for downtimes, user availablity, etc HB: - BUG. history graph goes left - personalisation (refresh rate, graph size, default graph period, etc) - LDAP authorisation (should work with MSWindows LDAP) - Reports: - more customizable (graphs time range: 1 mont, etc) WIN32 Agent: * API for sub-agents * Support for check_port[] parameter * Support for check_service[] parameter * Installation program * Possibility to store configuration in registry * Configuration tool (may be through MMC?) OTHER: - support for netload under other platforms - make kstat structure local (instead of static) - collect network and other statistics using kstat() - apply Mariusz's patches - support for complex reports - support for check_service[samba] - ability to execute command on server monitored - support for OS X (network loads, disk loads) - check for function parameters in evaluate_simple_expression - update trigger status to UNKNOWN if cannot evaluate function LATER: - setup demo site - different sound files for different severity - think about service node types (some nodes will automatically register problem)