Difference between revisions of "Helpful MySQL Queries"

From GWAVA Technologies Training
Jump to: navigation, search
(Number of users with data:)
 
(7 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Put queries here that you feel are helpful.  Be sure to indicate what the query is doing <grin>.
 
 
'''''Note:''' As a general rule '''DO NOT''' use a query on a customers system unless you are absolutely sure that you're not going to get a huge return.  If a query has many results to return or has a lot of data to sift through it could bring the server to it's knees.  Remember: A customer system is not a test system.  [http://stackoverflow.com/questions/3787651/how-can-i-stop-a-running-mysql-query Here's an article on how to kill a process in MySQL if you manage to mess it up]  Sometimes it is helpful to use a COUNT statement on your select statement first to see if you're going to have a lot of data to sift through.  Also, the LIMIT (MySQL) or TOP (MSSQL) statement can be helpful.  However, the SQL server will still have to find all the values that fit your query, it won't go to the limit or top value and then stop.''
 
 
==Queries==
 
===Select the minimum and maximum value in a table===
 
::select min(f_delivered), max(f_delivered) from t_message;
 
 
===Select list of users who have had mail archived===
 
::select f_first, f_last, f_mailbox from t_abook where f_uid in (select f_uuid from t_uuid_mapping) order by abook_id \g
 
 
===Select list of users who have never had mail archived===
 
::select f_first, f_last, f_mailbox from t_abook where f_uid not in (select f_uuid from t_uuid_mapping) order by abook_id \g
 
 
===Find Hashes of specific date range from when the attachment was archived===
 
::SELECT hash FROM t_document WHERE (date BETWEEN '''*'''1433200981 AND '''*'''1439076181) ORDER BY date DESC;
 
 
If you know the exact timestamp
 
 
::SELECT hash FROM t_document WHERE date = xxxxxxxxxx;
 
 
===Re-create the admin user in a Retain system where the user tables are blank===
 
Use the SQL file found [ftp://ftp.gwava.com/outgoing/TrainingWiki/SQL_Database/If_user_tables_blank_Retain3-4.sql here] to fix this issue.
 
'''''WARNING:''''' ''Only do this if the tables are blank, otherwise you could break something.''
 
'''''Note:''' The dates are timestamps in Unix time (or POSIX/Epoch, if you prefer)''
 
 
 
Daniel has created a number of helpful queries:
 
Daniel has created a number of helpful queries:
  
Line 36: Line 10:
 
User Information:
 
User Information:
  
To get User (displaying email) with Last date they were cached, Most recently archived, and the Item Store flag:
+
===To get User (displaying email) with Last date they were cached, Most recently archived, and the Item Store flag:===
<code>
+
 
 +
<pre>
 
SELECT abook_id, f_email, from_unixtime(ts_cache) AS 'Last Cached From Mail Server', from_unixtime(ts_store) AS 'Most recently Accessed', from_unixtime(ts_item) AS 'Item Store Flag'
 
SELECT abook_id, f_email, from_unixtime(ts_cache) AS 'Last Cached From Mail Server', from_unixtime(ts_store) AS 'Most recently Accessed', from_unixtime(ts_item) AS 'Item Store Flag'
 
FROM t_abook
 
FROM t_abook
Line 49: Line 24:
 
     FROM t_abook
 
     FROM t_abook
 
     INNER JOIN t_uuid_mapping ON t_abook.f_uid = t_uuid_mapping.f_uuid;
 
     INNER JOIN t_uuid_mapping ON t_abook.f_uid = t_uuid_mapping.f_uuid;
</code>
+
</pre>
 
+
Get A list of All users with a count of how many messages each person has:
+
  
 +
===Get A list of All users with a count of how many messages each person has:===
 +
<pre>
 
SELECT abook_id, f_email AS 'Email Address', COUNT(DISTINCT(tm.message_id)) AS 'Total Message Count'
 
SELECT abook_id, f_email AS 'Email Address', COUNT(DISTINCT(tm.message_id)) AS 'Total Message Count'
 
FROM t_message tm
 
FROM t_message tm
Line 60: Line 35:
 
t_abook ta ON ta.f_uid = tum.f_uuid
 
t_abook ta ON ta.f_uid = tum.f_uuid
 
GROUP BY abook_id;
 
GROUP BY abook_id;
 +
</pre>
  
 
*Note: The above query took 20ish minutes to run on a system with 8,000 users and 290 million messages.  Number of users that had mail archived was actually nearly 20,000
 
*Note: The above query took 20ish minutes to run on a system with 8,000 users and 290 million messages.  Number of users that had mail archived was actually nearly 20,000
 
Another thing of note is my use of abook_id in the SELECT and also the WHERE clause.  I’ve found that not everyone has an email address.  Sometimes Retain will archive something that doesn’t have an email attached to it.  Hence “WHERE abook_id =” and “GROUP BY abook_id”
 
Another thing of note is my use of abook_id in the SELECT and also the WHERE clause.  I’ve found that not everyone has an email address.  Sometimes Retain will archive something that doesn’t have an email attached to it.  Hence “WHERE abook_id =” and “GROUP BY abook_id”
  
Total number of messages a particular user has:
+
===Total number of messages a particular user has:===
 +
 
 +
<pre>
 
SELECT COUNT(*)
 
SELECT COUNT(*)
 
FROM t_message
 
FROM t_message
Line 70: Line 48:
 
INNER JOIN t_abook ON t_abook.f_uid = t_uuid_mapping.f_uuid
 
INNER JOIN t_abook ON t_abook.f_uid = t_uuid_mapping.f_uuid
 
WHERE f_email = ‘user@place.net’;
 
WHERE f_email = ‘user@place.net’;
 +
</pre>
  
 
IMPORTANT: It has been found that in Retain 4.0.3 that there are sometimes situations where there is data in the t_message table, but not in the t_message_attachments table.  If you have nothing in the t_message_attachments table Retain will not be able to retrieve anything from disc.
 
IMPORTANT: It has been found that in Retain 4.0.3 that there are sometimes situations where there is data in the t_message table, but not in the t_message_attachments table.  If you have nothing in the t_message_attachments table Retain will not be able to retrieve anything from disc.
  
A more accurate query for the number of messages Retain actually has would probably be:
+
====A more accurate query for the number of messages Retain actually has would probably be:====
 +
 
 +
<pre>
 
SELECT COUNT(*)
 
SELECT COUNT(*)
 
FROM t_message
 
FROM t_message
Line 79: Line 60:
 
INNER JOIN t_abook ON t_abook.f_uid = t_uuid_mapping.f_uuid
 
INNER JOIN t_abook ON t_abook.f_uid = t_uuid_mapping.f_uuid
 
WHERE abook_id = {number};
 
WHERE abook_id = {number};
 +
</pre>
  
Number of users with data:
 
  
 +
===Number of users with data:===
 +
 +
<pre>
 
Select count(*) AS 'Number of users with data'
 
Select count(*) AS 'Number of users with data'
 
From
 
From
Line 95: Line 79:
 
AS t
 
AS t
 
;
 
;
 +
</pre>
 +
 +
To limit it to users less than 3 months or older than 3 months add
 +
<pre>
 +
HAVING MAX(tm.f_delivered) < UNIX_TIMESTAMP(DATE_ADD(CURDATE(), INTERVAL -3 MONTH))
 +
</pre>
 +
underneath the GROUP BY clause.  Use > for newer than and < for older than.
 +
 +
'''''*Warning: This makes the query much more intense.  This means that adding the older than, or newer than part of the query will run for a significant amount of time.  A large system could see this query take several hours.'''''
 +
 +
===Number of Users in Exchange/O365 with data (Works up until 4.1.0.1, 4.2 will change things):===
  
Number of Users in Exchange/O365 with data (Works up until 4.1.0.1, 4.2 will change things):
+
<pre>
 
Select count(*) AS 'Number of Exchange/O365 Users with archive data'
 
Select count(*) AS 'Number of Exchange/O365 Users with archive data'
 
From
 
From
Line 111: Line 106:
 
AS t
 
AS t
 
;
 
;
 +
</pre>
  
  
Get total (possible) size of message store used by a user. Size is in Bytes (This size may not be entirely accurate, this is a number that has been passed by the Email server to Retain):
+
===Get total (possible) size of message store used by a user.===
 +
Size is in Bytes (This size may not be entirely accurate, this is a number that has been passed by the Email server to Retain):
  
 +
<pre>
 
SELECT SUM(f_size) AS 'Approximate Storage Use'
 
SELECT SUM(f_size) AS 'Approximate Storage Use'
 
FROM t_message_attachments tma
 
FROM t_message_attachments tma
Line 121: Line 119:
 
INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid
 
INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid
 
WHERE f_email = 'admin@knobias.com';
 
WHERE f_email = 'admin@knobias.com';
 +
</pre>
  
  
 
*NOTE: The previous 2 queries could possibly be inaccurate if 2 users have the same Email address.  The most accurate would probably be to use abook_id
 
*NOTE: The previous 2 queries could possibly be inaccurate if 2 users have the same Email address.  The most accurate would probably be to use abook_id
 
In the WHERE clause you can use these fields that should be unique:
 
In the WHERE clause you can use these fields that should be unique:
 +
<pre>
 
abook_id =
 
abook_id =
 
f_uid =
 
f_uid =
 
f_mailbox =
 
f_mailbox =
 
althash =
 
althash =
 
+
</pre>
  
 
Most Unique Example:
 
Most Unique Example:
 +
<pre>
 
SELECT abook_id AS 'Retain Address ID', f_email AS 'Email Address', SUM(f_size) AS 'Approximate Storage Use'
 
SELECT abook_id AS 'Retain Address ID', f_email AS 'Email Address', SUM(f_size) AS 'Approximate Storage Use'
 
FROM t_message_attachments tma
 
FROM t_message_attachments tma
Line 138: Line 139:
 
INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid
 
INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid
 
WHERE abook_id = {abook_id number};
 
WHERE abook_id = {abook_id number};
 +
</pre>
  
User ID (abook_id), mailbox name, email address and most recently stored message date:
+
===User ID (abook_id), mailbox name, email address and most recently stored message date:===
 
(Thanks Daron)
 
(Thanks Daron)
  
 +
<pre>
 
SELECT abook_id AS 'User ID', ta.f_mailbox Mailbox, ta.f_email AS 'Eamil address', from_unixtime(max(tm.f_stored)) AS 'Most recently archived message'
 
SELECT abook_id AS 'User ID', ta.f_mailbox Mailbox, ta.f_email AS 'Eamil address', from_unixtime(max(tm.f_stored)) AS 'Most recently archived message'
 
   from t_message tm
 
   from t_message tm
 
   inner join t_abook ta on tm.uuid_mapping_id = ta.f_uid  
 
   inner join t_abook ta on tm.uuid_mapping_id = ta.f_uid  
 
   GROUP BY abook_id;
 
   GROUP BY abook_id;
 +
</pre>
  
  
Message counts and message store size:
+
===Message counts and message store size:===
 
Get A list of All users with a count of how many messages each person has along with the possible size that their archive would take on disk.  This last number will not add up to the total disk use that Retain is using.  Use it as a ballpark figure.
 
Get A list of All users with a count of how many messages each person has along with the possible size that their archive would take on disk.  This last number will not add up to the total disk use that Retain is using.  Use it as a ballpark figure.
  
 
WARNING: The following query is VERY intense.  On a system with 3,500 users with mail in Retain, and 41 million total messages, it took 1 hour 17 minutes to return.  Use with caution.
 
WARNING: The following query is VERY intense.  On a system with 3,500 users with mail in Retain, and 41 million total messages, it took 1 hour 17 minutes to return.  Use with caution.
  
 +
<pre>
 
SELECT  
 
SELECT  
 
f_email AS 'Email Address',  
 
f_email AS 'Email Address',  
Line 162: Line 167:
 
GROUP BY abook_id
 
GROUP BY abook_id
 
ORDER BY tm.message_id;
 
ORDER BY tm.message_id;
 
+
</pre>
  
 
By default it’ll organize the results by the email address. If you’d like to organize it by size of the mailbox, smallest first, change the ORDER BY clause to read as follows:
 
By default it’ll organize the results by the email address. If you’d like to organize it by size of the mailbox, smallest first, change the ORDER BY clause to read as follows:
 +
 +
<pre>
 
ORDER BY SUM(f_size);
 
ORDER BY SUM(f_size);
 +
</pre>
  
 
If you’d like it in reverse order tack on a “DESC” like so:
 
If you’d like it in reverse order tack on a “DESC” like so:
 +
<pre>
 
ORDER BY SUM(f_size) DESC;
 
ORDER BY SUM(f_size) DESC;
 +
</pre>
  
 
If you want to organize it by number of messages, fewest first, tack on:
 
If you want to organize it by number of messages, fewest first, tack on:
 +
<pre>
 
COUNT(DISTINCT(tm.message_id));
 
COUNT(DISTINCT(tm.message_id));
 
+
</pre>
 
If you want to run a less intensive version: find specific users that the customer is interested in.  Get the address book id and nail it down to that.  On the system mentioned above the user with the most messages took only 35 seconds to complete.
 
If you want to run a less intensive version: find specific users that the customer is interested in.  Get the address book id and nail it down to that.  On the system mentioned above the user with the most messages took only 35 seconds to complete.
  
 
Example:
 
Example:
 
+
<pre>
 
SELECT f_email AS 'Email Address', FORMAT(COUNT(DISTINCT(tm.message_id)),0) AS 'Number of Messages', FORMAT(SUM(f_size),0) AS 'Approximate Storage Use (in Bytes)'
 
SELECT f_email AS 'Email Address', FORMAT(COUNT(DISTINCT(tm.message_id)),0) AS 'Number of Messages', FORMAT(SUM(f_size),0) AS 'Approximate Storage Use (in Bytes)'
 
FROM t_message_attachments tma
 
FROM t_message_attachments tma
Line 183: Line 194:
 
INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid
 
INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid
 
WHERE abook_id = 1585;
 
WHERE abook_id = 1585;
 +
</pre>
  
 
Other Variations
 
Other Variations
 
+
<pre>
 
SELECT  
 
SELECT  
 
CONCAT(ta.f_first, ' ', ta.f_last, '(',f_mailbox, ')') AS 'Mailbox Name',
 
CONCAT(ta.f_first, ' ', ta.f_last, '(',f_mailbox, ')') AS 'Mailbox Name',
Line 198: Line 210:
 
GROUP BY abook_id
 
GROUP BY abook_id
 
ORDER BY tm.message_id;
 
ORDER BY tm.message_id;
 +
</pre>
  
 
Example Return
 
Example Return
  
  
Find User name, and most recently archived message, Below is a way to see everyone that’s older than 3 months.  Just switch the > sign to a < sign to do 3 months and newer.
+
===Find User name, and most recently archived message===
 +
Below is a way to see everyone that’s older than 3 months.  Just switch the > sign to a < sign to do 3 months and newer.
  
 
If you don’t care about if it’s older or younger than 3 months (for licensing) then just take out the “HAVING” line
 
If you don’t care about if it’s older or younger than 3 months (for licensing) then just take out the “HAVING” line
  
 +
<pre>
 
SELECT
 
SELECT
 
   CONCAT(ta.f_first, ' ', ta.f_last, '(',f_mailbox, ')') AS 'Mailbox Name',
 
   CONCAT(ta.f_first, ' ', ta.f_last, '(',f_mailbox, ')') AS 'Mailbox Name',
Line 218: Line 233:
 
   ORDER BY MAX(tm.f_delivered)
 
   ORDER BY MAX(tm.f_delivered)
 
;
 
;
 +
</pre>
  
 
Example Return
 
Example Return
  
  
Get email, number of messages that user has, Post office or server subsection they’re in, and the date of the most recent message that was archived
+
===Get email, number of messages that user has, Post office or server subsection they’re in, and the date of the most recent message that was archived===
  
 +
<pre>
 
SELECT f_email AS 'E-mail', COUNT(*) AS 'Number of Messages in Retain', f_name AS 'Post Office', MAX(FROM_UNIXTIME(f_stored)) AS 'Most Recent Message Archived'
 
SELECT f_email AS 'E-mail', COUNT(*) AS 'Number of Messages in Retain', f_name AS 'Post Office', MAX(FROM_UNIXTIME(f_stored)) AS 'Most Recent Message Archived'
 
FROM t_message tm
 
FROM t_message tm
Line 230: Line 247:
 
INNER JOIN t_postoffices tp ON tp.postoffice_id = ta.postoffice_id
 
INNER JOIN t_postoffices tp ON tp.postoffice_id = ta.postoffice_id
 
GROUP BY abook_id;
 
GROUP BY abook_id;
 +
</pre>
  
 
WARNING: This query is VERY intense, do not run this without the understanding that this will take a significant amount of time and resources on SQL.  Small test system with 96 users and 4,553,000 items in t_message took 1 minute 7 seconds to run.  By way of comparison: A straight count of t_message took only 4 seconds.
 
WARNING: This query is VERY intense, do not run this without the understanding that this will take a significant amount of time and resources on SQL.  Small test system with 96 users and 4,553,000 items in t_message took 1 minute 7 seconds to run.  By way of comparison: A straight count of t_message took only 4 seconds.
  
Get all message data for one user for messages delivered to the mailbox before X date:
+
===Get all message data for one user for messages delivered to the mailbox before X date:===
  
 +
<pre>
 
SELECT *
 
SELECT *
 
   FROM t_message tm
 
   FROM t_message tm
Line 242: Line 261:
 
   AND tm.f_delivered < UNIX_TIMESTAMP(DATE_ADD(CURDATE(), INTERVAL -20 YEAR))
 
   AND tm.f_delivered < UNIX_TIMESTAMP(DATE_ADD(CURDATE(), INTERVAL -20 YEAR))
 
;
 
;
 +
</pre>
  
 +
===Getting Message data===
  
Getting Message data
+
<pre>
 
+
 
SELECT f_name AS 'Attachment Name', hash AS "Hash ID"  
 
SELECT f_name AS 'Attachment Name', hash AS "Hash ID"  
 
FROM t_document td
 
FROM t_document td
 
INNER JOIN t_message_attachments tma ON td.document_id = tma.document_id  
 
INNER JOIN t_message_attachments tma ON td.document_id = tma.document_id  
 
WHERE message_id = 614645;
 
WHERE message_id = 614645;
 +
</pre>
  
 
The above is an expanded version of the query in this KB: http://support2.gwava.com/kb/?View=entry&EntryID=2420
 
The above is an expanded version of the query in this KB: http://support2.gwava.com/kb/?View=entry&EntryID=2420
Line 257: Line 278:
 
Going in reverse order, if you have the hash ID then you can find the message ID(s) that it connects to, I also threw in the date that it was initially stored on disc (Human readable according to the date of the server you’re running this on.):
 
Going in reverse order, if you have the hash ID then you can find the message ID(s) that it connects to, I also threw in the date that it was initially stored on disc (Human readable according to the date of the server you’re running this on.):
  
 +
<pre>
 
SELECT message_id, FROM_UNIXTIME(date)
 
SELECT message_id, FROM_UNIXTIME(date)
 
FROM t_message_attachments tma
 
FROM t_message_attachments tma
 
INNER JOIN t_document td ON td.document_id = tma.document_id
 
INNER JOIN t_document td ON td.document_id = tma.document_id
 
WHERE hash = '{Your_hash}';
 
WHERE hash = '{Your_hash}';
 +
</pre>
  
 +
Example:
  
Example:
+
<pre>
 
SELECT message_id AS 'Message ID', FROM_UNIXTIME(date) AS 'Date Stored'
 
SELECT message_id AS 'Message ID', FROM_UNIXTIME(date) AS 'Date Stored'
 
FROM t_message_attachments tma
 
FROM t_message_attachments tma
 
INNER JOIN t_document td ON td.document_id = tma.document_id
 
INNER JOIN t_document td ON td.document_id = tma.document_id
 
WHERE hash = '3249153B5525F19A46D28D29A4BE381CD2E8E22B8510C970328CF54464FAFACC';
 
WHERE hash = '3249153B5525F19A46D28D29A4BE381CD2E8E22B8510C970328CF54464FAFACC';
 +
</pre>
  
 +
===To find if a particular hash was indexed and the message id:===
  
To find if a particular hash was indexed and the message id:
+
<pre>
 
+
 
SELECT  
 
SELECT  
 
   tm.message_id, f_indexed
 
   tm.message_id, f_indexed
Line 286: Line 311:
 
WHERE
 
WHERE
 
   hash = '{hash_number}';
 
   hash = '{hash_number}';
 +
</pre>
  
Check to see if you have any accidentally deleted documents (might take a while to run)
+
===Check to see if you have any accidentally deleted documents (might take a while to run)===
 
This can be a result of a problem with the database and the count will come up how many deleted documents you have.  You’ll have to go through a clean up process to fix this.:
 
This can be a result of a problem with the database and the count will come up how many deleted documents you have.  You’ll have to go through a clean up process to fix this.:
  
 +
<pre>
 
SELECT COUNT(*)
 
SELECT COUNT(*)
 
   FROM t_message_attachments tma
 
   FROM t_message_attachments tma
Line 296: Line 323:
 
   td.document_id IS NULL
 
   td.document_id IS NULL
 
;
 
;
 +
</pre>
  
 
+
===XML MERGE FILE CREATION:===
XML MERGE FILE CREATION:
+
 
Possible way to get the GUID’s out of a database so that you can merge them through the UI:
 
Possible way to get the GUID’s out of a database so that you can merge them through the UI:
  
 +
<pre>
 
SELECT
 
SELECT
 
   CONCAT('<MergeTask><sourceMailbox>',
 
   CONCAT('<MergeTask><sourceMailbox>',
Line 316: Line 344:
 
   and gw.f_email = ex.f_email
 
   and gw.f_email = ex.f_email
 
   ;
 
   ;
 +
</pre>
  
 
Note: you’ll probably need to figure out which domain is the old and which is the new so that you can appropriately assign the combination.
 
Note: you’ll probably need to figure out which domain is the old and which is the new so that you can appropriately assign the combination.
  
 
Copy and paste your results into an .xml file then tack on
 
Copy and paste your results into an .xml file then tack on
 +
 +
<pre>
 
<MergeTasks>
 
<MergeTasks>
 
     <tasks>
 
     <tasks>
 +
</pre>
  
 
To the top of the file and
 
To the top of the file and
  
 +
<pre>
 
     </tasks>
 
     </tasks>
 
</MergeTasks>
 
</MergeTasks>
 +
</pre>
  
 
To the bottom of the file  
 
To the bottom of the file  
  
 
+
===Instead of looking up the unix timestamps for colums like "f_delievered" etc you can use UNIX_TIMESTAMP('20100101')  I.E. UNIX_TIMESTAMP('YYYYMMDD')===
Instead of looking up the unix timestamps for colums like "f_delievered" etc you can use UNIX_TIMESTAMP('20100101')  I.E. UNIX_TIMESTAMP('YYYYMMDD')
+
  
 
So this query works in MySQL:
 
So this query works in MySQL:
 +
 +
<pre>
 
SELECT COUNT(*) FROM t_message  
 
SELECT COUNT(*) FROM t_message  
 
WHERE f_delivered <  
 
WHERE f_delivered <  
 
UNIX_TIMESTAMP('2009-10-24')
 
UNIX_TIMESTAMP('2009-10-24')
 
;
 
;
 +
</pre>
  
 
(You can include or exclude hyphens)
 
(You can include or exclude hyphens)
  
If you want a specific hour/min/sec you can use:
+
====If you want a specific hour/min/sec you can use:====
  
 +
<pre>
 
SELECT COUNT(*) FROM t_message  
 
SELECT COUNT(*) FROM t_message  
 
WHERE f_delivered <  
 
WHERE f_delivered <  
 
UNIX_TIMESTAMP('2010-10-24 01:02:03')
 
UNIX_TIMESTAMP('2010-10-24 01:02:03')
 
;
 
;
 +
</pre>
  
  
As Requested in RET-7262
+
===As Requested in RET-7262===
  
 +
<pre>
 
SELECT  
 
SELECT  
 
     abook_id AS 'Retain ID',
 
     abook_id AS 'Retain ID',
Line 366: Line 405:
 
     ORDER BY COUNT(*)
 
     ORDER BY COUNT(*)
 
;
 
;
 +
</pre>
  
  
  
 +
===MSSQL specific stuff:===
  
MSSQL specific stuff:
+
===The DATEADD feature is similar to the MySQL DATE_ADD feature.===
 
+
The DATEADD feature is similar to the MySQL DATE_ADD feature.
+
  
 +
<pre>
 
SELECT abook_id, MIN(f_first) AS 'First Name', MIN(f_last) AS 'Last Name', MIN(f_mailbox) AS 'Mailbox ID', MIN(f_email) AS 'Email Address', COUNT(*) AS 'Number of Messages'
 
SELECT abook_id, MIN(f_first) AS 'First Name', MIN(f_last) AS 'Last Name', MIN(f_mailbox) AS 'Mailbox ID', MIN(f_email) AS 'Email Address', COUNT(*) AS 'Number of Messages'
 
FROM t_message tm
 
FROM t_message tm
Line 379: Line 419:
 
INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid
 
INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid
 
GROUP BY abook_id
 
GROUP BY abook_id
HAVING dateadd(S, MAX(f_stored), '1970-01-01') > '2017-02-01'
+
HAVING DATEADD(S, MAX(f_stored), '1970-01-01') > '2017-02-01'
 
ORDER BY COUNT(*)
 
ORDER BY COUNT(*)
 
GO
 
GO
 +
</pre>
  
 +
===Newer than 3 months:===
  
Newer than 3 months:
+
<pre>
 
+
 
SELECT ta.abook_id, MIN(ta.f_first) AS 'First Name', MIN(ta.f_last) AS 'Last Name', MIN(ta.f_mailbox) AS 'Mailbox ID', MIN(ta.f_email) AS 'Email Address', COUNT(*) AS 'Number of Messages'
 
SELECT ta.abook_id, MIN(ta.f_first) AS 'First Name', MIN(ta.f_last) AS 'Last Name', MIN(ta.f_mailbox) AS 'Mailbox ID', MIN(ta.f_email) AS 'Email Address', COUNT(*) AS 'Number of Messages'
 
FROM t_message tm
 
FROM t_message tm
Line 393: Line 434:
 
HAVING DATEADD(S, MIN(ts_cache), '1970-01-01') > DATEADD(m, -3, GETDATE())
 
HAVING DATEADD(S, MIN(ts_cache), '1970-01-01') > DATEADD(m, -3, GETDATE())
 
ORDER BY COUNT(*);
 
ORDER BY COUNT(*);
 +
</pre>
  
Number of users with data:
+
===Number of users with data:===
  
Total Users with data:
+
====Total Users with data:====
 +
<pre>
 
Select count(*)  
 
Select count(*)  
 
     AS 'Number of users with data'
 
     AS 'Number of users with data'
Line 411: Line 454:
 
AS t
 
AS t
 
GO
 
GO
 +
</pre>
 +
 +
To limit it to users less than 3 months or older than 3 months add
 +
<pre>
 +
HAVING DATEADD(S, MAX(f_stored), '1970-01-01') > '2017-02-01'
 +
</pre>
 +
underneath the GROUP BY clause.  Use > for newer than and < for older than.
  
Number of Users in Exchange/O365 with data (Works up until 4.1.0.1, 4.2 will change things):
+
===Number of Users in Exchange/O365 with data (Works up until 4.1.0.1, 4.2 will change things):===
  
 +
<pre>
 
Select count(*) AS 'Number of Exchange/O365 Users with archive data'
 
Select count(*) AS 'Number of Exchange/O365 Users with archive data'
 
From
 
From
Line 428: Line 479:
 
AS t
 
AS t
 
GO
 
GO
 +
</pre>

Latest revision as of 18:14, 7 April 2017

Daniel has created a number of helpful queries:

All Queries are for Retain 3.x and newer. MySQL syntax. In most instances MSSQL will work fine with the same query.

Warning: Some queries can take a heavy performance toll on the SQL server, I’ve tried to create them to be as efficient as possible but some things just take lots of processing/disk I/O time. Don’t expect these to always come back right away. ALSO, do not run a query unless you understand what you’re doing. If a query goes cartesian on you it will bring the SQL server to it’s knees and will probably ruin anything else that’s trying to access that server.

Additional note: Because google thinks they’re awesome copy and paste straight from this doc may not work. Copy the query to a notepad, or notepad++ and change the ‘text’ to actual ‘ (AKA “tick” in some circles) marks. Quotes also work.

User Information:

Contents

[edit] To get User (displaying email) with Last date they were cached, Most recently archived, and the Item Store flag:

SELECT abook_id, f_email, from_unixtime(ts_cache) AS 'Last Cached From Mail Server', from_unixtime(ts_store) AS 'Most recently Accessed', from_unixtime(ts_item) AS 'Item Store Flag'
FROM t_abook
GROUP BY abook_id;

SELECT f_first AS 'First Name', f_last AS 'Last Name', f_mailbox AS 'Mailbox ID', f_email AS 'Email Address'
    FROM t_abook
    INNER JOIN t_uuid_mapping ON t_abook.f_uid = t_uuid_mapping.f_uuid;
  
SELECT COUNT(*)
    FROM t_abook
    INNER JOIN t_uuid_mapping ON t_abook.f_uid = t_uuid_mapping.f_uuid;

[edit] Get A list of All users with a count of how many messages each person has:

SELECT abook_id, f_email AS 'Email Address', COUNT(DISTINCT(tm.message_id)) AS 'Total Message Count'
FROM t_message tm
INNER JOIN 
t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id
INNER JOIN 
t_abook ta ON ta.f_uid = tum.f_uuid
GROUP BY abook_id;
  • Note: The above query took 20ish minutes to run on a system with 8,000 users and 290 million messages. Number of users that had mail archived was actually nearly 20,000

Another thing of note is my use of abook_id in the SELECT and also the WHERE clause. I’ve found that not everyone has an email address. Sometimes Retain will archive something that doesn’t have an email attached to it. Hence “WHERE abook_id =” and “GROUP BY abook_id”

[edit] Total number of messages a particular user has:

SELECT COUNT(*)
FROM t_message
INNER JOIN t_uuid_mapping ON t_message.uuid_mapping_id = t_uuid_mapping.uid_mapping_id
INNER JOIN t_abook ON t_abook.f_uid = t_uuid_mapping.f_uuid
WHERE f_email = ‘user@place.net’;

IMPORTANT: It has been found that in Retain 4.0.3 that there are sometimes situations where there is data in the t_message table, but not in the t_message_attachments table. If you have nothing in the t_message_attachments table Retain will not be able to retrieve anything from disc.

[edit] A more accurate query for the number of messages Retain actually has would probably be:

SELECT COUNT(*)
FROM t_message
INNER JOIN t_uuid_mapping ON t_message.uuid_mapping_id = t_uuid_mapping.uid_mapping_id
INNER JOIN t_abook ON t_abook.f_uid = t_uuid_mapping.f_uuid
WHERE abook_id = {number};


[edit] Number of users with data:

Select count(*) AS 'Number of users with data'
From
(
SELECT
    COUNT(*)
  FROM t_message tm
    INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id
    INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid
    INNER JOIN t_domains td ON ta.domain_id = td.domain_id
  GROUP BY abook_id
)
AS t
;

To limit it to users less than 3 months or older than 3 months add

HAVING MAX(tm.f_delivered) < UNIX_TIMESTAMP(DATE_ADD(CURDATE(), INTERVAL -3 MONTH))

underneath the GROUP BY clause. Use > for newer than and < for older than.

*Warning: This makes the query much more intense. This means that adding the older than, or newer than part of the query will run for a significant amount of time. A large system could see this query take several hours.

[edit] Number of Users in Exchange/O365 with data (Works up until 4.1.0.1, 4.2 will change things):

Select count(*) AS 'Number of Exchange/O365 Users with archive data'
From
(
SELECT
    COUNT(*)
  FROM t_message tm
    INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id
    INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid
    INNER JOIN t_domains td ON ta.domain_id = td.domain_id
  WHERE td.f_type = 1
  GROUP BY abook_id
)
AS t
;


[edit] Get total (possible) size of message store used by a user.

Size is in Bytes (This size may not be entirely accurate, this is a number that has been passed by the Email server to Retain):

SELECT SUM(f_size) AS 'Approximate Storage Use'
FROM t_message_attachments tma
INNER JOIN t_message tm ON tma.message_id = tm.message_id
INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id
INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid
WHERE f_email = 'admin@knobias.com';


  • NOTE: The previous 2 queries could possibly be inaccurate if 2 users have the same Email address. The most accurate would probably be to use abook_id

In the WHERE clause you can use these fields that should be unique:

abook_id =
f_uid =
f_mailbox =
althash =

Most Unique Example:

SELECT abook_id AS 'Retain Address ID', f_email AS 'Email Address', SUM(f_size) AS 'Approximate Storage Use'
FROM t_message_attachments tma
INNER JOIN t_message tm ON tma.message_id = tm.message_id
INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id
INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid
WHERE abook_id = {abook_id number};

[edit] User ID (abook_id), mailbox name, email address and most recently stored message date:

(Thanks Daron)

SELECT abook_id AS 'User ID', ta.f_mailbox Mailbox, ta.f_email AS 'Eamil address', from_unixtime(max(tm.f_stored)) AS 'Most recently archived message'
  from t_message tm
  inner join t_abook ta on tm.uuid_mapping_id = ta.f_uid 
  GROUP BY abook_id;


[edit] Message counts and message store size:

Get A list of All users with a count of how many messages each person has along with the possible size that their archive would take on disk. This last number will not add up to the total disk use that Retain is using. Use it as a ballpark figure.

WARNING: The following query is VERY intense. On a system with 3,500 users with mail in Retain, and 41 million total messages, it took 1 hour 17 minutes to return. Use with caution.

SELECT 
f_email AS 'Email Address', 
FORMAT(COUNT(DISTINCT(tm.message_id)),0) AS 'Number of Messages', FORMAT(SUM(f_size),0) AS 'Approximate Storage Use (in Bytes)'
FROM t_message_attachments tma
INNER JOIN t_message tm ON tma.message_id = tm.message_id
INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id
INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid
GROUP BY abook_id
ORDER BY tm.message_id;

By default it’ll organize the results by the email address. If you’d like to organize it by size of the mailbox, smallest first, change the ORDER BY clause to read as follows:

ORDER BY SUM(f_size);

If you’d like it in reverse order tack on a “DESC” like so:

ORDER BY SUM(f_size) DESC;

If you want to organize it by number of messages, fewest first, tack on:

COUNT(DISTINCT(tm.message_id));

If you want to run a less intensive version: find specific users that the customer is interested in. Get the address book id and nail it down to that. On the system mentioned above the user with the most messages took only 35 seconds to complete.

Example:

SELECT f_email AS 'Email Address', FORMAT(COUNT(DISTINCT(tm.message_id)),0) AS 'Number of Messages', FORMAT(SUM(f_size),0) AS 'Approximate Storage Use (in Bytes)'
FROM t_message_attachments tma
INNER JOIN t_message tm ON tma.message_id = tm.message_id
INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id
INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid
WHERE abook_id = 1585;

Other Variations

SELECT 
CONCAT(ta.f_first, ' ', ta.f_last, '(',f_mailbox, ')') AS 'Mailbox Name',
f_email AS 'Email Address', 
FORMAT(COUNT(DISTINCT(tm.message_id)),0) AS 'Number of Messages',
FORMAT(SUM(f_size),0) AS 'Approximate Storage Use (in Bytes)',
FROM_UNIXTIME(MIN(tm.f_delivered)) AS 'Oldest Message',
FROM_UNIXTIME(MAX(tm.f_delivered)) AS 'Newest Message'
FROM t_message tm
INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id
INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid
GROUP BY abook_id
ORDER BY tm.message_id;

Example Return


[edit] Find User name, and most recently archived message

Below is a way to see everyone that’s older than 3 months. Just switch the > sign to a < sign to do 3 months and newer.

If you don’t care about if it’s older or younger than 3 months (for licensing) then just take out the “HAVING” line

SELECT
  CONCAT(ta.f_first, ' ', ta.f_last, '(',f_mailbox, ')') AS 'Mailbox Name',
  f_email AS 'Email Address',
  FROM_UNIXTIME(MAX(tm.f_delivered)) AS 'Newest Message'
  FROM t_message_attachments tma
  INNER JOIN t_message tm ON tma.message_id = tm.message_id
  INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id
  INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid
  GROUP BY abook_id
  HAVING MAX(tm.f_delivered) < UNIX_TIMESTAMP(DATE_ADD(CURDATE(), INTERVAL -3 MONTH))
  ORDER BY MAX(tm.f_delivered)
;

Example Return


[edit] Get email, number of messages that user has, Post office or server subsection they’re in, and the date of the most recent message that was archived

SELECT f_email AS 'E-mail', COUNT(*) AS 'Number of Messages in Retain', f_name AS 'Post Office', MAX(FROM_UNIXTIME(f_stored)) AS 'Most Recent Message Archived'
FROM t_message tm
INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id
INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid
INNER JOIN t_postoffices tp ON tp.postoffice_id = ta.postoffice_id
GROUP BY abook_id;

WARNING: This query is VERY intense, do not run this without the understanding that this will take a significant amount of time and resources on SQL. Small test system with 96 users and 4,553,000 items in t_message took 1 minute 7 seconds to run. By way of comparison: A straight count of t_message took only 4 seconds.

[edit] Get all message data for one user for messages delivered to the mailbox before X date:

SELECT *
  FROM t_message tm
  INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id
  INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid
  WHERE abook_id = 1
  AND tm.f_delivered < UNIX_TIMESTAMP(DATE_ADD(CURDATE(), INTERVAL -20 YEAR))
;

[edit] Getting Message data

SELECT f_name AS 'Attachment Name', hash AS "Hash ID" 
FROM t_document td
INNER JOIN t_message_attachments tma ON td.document_id = tma.document_id 
WHERE message_id = 614645;

The above is an expanded version of the query in this KB: http://support2.gwava.com/kb/?View=entry&EntryID=2420

Besides the HASH ID it will also give you the name of the attachment.

Going in reverse order, if you have the hash ID then you can find the message ID(s) that it connects to, I also threw in the date that it was initially stored on disc (Human readable according to the date of the server you’re running this on.):

SELECT message_id, FROM_UNIXTIME(date)
FROM t_message_attachments tma
INNER JOIN t_document td ON td.document_id = tma.document_id
WHERE hash = '{Your_hash}';

Example:

SELECT message_id AS 'Message ID', FROM_UNIXTIME(date) AS 'Date Stored'
FROM t_message_attachments tma
INNER JOIN t_document td ON td.document_id = tma.document_id
WHERE hash = '3249153B5525F19A46D28D29A4BE381CD2E8E22B8510C970328CF54464FAFACC';

[edit] To find if a particular hash was indexed and the message id:

SELECT 
  tm.message_id, f_indexed
FROM
  t_message tm
inner join 
  t_message_attachments tma
ON
  tm.message_id = tma.message_id
INNER JOIN
  t_document td
ON
  tma.document_id = td.document_id
WHERE
  hash = '{hash_number}';

[edit] Check to see if you have any accidentally deleted documents (might take a while to run)

This can be a result of a problem with the database and the count will come up how many deleted documents you have. You’ll have to go through a clean up process to fix this.:

SELECT COUNT(*)
  FROM t_message_attachments tma
  LEFT JOIN t_document td ON td.document_id = tma.document_id
  WHERE
  td.document_id IS NULL
;

[edit] XML MERGE FILE CREATION:

Possible way to get the GUID’s out of a database so that you can merge them through the UI:

SELECT
  CONCAT('<MergeTask><sourceMailbox>',
  gw.f_uid,
  '</sourceMailbox><destinationMailbox>', 
  ex.f_uid,
  '</destinationMailbox><removeOrphanedEntity>true</removeOrphanedEntity></MergeTask>'
  )
FROM
  t_abook gw,
  t_abook ex
WHERE 
  gw.domain_id = 2
  and ex.domain_id = 3
  and gw.f_email = ex.f_email
  ;

Note: you’ll probably need to figure out which domain is the old and which is the new so that you can appropriately assign the combination.

Copy and paste your results into an .xml file then tack on

<MergeTasks>
    <tasks>

To the top of the file and

    </tasks>
</MergeTasks>

To the bottom of the file

[edit] Instead of looking up the unix timestamps for colums like "f_delievered" etc you can use UNIX_TIMESTAMP('20100101') I.E. UNIX_TIMESTAMP('YYYYMMDD')

So this query works in MySQL:

SELECT COUNT(*) FROM t_message 
WHERE f_delivered < 
UNIX_TIMESTAMP('2009-10-24')
;

(You can include or exclude hyphens)

[edit] If you want a specific hour/min/sec you can use:

SELECT COUNT(*) FROM t_message 
WHERE f_delivered < 
UNIX_TIMESTAMP('2010-10-24 01:02:03')
;


[edit] As Requested in RET-7262

SELECT 
    abook_id AS 'Retain ID',
    td.f_name AS 'Mail Server or Domain',
    CONCAT(f_first, ', ', f_last) AS 'First and Last Name',
    f_mailbox AS 'Mailbox ID',
    f_email AS 'Email Address',
    COUNT(*) AS 'Number of Messages'
    FROM t_message tm
        INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id
        INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid
        INNER JOIN t_domains td ON ta.domain_id = td.domain_id
    WHERE f_stored < UNIX_TIMESTAMP(DATE_ADD(CURDATE(), INTERVAL -3 MONTH))
    GROUP BY abook_id
    ORDER BY COUNT(*)
;


[edit] MSSQL specific stuff:

[edit] The DATEADD feature is similar to the MySQL DATE_ADD feature.

SELECT abook_id, MIN(f_first) AS 'First Name', MIN(f_last) AS 'Last Name', MIN(f_mailbox) AS 'Mailbox ID', MIN(f_email) AS 'Email Address', COUNT(*) AS 'Number of Messages'
FROM t_message tm
INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id
INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid
GROUP BY abook_id
HAVING DATEADD(S, MAX(f_stored), '1970-01-01') > '2017-02-01'
ORDER BY COUNT(*)
GO

[edit] Newer than 3 months:

SELECT ta.abook_id, MIN(ta.f_first) AS 'First Name', MIN(ta.f_last) AS 'Last Name', MIN(ta.f_mailbox) AS 'Mailbox ID', MIN(ta.f_email) AS 'Email Address', COUNT(*) AS 'Number of Messages'
FROM t_message tm
INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id
INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid
GROUP BY abook_id
HAVING DATEADD(S, MIN(ts_cache), '1970-01-01') > DATEADD(m, -3, GETDATE())
ORDER BY COUNT(*);

[edit] Number of users with data:

[edit] Total Users with data:

Select count(*) 
    AS 'Number of users with data'
From
(
SELECT
    COUNT(*) Col1
  FROM t_message tm
    INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id
    INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid
    INNER JOIN t_domains td ON ta.domain_id = td.domain_id
  GROUP BY abook_id
)
AS t
GO

To limit it to users less than 3 months or older than 3 months add

HAVING DATEADD(S, MAX(f_stored), '1970-01-01') > '2017-02-01'

underneath the GROUP BY clause. Use > for newer than and < for older than.

[edit] Number of Users in Exchange/O365 with data (Works up until 4.1.0.1, 4.2 will change things):

Select count(*) AS 'Number of Exchange/O365 Users with archive data'
From
(
SELECT
    COUNT(*) Col1
  FROM t_message tm
    INNER JOIN t_uuid_mapping tum ON tm.uuid_mapping_id = tum.uid_mapping_id
    INNER JOIN t_abook ta ON ta.f_uid = tum.f_uuid
    INNER JOIN t_domains td ON ta.domain_id = td.domain_id
  WHERE td.f_type = 1
  GROUP BY abook_id
)
AS t
GO
Personal tools
Namespaces

Variants
Actions
Home
Exchange
GroupWise
JAVA
Linux
MTK
Retain
GW Monitoring and Reporting (Redline)
GW Disaster Recovery (Reload)
GW Forensics (Reveal)
GWAVA
Secure Messaging Gateway
GW Mailbox Management (Vertigo)
Windows
Other
User Experience
Toolbox
Languages
Toolbox