Excel - calculating durations of time data spread across multiple rows
I have a spreadsheet with a dataset of a number of transactions, each of which is composed of substeps, each of which has the time that it occurred. There can be a variable number and order of steps.
I'd like to find the duration of each transaction. If I can do this in Excel then great, as it's already in that format. If there isn't a straight-forward way to do this in Excel, I'll load it into a database and do the analysis with SQL. If there is an Excel way round this it'll save a few hours setup though :)
A simplified example of my data is as follows:
TransID, Substep, Time 1, step A, 15:00:00 1, step B, 15:01:00 1, step C, 15:02:00 2, step B, 15:03:00 2, step C, 15:04:00 2, step E, 15:05:00 2, step F, 15:06:00 3, step C, 15:07:00 3, step D, 15:08:00 etc.
I'd like to produce a result set as follows:
TransID, Duration 1, 00:02:00 2, 00:03:00 3, 00:01:00 etc.
My initial try was with an extra column with a formula subtracting end time from start time, but without a repeating number of steps, or the same start and end steps I'm having difficulty seeing how this formula would work.
I've also tried creating a pivot table based on this data with ID as the rows and Time as the data. I can change the field settings on the time data to return grouped values such as count or max, but am struggling to see how this can be setup to show max(time) - min(time) for each ID, hence why I'm thinking about heading to SQL. If anyone can point out anything obvious I'm missing though, I'd be very grateful.
As suggested by Hobbo, I've now used a pivot table with TransID as the rows and twice added Time as the data. After setting the field settings on the Time to Max on the first and Min on the second, a formula can be added just outside the pivot table to calculate the differences. One thing I'd been overlooking here is that the same value can be added to the data section more than once!
A follow-on problem was that the formula I add is of the form =GETPIVOTDATA("Max of Time",$A$4,"ID",1)-GETPIVOTDATA("Min of Time",$A$4,"ID",1), whici doesn't then increment when copying and pasting. Solutions to this are to either use the pivot table toolbar to turn off GETPIVOTDATA formulae, or rather than clicking on the pivot table when selecting cells in the formula, type the cell references instead (e.g. =H4-G4)
You were on the right lines with pivot tables. Drag in TransID as a row field then drag in two copies of Time as data fields in the pivot table; right click on each and specify Min as the summarization function for one and Max for the other. To the right of the pivot table add a formula to calculate the difference.
"Looks good, the only problem I have is that the formula I add is of the the form =GETPIVOTDATA("Max of Time, $A$4, "ID", 1) - GETPIVOTDATA("Max of Time, $A$4, "ID", 1). When I copy that to the cells below, the 1 doesn't update to 2, 3 etc so they all show the same time. – Kris Coverdale "
Use this button on the pivot table toolbar to switch GETPIVOTDATA formulae off.
In your formula "GETPIVOTDATA("Max of Time, $A$4, "ID", 1) - GETPIVOTDATA("Max of Time, $A$4, "ID", 1)' the cell references are addressed between the symbol "$'. For example $A$4. When the cell references having $ symbol and you copy the formula to other cell then reference cells are not updated automatically. Hence you get the same type.
Perhaps you modify the formula as follows and then copy the formula to other cells. The formula should be like:
"GETPIVOTDATA("Max of Time, A4, "ID", 1) - GETPIVOTDATA("Max of Time, A4, "ID", 1)".
Maybe something as simple as a query like this.
SELECT TransID, DateDiff(mi, Min(Time),Max(Time)) AS Duration FROM MyTable GROUP BY TrandID
A B C 1 1, step A, 15:00:00 2 1, step B, 15:01:00 3 1, step C, 15:02:00 4 2, step B, 15:03:00 5 2, step C, 15:04:00 6 2, step E, 15:05:00 7 2, step F, 15:06:00 8 3, step C, 15:07:00 9 3, step D, 15:08:00 11 1, =max(if($A$1:$A$9=$A11,$C$1:$C$9,"")-min(if($A$1:$A$9=$A11,$C$1:$C$9,"") 12 2, =max(if($A$1:$A$9=$A12,$C$1:$C$9,"")-min(if($A$1:$A$9=$A12,$C$1:$C$9,"")
note: formulas are array functions so press ctrl-shift-enter after editing them.
To add to Kibbee's post, in reference to the comment, you can use ADO with Excel:
'From: http://support.microsoft.com/kb/246335 ' strFile = Workbooks(1).FullName strCon = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & strFile _ & ";Extended Properties=""Excel 8.0;HDR=Yes;IMEX=1"";" Set cn = CreateObject("ADODB.Connection") Set rs = CreateObject("ADODB.Recordset") cn.Open strCon strSQL = "SELECT TransID, DateDiff('n', Min([MyTime]),Max([MyTime])) AS Duration " _ & "FROM [Sheet1$] GROUP BY TransID" rs.Open strSQL, cn 'Write out to another sheet ' Worksheets(2).Cells(2, 1).CopyFromRecordset rs
EDIT: I have corrected some errors in the original post and changed the name of the time column to MyTime. Time is a reserved word in SQL and causes difficulties in queries. This now works on a very simple test.