Home » RDBMS Server » Performance Tuning » performance problem when use parallel and hash join on large tables
performance problem when use parallel and hash join on large tables [message #255427] Tue, 31 July 2007 17:57 Go to next message
fbucco
Messages: 3
Registered: July 2007
Junior Member
I´m trying to create a table using parallel and hash join. The first one has 27Gb and the second has 300Gb. I see at v$session_longops that oracle has already read both table and started to join them using hash join, but the expected time to finished this hash is 138 hours and it has only used 1:30 hours to read both tables. Is there anything that I can do to improve this step?

create table tmp
tablespace ncnologging
parallel 10
as SELECT /*+ use_hash(b,a) */ a.*, b.data_servico, b.id_subparticao
FROM a, b
WHERE a.if_ident_ano = b.ident_ano
AND a.if_ident_sequencial_remessa = b.ident_sequencial_remessa
AND a.if_ident_gerador = b.ident_gerador
AND a.if_ident_sequencial_registro = b.ident_sequencial_registro

Regards.
Re: performance problem when use parallel and hash join on large tables [message #255428 is a reply to message #255427] Tue, 31 July 2007 18:01 Go to previous messageGo to next message
BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal
Senior Member
I have a couple of questions for you.

1) Why did you NOT follow any of the posting guidelines in the STICKY posts at the top of this forum?

2) When will you follow the posting guidlelines?

3) why create a new table as opposed to just creating a view on top of the two existing tables?
Re: performance problem when use parallel and hash join on large tables [message #255442 is a reply to message #255427] Tue, 31 July 2007 23:00 Go to previous messageGo to next message
rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
It's worth finding out whether the bottleneck is in the SELECT or the CREATE.

Try the following:
SELECT *
FROM (
SELECT /*+ use_hash(b,a) PARALLEL(a) PARALLEL(b)*/ a.*, b.data_servico, b.id_subparticao
FROM a, b
WHERE a.if_ident_ano = b.ident_ano
AND a.if_ident_sequencial_remessa = b.ident_sequencial_remessa
AND a.if_ident_gerador = b.ident_gerador
AND a.if_ident_sequencial_registro = b.ident_sequencial_registro
)
WHERE ROWNUM > 1


It will run the SELECT and get all of the data from disk, but will return no rows. It will give you a benchmark for the SELECT. If that is quick, then writing the data must be the slow bit.

Also, if both tables are very large, the Hash Join may require multiple passes and use heaps of temp space. There's not much you can do about that other than increase the hash area, or you could hash partition the tables (discussed here). But benchmark the SQL first (above), otherwise you might be wasting your time.

Ross Leishman
Re: performance problem when use parallel and hash join on large tables [message #255675 is a reply to message #255428] Wed, 01 August 2007 08:26 Go to previous messageGo to next message
fbucco
Messages: 3
Registered: July 2007
Junior Member
1)I havent seen anyone that help me on this problem.

2) Sorry, I´ve havent seen it, from now on I´ll try do it.

3) We will partition the bigger table, so I will have to recreate it, add some columns and delete some rows.
Regards.
Re: performance problem when use parallel and hash join on large tables [message #255688 is a reply to message #255675] Wed, 01 August 2007 08:53 Go to previous messageGo to next message
Michel Cadot
Messages: 68665
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
The question is: why do you want to hash join on big tables? It is a very bad idea.
Why using hints? Do you have something wrong when not using them?

Regards
Michel
Re: performance problem when use parallel and hash join on large tables [message #255768 is a reply to message #255688] Wed, 01 August 2007 13:08 Go to previous message
fbucco
Messages: 3
Registered: July 2007
Junior Member
Let me try to explain what I have and I need.
I have two tables, A with 27Gb and B with 300Gb. The B table will be partitioned by one column and subpartitioned by another column and both columns are new. Some rows from B table will be discarded. The rows that I have to migrate come from a join between A and B. The join that I need to get these rows is:
[
select a.*, b.data_servico, b.id_subparticao
FROM a, b
WHERE a.if_ident_ano = b.ident_ano
AND a.if_ident_sequencial_remessa = b.ident_sequencial_remessa
AND a.if_ident_gerador = b.ident_gerador
AND a.if_ident_sequencial_registro = b.ident_sequencial_registro
]

There is an index on B table with these columns: (if_ident_ano, if_ident_sequencial_remessa , if_ident_gerador, if_ident_sequencial_registro).

So, I have some options to execute it:
1) create table as select without any parallel
2) create table as select using parallel and hash join
3) create table as select using parallel and nested loop
4) use copy command

I thought that the second one will be the best one, but I´m having this problem with the hash join.

Any sugestions are welcome.

Regards.
Previous Topic: How to find out the bad sql statemets ?
Next Topic: How to Generate Statistics of a table and how to see in oracle 10g
Goto Forum:
  


Current Time: Mon Jul 01 01:57:36 CDT 2024